All things Radeon VII / Vega 20

Chooka
Chooka
Joined: 11 Feb 13
Posts: 134
Credit: 3,722,415,759
RAC: 1,866,174

Thanks mmonnin. Hoping I

Thanks mmonnin. Hoping I don't have a bigger issue here.

After updating to the latest AMD driver and discovering the muti card issue, I uninstalled that version (19.4.1) in add/remove programs, restarted the PC then tried to reinstall the previous driver (19.1.1) however....after 6 hrs it's stuck at "please wait while detecting hardware"

I then ran the AMD driver removal tool, unplugged my GPU, plugged it back in again and started over. Still stuck looking for the hardware.

I do see in GPU-Z that it says - Driver Version 10.0.17763.1/Win 10 64. 

Would that be the issue you're referring to above?

 


Chooka
Chooka
Joined: 11 Feb 13
Posts: 134
Credit: 3,722,415,759
RAC: 1,866,174

So I just disconnected from

So I just disconnected from the internet, ran AMD driver removal tool in safe mode, restarted my pc and am trying to install 19.1.1 again.

GPU-Z still shows the Microsoft basic display adapter though.

I'm confounded that this simple thing has resulted in my hardware not being found. 

Surely a driver can't soft brick a GPU?

Windows device manager under Display Adaptors only shows the Microsoft Adapter..... which is surprising after removing and reinstalling the GPU physically.

Still trying to detect hardware.....

I can feel a full blown wipe of C: drive coming on :( May have to test another GPU first though.


Chooka
Chooka
Joined: 11 Feb 13
Posts: 134
Credit: 3,722,415,759
RAC: 1,866,174

FWIW - I just tried 2

FWIW - I just tried 2 different system restore points and neither worked :(

Something is seriously wrong.

*sigh*…. BOINC & PC's were made to test us!


archae86
archae86
Joined: 6 Dec 05
Posts: 3,157
Credit: 7,229,348,214
RAC: 1,147,915

Chooka, I once had a case of

Chooka,

I once had a case of an Nvidia card not being observed when I added it as a second graphics card to a host which had been running OK.  After trying everything I could think of, I returned it and got another card--same problem, so not likely a hardware defect.  Although I had been running DDU on some of my driver installs (and had tried more than one driver version), I had not been ticking the "clean install" box.  The first time I tried that, suddenly all was well.

OK, that was over in the Nvidia world, and you are in the AMD world, which is red-shifted in many ways.  I confess I have no idea at all what specifically has happened to you, but I would not be surprised if it turns out that somewhere on your system there is a residual something or other not yet cleaned up by the things you have tried.

I wish I had a useful suggestion, but I don't.

 

 

Chooka
Chooka
Joined: 11 Feb 13
Posts: 134
Credit: 3,722,415,759
RAC: 1,866,174

Well...the only option I see

Well...the only option I see is to format C:

Anyone else update to the latest Radeon driver? Had any issues?


solling2
solling2
Joined: 20 Nov 14
Posts: 219
Credit: 1,577,651,302
RAC: 20,150

Chooka schrieb:I do see in

Chooka wrote:
I do see in GPU-Z that it says - Driver Version 10.0.17763.1/Win 10 64. 

What about giving the official driver rollback path a chance: in your device manager, select your GPU, the right click and uninstall? Then update W10 to current build, then Adrenalin new install as Archae86 said.

mmonnin
mmonnin
Joined: 29 May 16
Posts: 291
Credit: 3,424,106,540
RAC: 3,782,740

I've never checked GPU-Z  to

I've never checked GPU-Z  to see what it says before any GPU driver installation. I'd guess there would have to be some kind of default driver to get a display output? Linux has nouveau. There wouldn't be a checkbox for either OpenCL or CUDA with just a default driver but a Win10 driver I would expect to see something else.

Chooka
Chooka
Joined: 11 Feb 13
Posts: 134
Credit: 3,722,415,759
RAC: 1,866,174

Well I just bit the bullet

Well I just bit the bullet and wiped C: drive :( Only took a cuple of hours to bring everything back up again. The Windows updates take the longest.

Whatever the issue was, it's gone now. Back up and running?

As I said, my system restores didn't work. DDU didn't work (even in safe mode) Quite the mystery. I am now using the latest driver btw - 19.4.1

 

 


Peter van Kalleveen
Peter van Kalleveen
Joined: 15 Jan 19
Posts: 45
Credit: 250,329,645
RAC: 0

Hey guys,Just wanted to

Hey guys,

Just wanted to share my VII experiences, because in raw flops its a heavy duty card but not without its setup issues in my case.

I’m running on a Threadripper 2950x platform

Out of the box horrible unstable, multiple crashes in 24 hour runs and situations where i needed to reinstall the driver set after crashing, or slowing down over time from 20min/task to 8hours/task.

A lot of tweaking, last time i thought it was stable i wanted to post here, but then it crashed during me typing the post 'ironic'.

I found that giving it to much task at the same time is a big source of trouble, at the moment i'm running only 2 FGRP task's at once with around 90% compute utilization. It takes roughly 7:55 min for completing the task, this results in 15 FGRP complete task’s an hour.

I have one 'cpu' per WU

Running 3 tasks might be possible for a long time but 4 will result in crashing sooner or later.

This seems independent from clock, heat and voltage because I tried it with a severe down clock on the card but seemed to had no influence on the stability with 4 tasks.

Also turning of all virtualization in bios and OS, no VirtualBox seems to improve its throughput and efficiency per threat.

In the end I’m running the card at 1699 mhz at around 977mv to bring the heat/noise and power consumption down to an acceptable level since the computer sits in the same room as we watch tv and the girlfriend sits and complains about the running computerUndecided

As i see results from other people here there a bit more to squeeze out of it, but to be fair i'm just very glad its rock solid and stable now.

Temp is 67C degrees with Tjunc 88C degrees with ambient around 22 degrees Celcius and power limit -20%

archae86
archae86
Joined: 6 Dec 05
Posts: 3,157
Credit: 7,229,348,214
RAC: 1,147,915

Peter van Kalleveen

Peter van Kalleveen wrote:
Just wanted to share my VII experiences

thanks for that--more reported experience helps everybody.

Quote:
at the moment i'm running only 2 FGRP task's at once

As it happens, just today, inspired by positive 3X results on a machine running an RX 570, I had a try at 3X on my Radeon VII machines.  I have been running 2X since very near the beginning.  I'm just running Gamma-Ray Pulsar work.

The good news: a clear productivity improvement if it worked consistently, as the elapsed times per task at 2X run about 6:10, while the 3X run about 8:45 (breakeven would be 9:15)

The bad news: about 10% of the tasks I have run at 3X have terminated early, reporting Computation error (65,).  The early termination has varied from Binary Point 83 up to Binary point 1543.

The tail end of stderr.txt for a representative example reads:

% Starting semicoherent search over f0 and f1.
% nf1dots: 41  df1dot: 2.512676418e-015  f1dot_start: -1e-013  f1dot_band: 1e-013
% Filling array of photon pairs
Error in computing index of fft input array, i:1121821346 pair:281376
ERROR: prepare_ts_2_phase_diff_sorted() returned with error 18934968
09:51:52 (34116): [CRITICAL]: ERROR: MAIN() returned with error '1'
FPU status flags:  PRECISION
09:52:04 (34116): [normal]: done. calling boinc_finish(65).
09:52:04 (34116): called boinc_finish

</stderr_txt>

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.