Radeon Vega

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

koschi wrote:So Einstein

koschi wrote:
So Einstein became the GPU Project of this years Pentathlon, Vega 56 ordered and arriving tomorrow, exciting! 

I hope it's going to serve you as well as mine's served me! Cool

koschi
koschi
Joined: 17 Mar 05
Posts: 86
Credit: 1687727555
RAC: 819433

It does! 2 WUs in 9:36min, 1

It does!

2 WUs in 9:36min, 1 in 5:35min.

However, it runs next to an RX580, but steals all the work from it (not VEGAs fault though).

I run Ubuntu 18.04 with AMDGPU-PRO 19.10 (legacy and pal OpenCL installed), BOINC client 7.14.2 & 7.15.0.

Both cards are recognised by BOINC: 

Wed 08 May 2019 20:20:27 CEST | | OpenCL: AMD/ATI GPU 0: Radeon RX Vega (driver version 2841.4 (PAL,HSAIL), device version OpenCL 2.0 AMD-APP (2841.4), 8176MB, 8176MB available, 11397 GFLOPS peak)Wed 08 May 2019 20:20:27 CEST | | OpenCL: AMD/ATI GPU 1: Radeon RX 580 Series (driver version 2841.4, device version OpenCL 1.2 AMD-APP (2841.4), 7295MB, 7295MB available, 5161 GFLOPS peak)

 

<use_all_gpus>1</use_all_gpus> is set and acknowledged by BOINC:Wed 08 May 2019 20:20:28 CEST | | Config: use all coprocessors

Regardless how many WUs I run in parallel (tested 1 and 2), they all end up on the Vega. The RX580 shows no load / increased temperature.

With ngpus 1.0 the BOINC client sends one WU to each GPU, in the manager this is shown in the status column as (device 0) & (device 1). The FGRP1G app is correctly called, once with --device 0 and once with --device 1:

root 28013 11934 14 23:13 pts/2 00:01:03 ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati --inputfile LATeah1049X.dat --alpha 1.41058464281 --delta -0.444366280137 --skyRadius 5.090540e-07 --ldiBins 30 --f0start 180.0 --f0Band 8.0 --firstSkyPoint 0 --numSkyPoints 1 --f1dot -1e-13 --f1dotBand 1e-13 --df1dot 2.512676418e-15 --ephemdir JPLEPH.405 --Tcoh 2097152.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.1 --reftime 56100 --model 0 --f0orbit 0.005 --mismatch 0.1 --demodbinary 1 --BinaryPointFile templates_LATeah1049X_0188_2669947.dat --debug 1 --debugCommandLineMangling --device 1

root 28592 11934 57 23:20 pts/2 00:00:05 ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati --inputfile LATeah1049X.dat --alpha 1.41058464281 --delta -0.444366280137 --skyRadius 5.090540e-07 --ldiBins 30 --f0start 180.0 --f0Band 8.0 --firstSkyPoint 0 --numSkyPoints 1 --f1dot -1e-13 --f1dotBand 1e-13 --df1dot 2.512676418e-15 --ephemdir JPLEPH.405 --Tcoh 2097152.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.1 --reftime 56100 --model 0 --f0orbit 0.005 --mismatch 0.1 --demodbinary 1 --BinaryPointFile templates_LATeah1049X_0188_2793903.dat --debug 1 --debugCommandLineMangling --device 0

 

However, lmsensors, amdgpu-utils and the WU runtime indicate that both WUs are being run on the Vega, while the RX580 remains idle.

 

Quite a strange problem. I'm not sure at what level this is screwed up. Most likely not BOINC, it was sending WUs to devices 0 and 1, as shown by the manager and the FGRPB1G processes. Is it the Einstein executable that ignores the device parameter (and runs everything on device 0) or somewhere in OpenCL, scheduling these tasks to the more powerful card?

 

I'm a bit out of ideas...

cecht
cecht
Joined: 7 Mar 18
Posts: 1533
Credit: 2900482224
RAC: 2187725

Is the RX 580 getting enough

Koschi wrote:
However, it runs next to an RX580, but steals all the work from it (not VEGAs fault though).

Is the RX 580 getting enough power? Perhaps try using amdgpu-utils (amdgpu-pac --execute) to state mask the Vega, so it draws less system power while crunching, and see if the 580 then goes to work.  Maybe even state mask both cards to really drop their combined power needs for this test. I'd recommend a state mask of 0,4 for both just to see if it's a power issue (don't forget to suspend BOINC Mgr before applying the masks). For this test, I think masking might work better than power capping.

 

Ideas are not fixed, nor should they be; we live in model-dependent reality.

koschi
koschi
Joined: 17 Mar 05
Posts: 86
Credit: 1687727555
RAC: 819433

My system is powered by a

My system is powered by a BeQuiet Straight Power 11 650W (93% / Gold).  The base system (undervolted R7 1700) draws 120W under load, the RX580 with mining BIOS 82W doing FGRP, that should be plenty of room for the V56 (Sapphire Pulse), which has a default PL of 180W.

 

I did set the SCLK mask to 0,4 (& 950mV on the Vega) for both cards. It drops the power consumption from 180W to 140W on the V56, but doesn't get the RX580 crunching. The RX580 is the primary card in this setup, it renders my desktop etc very well, so its not entirely disabled, just not doing working on Einstein tasks now.

koschi
koschi
Joined: 17 Mar 05
Posts: 86
Credit: 1687727555
RAC: 819433

I fired up an Ubuntu 19.04

I fired up an Ubuntu 19.04 installation with the AMDGPU-PRO 18.50 openCL, kernel 5.0.0 and BOINC 7.14.2.

Same issue, both tasks are computed on the VEGA, the Polaris doesn't get any work.

koschi
koschi
Joined: 17 Mar 05
Posts: 86
Credit: 1687727555
RAC: 819433

Long story short, the above

Long story short, the above described problem seems to result from the AMDGPU-PRO OpenCL not being able to handle legacy (Polaris) and PAL (VEGA) implementations at once, instead scheduling everything onto the Vega.

Replacing official OpenCL with AMD ROCm did the trick, though resulting in slightly slower computing. The RX580 made up for it though.

Now that the RX580 is removed again, I went back to OpenCL PAL from 19.10 on my Ubuntu 19.04. WU completion times decreased again. At 940MHz HBM2, 160W and 3 dedicated threads for Einstein am able to complete 2 WUs in around 9:45min, which results in a theoretical RAC of over 1 million credits per day. Tuning isn't final yet, I am aiming at <10min with 140W, lets see whether that works out. The Vega needs to be supported by free CPU capacity, so right now I keep 2 x 1.5 cores free, as even with 2 x 1 core the runtime increases by 15 sec.

My Sapphire Pulse Vega 56 now costs just 275€ in Germany, which puts it in a really nice sweet spot. IMHO more throughput and better efficiency (at 160W) than Polaris cards at around the same price (1 Vega <->2 Polaris). Looking the other direction, 2 have more throughput than a Radeon VII for a lower price. However, they will consume more energy and PCIe slots.

QuantumHelos
QuantumHelos
Joined: 5 Nov 17
Posts: 190
Credit: 64239858
RAC: 0

koschi wrote:Long story

koschi wrote:

Long story short, the above described problem seems to result from the AMDGPU-PRO OpenCL not being able to handle legacy (Polaris) and PAL (VEGA) implementations at once, instead scheduling everything onto the Vega.

Replacing official OpenCL with AMD ROCm did the trick, though resulting in slightly slower computing. The RX580 made up for it though.

Now that the RX580 is removed again, I went back to OpenCL PAL from 19.10 on my Ubuntu 19.04. WU completion times decreased again. At 940MHz HBM2, 160W and 3 dedicated threads for Einstein am able to complete 2 WUs in around 9:45min, which results in a theoretical RAC of over 1 million credits per day. Tuning isn't final yet, I am aiming at <10min with 140W, lets see whether that works out. The Vega needs to be supported by free CPU capacity, so right now I keep 2 x 1.5 cores free, as even with 2 x 1 core the runtime increases by 15 sec.

My Sapphire Pulse Vega 56 now costs just 275€ in Germany, which puts it in a really nice sweet spot. IMHO more throughput and better efficiency (at 160W) than Polaris cards at around the same price (1 Vega <->2 Polaris). Looking the other direction, 2 have more throughput than a Radeon VII for a lower price. However, they will consume more energy and PCIe slots.

 

So ROCm does the trick! ROCm is being upgraded at AMD thanks to projects like : 

https://www.amd.com/system/files/documents/lawrence-livermore-national-laboratory-case-study.pdf

https://www.amd.com/en/case-studies/lawrence-livermore-national-laboratory

& Frontier ... (Noted is the update of ROCm with Cray systems)

VinodK
VinodK
Joined: 31 Jan 17
Posts: 15
Credit: 246751087
RAC: 0

I am getting a vega64 card

I am getting a vega64 card tomorrow. I am thinking about undervolting as well. The current way seems to be "reduce voltage -> run some WUs -> reduce more if there are no invalids and repeat" . Wondering if there is a better way than this. All the online instructions are gaming focused. 

solling2
solling2
Joined: 20 Nov 14
Posts: 219
Credit: 1577514649
RAC: 19784

VinodK schrieb:I am getting a

VinodK wrote:
I am getting a vega64 card tomorrow. I am thinking about undervolting as well. The current way seems to be "reduce voltage -> run some WUs -> reduce more if there are no invalids and repeat" . Wondering if there is a better way than this. All the online instructions are gaming focused. 

When going through previous posts in this forum you'll notice different approaches. They all have their justification. It just depends on the goal the cruncher has: highest throughput, lowest power draw, best bang for the buck? Some are happy with just dimming the power limit. I prefer to reduce voltage plus set memory clock higher. Also check how many tasks you can best run at the same time. Please report how your efforts go. :-)

VinodK
VinodK
Joined: 31 Jan 17
Posts: 15
Credit: 246751087
RAC: 0

I got the card and am having

I got the card and was having a terrible time with system crashes.  It was crashing the system every few hours. I have a decent power supply in evga 850W gold. Looks like that is not enough.  I am testing out with power limit set to -50% , so far no crashes yet. Performance loss doesn't seem too much. 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.