Einstein@Home GPU/APU Application for AMD/ATI Graphics Cards: discussion thread

.clair.
.clair.
Joined: 20 Nov 06
Posts: 62
Credit: 1051176770
RAC: 0

one thing i have come to

one thing i have come to accept is that you can not over clock a gpu anything like as much for crunching as you can for game`s,
A pixel out of place in a game can be ignored,
A pixel`s worth of data out of place in a work unit is a terminal error
And the auto over clock app`s that come with a graphics card do not alow for this by a long way.
Does anyone disagree ?

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 734172508
RAC: 1292785

RE: I plan to make some

Quote:


I plan to make some evaluation of runtime by GPU type based on the statistics in the E@H database next week, after that there should be some overview which type of cards should better not be used atm.

Cheers
HBE

So I did this today. There are a few uncertainties involved in the anlysis, e.g. it's sometimes not clear how many units were run in parallel, and in mst cases BOINC only reports the GPU type (e.g. "Cedar", "Cypress", etc) and not the card madel itself.

But anyway....the analysis shows that by and large, there are really only a few instances where GPU tasks take much longer than CPU tasks. However, volunteers with GPUs from the "Cedar", "Wrestler" (and other APU embedded GPUs) and perhaps also "Caicos" series might want to check their statistics to see if it is really worth for them to run the BRP4 jobs on their GPUs.

Cheers
HBE

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 51

RE: ..and in most cases

Quote:
..and in most cases BOINC only reports the GPU type (e.g. "Cedar", "Cypress", etc) and not the card model itself.


That's because:
a) They're not showing with model name and number in the OpenCL (OpenCl.dll, opencl_lib, libOpenCL.so) libraries, or CAL (aticalrt64.dll, amdcalrt64.dll, aticalrt.dll, amdcalrt.dll, callib) libraries. So BOINC detects which group they're in and shows that.
b) Only Nvidia GPUs will show with complete model name, read from the CUDA (nvcuda.dll, cudalib, libcuda.dylib, libcuda.so) libraries.

So for AMD you then get in coproc_detect.cpp, lines 1429-1510:

{
        case CAL_TARGET_600:
            gpu_name="ATI Radeon HD 2900 (RV600)";
            break;
        case CAL_TARGET_610:
            gpu_name="ATI Radeon HD 2300/2400/3200 (RV610)";
            attribs.numberOfSIMD=1;        // set correct values (reported wrong by driver)
            attribs.wavefrontSize=32;
            break;
        case CAL_TARGET_630:
            gpu_name="ATI Radeon HD 2600 (RV630)";
            // set correct values (reported wrong by driver)
            attribs.numberOfSIMD=3;
            attribs.wavefrontSize=32;
            break;
        case CAL_TARGET_670:
            gpu_name="ATI Radeon HD 3800 (RV670)";
            break;
        case CAL_TARGET_710:
            gpu_name="ATI Radeon HD 4350/4550 (R710)";
            break;
        case CAL_TARGET_730:
            gpu_name="ATI Radeon HD 4600 series (R730)";
            break;
        case CAL_TARGET_7XX:
            gpu_name="ATI Radeon (RV700 class)";
            break;
        case CAL_TARGET_770:
            gpu_name="ATI Radeon HD 4700/4800 (RV740/RV770)";
            break;
        case 8:
            gpu_name="ATI Radeon HD 5800 series (Cypress)";
            break;
        case 9:
            gpu_name="ATI Radeon HD 5700 series (Juniper)";
            break;
        case 10:
            gpu_name="ATI Radeon HD 5x00 series (Redwood)";
            break;
        case 11:
            gpu_name="ATI Radeon HD 5x00 series (Cedar)";
            break;
//
// looks like we mixed the CAL TargetID because all other tools identify CAL_TARGETID 13 as Sumo (not SuperSumo) so
// we have to fix this and some other strings here too
// CAL_TARGETID 12 is still unknown .. maybe this is SuperSumo inside AMDs upcoming Trinity
// 
// 
        case 12:
            gpu_name="AMD Radeon HD (unknown)";
            break;
        case 13:
            gpu_name="AMD Radeon HD 6x00 series (Sumo)";
            break;
// AMD released some more Wrestler so we have at the moment : 6250/6290/6310/6320/7310/7340 (based on Catalyst 12.2 preview)
        case 14:
            gpu_name="AMD Radeon HD 6200/6300/7300 series (Wrestler)";
            break;
        case 15:
            gpu_name="AMD Radeon HD 6900 series (Cayman)";
            break;
// the last unknown ... AMD Radeon HD (unknown) looks better !
        case 16:
            gpu_name="AMD Radeon HD (unknown)";
            break;
        case 17:
            gpu_name="AMD Radeon HD 6800 series (Barts)";
            break;
        case 18:
            gpu_name="AMD Radeon HD 6x00 series (Turks)";
            break;
        case 19:
            gpu_name="AMD Radeon HD 6300 series (Caicos)";
            break;
        case 20:
            gpu_name = "AMD Radeon HD 79x0 series (Tahiti)";
            break;
// there arent any other target ids inside the Shadercompiler (YET !!! )
// but because of ATI was bought by AMD and is not existing anymore the default should be changed too
        default:
            gpu_name="AMD Radeon HD (unknown)";
            break;
        }
Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 734172508
RAC: 1292785

Ah I see..thanks for the

Ah I see..thanks for the explanation. Something must be different on Macs, tho, right? E.g. see mine here:

http://einsteinathome.org/host/5234603

Coprocessors CAL ATI Radeon HD 6770M (1024MB)

Cheers
HB

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 51

As far as I understand from

As far as I understand from Charlie, the OpenCL (RAM) detection under OS X is broken, so instead of using any OpenCL library, they use the OpenGL library and read the RAM size and model name from that.

Nico
Nico
Joined: 29 Dec 11
Posts: 5
Credit: 21977480
RAC: 0

My BRP4 tasks take 4 hours 23

My BRP4 tasks take 4 hours 23 minutes when using 0.5 CPU + 1.0 ATI GPU (Radeon HD 5570), compared with almost 40 hours when run on one core of my i5 quad core. I rebooted an hour ago and the latest BOINC Manager estimate for BRP4 tasks is 3 hours 23 minutes. These are welcome improvements - but still a factor of 3 or 4 less than the one hour BRP4 tasks reported earlier in this thread. I would like to speed things up if I can.

My computing preferences are 75% of the processors and 100% of CPU time.

Maybe unused processes are slowing things down on the x64 Win7 machine with 8GB ram. Skype is there but idle, Windows TaskManager runs in background. Plus several gadgets: CPU usage, network usage, drive usage, GPU usage, BOINC monitor, weather. I am managing the computer remotely using TeamViewer and monitoring cpu temperatures with CoreTemp 0.99.7, they are running 60-68C. The antivirus / firewall is Comodo.

Is there a point to running only selected applications - limiting tasks for example to BRP4? I now accept all Einstein@home tasks and typically I have 4 jobs running in parallel: x1 BRP4 job, and three others: a mix of Gravitational Wave S6 and Gamma Ray pulsar jobs. Sometimes I see 2 BRP4 jobs at once, one 4 hour job in the ATI GPU and one 40 hour job in the CPU. I expected to see both ATI GPUs running BRP4s.

While this may be the PC's best practical performance or a limitation of the Radeon HD 5570 I really appreciate any suggestions.

Sunny129
Sunny129
Joined: 5 Dec 05
Posts: 162
Credit: 160342159
RAC: 0

RE: My BRP4 tasks take 4

Quote:
My BRP4 tasks take 4 hours 23 minutes when using 0.5 CPU + 1.0 ATI GPU (Radeon HD 5570), compared with almost 40 hours when run on one core of my i5 quad core. I rebooted an hour ago and the latest BOINC Manager estimate for BRP4 tasks is 3 hours 23 minutes. These are welcome improvements - but still a factor of 3 or 4 less than the one hour BRP4 tasks reported earlier in this thread. I would like to speed things up if I can.


forgive me, for i was too lazy to go back through the entire thread looking for those documented 60-minute run times and the types of GPUs they ran on. but if i had to guess, i'd say your GPU has reached its limits. my HD 6950 2GB GPU took 64 minutes to crunch through a single BRP4 ATI task...granted i was able to get the run times down to 40 minutes by running 4 of them simultaneously. but the HD 6950 is much more powerful than an HD 5570. i'm actually quite puzzled as to how a restart shaved an hour off your run times...

Quote:
Is there a point to running only selected applications - limiting tasks for example to BRP4? I now accept all Einstein@home tasks and typically I have 4 jobs running in parallel: x1 BRP4 job, and three others: a mix of Gravitational Wave S6 and Gamma Ray pulsar jobs. Sometimes I see 2 BRP4 jobs at once, one 4 hour job in the ATI GPU and one 40 hour job in the CPU. I expected to see both ATI GPUs running BRP4s.


there's no point other than to run the apps you want to run. you have 2 GPUs in your machine? i'm not sure why they're not both running BRP4 ATI tasks...perhaps you should try setting BOINC to use 100% of the CPU (all 4 cores) and see what happens. also, a suggestion - since you now know that your GPU can complete 10 BRP4 tasks (maybe more) in the time it takes your CPU to complete a BRP4 task, you should uncheck the "Run CPU versions of applications for which GPU versions are available" box - that will keep the BRP4 tasks running on the GPUs only, freeing up more resources for the other 3 Einstein@Home CPU applications.

Nico
Nico
Joined: 29 Dec 11
Posts: 5
Credit: 21977480
RAC: 0

Thanks again Sunny129, your

Thanks again Sunny129, your HD 6950 has a factor of 4 floating point advantage over my HD 5570:
HD 6950 2253 GFLOPS
HD 5570 520 GFLOPS
http://en.wikipedia.org/wiki/Comparison_of_AMD_graphics_processing_units

...so my 4 hour time for a GPU BRP4 seems in line with your 40-64 minutes.

I can't explain where 3 hours running times came from after reboot instead of the 4 hours 20 minutes or so stable estimates from the last few days. The timing continues to change slowly after reboot, now a few hours later the queued BRP4 tasks are estimated at 3:46:38 instead of closer to 3 hours. Meanwhile the BRP4 task currently running in the GPU is estimated to take 4 hours 13 minutes. Things seem to be running OK.

Today I tried setting up for all cores: BOINC uses 100% of the CPU/all 4 cores with 100% duty cycle. Result: I still get only one GPU + 0.5 CPU job plus 4 other (non-GPU) Einstein@home jobs. I know it adds up to 4.5 cores on my 4-core i5. While the arithmetic seems off the multi-tasking works fine and work is happening much faster, overall almost twice as fast considering RAC. I reset back to 3 dedicated cores thinking that 5 simultaneous jobs would cause too enough resource conflicts to marginally reduce output.

I took your suggestion to uncheck the "Run CPU versions of applications for which GPU versions are available" box, maybe performance will improve. In any case the 10:1 advantage using just one of my 2 GPUs is pretty satisfying, kudos to the BOINC team programmers. And thanks to you for answering my questions, I am quite happy with my performance improvement and now I am thinking about checking Craigslist for good used ATI graphics cards. Maybe an HD 6950 or two.

Dr.Alexx
Dr.Alexx
Joined: 14 Aug 05
Posts: 22
Credit: 5135173
RAC: 257

RE: RE: RE: ...where

Quote:
Quote:
Quote:

...where "n" is the GPU utilization factor. when set to a value of 1, the BRP4 GPU apps would run only 1 GPU task at a time. when set to 0.5, the apps would run 2 GPU tasks simultaneously. when set to 0.33, the apps would run 3 GPU tasks simultaneously, and so on and so forth...

Thank you for a good explanation. Could u please speciafy a little:

I have a factor of 0,5 by default for Radeon 5750. Is this good or bad? I didn't change anything.


hmm...the default GPU utilization factor for all of my various machines under my Einstein@Home web preferences was 1, so i kind of assumed that everyone's GPUs would default to running only 1 GPU task at a time. either way, you should be fine with a factor of 0.5 (running 2 GPU tasks simultaneously) provided your HD 5750 is a 1GB model. you see, they also made some HD 5750 with only 512MB of VRAM onboard. considering each BRP4 ATI task consumes approx. 355MB of VRAM, an HD 5750 w/ only 512MB of VRAM would only be able to handle 1 task at a time efficiently. if you tried to run 2 tasks simultaneously on a 512MB card, you'd be over-utilizing the VRAM. while i'm fairly confident that it would work and not cause compute errors, the VRAM bottlneck would probably significantly increase your GPU task run times and take a serious toll on your GPU's compute efficiency. long story short, if you have a 512MB HD 5750, change your GPU utilization factor to 1 (and run only 1 GPU tasks at a time). if you have a 1GB HD 5750, leave your GPU utilization factor at 0.5 (and run 2 GPU tasks simultaneously).

Hello! Thank you for explanation. I have 1Gb model/Core i7-2600 (8 virtual cores), ratio 0,5. But nevertheless I have only 9 tasks running at a time. Why so? I thought there will be 10 tasks (8 cores and 2 GPUs). 2 of the tasks are marked "0,5 CPU + 0,5 GPU" - what's that supposed to mean

PS Noticed lags in VLC 1.1.11 if GPU tasks are running in the background while I watch the film.

Petrion
Petrion
Joined: 30 Apr 08
Posts: 53
Credit: 1243186
RAC: 0

@sunny129 RE: forgi

@sunny129

Quote:

forgive me, for i was too lazy to go back through the entire thread looking for those documented 60-minute run times and the types of GPUs they ran on. but if i had to guess, i'd say your GPU has reached its limits.


My 6850 crunches in 65 minutes but its a lot more powerful than a 5570 with more than double the shaders - http://www.gpureview.com/show_cards.php?card1=624&card2=636

But the limit of my 6850 is that running 2 tasks takes twice as long to complete as the GPU resources get split evenly between the WUs, so no advantage there for me.

Quote:
i'm actually quite puzzled as to how a restart shaved an hour off your run times...


I have a theory on that; I noticed my times slip from 65mins down to 105mins after not having rebooted in a couple of days. This leads me to believe that some memory leaks are occurring and garbage is piling up in RAM and need to be flushed out.

I don't know if BOINC \ E@Home is the cause or it's other software I'm running but flushing the memory with a reboot seems to work.

I'd like to know if other people are having the same issue and if they are do they play Skyrim on their PC...I have a theory...

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.