After more than a year of work by Oliver Bock, Bernd Machenschalk, Heinz-Bernd Eggenstein and other developers, we are pleased to announce the release of the first Einstein@Home application for ATI/AMD Graphics Cards.
This OpenCL application, which searches Arecibo data for new radio pulsars, is about a factor of ten faster than the same search running on a typical CPU. The application is currently available for Windows and Linux computers with Radeon HD 5000 or better graphics cards. We hope to have a version for Macintosh (Apple OS X 10.8, Mountain Lion) sometime this summer, but there are still some problems that need to be fixed or worked around.
Volunteers who wish to run this application will need to install version 7.0.27 or later of the BOINC client. Please see this thread for more information, or if you want to ask questions.
Many thanks to the AMD/ATI team for their support in the OpenCL software development effort.
Bruce Allen
Director, Einstein@Home
Copyright © 2024 Einstein@Home. All rights reserved.
Comments
Einstein@Home GPU Application for ATI/AMD Graphics Cards
)
SP I would presume?
SP indeed. No support is
)
SP indeed. No support is offered for HD 4xxx generation cards (without support for OpenCL 1.1), anything more recent should do fine.
Cheers
HB
Seems like it is time to buy
)
Seems like it is time to buy AMD-card! :) (had no reasons to buy card before and still use Sandy Bridge HD3000)
Is it possible to take a look at some comparison charts for compute power of typical CPU and GPU for Einstein@Home-type of calculations?
CPU does not even compare.
)
CPU does not even compare. CPUs are also incredibly inefficient compared to GPU. NVIDIA.I believe they said is currently 20x faster, and currently AMD will be 10x faster. I would believe this number will increase as changes are made IMHO.
thx for the ati/amd app, it
)
thx for the ati/amd app, it works on my hd5870
my first result:http://einsteinathome.org/task/288520169
but the gpu-load is only ~60%, one cpu-core is only for gpu-tasks (cpu: i7-2600k)
should i set 'GPU utilization factor' to 0,5?
on my second machine is a 560ti running, there i have the best results with 'GPU utilization factor' 0,33
michel
Would assume the same
)
Would assume the same applies, my 680 also runs at around 60% with one applied, and around 90% with .33 set.
Nice! Yes, I would
)
Nice!
Yes, I would encourage experiments with the utilization factor. You cn use different "venues" in BOINC-speak to assign different settings to different hosts.
CU
HB
It
)
It works!!!
http://einsteinathome.org/host/5353241/tasks
I run it with the "dangerous" option of 0.5 2tasks at once.
And it runs 1 + 1 on the GPU together with Milkyway, SETI, Primegrid and POEM Which are also using 0.5 in the app_info.xml file.
RE: SP indeed. No support
)
So i think 4xxx will never be supported? or only @ the beginning now?
DSKAG Austria Research Team: [LINK]http://www.research.dskag.at[/LINK]
RE: RE: SP indeed. No
)
I don't think we will make an OpenCL 1.0 App, at least not for BRP4. It would be another code branch to maintain and it would almost double the memory requirements, thus thus a task would not fit in 512MB.
Also I doubt that the limited computing power of the 4xxx would gain us much.
BM
BM
ok i see, where does this
)
ok i see, where does this 512MB limit comes from? OPENCL 1.0?
DSKAG Austria Research Team: [LINK]http://www.research.dskag.at[/LINK]
This is not a hard limit,
)
This is not a hard limit, just a target that we set to get the most of our population of ATI cards.
BM
BM
Hm are there so much 5xxx or
)
Hm are there so much 5xxx or higher with 512MB?
DSKAG Austria Research Team: [LINK]http://www.research.dskag.at[/LINK]
Yes, in fact, all of them
)
Yes, in fact, all of them :-)
Oliver
Einstein@Home Project
I ment ONLY 512MB ;) ;) But i
)
I ment ONLY 512MB ;) ;) But i see half of the cards could possible have only 512MB. Hm sad.
DSKAG Austria Research Team: [LINK]http://www.research.dskag.at[/LINK]
Hi! As Bernd mentioned
)
Hi!
As Bernd mentioned already, more than 512 MB would only be needed for a second code branch that would be able to support OpenCL1.0. It's mostly the 4xxx cards that would benefit from support of OpennCL1.0, not the HD 5000 series. So the question would be: are there many 4000-series cards with just 512 MB? Those were popular when? 2008? 2009?. More than 512 MB video RAM wasn't the norm back then. So by supporting OpenCL1.0, we would be able to utilize only a certain fraction of the already shrinking population of older cards, to get a not-so-great performance per card ===> it just doesn't make too much sense.
Cheers
HB
I'll leave my 4850 to do
)
I'll leave my 4850 to do Milkyway and Collatz for as long as it lives.
I can't put any better card in that computor since the PCIE-Express card is only 1.x something.....
My HD 4870 with 1 GB (and a
)
My HD 4870 with 1 GB (and a GeForce 320M with 256 MB) is eagerly awaiting something cooler than boring math projects. No dice as of yet.
Click Here to see My Detailed BOINC Stats
noderaser: use it like im do
)
noderaser: use it like im do with my 4850 with 1GB onto POEM (when you dont like MW). Possible Three WUs @ once, 84000 Credits/day. Thx god they supporting OpenCL since a short time. Its medical research
DSKAG Austria Research Team: [LINK]http://www.research.dskag.at[/LINK]
The BOINC site is unavailable
)
The BOINC site is unavailable for 2 days! Cannot download new client! Can somewone send 64bit Windows client ver 27 to kido00 (a t) ya.ru ?
RE: The BOINC site is
)
There has been a power failure at Berkeley due to a shorted underground cable. It has been repaired but the servers are still down.
Tullio
RE: The BOINC site is
)
Please see main thread...
Einstein@Home Project
Local copies of 7.0.28 are
)
Local copies of 7.0.28 are linked here.
BM
BM
[img][/img][url]ok george
)
[img][/img][url]ok george kalemakis.[/url]
RE: Seems like it is time
)
You can find some performance figures here: http://albert.phys.uwm.edu/forum_thread.php?id=8888&nowrap=true#112053
A HD6950 runs 1% /min @ 2 wu's concurrent.
Yesterday I attached my
)
Yesterday I attached my mainsys with 2 AMD-cards to the project.
Until now 12 wu's were done, 4 validated (2 against AMD, 1 against Cuda and one against SSE).
Looks like the team has done a wonderful job! THX!
Thanks to everyone involved
)
Thanks to everyone involved in the development of the AMD/OpenCL app!
I'm still waiting for the first result to validate, but so far so good on a HD 7950 with Win8 preview:
http://einsteinathome.org/task/289196474
BOINC 7.0.27 (x64)
Catalyst 12.4 (installed in Win7 comp. mode)
Windows 8 Dev. Preview x64
Performance not overwelming yet, appears to be only ~9% faster than my GTX 560 Ti, but since this is the first public release (and in my case not the real win8 driver), who knows what is still to come ... :-)
Regards
Mark my words and remember me. - 11th Hour, Lamb of God
RE: RE: Seems like it is
)
I just checked: I have 0.6% /min @ 3 concurrent einstein tasks at i7-2600K 4.5GHz. So is possible average GPU-acceleration(AMD or NV) in my case just about 2 times?..
RE: RE: RE: Seems like
)
You need to compare the same types of wu's. AMD-wu's are BRP(Arecibo) wu's (500 credits).
2 concurrent means that 2 wu's are running simultanous on one GPU; my PC finishes 2 wu's every 1:45 on the HD6950 and 2 wu's every 2:45 on the HD5850 (no overclocking).
CPU is i7-860 @ 2.8GHz, win7-64
Midrange HD7xxx should perform better.
RE: I'm still waiting for
)
Three results have been validated ok. And there are noticeable run time differences between all results. I guess it is not the amount of calculations which is varying that much? So the OpenCL application is affected maybe more by other processes than the CUDA app?
Regards
Mark my words and remember me. - 11th Hour, Lamb of God
RE: So the OpenCL
)
Yes, that's an observation we also made during our testing phase. OpenCL, well at least AMD's implementation, is much more sensitive to the amount of CPU-power available to serve the GPU than NVIDIA's CUDA. Also, for CUDA we (as developers) can decide and influence how to trade GPU efficiency against CPU-consumption quite a bit - OpenCL doesn't offer such fine-tuning.
Best,
Oliver
Einstein@Home Project
RE: RE: I just checked:
)
It was not easy to find the same task... well, it is http://einsteinathome.org/task/288218220 , and BRP(Arecibo) 500 credits task used 21,678 sec of my CPU, so 1% takes 217 sec. Your device1 speed is 1% for 97 sec. So again your device1-OpenCL is just about 2 times faster, and your device0-GPU is about 3 times faster. I don't know about evolution in energy efficiency for HD7970 relatively to your 160 Wt HD5850, but you have 75 Wt per task, and my i7 - 35 Wt per task (4 core overclocked i7-2600 4.5 GHz consumption is 150 Wt). We have almost the same energy efficiency! No reason for GPU installation?..
By the way, the same task (see http://einsteinathome.org/workunit/123117782 ) used 10 times more time on Pentium-4 3GHz. This progress in CPU is more impressive than current version OpenGL benefits. So sad...
When crunching, gpus use less
)
When crunching, gpus use less energy than if they were playing video. No video output.
RE: It was not easy to
)
Sorry, you missed one important fact:
in these ' 1% for 97sec ' TWO wu's make this progress, they run at the same time, not one after the other.
So the gps's are not 2 / 3 times faster, they are 4 / 6 times faster and the powerconsumption is not almost the same but 50% of your cpu /wu.
Another way to calculate it: in 21678 sec (~6 hrs) my faster gpu crunches 6,8 wu's. And don't forget: my both gpu's are outdated, actual ones are faster and consume less power.
Another fact: my mobo has one x16 slot and one x8 slot, here at einstein you can find another thread explaining the differeces. A better mobo would give better figures. It does not really reflect the capabilities of the 'slower' gpu.
Anyway, we do not fight a war 'CPU against GPU', we do scientific work. There are different ways to do that. Speaking for myself, I'm happy to participate in science with the capabilities I have.
RE: Sorry, you missed one
)
I did not miss. That's why I wrote "you have 75Wt per task", not "150 Wt per task" (not sure if 2 tasks use 100% of your GPU power). And my consumption is 140Wt/4=35 Wt per task, because 4 tasks can run simultaneously and speed is the same.
Yes, but nobody here answers about 7970 or 680 speed and efficiency. Thank you for your information, even for outdated GPU. I wander if Bruce Allen team has no such kind of information to share with us.
Sure! Peace, dude :) My initial question is "Is it possible to take a look at some comparison charts for compute power of typical CPU and GPU for Einstein@Home-type of calculations?". We still have no charts and just trying to find out the truth: is it worth to buy 7970 for powerful and energy-efficient calculations. Because if it is worth - I will buy.
why would anyone buy ati for
)
why would anyone buy ati for this project is beyond me. CUDA runs faster here. If you're going to buy a card specifically for this project you should buy NVIDIA. 680 on W7 can run 3 tasks at a time and average around 3000 seconds per task with PCIe 3.0, a little more if CPU is running other projects (3100).
On Linux it's even a little faster
RE: why would anyone buy
)
Why do you thik so? Do you have some charts for 680 CUDA vs 7970 OpenCL (both codes is written by perfect programmers)? This is what I am looking for! Afaik right-written OpenCL code has performance equal to CUDA for FFT and almost all other kinds of math.
I prefer AMD at least because OpenCL provides an open, industry-standard framework. No one but NVidia can use CUDA - this is wrong way. And don't believe NVidia advertising, it is very aggressive, half-truth, biased and often even deceptive.
This is useful information, thank you. So it is ~7 times faster and takes ~2..3 times more power consumption, therefore it is ~2..3 times more energy-efficient. Let us wait for someone's 7970 report.
Found this on
)
Found this on Albert
http://albert.phys.uwm.edu/results.php?hostid=2209&offset=0&show_names=0&state=3&appid=
Based on the time stamps, I only managed to find one where they were close enough together that would give me the GUESS that they were running 2x at a time. The CPU being used can bring in some speculation. But even with those times, and the increase in TDP of the 7970, CUDA is still a better choice.
Also don't forget what Oliver stated, "Yes, that's an observation we also made during our testing phase. OpenCL, well at least AMD's implementation, is much more sensitive to the amount of CPU-power available to serve the GPU than NVIDIA's CUDA. Also, for CUDA we (as developers) can decide and influence how to trade GPU efficiency against CPU-consumption quite a bit - OpenCL doesn't offer such fine-tuning.
This statement does not seem to be in favor of OpenCl from the dev's perspective. What is very noted from the PCIe discussion in cruncher's corner, is that PCIe 3 is MUCH better than 2 when loading multiple tasks.
EDIT: Since many people will not have a 7970, I would send them a private message. That person ive seen in quite a few forums, so I'm "sure" they would be willing to help.
EDIT 2x= Even if this person was running 2, the time would still be higher than my 680 running 3. Thereby DRASTICALLY reducing efficiency.
Well, this type of "vendor A
)
Well, this type of "vendor A vs. vendor B" discussions have a tendency to spin out of control sooner or later and I don't want to get too involved into it :-), I'd just like to stress one important point: the BRP4 app versions for CUDA and OpenCL respectively should NOT, IMHO, be used in a "benchmark" type of sense to make general comparisons between AMD vs NVIDIA or CUDA vs. OpenCL.
The two versions use completely different libraries for the FFT, they even use slightly different approaches for the FFT because of limitations of the FFT lib used in the OpenCL case. The OpenCL app is "younger" and in general I would consider it less optimized to its target platform.
Cheers
HBE
Bikeman, no war, no spinning
)
Bikeman, no war, no spinning out - just measurement, some statistics and a few ideas about attractiveness of different approaches for GPGPU. You are deep in OpenCL and CUDA for this project, so can you give us estimation of new 7970 energy-efficiency in this particular kind of calculations? ( "Is it possible to take a look at some comparison charts for compute power of typical CPU and GPU for Einstein@Home-type of calculations?" ) I believe you have some info and measurements results. Sure OpenCL is yonger, and your version of this GPGPU-library is the first and may be not perfect yet. I don't even try to use it in war "AMD vs NVidia" as a benchmark.
I remember Jul 2011 we had been told at Hannover meeting that in average GPGPU is approximately 5 times more energy efficient than CPU. But one year passed: i7, AMD 7970, NVidia 680 and you OpenCL-library appeared. So please tell me like Holy Father to the parishioner: should I buy AMD GPU or not ( games only are not enough motivation for me :) ).
Here are some tasks from my
)
Here are some tasks from my 7970 running one wu at a time while seti was down cos of their power grid failing.
http://einsteinathome.org/host/5365549/tasks
The motherboard is an xfx780i with PCIe v2 16x slot.
the cpu is 3.6 P4 with ht on and running other cpu projects so the gpu is starved of cpu time to run einstein, so times are longer than should be,
I also have other problems with this pc having now had to go back one month with system restore which removed ccc 12.4 and BM 7:0:28
When i am shure the other problems are gone/fixed i will upgrade again and try again with einstein.
I built this system with ATI gpu so that it can run SETI VLAR workunits
einstein work was just for fun and fill in time,
I was lucky that E@H come up with OCL app in time :¬)
RE: No one but NVidia can
)
I'd like to correct this one, while CUDA itself isn't an open standard like OpenCL, NVIDIA opened their LLVM-based CUDA compiler. This allows all interesting parties to target their GPUs, APUs and CPUs with CUDA. There is already a CUDA compiler targeting multi/many-core CPUs (by PGI). In this sense CUDA has now become a full-fledged competitor for OpenCL. It's now up to the Khronos Group to win this competition - as always, survival of the fittest...
I'm also in favor of open standards but they also need to deliver and be turned into marketable products. The best standard doesn't help if it's not adopted by a critical mass. If the Khronos Group would adopt something like the Java Community Process to develop the OpenCL standard itself things might work out, but right now they don't perform as they probably should in a competitive environment.
JM2C,
Oliver
PS: Back to topic! :-)
Einstein@Home Project
RE: just measurement, some
)
You can compare HD79XX against my HD6950 here:
http://albert.phys.uwm.edu/workunit.php?wuid=75885
Computer 2209 runs a HD79XX
You are familiar with my configuration
These two jobs are not really
)
These two jobs are not really comparable tho: one is a brand-new (1.25) prototype for an OpenCL app specifically modified to cure validation problems of HD6900 series cards. It is slower than the previous version 1.24.
Cheers
HB
RE: These two jobs are not
)
1:45 @ Einstein (1.24) : 2:09 @ Albert (1.25)
Bikeman, Oliver Bock, Alex
)
Bikeman, Oliver Bock, Alex and others - thank you all! I bought it. Let me share results of my new 7970, for Arecibo 1.24 atiOpenCL:
1 task on GPU (0.5 CPU + 1 ATI GPU): 18-25 min, GPU Load by GPU-Z 40-45% (although Catalist Control Center show "activity 60%"), CPU Load by W7 TaskManager - 5% (it is ~40% load of one core)
2 tasks on GPU (0.5 CPU + 0.5 ATI GPU): ~38 min, GPU Load by GPU-Z 58-62% (Catalist CC - 80-84%), CPU Load - 3%
Have no idea why dispersion (17-25 min) in the case of 1 task so large if tasks need equal(?) amount of calculation. CPU is not heavy loaded by other tasks.
Whatever... if we assume that "1.22 BRP4 SSE" and "1.24 atiOpenCL" need the same amount of calculations, then for one task even on PCIe2.0 GPU 7970 1GHz is ~20 times faster (and ~5-7 times more energy efficient) than my i7-2600k 4.5 GHz CPU. Good job!
And thank you for inducement me to by GPU, I am going to check the progress in game industry for the last 7-8 years.
Congratulations! To get the
)
Congratulations!
To get the same results with my both cards I need to switch over to the 36h-day!
You're welcome! We will
)
You're welcome!
We will continue to improve the ATi/AMD app, so that the energy efficiency should even go up some more in the not so distant future.
Cheers
HB
RE: Bikeman, Oliver Bock,
)
Nice. My rig isn't as buff but I can crunch one on average in 65 min (Catalyst GPU load 80%), but doing 2 tasks splits my GPU work between them taking 125 min (Catalyst GPU load 92%) to do both. Both ways uses 5% CPU load.
Win 7Pro X64, i5-2500K CPU @ 3.30GHz (OC 4.5GHz), AMD HD6850, PCIe 2.0, 8GB 1600 RAM, BOINC 7.0.28
Run time(sec) 3,739.87
CPU time(sec) 530.31
Claimed credit 6.90
Granted credit 500.00
RE: And thank you for
)
I'm a power-gamer and I use my gaming rig to crunch, thus my HD 6850. And the gaming industry has progressed a lot in the last 7-8 years. You should have fun. :)