CUDA, Stream Computing and Ct

bloed_brot
bloed_brot
Joined: 5 Apr 05
Posts: 70
Credit: 91124558
RAC: 0
Topic 193727

With the latest installments of the graphic boards hitting the teraflop mark, I wonder when if at all einstein can carry out those parallel calculations these bad boys can crunch so effectively?

Folding@home has shown that they are up to 60(!) times faster than a quad-core 3,4 GHz CPU. So, please give me a clue whether to expect some support for GPGPU-apps coming from the einstein team and if so, which framework is likely to being supported?

Thanks and keep up the good work

:
your thoughts - the ways :: the knowledge - your space
:

Alexander W. Janssen
Alexander W. Janssen
Joined: 20 Feb 05
Posts: 56
Credit: 4543686
RAC: 0

CUDA, Stream Computing and Ct

I second that. There's only one question left... CUDA or Brook? Can the project afford maintaining two GPU-platforms?

Alex.

P.S.: No, I don't want to kick off a NVIDA vs. ATI discussion ;-)

"I am tired of all this sort of thing called science here... We have spent
millions in that sort of thing for the last few years, and it is time it
should be stopped."
-- Simon Cameron, U.S. Senator, on the Smithsonian Institute, 1901.

bloed_brot
bloed_brot
Joined: 5 Apr 05
Posts: 70
Credit: 91124558
RAC: 0

RE: I second that. There's

Message 82438 in response to message 82437

Quote:

I second that. There's only one question left... CUDA or Brook? Can the project afford maintaining two GPU-platforms?

Alex.

P.S.: No, I don't want to kick off a NVIDA vs. ATI discussion ;-)

No, me neither, no fanboy flame war, please ;-)

I too think that for starters it is only possible to support one platform / framework. Since I am no programmer, I cannot comment on those voices saying that nVidia's CUDA is easier to programm than ATI's Stream Computing. Yet, ATI's has made it open source so maybe in future it is more likely to being more widely adopted. Despite Brook being around for years (5?) it is still early doors for GPGPU, yet I think the GPU manufactures have picked up the pace recently and maybe einstein can develop an app that benefits from these developments. After all the SSE2 version of the Linux power app is a beast. :)

:
your thoughts - the ways :: the knowledge - your space
:

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4332
Credit: 251911654
RAC: 33112

I got some code from a

I got some code from a masters student who worked on porting the FStat engine to CUDA. Looks like a factor of 7 speedup, but he's still struggling with the few calculations in there that require double precision. There might be an App some time during S5R4.

BM

BM

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1364
Credit: 3562358667
RAC: 0

RE: I got some code from a

Message 82440 in response to message 82439

Quote:

I got some code from a masters student who worked on porting the FStat engine to CUDA. Looks like a factor of 7 speedup, but he's still struggling with the few calculations in there that require double precision. There might be an App some time during S5R4.

BM

If double precision is a major req you could just limit cuda to the G200 series of cards. Unlike their predecessors they have 64bit FPUs.

[AF>Futura Sciences]click
[AF>Futura Scie...
Joined: 12 Apr 05
Posts: 34
Credit: 1923040
RAC: 0

RE: RE: I got some code

Message 82441 in response to message 82440

Quote:
Quote:

I got some code from a masters student who worked on porting the FStat engine to CUDA. Looks like a factor of 7 speedup, but he's still struggling with the few calculations in there that require double precision. There might be an App some time during S5R4.

BM

If double precision is a major req you could just limit cuda to the G200 series of cards. Unlike their predecessors they have 64bit FPUs.

Yes they comply to IEEE 754R.
But... Only about 30 of the 240 ALUs support double precision.

God created a few good looking guys.. and for the rest he put hairs on top..

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4332
Credit: 251911654
RAC: 33112

RE: RE: I got some code

Message 82442 in response to message 82440

Quote:
Quote:

I got some code from a masters student who worked on porting the FStat engine to CUDA. Looks like a factor of 7 speedup, but he's still struggling with the few calculations in there that require double precision. There might be an App some time during S5R4.

BM

If double precision is a major req you could just limit cuda to the G200 series of cards. Unlike their predecessors they have 64bit FPUs.


There are rather few of them. Right now I'm not sure that supporting GPU is worth the effort at all. Anyway, I'm pretty sure that the remaining issues can be resolved by emulating the double precision e.g. with two floats or a float and an int. But first you'll have to find out what precisely goes wrong, and that's where we're stuck atm.

BM

BM

bloed_brot
bloed_brot
Joined: 5 Apr 05
Posts: 70
Credit: 91124558
RAC: 0

RE: Right now I'm not sure

Message 82443 in response to message 82442

Quote:
Right now I'm not sure that supporting GPU is worth the effort at all.
BM

How come? Will not CUDA become more sophisticated as well as the processing power continue to outperform serialised instruction computing on the x86 architecture? Doesn't that imply, that parallelisation of the e@h-app is rather tricky while parallelisation is key to unlocking shed loads of processing?

I am sorry that I do not understand the precise problems (e.g. double float VS int+float), however, from a long term perspective, I would say, that harnissing the power of GPUs holds more potential than harnessing the power of CPUs.

BM or somebody else equally knowledgable (AkosF, etc.): Please, is it possible to explain why Folding@home has managed to get a GPU client working whereas e@h proves to be difficult? Please explain from the point of the architecture of the apps :)

Thank you ever so much!

:
your thoughts - the ways :: the knowledge - your space
:

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4332
Credit: 251911654
RAC: 33112

RE: RE: Right now I'm not

Message 82444 in response to message 82443

Quote:
Quote:
Right now I'm not sure that supporting GPU is worth the effort at all.
BM

How come? Will not CUDA become more sophisticated as well as the processing power continue to outperform serialised instruction computing on the x86 architecture? Doesn't that imply, that parallelisation of the e@h-app is rather tricky while parallelisation is key to unlocking shed loads of processing?

I am sorry that I do not understand the precise problems (e.g. double float VS int+float), however, from a long term perspective, I would say, that harnissing the power of GPUs holds more potential than harnessing the power of CPUs.

BM or somebody else equally knowledgable (AkosF, etc.): Please, is it possible to explain why Folding@home has managed to get a GPU client working whereas e@h proves to be difficult? Please explain from the point of the architecture of the apps :)

Thank you ever so much!


There is no standard for GPU computing (yet). Picking one particular model: how many Einstein@home participants do have an NVidia Quadro card that they want to actually use for crunching? Remember that displaying anything is not (yet) possible when using the GPU for numerical calculations.

As far as I understand the Folding@home application is based on Brook or some similar higher level language, the Einstein@home application is (currently) not. Our "Fstat engine" could be thought of as an FFT for narrow frequency bands. It's actually possible to use standard FFT implementations to calculate it, but in the current framework this would be rather inefficient. The current code was chosen for Einstein@home because it allows us to split the frequency bands into many small pieces (workunits), keeping computing time and data transfer volume within the bounds of a volunteer computing project.

Pinkesh Patel (a LSC member) is working on a program that actually uses standard FFT algorithms (I think with little modifications) for calculating the F-Statistic, but his code isn't ready to be used yet (at least not on E@H), using it would require a completely different search- and workunit design, and it would be much more demanding for machines and their connection to the servers than what we currently expect our participants to have.

I definitely think that using high-level languages / libraries like Brook that have efficient implementations for every platform is the way to go in the future, but for the moment (i.e. S5R4) we need to stick to what we have.

BM

BM

bloed_brot
bloed_brot
Joined: 5 Apr 05
Posts: 70
Credit: 91124558
RAC: 0

Thank you BM, for making it

Thank you BM, for making it clearer.
So the design and generation of work units limits the processing methods the app can be built upon and since GPU and CPU process differently, the method of generating the work units can only focus and support one type (GPU or CPU), correct?

Well, I am sure that the sooner or later you guys will come up with a solution! :)

:
your thoughts - the ways :: the knowledge - your space
:

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 770944315
RAC: 1160329

Hi! I understand that even

Hi!

I understand that even for Folding@Home, the workunits crunched by the GPU beta clients are different from those for the other platforms. But they did manage to do visualization and GPU processing at the same time now, so that you can still use your PC's video capabilities while crunching, which should improve acceptance.

Bikeman

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.