New Einstein@Home Radio Pulsar Search and NVIDIA GPU Code

Einstein@Home is beginning a new round of searching for radio pulsars in short-orbital-period binary systems.

This is accompanied by the release of a new application (called BRP3). The new application is particularly efficient on NVIDIA Graphics Processor Cards (up to a factor of 20 faster than the CPU-only application). In addition, when running on an NVIDIA GPU card, this new application makes very little use of the CPU (typically around 20% CPU use when the GPU is devoted to Einstein@Home).

The NVIDIA GPU application is initially available for Windows and Linux only. We hope to have a Macintosh version available soon. Due to limitations in the NVIDIA drivers, the Linux version still makes heavy use of the CPU. This will be fixed in Spring 2011, when a new version of the NVIDIA Driver is released. Many thanks to NVIDIA technical support for their assistance!

Because we have exhausted the backlog of data from Arecibo Observatory, this new application is being shipped with data from the Parkes Multibeam Pulsar Survey (from the Parkes Radio Telescope in Australia). In the next weeks we expect to also start using this new application on fresh Arecibo data taken with the latest 'Mock Spectrometer' back-end.

Questions, problems or bug reports related to this new application and search should be reported in this news item thread as a 'Comment'.

Bruce Allen
Director, Einstein@Home

Comments

Jeroen
Jeroen
Joined: 25 Nov 05
Posts: 379
Credit: 740030628
RAC: 0

RE: Thanks! I got the

Quote:

Thanks!

I got the impression that tasks hang only if started with run_client in the background without an actice X desktop, bute rarely if ever stall when started with run_manager and BOINC manager open. Just a first impression, tho.

CU

HB

I'm currently running console only via a small busybox image and run_client instead of run_manager. I did notice yesterday that it took a few minutes for the CUDA tasks to start up. Initially the tasks were in a paused state and then after five minutes or so, the tasks started up.

telegd
telegd
Joined: 17 Apr 07
Posts: 91
Credit: 10212522
RAC: 0

RE: The cross validation

Quote:
The cross validation problem is not a driver issue, it will be fixed with the next app version.


I figured as much. Thanks for the confirmation.

Quote:
16 bit Suse 11.2 box


I got a smile from your typo...

I haven't done any proper testing, but I notice that (sometimes) the X-Windows process starts taking a lot of CPU time (almost a full core) when the GPU app is running. Happens for my PrimeGrid app too. I am not sure what triggers it and I can't be sure that it never happened on the 260 drivers. However, I don't think I ever saw it before....

mickydl*
mickydl*
Joined: 7 Oct 08
Posts: 39
Credit: 200374822
RAC: 0

I'm running the 270 driver

I'm running the 270 driver for the past two weeks now. Except for a few errors I got when I was still messing around with the app_info.xml not a single WU has failed, hung or was invalid.

It's running on two machines, one with a GForce9800 and one with a GTX470.
OS is a self compiled LFS Linux 64Bit with 32Bit compat. libs installed. No X-Windows on both machines.

Michael

astrocrab
astrocrab
Joined: 28 Jan 08
Posts: 208
Credit: 429202534
RAC: 0

RE: As for the 270 driver:

Quote:


As for the 270 driver: It runs flawlessly on my 16 bit Suse 11.2 box, but I got hanging CUDA apps since updating the driver on my 64 bit Linux box.

Does anybody see the same on a 64 bit Linux w/ 270 driver?

CU
HB

i run 270 driver on several ubuntu 10.10 x64 machines with gtx 560 for several weeks. no any hangs.

i thought 16 bit machines extinct soon after dynosaurs.

when new app will be available (estimated) =) ?

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 728194737
RAC: 1184045

RE: i thought 16 bit

Quote:


i thought 16 bit machines extinct soon after dynosaurs.

Lol!!!!

Oh my dear...but by typos like that you recognize people who actually have done asm programming on 8 bit [sic] processors and thought that 16 bit was heaven :-)

CU
HB

Donald A. Tevault
Donald A. Tevault
Joined: 17 Feb 06
Posts: 439
Credit: 73516529
RAC: 0

RE: RE: i thought 16 bit

Quote:
Quote:


i thought 16 bit machines extinct soon after dynosaurs.

Lol!!!!

Oh my dear...but by typos like that you recognize people who actually have done asm programming on 8 bit [sic] processors and thought that 16 bit was heaven :-)

CU
HB

Have you ever worked with transistorized computers? They're much more fun than the modern ones.

astrocrab
astrocrab
Joined: 28 Jan 08
Posts: 208
Credit: 429202534
RAC: 0

RE: Oh my dear...but by

Quote:


Oh my dear...but by typos like that you recognize people who actually have done asm programming on 8 bit [sic] processors and thought that 16 bit was heaven :-)

CU
HB

i had some programming on Z80 =) and today's many gigabytes software scares me ))
but where did you get 16-bit cpu? even ancient 386 was a 32-bit already.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 728194737
RAC: 1184045

Not wanting to hijack the

Not wanting to hijack the thread, but since you asked:

My first assembly program was on the 6502 8 bit CPU of a Commodore VIC 20. Early 80th of the previous century.

Then I had an Intel 8086 based PC (or 8088 can't remember), which was logically a 16 bit CPU.

:-)

HB

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6588
Credit: 317507858
RAC: 374648

RE: Not wanting to hijack

Quote:

Not wanting to hijack the thread, but since you asked:

My first assembly program was on the 6502 8 bit CPU of a Commodore VIC 20. Early 80th of the previous century.

Then I had an Intel 8086 based PC (or 8088 can't remember), which was logically a 16 bit CPU.

:-)

HB


Ah, there are other dinosaurs about that know of what you speak! :-):-)

Yeah I did 6502 as well, on the C-64. Quite a laugh sorting out their indirection/pointer commands as I recall. Pretty well everything bar the power switch was memory mapped, so you got direct access to the lot. I too graduated to the 8088 first then 8086, using MASM and then "Progammer's Work Bench" - a good IDE for it's day. The learning hump for me was understanding stack frames correctly. The 8088 is internally/code the same but had only an 8-bit memory bus accesses, and word alignment was thus a performance issue for 8086.

16 bit was like : "Really? Wow! Can I have a try? Please ..... "

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

tullio
tullio
Joined: 22 Jan 05
Posts: 2118
Credit: 61407735
RAC: 0

I have used the Z80 and the

I have used the Z80 and the Z8000, which was 16 bit. 32-bit Z80000 never appeared and I switched to 68010 and following chips from Motorola,
Tullio

Ver Greeneyes
Ver Greeneyes
Joined: 26 Mar 09
Posts: 140
Credit: 9562235
RAC: 0

If you've ever looked at or

If you've ever looked at or created a boot sector for your HDD, even today's CPUs still start up in a legacy 16-bit mode :)

Donald A. Tevault
Donald A. Tevault
Joined: 17 Feb 06
Posts: 439
Credit: 73516529
RAC: 0

RE: Not wanting to hijack

Quote:

Not wanting to hijack the thread, but since you asked:

My first assembly program was on the 6502 8 bit CPU of a Commodore VIC 20. Early 80th of the previous century.

Then I had an Intel 8086 based PC (or 8088 can't remember), which was logically a 16 bit CPU.

:-)

HB

Some interesting posts here, but not quite on topic. A new thread, perhaps?

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 984
Credit: 25171438
RAC: 34

RE: Update: I checked the

Quote:

Update: I checked the 260.19.36 as well as the 270.18 beta driver. While the first doesn't fix the issue (as expected) the latter does. Well, at least in a first test run that still needs to finish... If all turns out fine we might publish an unofficial release of our Linux 32-bit app that relies on this new driver. You are free to install and run the new driver/app combo manually as you please (using an appropriate app_info.xml file).

Stay tuned for more...

Yet another update: we will release shortly a Linux CUDA app specifically for use with the NVIDIA 270.xx beta driver. As soon as you install this driver our server will send you the new app which behaves like a normal BOINC CUDA app, reducing the CPU consumption as good as possible.

We'll post a tech news item as soon as the new app is released (it's imminent).

Cheers,
Oliver

Einstein@Home Project

Jeroen
Jeroen
Joined: 25 Nov 05
Posts: 379
Credit: 740030628
RAC: 0

RE: Yet another update: we

Quote:


Yet another update: we will release shortly a Linux CUDA app specifically for use with the NVIDIA 270.xx beta driver. As soon as you install this driver our server will send you the new app which behaves like a normal BOINC CUDA app, reducing the CPU consumption as good as possible.

We'll post a tech news item as soon as the new app is released (it's imminent).

Cheers,
Oliver

That is great news. Thanks.

Donald A. Tevault
Donald A. Tevault
Joined: 17 Feb 06
Posts: 439
Credit: 73516529
RAC: 0

RE: Yet another update: we

Quote:


Yet another update: we will release shortly a Linux CUDA app specifically for use with the NVIDIA 270.xx beta driver. As soon as you install this driver our server will send you the new app which behaves like a normal BOINC CUDA app, reducing the CPU consumption as good as possible.

We'll post a tech news item as soon as the new app is released (it's imminent).

Cheers,
Oliver

Cool. As it happens, I installed an nVidia card and the 270 driver in one of my machines only yesterday.

astrocrab
astrocrab
Joined: 28 Jan 08
Posts: 208
Credit: 429202534
RAC: 0

i'm sorry, i don't understand

i'm sorry, i don't understand clear enought when can we use new 1.07 version?

Michael Karlinsky
Michael Karlinsky
Joined: 22 Jan 05
Posts: 888
Credit: 23502182
RAC: 0

RE: i'm sorry, i don't

Quote:
i'm sorry, i don't understand clear enought when can we use new 1.07 version?

1.07 for BRP is an official app, which downloads automatically, see apps-page.

Michael

PS: For linux && NVIDIA 270.* beta driver, see Olivers post below.

astrocrab
astrocrab
Joined: 28 Jan 08
Posts: 208
Credit: 429202534
RAC: 0

RE: PS: For linux &&

Quote:

PS: For linux && NVIDIA 270.* beta driver, see Olivers post below.


below? i can't see any (

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250533320
RAC: 34175

There are three new Apps, all

There are three new Apps, all do have the version number 1.07. All are built from basically the same code that should avoid GPU-CPU cross-validation problems.

One is for Windows, one is for Linux for all drivers but uses a full CPU core. These two you should get automatically from now on.

There is a third one for Linux that will work only with the driver version 270. If you feel you need to, you can already download the executable from here (it will take some work for you to get it to run). However as soon as I come to it I will modify the scheduler, so that Linux users that have installed the 270 driver will get this App automatically.

hth

BM

BM

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 728194737
RAC: 1184045

According to Bernd's tech

According to Bernd's tech news item, the Linux app taking advantage of the driver bug fix in 270.* Linux drivers will come online on Monday 21st Feb 2011. The Linux and Windows 107 CUDA apps that are distributed starting today fix all known GPU/CPU cross validation problems.

HB

Jeroen
Jeroen
Joined: 25 Nov 05
Posts: 379
Credit: 740030628
RAC: 0

I have the new 270 based app

I have the new 270 based app running currently with my 295. I don't have much runtime yet but so far so good. Here is an updated app_info.xml in case anyone else is interested. CPU load is near zero and the app seems to be performing great from what I have seen so far.

1841 49.3 2.2 83972 90440 ? RNl 17:55 1:24 ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP3_1.07_i686-pc-linux-gnu__BRP3cuda32nv270
1843 49.7 2.1 83012 88972 ? RNl 17:55 1:24 ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP3_1.07_i686-pc-linux-gnu__BRP3cuda32nv270

load average: 0.02, 0.02, 0.03

astrocrab
astrocrab
Joined: 28 Jan 08
Posts: 208
Credit: 429202534
RAC: 0

With this new

With this new einsteinbinary_BRP3_1.07_i686-pc-linux-gnu__BRP3cuda32nv270 application
i got floating cpu load 25-80% (with average about 50%) instead of constantly 100% with previous fullCPU app, but time to complete a WU also raise from ~4000 seconds to ~5000 seconds. =(
this mean what fullcpu app works faster when 270 app.
what am i doing wrong?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117646739464
RAC: 35138659

RE: I have the new 270

Quote:
I have the new 270 based app running currently with my 295. I don't have much runtime yet but so far so good ....


I have a lot of hosts running Linux and a few running Windows but no nVidia cards as yet. I have 12 HD4850s on MWAH and I'm hoping that an OpenCL app might appear soon enough so I've been resisting the urge to buy a few nVidia cards. However, I've been keeping track of the CUDA app development and it's pretty hard to resist the urge to put some cards in a few Linux machines, particularly now that the remaining 'impediments' seem to be disappearing rather quickly.

I have (or have access to) hosts running Linux, MacOSX and Windows and my preference is very much towards Linux and MacOSX with Windows a distant last. They (being unix) much more suit my style of micromanagement (writing shell scripts, etc) :-). I've just finished browsing your linked app_info.xml and I have a few comments you might be interested in.

  • * You've catered for GC1HF, APB2 and BRP3 but surely you could omit ABP2 since the chances of getting any must be virtually nil.

* Even your most recently returned results are listed as '1.06' - there's no transition to '1.07' showing on the website. Having perused your app_info.xml, I think I can tell you what to do to correct that. Are your most recently downloaded new tasks listed in BOINC Manager also showing as '1.06'? If they are and if your working app_info.xml is similar to the one in the link, all you need to do is swap the order of the two clauses. Just put the one with of 107 first and the 106 one second.

* You appear to be still getting validate errors and some 'inconclusive' matches as well in your recent returns. Looks like there are still problems with the 1.07 nv270 app.

* Your app_info.xml says that you will be doing '1.06' branded tasks with the 1.07 app. This is fine but it also implies that tasks started with 1.06 would have been completed with 1.07. This is also fine in the CPU world (usually) as long as the format of a checkpoint hasn't changed. I don't know about the GPU world but can you perhaps check if the tasks now showing as validate errors were perhaps started with 1.06 and finished with 1.07? Maybe there's a problem doing that.

I've got to go right now so I'll add some more to the above list when I get a chance. Not sure when that will be as I've got a few pressing commitments right now.

Cheers,
Gary.

telegd
telegd
Joined: 17 Apr 07
Posts: 91
Credit: 10212522
RAC: 0

RE: With this new

Quote:
With this new einsteinbinary_BRP3_1.07_i686-pc-linux-gnu__BRP3cuda32nv270 application
i got floating cpu load 25-80% (with average about 50%) instead of constantly 100% with previous fullCPU app, but time to complete a WU also raise from ~4000 seconds to ~5000 seconds. =(
this mean what fullcpu app works faster when 270 app.
what am i doing wrong?

I have a couple valid WU's with the new 1.07 and my completion time has also gone up a little. I just figured the old app used teamwork between CPU & GPU to get slightly better time. I don't mind, though - freeing up a core for other work is worth it even for a slight slowdown on the GPU.

My "0.05 CPU" for the new app runs consistently at 20% of an i7-860 core (non-shared). Seems OK to me.

I suppose that means (for people with better cards than me) that you could run about 4 or 5 GPU apps using one CPU. Just a guess...

Jeroen
Jeroen
Joined: 25 Nov 05
Posts: 379
Credit: 740030628
RAC: 0

RE: I have a lot of hosts

Quote:

I have a lot of hosts running Linux and a few running Windows but no nVidia cards as yet. I have 12 HD4850s on MWAH and I'm hoping that an OpenCL app might appear soon enough so I've been resisting the urge to buy a few nVidia cards. However, I've been keeping track of the CUDA app development and it's pretty hard to resist the urge to put some cards in a few Linux machines, particularly now that the remaining 'impediments' seem to be disappearing rather quickly.

Hopefully we will see an OpenCL application to cover the ATI cards as well. Regarding adding CUDA cards, there are some good deals on E-bay for previous two generations of NVIDIA cards going on since people are upgrading to the 5xx series.

Quote:

I have (or have access to) hosts running Linux, MacOSX and Windows and my preference is very much towards Linux and MacOSX with Windows a distant last. They (being unix) much more suit my style of micromanagement (writing shell scripts, etc) :-).

I have the same preference. I prefer not having more Windows systems on my network then necessary due to having to keep them updated and secure. These days I boot my Linux image in via a PXE server and store the project data on via NFS as to not have to have separate disks and OS installs on each system.

Quote:

I've just finished browsing your linked app_info.xml and I have a few comments you might be interested in.
  • * You've catered for GC1HF, APB2 and BRP3 but surely you could omit ABP2 since the chances of getting any must be virtually nil.
    ...
I've got to go right now so I'll add some more to the above list when I get a chance. Not sure when that will be as I've got a few pressing commitments right now.

Thanks for all the comments! I went ahead and updated my app_info.xml file with the suggested changes including removing ABP2 and reordering the versions for BRP3. I'll keep an eye on the WU processing to check for WUs that fail validation. Prior to the latest apps, I was seeing anywhere from 6 - 24 invalid WUs per day. When I started running the new app yesterday, there were two work units that were still in process that I switched versions on. Perhaps it would have been better to finish those up with the old app.

Jeroen
Jeroen
Joined: 25 Nov 05
Posts: 379
Credit: 740030628
RAC: 0

RE: With this new

Quote:
With this new einsteinbinary_BRP3_1.07_i686-pc-linux-gnu__BRP3cuda32nv270 application
i got floating cpu load 25-80% (with average about 50%) instead of constantly 100% with previous fullCPU app, but time to complete a WU also raise from ~4000 seconds to ~5000 seconds. =(
this mean what fullcpu app works faster when 270 app.
what am i doing wrong?

I am seeing similar performance difference between the full cpu app and the 270 app. This is running one WU per GPU.

FullCPU App: 2954 seconds
270 App: 3674 seconds

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 728194737
RAC: 1184045

Hi! At what niceness level

Hi!

At what niceness level is the 270 app running on your host?
(usually the column "NI" in top).

A small performance drop should be expected in return for the lower CPU utilization, but the reported figures seem a bit too slow. If I remember correctly, the app should run with niceness 10 or so , while the other CPU apps should run at nice level 19, to ensure that the CUDA app is a bit more likely to get the CPU once GPU computations are finished.

Note that if you are using your own app_info.xml file, make sure to set avg_ncpus to a value < 1.0 when using the new nv270 app variant, because otherwise BOINC will start it with niceness 19.

CU
HB

mickydl*
mickydl*
Joined: 7 Oct 08
Posts: 39
Credit: 200374822
RAC: 0

On my machine the niceness

On my machine the niceness level seems OK. The CUDA app is running with a nice level of 10 everything else with 19.

Michael

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 728194737
RAC: 1184045

Good to know, thanks. I

Good to know, thanks.

I did some "back of the envelope" calculations and I'm now less surprised about the runtime increase. Here's the essence of it:

One BRP3 task consists of 4 subunits. Each sub-unit tries ca 12k orbital templates on a Parkes data sample. So every task performs ca 50 k templates.

For every template, several so called CUDA kernels (code executed on the GPU) have to be started in sequence. I don't know the exact number of kernel invocations but from what I do know, it must be > 10. Maybe more like 20, depending on how the FFT part works.

That means there will be > ca 500k kernel invocations per task. if you divide the observed slowdown of ca 1000 seconds (which seems to be pretty independent of GPU speed), you get an increase in CUDA kernel invocation latency of ca 2 milliseconds. This is the same order of magnitude as the time slice of a "niced" process in most Linux kernels.

Not sure what this means for the project tho. Some people don't like the GPU app to occupy a whole core, others don't mind and insist on max productivity. Maybe it would be best to make this configurable somehow.

CU
HB

astrocrab
astrocrab
Joined: 28 Jan 08
Posts: 208
Credit: 429202534
RAC: 0

25% of performance is a too

25% of performance is a too huge piece to ignore it. i think app should be optimized
1. to make less kernel calls
2. to utilise today's powerful GPU core effectively. GTX 580, 570, 560, 480, 470 use only 40-50% of GPU when crunching single WU and we must make magic with app_info.xml to increase output and perform manual upgrade to newer version. we can't make install_and_forget type of machines.

do you agree?

astrocrab
astrocrab
Joined: 28 Jan 08
Posts: 208
Credit: 429202534
RAC: 0

i mean "to utilise today's

i mean "to utilise today's powerful GPU more effectively."

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 728194737
RAC: 1184045

RE: 25% of performance is a

Quote:

25% of performance is a too huge piece to ignore it. i think app should be optimized
1. to make less kernel calls
2. to utilise today's powerful GPU core effectively. GTX 580, 570, 560, 480, 470 use only 40-50% of GPU when crunching single WU and we must make magic with app_info.xml to increase output and perform manual upgrade to newer version. we can't make install_and_forget type of machines.

do you agree?

Well, that's easier said than done:-). You cannot decrease the number of kernel invocations at will, some things have to be computed first by one kernel before another can work on the output. I don't see that much potential for optimization here. Maybe it's possible to reduce kernel invocations by (say) 20 to 25% at most, leaving us with a performance difference of 750 sec instead of 1000s per WU.

The other alternative is, of course, to go back to the full-CPU method: sacrifice a full CPU core per GPU task in order to avoid the increased latency in the GPU processing.

CU
HB

Donald A. Tevault
Donald A. Tevault
Joined: 17 Feb 06
Posts: 439
Credit: 73516529
RAC: 0

RE: RE: 25% of

Quote:
Quote:

25% of performance is a too huge piece to ignore it. i think app should be optimized
1. to make less kernel calls
2. to utilise today's powerful GPU core effectively. GTX 580, 570, 560, 480, 470 use only 40-50% of GPU when crunching single WU and we must make magic with app_info.xml to increase output and perform manual upgrade to newer version. we can't make install_and_forget type of machines.

do you agree?

Well, that's easier said than done:-). You cannot decrease the number of kernel invocations at will, some things have to be computed first by one kernel before another can work on the output. I don't see that much potential for optimization here. Maybe it's possible to reduce kernel invocations by (say) 20 to 25% at most, leaving us with a performance difference of 750 sec instead of 1000s per WU.

The other alternative is, of course, to go back to the full-CPU method: sacrifice a full CPU core per GPU task in order to avoid the increased latency in the GPU processing.

CU
HB

Actually, I like the idea of sticking with the full-core method. With modern processors having at least four cores, I don't think that that's much of a sacrifice for increased performance.

Edit: Okay, disregard the above. I just saw Bernd's note in the Technical News section.

Jeroen
Jeroen
Joined: 25 Nov 05
Posts: 379
Credit: 740030628
RAC: 0

RE: Hi! At what niceness

Quote:

Hi!

At what niceness level is the 270 app running on your host?
(usually the column "NI" in top).

A small performance drop should be expected in return for the lower CPU utilization, but the reported figures seem a bit too slow. If I remember correctly, the app should run with niceness 10 or so , while the other CPU apps should run at nice level 19, to ensure that the CUDA app is a bit more likely to get the CPU once GPU computations are finished.

Note that if you are using your own app_info.xml file, make sure to set avg_ncpus to a value < 1.0 when using the new nv270 app variant, because otherwise BOINC will start it with niceness 19.

CU
HB

I left the niceness to default as I am only running the two BRP3 CUDA work units currently. There is nothing else consuming up CPU resources at the moment.

Thanks.

art
art
Joined: 3 May 07
Posts: 2
Credit: 37643293
RAC: 9567

I'm curious as to how this

I'm curious as to how this decision was decided on. Will the increased processing speed compensate for removing any computer which doesn't have an NVIDIA GPU, which I would assume is substantial? Of those now shut out, I'm curious as to how many will remove the project and not return.

Would it not have been better to hold off on launching the NVIDIA GPU code until the OpenGL code was ready?

Tony DeBari
Tony DeBari
Joined: 29 Apr 05
Posts: 30
Credit: 38576823
RAC: 0

RE: I'm curious as to how

Quote:
I'm curious as to how this decision was decided on. Will the increased processing speed compensate for removing any computer which doesn't have an NVIDIA GPU, which I would assume is substantial? Of those now shut out, I'm curious as to how many will remove the project and not return.

No one has been shut out. The BRP3 CPU app is still available to run on computers that do not have a CUDA-capable GPU. It can also run concurrently with the CUDA app on those computers that do. I have one such host, and even though the GPU is tied up with Seti@Home at the moment, the CPU is happily crunching any BRP3 WUs that come its way.

Quote:
Would it not have been better to hold off on launching the NVIDIA GPU code until the OpenGL code was ready?

I'm guessing you meant OpenCL, as OpenGL is for graphics and has nothing do with distributed computing except possibly for rendering the graphics in a screen saver. I see no reason why the release of the CUDA app should have been delayed. In the time it will take to finish the OpenCL app, the CUDA app will have crunched many times more WUs than could have been done by CPUs alone. It would have been of no benefit to the project to leave that processing power untapped.

-- Tony D.

art
art
Joined: 3 May 07
Posts: 2
Credit: 37643293
RAC: 9567

Well, something seems to be

Well, something seems to be off kilter. I've not had any new jobs from Einstein@home on my ATI based workstation for over a week. Only a message indicating I don't have an NVIDIA GPU.

Is it because there are no other jobs or is there a setting I need to change?

Tony DeBari
Tony DeBari
Joined: 29 Apr 05
Posts: 30
Credit: 38576823
RAC: 0

RE: Well, something seems

Quote:

Well, something seems to be off kilter. I've not had any new jobs from Einstein@home on my ATI based workstation for over a week. Only a message indicating I don't have an NVIDIA GPU.

Is it because there are no other jobs or is there a setting I need to change?

That message indicates that the host requested GPU work for your ATI card and the project responded (correctly) that the only GPU work available is for nVidia cards.

The thing to check is if the host is requesting CPU work at all. It didn't the last time it contacted the E@H server. (The log of the most recent contact is available here.) It's possible that the host is paying back long-term debt to one of the other projects for which you crunch - my guess would be Seti@Home, which just had an extended outage and continues to have intermittent work distribution issues. If that's the case, BOINC will resume asking for E@H work once the debt evens out.

(Mods: Sorry for the thread hijack. This discussion should probably be moved to Cruncher's Corner at this point.)

-- Tony D.

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 984
Credit: 25171438
RAC: 34

FYI, the performance decrease

FYI, the performance decrease was due to a missing optimization step during build of the 1.07 apps (see this post). Version 1.08 fixes that and performance should be almost on par with the full CPU (260.x driver) version.

Cheers,
Oliver

Einstein@Home Project

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 728194737
RAC: 1184045

Just for completeness: I

Just for completeness:

I wrote earlier:

Quote:

As for the 270 driver: It runs flawlessly on my 16 bit Suse 11.2 box, but I got hanging CUDA apps since updating the driver on my 64 bit Linux box.

It seems that was a hardware issue related to my particular box. Not related to the driver update at all.
I swapped the graphics card in that box and then the app did no longer hang...it produced results that would not validate :-(.

I rebooted and later re-inserted the card....and now it was detected as a PCIe-1x (!!) card, running real sloooooooooow. What .......$&$&$ ????

I reinserted it again with considerable force, gave the box a kick...and now it's working fine again as a PCIe-16x card and validates fine. I won't touch that box again.

CU
HB

Elvis
Elvis
Joined: 6 Oct 06
Posts: 2
Credit: 142096740
RAC: 29762

Hi There ! I have resume

Hi There !

I have resume the Einstein@home project and crunching with BOIN after a two year break but still have 80,846 points on Einstein.
I now have a four core CPU and an ATI Radeon HD 4850 Video Card.
BOIN manager tells at boot that this GPU can produce 1120 GFLOPS peak.

Do you know When will Einstein@home use ATI Video card ?
Why are GPUs "so much" More powerfull that CPU and/or producing so much points compared to CPU calculation ?

Thanks and hurry up for the ATI Support ! ;-)

Elvis

rados
rados
Joined: 28 Feb 11
Posts: 1
Credit: 185405
RAC: 0

my settings allows Boinc to

my settings allows Boinc to run when inactive for 2 min.
thats fine but when i start to use my computer again everything stops execpt Einstein@home cuda32 version tasks

it shows as it is stoped in the manager but i can see it in windows task manager and can understand from the noise of the fan of graphic card...

induktio
induktio
Joined: 1 Oct 10
Posts: 15
Credit: 10144774
RAC: 0

This does seem very

This does seem very interesting. Currently I am doing CPU-only crunching in a Linux environment but once these CUDA drivers mature it would be tempting to acquire a GPU to help in the process.

Going to the point, yesterday I read that Amazon Web Services had begun offering Cluster GPU instances with these specs:

Quote:

The Cluster GPU instance family currently contains a single instance type, the Cluster GPU Quadruple Extra Large with the following specifications:

22 GB of memory
33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem†architecture)
2 x NVIDIA Tesla “Fermi†M2050 GPUs
1690 GB of instance storage
64-bit platform
I/O Performance: Very High (10 Gigabit Ethernet)
API name: cg1.4xlarge


It's not hard to count 1+1 and see the computing power potential here. Although that is the most expensive instance they offer, the GPUs are supposedly very powerful. One thing I wonder is has this Einstein@Home app been tested to run reliably on the above Tesla M2050 GPU? Do you have any estimates how quickly it could complete binary search workunits?

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 984
Credit: 25171438
RAC: 34

RE: One thing I wonder is

Quote:
One thing I wonder is has this Einstein@Home app been tested to run reliably on the above Tesla M2050 GPU? Do you have any estimates how quickly it could complete binary search workunits?

We have lots of C2050 cards (same architecture). The speed-up compared to the Xeon CPU-only performance of their host machines is currently roughly at a factor of 20.

Oliver

Einstein@Home Project

Jeroen
Jeroen
Joined: 25 Nov 05
Posts: 379
Credit: 740030628
RAC: 0

RE: This does seem very

Quote:

This does seem very interesting. Currently I am doing CPU-only crunching in a Linux environment but once these CUDA drivers mature it would be tempting to acquire a GPU to help in the process.

One thing I wonder is has this Einstein@Home app been tested to run reliably on the above Tesla M2050 GPU? Do you have any estimates how quickly it could complete binary search workunits?

The M2050 has 3GB memory. Since it is a Fermi card, you should be able to run at least 3-4 work units at once via each card for improved production. Each work unit needs 300-400MB of GPU memory.

From searching the stats for other users with the Tesla cards, the C2050's are completing work units in 2800-3200 seconds. I am not sure how many work units these GPUs are running at once though.

induktio
induktio
Joined: 1 Oct 10
Posts: 15
Credit: 10144774
RAC: 0

RE: The M2050 has 3GB

Quote:


The M2050 has 3GB memory. Since it is a Fermi card, you should be able to run at least 3-4 work units at once via each card for improved production. Each work unit needs 300-400MB of GPU memory.

From searching the stats for other users with the Tesla cards, the C2050's are completing work units in 2800-3200 seconds. I am not sure how many work units these GPUs are running at once though.

I recall seeing some GeForce GTX 580's completing WU's in ~3000 seconds. I'm not sure how those two architectures compare, but both C2050's and GTX 580's seem to have roughly the same number of CUDA cores and memory bandwidth. The GTX 580 seems to also have Fermi capability, so it's likely it was running multiple work units concurrently.

From a practical point of view, GTX 580 seems to deliver the same performance than Tesla C2050 at 1/5th of a cost. It doesn't make much sense to buy Tesla unless one really needs the bigger, ECC-enabled memory (which admittely is required for some serious work).

Jeroen
Jeroen
Joined: 25 Nov 05
Posts: 379
Credit: 740030628
RAC: 0

RE: I recall seeing some

Quote:


I recall seeing some GeForce GTX 580's completing WU's in ~3000 seconds. I'm not sure how those two architectures compare, but both C2050's and GTX 580's seem to have roughly the same number of CUDA cores and memory bandwidth. The GTX 580 seems to also have Fermi capability, so it's likely it was running multiple work units concurrently.

From a practical point of view, GTX 580 seems to deliver the same performance than Tesla C2050 at 1/5th of a cost. It doesn't make much sense to buy Tesla unless one really needs the bigger, ECC-enabled memory (which admittely is required for some serious work).

My 580's are able to complete 3 tasks at once in around 3500-3600 seconds. I would guess the Tesla Fermi cards would perform similar due to similar CUDA cores. EVGA is coming out with a 3GB version of the 580 in early April, which I think will be perfect for this project. The 1.5GB version of the 580 can run four tasks at once in most cases but this uses up almost all the GPU memory and in some cases the fourth task will not run due to memory constraints. I am not sure what the price will be on the 3GB version though.

mickydl*
mickydl*
Joined: 7 Oct 08
Posts: 39
Credit: 200374822
RAC: 0

Another difference of the

Another difference of the Teslas is that they will provide the full double precision FP performance. Compared to the consumer cards this means:

Tesla: FP speed = 1/2 interger Speed
GXT580: FP speed = 1/8 interger Speed

Not that you need it for Einstein but if you are planing on using them for other projects as well...

Michael

egg Films Graphics 2
egg Films Graphics 2
Joined: 22 Mar 11
Posts: 1
Credit: 453850
RAC: 0

We just installed Quadro 4000

We just installed Quadro 4000 cards in 4 8-core Mac Pros. Can't wait to see the GPU app and the data we can crunch. Hope the GPU app ships soon!

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117646739464
RAC: 35138659

RE: .... Hope the GPU app

Quote:
.... Hope the GPU app ships soon!


Hi,
Welcome to the project.

Depending on what version of OS X you are running, the app is already available. The latest info I recall seeing about this is here.

I just had a look at the computer you have attached to the project. It's not showing as having a compatible GPU.

Cheers,
Gary.