First CUDA App for Windows available for Beta Test

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4332
Credit: 251835422
RAC: 34578
Topic 194474

The first Einstein@home CUDA App for Windows is available for Beta Test at Beta Test Page.

This is a big package to download (17MB), as it contains both applications (S5R5 and ABP1).

We are still new to writing app_info.xml files for CUDA applications. The one contained in this package should allow to run one ABP1 task per CUDA device, occupying a full CPU core for support, and run S5R5 tasks on the remaining CPU cores.

There are no entries for older than the latest App versions 3.05 / 3.07 in the app_info.xml, tasks already assigned to older Apps will error out.

The ABP1 CUDA App has undergone limited testing with a BOINC CC 6.6.36 and a 181.20 NVidia Driver on a GTX285 system. We can confirm that the App doesn't crash immediately on such a setup, but not much more. We already ran into some mysteries of the CC scheduler, though.

Again, this is our very first experience with CUDA Applications, please bear with us and don't expect everything to work right away.

Please test and report, and please include important information (like the NVIdia Driver and Core Client version) in your posts.

BM

BM

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2984403641
RAC: 738237

First CUDA App for Windows available for Beta Test

OK, I'm game to give it a try: will install on host 1001564 in a couple of hours, when existing work is complete. That's a Q6600 with a 9800GT CUDA card, running Windows XP SP3 (32-bit). BOINC is v6.6.37 and nVidia driver is 190.38 (current WHQL version). The host has done CUDA work for SETI, GPUGrid and AQUA already, so it (and I!) have experience of this sort of thing.

Some initial observations:

It's a shame you feel the need to devote a whole CPU core to the CUDA app, though understandable at this early stage of testing. I'll be watching to see how much CPU time is actually used, and seeing if the request can be turned down a bit.

I was surprised to see cudart.lib and cufft.lib in the download - other BOINC Windows CUDA applications distribute cudart.dll and cufft.dll. The version of cufft used, in particular, makes a tremendous difference to the speed of SETI tasks (v2.3 is about twice the speed of v2.1, with no change to the application). Another one to watch.

The app_info.xml looks correct, though the acid test of installation is yet to come. One potential problem comes with assessing the speed of the CUDA card/app - I suspect that people may see wild fluctuations in estimated runtimes, and the (shared) project DCF, as CPU and CUDA tasks run. Again, I'll watch and report.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4332
Credit: 251835422
RAC: 34578

RE: It's a shame you feel

Message 94029 in response to message 94028

Quote:
It's a shame you feel the need to devote a whole CPU core to the CUDA app, though understandable at this early stage of testing. I'll be watching to see how much CPU time is actually used, and seeing if the request can be turned down a bit.


Currently the application only uses CUDA for the FFT, the other calculations are still done on the CPU. I think the CPU part is still the bottleneck. This will change in the future.

BM

BM

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2984403641
RAC: 738237

You definitely need those

You definitely need those CUDA DLLs!

First task (135918825) errored instantly with an exit code -1073741515 (0xc0000135).

I then added references to cudart.dll and cufft.dll to app_info.xml (duplicated the .lib entries), and copied the DLLs themselves from the nearest available project directory. With that amendment, the next task is running OK.

I happened to grab the v2.3 DLLs (279 KB and 8,435 KB respectively) because they were nearest and I know my driver can handle them. To maintain compatibility down to driver 181.20, as in your OP, you'll need to stick to the v2.1 DLLs (188 KB and 380 KB). It will be interesting to see if that makes a difference to the speed of the ABP1 app (v2.3 doubled SETI's speed, made no difference at all at GPUGrid).

While typing that, task #2 has reached 5% in 21m 15s - a definite improvement on the 10h 30m the last CPU one took (though that was with the slow v3.07 build). You can watch task 135918835 to see if it reports overnight.

Oh, and yes, it is taking practically a whole CPU core as well...

Mark Henderson
Mark Henderson
Joined: 19 Feb 05
Posts: 34
Credit: 39095897
RAC: 3437

I could not get a WU to even

I could not get a WU to even start on XP64 until I did as previous post. Error was "There are no child processes to wait for. (0x80) - exit code 128 (0x80)"

I Copied the 2 Cuda 2.3 DLL's from Seti folder to Einstein folder, and made reference to them. It is running good now. Thanks Richard for the idea.
CPU usage is about 50% of a cpu. 13 on a quad = about 50%

Would the 2 Cuda Lib. files even be necessary now with this arrangment?

Here is my App Info.

einstein_S5R5

einstein_S5R5_3.05_windows_intelx86.exe

einstein_S5R5_3.05_windows_intelx86_0.exe

einstein_S5R5_3.05_windows_intelx86_1.exe

einstein_S5R5_3.05_windows_intelx86_2.exe

einstein_S5R5_3.05_graphics_windows_intelx86.exe

einsteinbinary_ABP1

einsteinbinary_ABP1_3.07_graphics_windows_intelx86.exe

einsteinbinary_ABP1_3.07_windows_intelx86_cuda.exe

cudart.lib

cufft.lib

cudart.dll

cufft.dll

einstein_S5R5
305
6.3.0

einstein_S5R5_3.05_windows_intelx86.exe



einstein_S5R5_3.05_windows_intelx86_0.exe


einstein_S5R5_3.05_windows_intelx86_1.exe


einstein_S5R5_3.05_windows_intelx86_2.exe


einstein_S5R5_3.05_graphics_windows_intelx86.exe
graphics_app

einsteinbinary_ABP1
307
cuda
1.0
1.0

CUDA
1

6.7.0

einsteinbinary_ABP1_3.07_windows_intelx86_cuda.exe



einsteinbinary_ABP1_3.07_graphics_windows_intelx86.exe
graphics_app


cudart.lib


cufft.lib

cudart.dll


cufft.dll

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 2

RE: Error was "There are no

Message 94032 in response to message 94031

Quote:
Error was "There are no child processes to wait for. (0x80) - exit code 128 (0x80)"


Mainly for Bernd: This error code crops up when the user is using an outdated DirectX version (outdated against the machine you built the applications on). Minimum needed for BOINC is DirectX 9.0c.

It also happens when you build the applications on a machine that either has a version of .Net installed, which the users do not have installed, or you used a compiler with additional dependencies for this .Net version. You may not even have needed .Net for the building of the applications, with the additional dependencies being a remnant of something done earlier with the same compiler.

Anyway, the .Net version on the machine then needs to be on the user's machine as well. So if you used 3.5 and he's got 2.0, he will get that error. The only way around that is to get rid of .Net as additional dependency and probably better to uninstall it completely, before you recompile the application.

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

Hi! I'm running a 9600GT,

Hi!

I'm running a 9600GT, driver version 190.38, under windows 7 rc 64-bit. This host.
I have added the .dll's (v. 2.3) like Richard & Mark noted.

I've been running Setis CUDA-app for quiet some time now and all seems fine.
When i try to run this one i get an error like this in Boinc:

04/08/2009 09:56:03 Einstein@Home [sched_op_debug] Starting scheduler request
04/08/2009 09:56:03 Einstein@Home Sending scheduler request: Requested by user.
04/08/2009 09:56:03 Einstein@Home Reporting 1 completed tasks, requesting new tasks for GPU
04/08/2009 09:56:03 Einstein@Home [sched_op_debug] CPU work request: 0.00 seconds; 0 idle CPUs
04/08/2009 09:56:03 Einstein@Home [sched_op_debug] CUDA work request: 8640.86 seconds; 1 idle GPUs
04/08/2009 09:56:08 Einstein@Home Scheduler request completed: got 1 new tasks
04/08/2009 09:56:08 Einstein@Home [sched_op_debug] Server version 607
04/08/2009 09:56:08 Einstein@Home Message from server: To get more Einstein@Home work, finish current work, stop BOINC, remove app_info.xml file, and restart.
04/08/2009 09:56:08 Einstein@Home Project requested delay of 14400 seconds
04/08/2009 09:56:08 Einstein@Home [sched_op_debug] estimated total CPU job duration: 0 seconds
04/08/2009 09:56:08 Einstein@Home [sched_op_debug] estimated total CUDA job duration: 31835 seconds
04/08/2009 09:56:08 Einstein@Home [sched_op_debug] handle_scheduler_reply(): got ack for result p2030_53648_85868_0088_G63.91+00.63.N_2.dm_1_1
04/08/2009 09:56:08 Einstein@Home [sched_op_debug] Deferring communication for 4 hr 0 min 0 sec
04/08/2009 09:56:08 Einstein@Home [sched_op_debug] Reason: requested by project
04/08/2009 09:56:10 Einstein@Home Started download of p2030_53648_85868_0088_G63.91+00.63.N_2_150.binary
04/08/2009 09:56:10 Einstein@Home [file_xfer_debug] URL: http://einstein-dl.aei.uni-hannover.de/EinsteinAtHome/download/16e/p2030_53648_85868_0088_G63.91+00.63.N_2_150.binary
04/08/2009 09:56:12 Einstein@Home work fetch suspended by user
04/08/2009 09:56:13 Einstein@Home [file_xfer_debug] FILE_XFER_SET::poll(): http op done; retval 0
04/08/2009 09:56:13 Einstein@Home [file_xfer_debug] file transfer status 0
04/08/2009 09:56:13 Einstein@Home Finished download of p2030_53648_85868_0088_G63.91+00.63.N_2_150.binary
04/08/2009 09:56:13 Einstein@Home [file_xfer_debug] Throughput 802677 bytes/sec
04/08/2009 09:56:14 Einstein@Home Starting p2030_53648_85868_0088_G63.91+00.63.N_2.dm_150_0
04/08/2009 09:56:14 Einstein@Home [cpu_sched] Starting p2030_53648_85868_0088_G63.91+00.63.N_2.dm_150_0 (initial)
04/08/2009 09:56:14 Einstein@Home Starting task p2030_53648_85868_0088_G63.91+00.63.N_2.dm_150_0 using einsteinbinary_ABP1 version 307
04/08/2009 09:56:19 Einstein@Home Computation for task p2030_53648_85868_0088_G63.91+00.63.N_2.dm_150_0 finished
04/08/2009 09:56:19 Einstein@Home Output file p2030_53648_85868_0088_G63.91+00.63.N_2.dm_150_0_0 for task p2030_53648_85868_0088_G63.91+00.63.N_2.dm_150_0 absent

In the task details I found this:

[09:56:17][5804][ERROR] Error during CUDA host->device data transfer (unknown error)
[09:56:17][5804][ERROR] Demodulation failed (error: 3)!

Any ideas on what went wrong?

Thanks
Holmis

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2984403641
RAC: 738237

Second CUDA task completed

Second CUDA task completed and reported. Not very impressive for speed:

- CUDA time plus CPU time (both resources used) adds up to more than it would have taken on the CPU alone! Still, it ran and reported success - we just have to await the arrival of a wingmate to see if it validates. Task list for easy monitoring.

Notes:

I checked, and this host has no version of dot net installed - so no dependencies have crept onto the development machine. The errors Mark and I saw seem just to be the missing CUDA DLLs: interesting that the messages are so different for x86 and x64. (I assume that the test machine in your lab had a development SDK or suchlike installed, and was able to find a shared set of DLLs associated with that).

stderr says that my CUDA card is ""GeForce 9800 GT" (508.03 GFLOPS)". Very flattering, and nVidia's marketing department would be very proud of you. David Anderson has created a new definition of real-world BOINC GFLOPS, and this card scores 60 on that scale.

Something to watch out for as you transfer more and more of the computation onto the CUDA card will be intensive CUDA operations getting in the way of the card's other function of rendering the display, and making the computer feel 'laggy' and unresponsive. I'm pleased to say there's no sign of that yet, at least at the beginning and end of a task where I tested it.

Edit: Like Holmis, I'm getting that "Message from server: To get more Einstein@Home work, finish current work, stop BOINC, remove app_info.xml file, and restart.", plus the 4-hour deferral. But both types of work are being downloaded and run OK - no problems with the S5R5 part of the app_info.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4332
Credit: 251835422
RAC: 34578

The lib/dll thing was my

The lib/dll thing was my mistake, the package did work on our machines because CUDA was installed there.

I replaced the .libs by the (hopefully) correct .dlls in the package and in the app_info.xml (and updated the md5 sum on the Beta page).

BM

BM

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2984403641
RAC: 738237

RE: (hopefully) correct

Message 94036 in response to message 94035

Quote:
(hopefully) correct .dlls ....


Well, they're smaller than the v2.2 files, but much bigger than the sizes I posted last night. Maybe these are v2.1 (unfortunately there's no embedded Windows verioning information) and last night's were 2.0 - I took the sizes from the original SETI downloads when BOINC/CUDA was launched in December 2008.

Anyway, they work with my driver - just replaced the files and started another run. We'll find out if Einstein is speed-sensitive to the FFT version in about six hours (or longer....)

Mark Henderson
Mark Henderson
Joined: 19 Feb 05
Posts: 34
Credit: 39095897
RAC: 3437

Bernd, Would you rather we

Message 94037 in response to message 94035

Bernd,
Would you rather we test this new app. with the project supplied Cuda 2.1 or is it fine to go with the 2.3 DLL's if we have them.

Also on the Beta download page it says
New work should get assigned to the new App einstein_S5R4_6.10.
Is this incorrect?

Quote:

The lib/dll thing was my mistake, the package did work on our machines because CUDA was installed there.

I replaced the .libs by the (hopefully) correct .dlls in the package and in the app_info.xml (and updated the md5 sum on the Beta page).

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.