Binary Radio Pulsar Search (Parkes PMPS XT) "BRP6"

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 729112417
RAC: 1194171

I deprecated the BRP6 CUDA

I deprecated the BRP6 CUDA beta app versions for now, we will look into this issue in more detail on Monday.

HB

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2958812867
RAC: 714592

That's the sort of thing I

That's the sort of thing I want to look into. Process Explorer confirms that we are passing a --device 0 on the tail of the command line for Parkes v1.39 - though I haven't yet found, in either the command line or init_data.xml, whether this means NVidia device zero or Intel device zero - I have both in the affected host.

I'm going to try passing 7.5.0 to v1.39 to see what messages are sent, and what the reaction is - though the failure happens so quickly I may not be able to catch it. If I can, I'll trying rebranding some v1.39 tasks to v1.47 with a lower API, and repeat.

Claggy
Claggy
Joined: 29 Dec 06
Posts: 560
Credit: 2699403
RAC: 0

RE: I'm going to try

Quote:
I'm going to try passing 7.5.0 to v1.39 to see what messages are sent, and what the reaction is - though the failure happens so quickly I may not be able to catch it. If I can, I'll trying rebranding some v1.39 tasks to v1.47 with a lower API, and repeat.


Try using --exit_after_app_start 1

http://albertathome.org/goto/comment/80543

Claggy

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 729112417
RAC: 1194171

RE: I'm going to try

Quote:

I'm going to try passing 7.5.0 to v1.39 to see what messages are sent, and what the reaction is - though the failure happens so quickly I may not be able to catch it.

I guess you could make an app_config.xml file to add the command line option in question, just to make sure the tasks never fail. You could then see if BOINC adds another instance of the --device option or not.

Cheers
HB

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2958812867
RAC: 714592

Ta. I'll try that, though

Ta. I'll try that, though it'll be messy on a multi-core, multi-gpu machine: which may be a clue, because that's the only one I'm having errors on.

Meanwhile, I've polished my glasses, and found the GPU selector in init_data.xml for v1.39:

NVIDIA
0
0
0.500000
0.040000
590000000000000.000000


etc. - hidden between the User info and the Host info.

floyd
floyd
Joined: 12 Sep 11
Posts: 133
Credit: 186610495
RAC: 0

I changed the API setting

I changed the API setting from 7.5.0 to 7.2.2 as suggested and the tasks were running after that (with --device parameter). Changed it back and all crashed immediately, and I caught a glimpse of one without --device.

(Edit: Removed an incorrect statement)

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2958812867
RAC: 714592

Likewise. I now have two

Likewise. I now have two v1.39 downloads, running under v1.47 by setting the API version to 7.2.2 and rebranding.

I messed up the initial setting of --device 0 via app_config (not easy to spot with a commandline BOINC - thanks due to BOINC Manager for the GUI Notice about the stray character), but I know it's right now because I've got two copies showing for the running tasks - the app doesn't seem to mind that. I'll let these run and finish in peace, then try again with the API 7.5.0 setting for the next pair.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 729112417
RAC: 1194171

While beta-testing this new

While beta-testing this new app, experienced GPU users might want to have an eye on GPU temperature and fan-speed (= noise level) in comparison to the current offical apps. During limited testing with some older cards that are in the medium price and performance range, we saw that the GPUs get up to ca 10 degrees C hotter with this app version. Of course this varies widely with the different cards and cooling systems used, but it can't be bad to have an eye (and ear) on this, just to know what one should expect, especially for the high performance cards.

HB

Bill592
Bill592
Joined: 25 Feb 05
Posts: 786
Credit: 70825065
RAC: 0

RE: While beta-testing this

Quote:

While beta-testing this new app, experienced GPU users might want to have an eye on GPU temperature and fan-speed (= noise level) in comparison to the current offical apps. During limited testing with some older cards that are in the medium price and performance range, we saw that the GPUs get up to ca 10 degrees C hotter with this app version. Of course this varies widely with the different cards and cooling systems used, but it can't be bad to have an eye (and ear) on this, just to know what one should expect, especially for the high performance cards.

HB

Yep ! Mine is running about 5°C Hotter and - the computer is
drawing 55 Watts more power from the wall.
That is running Two tasks at once on a radeon 7970.

Thanks Herr Bikeman )

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 729112417
RAC: 1194171

RE: Yep ! Mine is

Quote:

Yep ! Mine is running about 5°C Hotter and - the computer is
drawing 55 Watts more power from the wall.
That is running Two tasks at once on a radeon 7970.


55 W is a bit unexpected, but hey, it's for science ...

Back to the API issue...

I was curious how CUDA device assignment is now handled in the BOINC supplied sample CUDA app, surely it has some best-practice code to copy and paste ;-), and I found this in samples/nvcuda/cuda.cpp

// NOTE: As currently written, this sample is of limited usefulness, as it
// is missing two important features:
// * Code to determine the correct device assigned by BOINC.  It needs to get
//   the device number from the gpu_opencl_dev_index field of init_data.xml
//   if it exists, else from the gpu_device_num field of init_data.xml if that
//   exists, else from the --device or -device argument passed by the client.
//   See api/boinc_opencl.cpp for code which does this.
// * Code to select which NVIDIA GPU to use if there are more than one on the
//   system; it needs to call cudaSetDevice().

Oh well...

But really.... what use is it to look at gpu_opencl_dev_index first? For a CUDA app this would be wrong, I guess???

Let's hide this stuff behind an API helper function, just like for OpenCL.

Cheers
HB

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.