Binary Radio Pulsar Search (Parkes PMPS XT) "BRP6"

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 729112417

RAC: 1194171

I deprecated the BRP6 CUDA

28 Feb 2015 15:38:40 UTC

Message 129851

(moderation:

)

I deprecated the BRP6 CUDA beta app versions for now, we will look into this issue in more detail on Monday.

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2958812867

RAC: 714592

That's the sort of thing I

28 Feb 2015 15:43:02 UTC

Message 129852 in response to message 129850

(moderation:

)

That's the sort of thing I want to look into. Process Explorer confirms that we are passing a --device 0 on the tail of the command line for Parkes v1.39 - though I haven't yet found, in either the command line or init_data.xml, whether this means NVidia device zero or Intel device zero - I have both in the affected host.

I'm going to try passing 7.5.0 to v1.39 to see what messages are sent, and what the reaction is - though the failure happens so quickly I may not be able to catch it. If I can, I'll trying rebranding some v1.39 tasks to v1.47 with a lower API, and repeat.

Claggy

Joined: 29 Dec 06

Posts: 560

Credit: 2699403

RAC: 0

RE: I'm going to try

28 Feb 2015 15:46:58 UTC

Message 129853 in response to message 129852

(moderation:

)

Quote:

I'm going to try passing 7.5.0 to v1.39 to see what messages are sent, and what the reaction is - though the failure happens so quickly I may not be able to catch it. If I can, I'll trying rebranding some v1.39 tasks to v1.47 with a lower API, and repeat.

Try using --exit_after_app_start 1

http://albertathome.org/goto/comment/80543

Claggy

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 729112417

RAC: 1194171

RE: I'm going to try

28 Feb 2015 15:55:31 UTC

Message 129854 in response to message 129852

(moderation:

)

Quote:

I'm going to try passing 7.5.0 to v1.39 to see what messages are sent, and what the reaction is - though the failure happens so quickly I may not be able to catch it.

I guess you could make an app_config.xml file to add the command line option in question, just to make sure the tasks never fail. You could then see if BOINC adds another instance of the --device option or not.

Cheers
HB

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2958812867

RAC: 714592

Ta. I'll try that, though

28 Feb 2015 15:58:51 UTC

Message 129855 in response to message 129853

(moderation:

)

Ta. I'll try that, though it'll be messy on a multi-core, multi-gpu machine: which may be a clue, because that's the only one I'm having errors on.

Meanwhile, I've polished my glasses, and found the GPU selector in init_data.xml for v1.39:

NVIDIA
0
0
0.500000
0.040000
590000000000000.000000

etc. - hidden between the User info and the Host info.

floyd

Joined: 12 Sep 11

Posts: 133

Credit: 186610495

RAC: 0

I changed the API setting

28 Feb 2015 16:20:52 UTC

Message 129856

(moderation:

)

I changed the API setting from 7.5.0 to 7.2.2 as suggested and the tasks were running after that (with --device parameter). Changed it back and all crashed immediately, and I caught a glimpse of one without --device.

(Edit: Removed an incorrect statement)

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2958812867

RAC: 714592

Likewise. I now have two

28 Feb 2015 18:16:39 UTC

Message 129857

(moderation:

)

Likewise. I now have two v1.39 downloads, running under v1.47 by setting the API version to 7.2.2 and rebranding.

I messed up the initial setting of --device 0 via app_config (not easy to spot with a commandline BOINC - thanks due to BOINC Manager for the GUI Notice about the stray character), but I know it's right now because I've got two copies showing for the running tasks - the app doesn't seem to mind that. I'll let these run and finish in peace, then try again with the API 7.5.0 setting for the next pair.

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 729112417

RAC: 1194171

While beta-testing this new

28 Feb 2015 19:38:05 UTC

Message 129858

(moderation:

)

While beta-testing this new app, experienced GPU users might want to have an eye on GPU temperature and fan-speed (= noise level) in comparison to the current offical apps. During limited testing with some older cards that are in the medium price and performance range, we saw that the GPUs get up to ca 10 degrees C hotter with this app version. Of course this varies widely with the different cards and cooling systems used, but it can't be bad to have an eye (and ear) on this, just to know what one should expect, especially for the high performance cards.

Bill592

Joined: 25 Feb 05

Posts: 786

Credit: 70825065

RAC: 0

RE: While beta-testing this

28 Feb 2015 19:51:21 UTC

Message 129859 in response to message 129858

(moderation:

)

Quote:

While beta-testing this new app, experienced GPU users might want to have an eye on GPU temperature and fan-speed (= noise level) in comparison to the current offical apps. During limited testing with some older cards that are in the medium price and performance range, we saw that the GPUs get up to ca 10 degrees C hotter with this app version. Of course this varies widely with the different cards and cooling systems used, but it can't be bad to have an eye (and ear) on this, just to know what one should expect, especially for the high performance cards.

HB

Yep ! Mine is running about 5Â°C Hotter and - the computer is
drawing 55 Watts more power from the wall.
That is running Two tasks at once on a radeon 7970.

Thanks Herr Bikeman )

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 729112417

RAC: 1194171

RE: Yep ! Mine is

28 Feb 2015 21:07:42 UTC

Message 129860 in response to message 129859

(moderation:

)

Quote:

Yep ! Mine is running about 5Â°C Hotter and - the computer is
drawing 55 Watts more power from the wall.
That is running Two tasks at once on a radeon 7970.

55 W is a bit unexpected, but hey, it's for science ...

Back to the API issue...

I was curious how CUDA device assignment is now handled in the BOINC supplied sample CUDA app, surely it has some best-practice code to copy and paste ;-), and I found this in samples/nvcuda/cuda.cpp

// NOTE: As currently written, this sample is of limited usefulness, as it
// is missing two important features:
// * Code to determine the correct device assigned by BOINC.  It needs to get
//   the device number from the gpu_opencl_dev_index field of init_data.xml
//   if it exists, else from the gpu_device_num field of init_data.xml if that
//   exists, else from the --device or -device argument passed by the client.
//   See api/boinc_opencl.cpp for code which does this.
// * Code to select which NVIDIA GPU to use if there are more than one on the
//   system; it needs to call cudaSetDevice().

Oh well...

But really.... what use is it to look at gpu_opencl_dev_index first? For a CUDA app this would be wrong, I guess???

Let's hide this stuff behind an API helper function, just like for OpenCL.

Cheers
HB

Binary Radio Pulsar Search (Parkes PMPS XT) "BRP6"

Forums › Technical News

Comment viewing options

Forums › Technical News