That's the sort of thing I want to look into. Process Explorer confirms that we are passing a --device 0 on the tail of the command line for Parkes v1.39 - though I haven't yet found, in either the command line or init_data.xml, whether this means NVidia device zero or Intel device zero - I have both in the affected host.
I'm going to try passing 7.5.0 to v1.39 to see what messages are sent, and what the reaction is - though the failure happens so quickly I may not be able to catch it. If I can, I'll trying rebranding some v1.39 tasks to v1.47 with a lower API, and repeat.
I'm going to try passing 7.5.0 to v1.39 to see what messages are sent, and what the reaction is - though the failure happens so quickly I may not be able to catch it. If I can, I'll trying rebranding some v1.39 tasks to v1.47 with a lower API, and repeat.
I'm going to try passing 7.5.0 to v1.39 to see what messages are sent, and what the reaction is - though the failure happens so quickly I may not be able to catch it.
I guess you could make an app_config.xml file to add the command line option in question, just to make sure the tasks never fail. You could then see if BOINC adds another instance of the --device option or not.
I changed the API setting from 7.5.0 to 7.2.2 as suggested and the tasks were running after that (with --device parameter). Changed it back and all crashed immediately, and I caught a glimpse of one without --device.
Likewise. I now have two v1.39 downloads, running under v1.47 by setting the API version to 7.2.2 and rebranding.
I messed up the initial setting of --device 0 via app_config (not easy to spot with a commandline BOINC - thanks due to BOINC Manager for the GUI Notice about the stray character), but I know it's right now because I've got two copies showing for the running tasks - the app doesn't seem to mind that. I'll let these run and finish in peace, then try again with the API 7.5.0 setting for the next pair.
While beta-testing this new app, experienced GPU users might want to have an eye on GPU temperature and fan-speed (= noise level) in comparison to the current offical apps. During limited testing with some older cards that are in the medium price and performance range, we saw that the GPUs get up to ca 10 degrees C hotter with this app version. Of course this varies widely with the different cards and cooling systems used, but it can't be bad to have an eye (and ear) on this, just to know what one should expect, especially for the high performance cards.
While beta-testing this new app, experienced GPU users might want to have an eye on GPU temperature and fan-speed (= noise level) in comparison to the current offical apps. During limited testing with some older cards that are in the medium price and performance range, we saw that the GPUs get up to ca 10 degrees C hotter with this app version. Of course this varies widely with the different cards and cooling systems used, but it can't be bad to have an eye (and ear) on this, just to know what one should expect, especially for the high performance cards.
HB
Yep ! Mine is running about 5°C Hotter and - the computer is
drawing 55 Watts more power from the wall.
That is running Two tasks at once on a radeon 7970.
Yep ! Mine is running about 5°C Hotter and - the computer is
drawing 55 Watts more power from the wall.
That is running Two tasks at once on a radeon 7970.
55 W is a bit unexpected, but hey, it's for science ...
Back to the API issue...
I was curious how CUDA device assignment is now handled in the BOINC supplied sample CUDA app, surely it has some best-practice code to copy and paste ;-), and I found this in samples/nvcuda/cuda.cpp
// NOTE: As currently written, this sample is of limited usefulness, as it
// is missing two important features:
// * Code to determine the correct device assigned by BOINC. It needs to get
// the device number from the gpu_opencl_dev_index field of init_data.xml
// if it exists, else from the gpu_device_num field of init_data.xml if that
// exists, else from the --device or -device argument passed by the client.
// See api/boinc_opencl.cpp for code which does this.
// * Code to select which NVIDIA GPU to use if there are more than one on the
// system; it needs to call cudaSetDevice().
Oh well...
But really.... what use is it to look at gpu_opencl_dev_index first? For a CUDA app this would be wrong, I guess???
Let's hide this stuff behind an API helper function, just like for OpenCL.
I deprecated the BRP6 CUDA
)
I deprecated the BRP6 CUDA beta app versions for now, we will look into this issue in more detail on Monday.
HB
That's the sort of thing I
)
That's the sort of thing I want to look into. Process Explorer confirms that we are passing a --device 0 on the tail of the command line for Parkes v1.39 - though I haven't yet found, in either the command line or init_data.xml, whether this means NVidia device zero or Intel device zero - I have both in the affected host.
I'm going to try passing 7.5.0 to v1.39 to see what messages are sent, and what the reaction is - though the failure happens so quickly I may not be able to catch it. If I can, I'll trying rebranding some v1.39 tasks to v1.47 with a lower API, and repeat.
RE: I'm going to try
)
Try using --exit_after_app_start 1
http://albertathome.org/goto/comment/80543
Claggy
RE: I'm going to try
)
I guess you could make an app_config.xml file to add the command line option in question, just to make sure the tasks never fail. You could then see if BOINC adds another instance of the --device option or not.
Cheers
HB
Ta. I'll try that, though
)
Ta. I'll try that, though it'll be messy on a multi-core, multi-gpu machine: which may be a clue, because that's the only one I'm having errors on.
Meanwhile, I've polished my glasses, and found the GPU selector in init_data.xml for v1.39:
etc. - hidden between the User info and the Host info.
I changed the API setting
)
I changed the API setting from 7.5.0 to 7.2.2 as suggested and the tasks were running after that (with --device parameter). Changed it back and all crashed immediately, and I caught a glimpse of one without --device.
(Edit: Removed an incorrect statement)
Likewise. I now have two
)
Likewise. I now have two v1.39 downloads, running under v1.47 by setting the API version to 7.2.2 and rebranding.
I messed up the initial setting of --device 0 via app_config (not easy to spot with a commandline BOINC - thanks due to BOINC Manager for the GUI Notice about the stray character), but I know it's right now because I've got two copies showing for the running tasks - the app doesn't seem to mind that. I'll let these run and finish in peace, then try again with the API 7.5.0 setting for the next pair.
While beta-testing this new
)
While beta-testing this new app, experienced GPU users might want to have an eye on GPU temperature and fan-speed (= noise level) in comparison to the current offical apps. During limited testing with some older cards that are in the medium price and performance range, we saw that the GPUs get up to ca 10 degrees C hotter with this app version. Of course this varies widely with the different cards and cooling systems used, but it can't be bad to have an eye (and ear) on this, just to know what one should expect, especially for the high performance cards.
HB
RE: While beta-testing this
)
Yep ! Mine is running about 5°C Hotter and - the computer is
drawing 55 Watts more power from the wall.
That is running Two tasks at once on a radeon 7970.
Thanks Herr Bikeman )
RE: Yep ! Mine is
)
55 W is a bit unexpected, but hey, it's for science ...
Back to the API issue...
I was curious how CUDA device assignment is now handled in the BOINC supplied sample CUDA app, surely it has some best-practice code to copy and paste ;-), and I found this in samples/nvcuda/cuda.cpp
Oh well...
But really.... what use is it to look at gpu_opencl_dev_index first? For a CUDA app this would be wrong, I guess???
Let's hide this stuff behind an API helper function, just like for OpenCL.
Cheers
HB