Thanks @petri33 & @Ian&Steve C. for making this available. I'm fairly new at BOINC so I don't quite get the nuances of configuring the Anonymous Platform.
in app_info.xml on systems with 8GB VRAM once I reloaded boinc-client it would fail the second Task with a Computational Error:
</p>
<pre>
<core_client_version>7.18.1</core_client_version>
<![CDATA[
<message>
process exited with code 65 (0x41, -191)</message>
<stderr_txt>
13:29:23 (3697396): [normal]: This Einstein@home App (v1.0 by petri33) was built at: Apr 28 2022 18:47:15
13:29:23 (3697396): [normal]: Start of BOINC application '../../projects/einstein.phys.uwm.edu/HSgammaPulsar_x86_64-pc-linux-gnu-opencl_v1.0'.
13:29:23 (3697396): [debug]: 1e+16 fp, 6.4e+09 fp/s, 1647640 s, 457h40m40s30
13:29:23 (3697396): [normal]: % CPU usage: 1.000000, GPU usage: 0.500000
command line: ../../projects/einstein.phys.uwm.edu/HSgammaPulsar_x86_64-pc-linux-gnu-opencl_v1.0 --inputfile ../../projects/einstein.phys.uwm.edu/LATeah3012L11.dat --alpha 2.59819959601 --delta -0.694603692878 --skyRadius 1.890770e-06 --ldiBins 15 --f0start 764.0 --f0Band 8.0 --firstSkyPoint 0 --numSkyPoints 1 --f1dot -1e-13 --f1dotBand 1e-13 --df1dot 1.69860773e-15 --ephemdir ../../projects/einstein.phys.uwm.edu/JPLEPH --Tcoh 2097152.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.1 --reftime 56100 --model 0 --f0orbit 0.005 --mismatch 0.1 --demodbinary 1 --BinaryPointFile ../../projects/einstein.phys.uwm.edu/templates_LATeah3012L11_0772_33406920.dat --debug 0 -o LATeah3012L11_772.0_0_0.0_33406920_0_0.out
output files: 'LATeah3012L11_772.0_0_0.0_33406920_0_0.out' '../../projects/einstein.phys.uwm.edu/LATeah3012L11_772.0_0_0.0_33406920_0_0' 'LATeah3012L11_772.0_0_0.0_33406920_0_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah3012L11_772.0_0_0.0_33406920_0_1'
13:29:23 (3697396): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86
13:29:23 (3697396): [debug]: glibc version/release: 2.35/stable
13:29:23 (3697396): [debug]: Set up communication with graphics process.
EAH_SLEEP file found, value 0
kernel_compact 256 threads
kernel_raz 256 threads
kernel_ts_2_phase_diff_sorted 64 threads
kernel_prepare_power_toplist 256 threads
kernel_prepareSort 1024 threads
kernel_SortedPhoton 64 threads
kernel_setupPhotonPairsArray 64 threads
kernel_extractPhotonIndex 512 threads
Eah sleep true, 0
boinc_get_opencl_ids returned [0x55efcd653c40 , 0x55efcd649af0]
Using OpenCL platform provided by: NVIDIA Corporation
Using OpenCL device "NVIDIA GeForce GTX 1070 Ti" by: NVIDIA Corporation
Max allocation limit: 2127691776
Global mem size: 8510767104
Could not open file: /tmp/dep-b9eb4b.d
OpenCL device has FP64 support
Could not open file: /tmp/dep-272e61.d
SemiCoh mode 0 start
skypoints(1)read_checkpoint(): Couldn't open file 'LATeah3012L11_772.0_0_0.0_33406920_0_0.out.cpt': No such file or directory (2)
skypoint loop(1)
S0:
binpoints loop 639
set_up_fft samples:16777216
% fft length: 16777216(0x1000000)
Using alternate fft kernel file: ../../clfft.kernel.Transpose2.cl.alt
Could not open file: /tmp/dep-a7ca6a.d
Using alternate fft kernel file: ../../clfft.kernel.Stockham3.cl.alt
Could not open file: /tmp/dep-ee75f4.d
Using alternate fft kernel file: ../../clfft.kernel.Transpose4.cl.alt
Could not open file: /tmp/dep-3d35bf.d
Using alternate fft kernel file: ../../clfft.kernel.Stockham5.cl.alt
Could not open file: /tmp/dep-e68c2c.d
Using alternate fft kernel file: ../../clfft.kernel.Transpose6.cl.alt
Could not open file: /tmp/dep-743a39.d
% Scratch buffer size: 136314880
ZError in OpenCL context: Unknown error executing clFlush on NVIDIA GeForce GTX 1070 Ti (Device 0).
... {above repeated many times } ...
Failed to allocate tmp buffer for photon data
13:29:28 (3697396): [CRITICAL]: ERROR: MAIN() returned with error '1'
FPU status flags:
mv: cannot stat 'LATeah3012L11_772.0_0_0.0_33406920_0_0.out': No such file or directory
mv: cannot stat 'LATeah3012L11_772.0_0_0.0_33406920_0_0.out.cohfu': No such file or directory
13:29:28 (3697396): [normal]: done. calling boinc_finish(65).
13:29:28 (3697396): called boinc_finish(65)
Warning: Program terminating, but clFFT resources not freed.
Please consider explicitly calling clfftTeardown( ).
</stderr_txt>
]]></pre>
<pre>
Still, running just 1 Task/GPU I'm seeing a 45% decrease in time compared to the stock application (EVGA 1070ti @ 90W; Ubuntu 22.04 LTS; NVIDIA 510.73.05) and similar gains on a 2060, 2060 Super and a 1660ti.
You don't make the change in task concurrence in the coproc_info.xml file. That file is autogenerated by the client detection of the system gpus. It is not meant to be tampered with by the user.
You make the change to crunch multiple tasks concurrently on the gpu either at the projects Computing Preferences settings or in an app_info.xml file which needs to be written by the user.
Thanks @petri33 & @Ian&Steve C. for making this available. I'm fairly new at BOINC so I don't quite get the nuances of configuring the Anonymous Platform.
in app_info.xml on systems with 8GB VRAM once I reloaded boinc-client it would fail the second Task with a Computational Error:
</p>
<pre>
<core_client_version>7.18.1</core_client_version>
<![CDATA[
<message>
process exited with code 65 (0x41, -191)</message>
<stderr_txt>
13:29:23 (3697396): [normal]: This Einstein@home App (v1.0 by petri33) was built at: Apr 28 2022 18:47:15
13:29:23 (3697396): [normal]: Start of BOINC application '../../projects/einstein.phys.uwm.edu/HSgammaPulsar_x86_64-pc-linux-gnu-opencl_v1.0'.
13:29:23 (3697396): [debug]: 1e+16 fp, 6.4e+09 fp/s, 1647640 s, 457h40m40s30
13:29:23 (3697396): [normal]: % CPU usage: 1.000000, GPU usage: 0.500000
command line: ../../projects/einstein.phys.uwm.edu/HSgammaPulsar_x86_64-pc-linux-gnu-opencl_v1.0 --inputfile ../../projects/einstein.phys.uwm.edu/LATeah3012L11.dat --alpha 2.59819959601 --delta -0.694603692878 --skyRadius 1.890770e-06 --ldiBins 15 --f0start 764.0 --f0Band 8.0 --firstSkyPoint 0 --numSkyPoints 1 --f1dot -1e-13 --f1dotBand 1e-13 --df1dot 1.69860773e-15 --ephemdir ../../projects/einstein.phys.uwm.edu/JPLEPH --Tcoh 2097152.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.1 --reftime 56100 --model 0 --f0orbit 0.005 --mismatch 0.1 --demodbinary 1 --BinaryPointFile ../../projects/einstein.phys.uwm.edu/templates_LATeah3012L11_0772_33406920.dat --debug 0 -o LATeah3012L11_772.0_0_0.0_33406920_0_0.out
output files: 'LATeah3012L11_772.0_0_0.0_33406920_0_0.out' '../../projects/einstein.phys.uwm.edu/LATeah3012L11_772.0_0_0.0_33406920_0_0' 'LATeah3012L11_772.0_0_0.0_33406920_0_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah3012L11_772.0_0_0.0_33406920_0_1'
13:29:23 (3697396): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86
13:29:23 (3697396): [debug]: glibc version/release: 2.35/stable
13:29:23 (3697396): [debug]: Set up communication with graphics process.
EAH_SLEEP file found, value 0
kernel_compact 256 threads
kernel_raz 256 threads
kernel_ts_2_phase_diff_sorted 64 threads
kernel_prepare_power_toplist 256 threads
kernel_prepareSort 1024 threads
kernel_SortedPhoton 64 threads
kernel_setupPhotonPairsArray 64 threads
kernel_extractPhotonIndex 512 threads
Eah sleep true, 0
boinc_get_opencl_ids returned [0x55efcd653c40 , 0x55efcd649af0]
Using OpenCL platform provided by: NVIDIA Corporation
Using OpenCL device "NVIDIA GeForce GTX 1070 Ti" by: NVIDIA Corporation
Max allocation limit: 2127691776
Global mem size: 8510767104
Could not open file: /tmp/dep-b9eb4b.d
OpenCL device has FP64 support
Could not open file: /tmp/dep-272e61.d
SemiCoh mode 0 start
skypoints(1)read_checkpoint(): Couldn't open file 'LATeah3012L11_772.0_0_0.0_33406920_0_0.out.cpt': No such file or directory (2)
skypoint loop(1)
S0:
binpoints loop 639
set_up_fft samples:16777216
% fft length: 16777216(0x1000000)
Using alternate fft kernel file: ../../clfft.kernel.Transpose2.cl.alt
Could not open file: /tmp/dep-a7ca6a.d
Using alternate fft kernel file: ../../clfft.kernel.Stockham3.cl.alt
Could not open file: /tmp/dep-ee75f4.d
Using alternate fft kernel file: ../../clfft.kernel.Transpose4.cl.alt
Could not open file: /tmp/dep-3d35bf.d
Using alternate fft kernel file: ../../clfft.kernel.Stockham5.cl.alt
Could not open file: /tmp/dep-e68c2c.d
Using alternate fft kernel file: ../../clfft.kernel.Transpose6.cl.alt
Could not open file: /tmp/dep-743a39.d
% Scratch buffer size: 136314880
ZError in OpenCL context: Unknown error executing clFlush on NVIDIA GeForce GTX 1070 Ti (Device 0).
... {above repeated many times } ...
Failed to allocate tmp buffer for photon data
13:29:28 (3697396): [CRITICAL]: ERROR: MAIN() returned with error '1'
FPU status flags:
mv: cannot stat 'LATeah3012L11_772.0_0_0.0_33406920_0_0.out': No such file or directory
mv: cannot stat 'LATeah3012L11_772.0_0_0.0_33406920_0_0.out.cohfu': No such file or directory
13:29:28 (3697396): [normal]: done. calling boinc_finish(65).
13:29:28 (3697396): called boinc_finish(65)
Warning: Program terminating, but clFFT resources not freed.
Please consider explicitly calling clfftTeardown( ).
</stderr_txt>
]]></pre>
<pre>
Still, running just 1 Task/GPU I'm seeing a 45% decrease in time compared to the stock application (EVGA 1070ti @ 90W; Ubuntu 22.04 LTS; NVIDIA 510.73.05) and similar gains on a 2060, 2060 Super and a 1660ti.
this is the same problem that's popped up for a few folks (mostly Keith) with Ryzen systems.
if you keep getting a lot of errors, you could consider running 2x v0.95 app tasks, which might be faster than 1x v1.0 task.
In my specific use case: running my GPUs at their lowest Power-Limit (Pascal) or at a reduced graphics clock (Turing), the 1.0 significantly out-performs the 0.95 version to the point that running 1 Task/GPU on the 1.0 version outperforms 2 tasks per GPU on the 0.95 version.
For my 1070Ti, for example, the 1.0 version is 45% faster than the native Application but the 0.95 version is only 24.7% faster with 1 task and 22.2% faster comparing 2 Tasks/GPU.
Thanks Ian and Petri. The
)
Thanks Ian and Petri. The v0.95 app is up and running on my 3 gpus. Well done!
Ian&Steve C. wrote: just
)
Nothing unusual so far. Will keep an eye on error ratio.
Thanks @petri33 & @Ian&Steve
)
Thanks @petri33 & @Ian&Steve C. for making this available. I'm fairly new at BOINC so I don't quite get the nuances of configuring the Anonymous Platform.
Though I tried to set:
in app_info.xml on systems with 8GB VRAM once I reloaded boinc-client it would fail the second Task with a Computational Error:
Still, running just 1 Task/GPU I'm seeing a 45% decrease in time compared to the stock application (EVGA 1070ti @ 90W; Ubuntu 22.04 LTS; NVIDIA 510.73.05) and similar gains on a 2060, 2060 Super and a 1660ti.
You don't make the change in
)
You don't make the change in task concurrence in the coproc_info.xml file. That file is autogenerated by the client detection of the system gpus. It is not meant to be tampered with by the user.
You make the change to crunch multiple tasks concurrently on the gpu either at the projects Computing Preferences settings or in an app_info.xml file which needs to be written by the user.
So change either here:
Project Preferences >> GPU utilization factor of FGRP apps: 1.00 >> 0.50
or here:
<app_config>
<app>
<name>hsgamma_FGRPB1G</name>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>0.9</cpu_usage>
</gpu_versions>
</app>
</app_config>
The app_config.xml file goes into the project directory >> einstein.phys.uwm.edu
Choose one or the other method. Not both. Project Preferences is the easiest.
yeah use an app_config to do
)
Keith, that coproc section he posted is actually from the app_info file, not the coproc_info file.
but yeah, use an app_config to do it.
_________________________________________________________________________
gordonbb wrote: Thanks
)
this is the same problem that's popped up for a few folks (mostly Keith) with Ryzen systems.
if you keep getting a lot of errors, you could consider running 2x v0.95 app tasks, which might be faster than 1x v1.0 task.
_________________________________________________________________________
Thanks for the corrections,
)
Thanks for the corrections, Ian. I breezed over the post too fast to read what was really going on.
Yes, with the flushing cache errors, you need to back level to the v0.95 version. That stops the errors on my Ryzen hosts.
Still faster than the stock 1.28 application.
Ian&Steve C. wrote: this is
)
Thank-you @Ian&Steve C. & @Keith Myers.
I'll revert to the 0.95 version (yes, these are Ryzen systems) and put the app_config.xml file that I removed back and give it a try.
Curious. In my specific
)
Curious.
In my specific use case: running my GPUs at their lowest Power-Limit (Pascal) or at a reduced graphics clock (Turing), the 1.0 significantly out-performs the 0.95 version to the point that running 1 Task/GPU on the 1.0 version outperforms 2 tasks per GPU on the 0.95 version.
For my 1070Ti, for example, the 1.0 version is 45% faster than the native Application but the 0.95 version is only 24.7% faster with 1 task and 22.2% faster comparing 2 Tasks/GPU.
If I understand correctly,
)
If I understand correctly, you are able to run the v1.0 application with just a single task per gpu and it doesn't error out?
If so that is a new datapoint for troubleshooting the application on Ryzen systems and 8GB cards.