I'm opening this forum as a place to report optimized crunch settings on various system configurations for the Beta version of the Gravitational Wave Search O2 Multi-Directional GPU app. The current app version is 2.08 and only available for Linux, but as other Beta versions become available, those results can be posted here also.
To kick things off, here are my results for this host with a
4-thread Pentium Gold G5600 CPU @ 3.90GHz, running two
AMD RX 570s 4 GB, under
Ubuntu 18.04.4 & amdgpu 20.10-1048554 openCL drivers
GW GPU app Beta v. 2.08:
Average individual task times at different run multiples
Task multiple | 2x | 3x |
Time, m. | 10.8 | 16.2 |
# errors | 0 | 3 |
# samples | 498 | 77 |
While running tasks at 3x (gpu_usage = 0.33 in app_config), most tasks ran at about 8.5 min, but then would hit a stretch of long-running tasks that increased the overall average task time (T = run time X gpu_usage). Running 3x also had the disadvantage of spitting out errors at the rate of ~4%. I only ran at 3x for about 20 hr because I wasn't impressed with the results and didn't want to generate errors.
At 2x, most tasks ran a little slower compared to 3x, but the long-running tasks ran quite a bit faster than at 3x, and there were no errors.
So, for a set-it-and-forget-it configuration with an RX 570 and 4-thread CPU, I recommend a 2x task multiple for best task throughput.
Ideas are not fixed, nor should they be; we live in model-dependent reality.
Copyright © 2024 Einstein@Home. All rights reserved.
cecht wrote:I'm opening this
)
Thanks for posting performance observations relevant to your particular hardware setup. I'm running the same GPU (RX 570 4GB) supported by a 6C/12T CPU and don't run any CPU tasks. I see similar times to what you describe, perhaps with a bit better luck at x3. Many of mine at that multiplicity average around 12 mins per task. Your choice of having a single thread supporting each task would seem to be quite sufficient - and a good reason to stick to x2..
The beta version of the app you refer to was produced to deal with the problem reported in this thread. It was announced by Bernd in this message and was designed to deal with a problem exposed by anyone using the experimental AMD ROCm compute libs for Linux - probably very few would be. Since only those few would need this fix, there wont be any beta versions for other operating systems. It's unlikely that people using the non-beta version with the standard OpenCL libs would see any change in crunch times by changing to the beta version. As far as I'm aware, there has been no 'optimisation' of, or any sort of performance change to the app itself.
Cheers,
Gary.