There seems to be a lot of FGRP5 (CPUs?) to do. Can this not be moved over to graphics cards?
A very good question. If it was easy to switch them between CPU and GPU tasks someone would probably have noticed it. And told the rest of us (and me too).
A very small stream of grp#1 tasks is being downloaded. But they usually require either a lot of manual updates to get them or some kind of automated script that triggers the update using the BOINC command line interface.
As it is, if you have an operating system/card that will run brp7 (aka MeerKAT) you can process those much more easily.
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Question out of curiosity: Is it true that CPU-based searches are more thorough but slow while GPU searches are generally less thorough but fast?
That is a great question. And I don't have a clue. I am presuming the actual algorithms are the same. Its just that gpus can run the code in parallel which speeds it up (significantly :)
But you raise a good point. Gary? Petri? Someone who has asked that question and/or studied the code(s)?
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Question out of curiosity: Is it true that CPU-based searches are more thorough but slow while GPU searches are generally less thorough but fast?
Not exactly. There are a couple of things that determine whether a search would run on the CPUs or the GPUs. These include:
- total computing power needed for a search: of course
- boundary conditions: the Gamma-Ray search for isolated pulsars (currently FGRP5) requires an FFT length that we can't get on the GPUs, at least we couldn't get with the libraries and GPU memory that was available at the time when we started it.
- memory access / algorithm: GPU memory access is horribly slow compared to computation. It took us a couple of years and four (IIRC) attempts to find and implement an algorithm for the GW search with a memory access pattern such that using the GPU is really faster than using the CPU
- workunit "size": With the first GPU apps we had for the Arecibo Radio-Pulsar search (an early BRP, I think even called ABP2) the turnaround times became so short that it blew the whole system (DB, workunit generator etc.). Ultimately we learned to deal with it, mostly by "bundling" "atomic" tasks together for shipping. Putting the current FGRP5 tasks on the GPUs as they are might have a similar effect, though our whole system is much more powerful and robust nowadays.
- computing/data ratio: it doesn't help much if the GPU computing is held up by downloading the tasks. Searches that have a low computing/data ratio are better ran on the CPU (I don't think we currently have such searches, though).
Question out of curiosity: Is it true that CPU-based searches are more thorough but slow while GPU searches are generally less thorough but fast?
Not exactly. There are a couple of things that determine whether a search would run on the CPUs or the GPUs. These include:.....
.....snip.....
And that, my friends, is why Bernd is there working with others of his caliber at Einstein, while we are just sitting around watching our computers do the things that Bernd says we should do.
And of course the most obvious one is still missing in that list: E@H is in the comfortable situation that we finally have GPU versions of basically all our search applications. However a completely new application is much easier to develop and deploy for CPUs. Porting to the GPU then is another step that takes some extra time and effort.
Question. So, these tasks are set to use .2 of the CPUs and I have it set to use .33 of the GPUs. In our system that has three GPUs (so it is running 9 of these tasks at any given time), will it also "fill" a CPU core and use 1 core for 5 of these tasks simultaneously (.2 + .2 + .2 + .2 + .2) OR is the core completely reserved for one work unit at a time? Would there be any advantage to changing the CPUs usage from .2?
The tasks do not really use 0.2 of a cpu. They use exactly how much or how little of a cpu that they need for each task. No one seems to grasp the concept that the application itself determines how much cpu it needs for a task.
BOINC has no control over the science application resource usage.
The only thing you are doing is describing to BOINC is how much cpu a task might use for BOINC scheduling purposes.
Indeed. What I do is set Boinc to use 100% of CPU cores, so it tries to run GPU and CPU stuff all at once. I then check using MSI Afterburner (or similar) to see % usage of CPU and GPU. I try to make them both near 100%, especially the more powerful GPU. If the GPU is too low, I first try to run multiple instances at once (up to 4 per card), also limited by GPU RAM usage. If that isn't enough, I allocate CPU cores to the GPU tasks in app_config, 1 core at a time, until I get a fully utilised GPU.
For example, an i5 8600K (6 core CPU) with two Radeon R9 280X (Tahiti) GPUs. I set to 100% CPU usage and this in the app config for Einstein:
Peter Hucker of the Scottish
)
A very good question. If it was easy to switch them between CPU and GPU tasks someone would probably have noticed it. And told the rest of us (and me too).
A very small stream of grp#1 tasks is being downloaded. But they usually require either a lot of manual updates to get them or some kind of automated script that triggers the update using the BOINC command line interface.
As it is, if you have an operating system/card that will run brp7 (aka MeerKAT) you can process those much more easily.
Here is what I am told is available.
Hope it helps.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Question out of curiosity: Is
)
Question out of curiosity: Is it true that CPU-based searches are more thorough but slow while GPU searches are generally less thorough but fast?
TRAPPIST-713 wrote: Question
)
That is a great question. And I don't have a clue. I am presuming the actual algorithms are the same. Its just that gpus can run the code in parallel which speeds it up (significantly :)
But you raise a good point. Gary? Petri? Someone who has asked that question and/or studied the code(s)?
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
TRAPPIST-713 wrote:Question
)
Not exactly. There are a couple of things that determine whether a search would run on the CPUs or the GPUs. These include:
- total computing power needed for a search: of course
- boundary conditions: the Gamma-Ray search for isolated pulsars (currently FGRP5) requires an FFT length that we can't get on the GPUs, at least we couldn't get with the libraries and GPU memory that was available at the time when we started it.
- memory access / algorithm: GPU memory access is horribly slow compared to computation. It took us a couple of years and four (IIRC) attempts to find and implement an algorithm for the GW search with a memory access pattern such that using the GPU is really faster than using the CPU
- workunit "size": With the first GPU apps we had for the Arecibo Radio-Pulsar search (an early BRP, I think even called ABP2) the turnaround times became so short that it blew the whole system (DB, workunit generator etc.). Ultimately we learned to deal with it, mostly by "bundling" "atomic" tasks together for shipping. Putting the current FGRP5 tasks on the GPUs as they are might have a similar effect, though our whole system is much more powerful and robust nowadays.
- computing/data ratio: it doesn't help much if the GPU computing is held up by downloading the tasks. Searches that have a low computing/data ratio are better ran on the CPU (I don't think we currently have such searches, though).
BM
Bernd Machenschalk
)
And that, my friends, is why Bernd is there working with others of his caliber at Einstein, while we are just sitting around watching our computers do the things that Bernd says we should do.
Way to go, Bernd!
Proud member of the Old Farts Association
Bernd, great insight! Thanks.
)
Bernd, great insight! Thanks.
And of course the most
)
And of course the most obvious one is still missing in that list: E@H is in the comfortable situation that we finally have GPU versions of basically all our search applications. However a completely new application is much easier to develop and deploy for CPUs. Porting to the GPU then is another step that takes some extra time and effort.
BM
Question. So, these tasks are
)
Question. So, these tasks are set to use .2 of the CPUs and I have it set to use .33 of the GPUs. In our system that has three GPUs (so it is running 9 of these tasks at any given time), will it also "fill" a CPU core and use 1 core for 5 of these tasks simultaneously (.2 + .2 + .2 + .2 + .2) OR is the core completely reserved for one work unit at a time? Would there be any advantage to changing the CPUs usage from .2?
I hope this makes sense.
The tasks do not really use
)
The tasks do not really use 0.2 of a cpu. They use exactly how much or how little of a cpu that they need for each task. No one seems to grasp the concept that the application itself determines how much cpu it needs for a task.
BOINC has no control over the science application resource usage.
The only thing you are doing is describing to BOINC is how much cpu a task might use for BOINC scheduling purposes.
Indeed. What I do is set
)
Indeed. What I do is set Boinc to use 100% of CPU cores, so it tries to run GPU and CPU stuff all at once. I then check using MSI Afterburner (or similar) to see % usage of CPU and GPU. I try to make them both near 100%, especially the more powerful GPU. If the GPU is too low, I first try to run multiple instances at once (up to 4 per card), also limited by GPU RAM usage. If that isn't enough, I allocate CPU cores to the GPU tasks in app_config, 1 core at a time, until I get a fully utilised GPU.
For example, an i5 8600K (6 core CPU) with two Radeon R9 280X (Tahiti) GPUs. I set to 100% CPU usage and this in the app config for Einstein:
<app_config>
<app>
<name>einsteinbinary_BRP7</name>
<gpu_versions>
<gpu_usage>0.500000</gpu_usage>
<cpu_usage>0.500000</cpu_usage>
</gpu_versions>
</app>
</app_config>
This means 2 per card, totalling 4 tasks, which together reserve 2 CPU cores.
This is better than manually lowering the CPU cores used, because it's different for every project.
If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.