Would the same apply to the Binary Radio Pulsar Search (MeerKAT) (BRP7) tasks that I am running ?
Bill F
In October of 1969 I took an oath to support and defend the Constitution of the United States against all enemies, foreign and domestic;
There was no expiration date.
Yes, same principle. But in my experience so far with MeerKAT tasks, they don't respond as well to 2X as the optimized app for GR#1. You won't see as much benefit . . . . if any.
They also use more VRAM. 2X may be cutting it close with 6GB.
Thank you I will probably stay with singles then on the MeerKAT tasks.
Bill F
In October of 1969 I took an oath to support and defend the Constitution of the United States against all enemies, foreign and domestic;
There was no expiration date.
... I take it this is the thread where GPU concurrency would be discussed despite its title....
Not really. It's far better to start your own thread, giving detailed info about your hardware and the search (or searches) you want to run so that others with a similar setup can give you their current experiences.
This thread is quite old and the opening post is an example of a question being asked by a few users at that time who, having tried the app_config.xml mechanism, were wanting to not use it any more. I took the opportunity of using one such post to explain that 'undoing' the mechanism didn't happen automatically if you just deleted the app_config.xml file. I just pinned the question and my response so it could be pointed to more easily when similar questions arose.
Several E@H searches can run multiple tasks per GPU. There are two ways to do this. Firstly, in your project preferences, there is a GPU utilization factor that can be set (and easily undone) which is quite good for doing a quick check to see if there is a benefit. Its drawback is that the default CPU 'support' allocation cannot be changed.
The second method is more complicated in that you have to construct an app_config.xml file, part of which gets incorporated into the state file (client_state.xml) and doesn't automatically get removed if app_config.xml gets deleted. The benefit of app_config.xml is that you have control over the 'budgeting' of both fractional GPU and CPU resources. I mention this because the values specified DO NOT change what a task will actually need or use. It's quite important to test carefully since badly selected values will just allow BOINC to stuff things up more easily :-)
Blake wrote:
Has anyone figured out the optimal GPU task concurrency for Einstein running on an NVidia Titan V?
There is no such thing as an optimal concurrency. You really need to 'suck it and see' since there are lots of variables that can have an impact. One of these is the nature of a given type of search which can change over time.
This thread was started almost 5 years ago when the FGRPB1G search was in progress. Since that search has finished and the completely different O3AS search is now in progress, you should have started a new thread to publish your information.
The big differences are that there is significant CPU support required for efficient running of the O3AS tasks and the VRAM requirements are larger than for the previous search. The app_config.xml file you provide (cpu_usage 0.9) allows the possibility for the boinc client to perhaps run an extra CPU task (if you run CPU tasks for other projects) and this could impact GPU performance. Why use the more complicated app_config.xml mechanism if you are just duplicating what a GPU utilisation factor of 0.5 would do anyway? You really should set cpu_usage to 1.0 to make sure a full thread per GPU task is budgeted for CPU support.
maeax wrote:
In the next time will testing .25 instead of .5 for gpu_usage.
You run a significant risk of tasks crashing due to lack of VRAM if you do that. Your GPUs show as having 8GB and the current version of the app was released because, previously, single tasks were crashing on 4GB GPUs. The latest version (1.07) of the app halves the memory use so whilst 2 tasks can easily share an 8GB GPU, 4 tasks may not. It would be much safer to test 0.33 to get 3 tasks per GPU.
If you're not familiar with the 'running out of memory' problem for O3AS tasks, you should read the full announcement thread in Tech News. There were quite a few examples of this that ultimately led to the app change from 1.06 to 1.07.
maeax wrote:
Sixteen Tasks is the max for using parallel on this AMD GPU.
You really need to read up about VRAM requirements for O3AS tasks :-).
this app_config.xml don't work without the paramater for CPU.
Of course it doesn't.
I suggested you use a value of 1.0 - ie. <cpu_usage>1.0</cpu_usage>. In fact, since this budgets for only one thread, you may get better GPU task times by using an even greater value. It all depends on what else is running besides Einstein GPU tasks. If you have lots of CPU tasks as well, both CPU and GPU times will suffer if you don't reserve enough CPU support for each GPU task that is running.
That's not an issue. BOINC shows your CPU has 64 processors. If you load too many of them with compute intensive tasks, there is very likely to be a bottleneck somewhere which will impact overall performance.
The <cpu_usage> mechanism is a way to make sure enough cores are reserved for GPU support. It doesn't specify or control how much or when it will be used. Each GPU task takes what it needs precisely when it needs it. If support isn't immediately available when asked for, the task will slow down, perhaps quite a lot.
You don't mention how many CPU type tasks (from any project) you are running in addition to O3AS tasks. If it happens to be a lot, your GPU performance could be degraded and one way to see if that is the case would be to reserve more cores by increasing the <cpu_usage> value. You can often make significant performance improvements by not over-committing your hardware resources.
Keith... Thank you for the
)
Keith... Thank you for the quick response.
Would the same apply to the Binary Radio Pulsar Search (MeerKAT) (BRP7) tasks that I am running ?
Bill F
In October of 1969 I took an oath to support and defend the Constitution of the United States against all enemies, foreign and domestic;
There was no expiration date.
Yes, same principle. But in
)
Yes, same principle. But in my experience so far with MeerKAT tasks, they don't respond as well to 2X as the optimized app for GR#1. You won't see as much benefit . . . . if any.
They also use more VRAM. 2X may be cutting it close with 6GB.
Keith, Thank you I will
)
Keith,
Thank you I will probably stay with singles then on the MeerKAT tasks.
Bill F
In October of 1969 I took an oath to support and defend the Constitution of the United States against all enemies, foreign and domestic;
There was no expiration date.
Blake wrote:... I take it
)
Not really. It's far better to start your own thread, giving detailed info about your hardware and the search (or searches) you want to run so that others with a similar setup can give you their current experiences.
This thread is quite old and the opening post is an example of a question being asked by a few users at that time who, having tried the app_config.xml mechanism, were wanting to not use it any more. I took the opportunity of using one such post to explain that 'undoing' the mechanism didn't happen automatically if you just deleted the app_config.xml file. I just pinned the question and my response so it could be pointed to more easily when similar questions arose.
Several E@H searches can run multiple tasks per GPU. There are two ways to do this. Firstly, in your project preferences, there is a GPU utilization factor that can be set (and easily undone) which is quite good for doing a quick check to see if there is a benefit. Its drawback is that the default CPU 'support' allocation cannot be changed.
The second method is more complicated in that you have to construct an app_config.xml file, part of which gets incorporated into the state file (client_state.xml) and doesn't automatically get removed if app_config.xml gets deleted. The benefit of app_config.xml is that you have control over the 'budgeting' of both fractional GPU and CPU resources. I mention this because the values specified DO NOT change what a task will actually need or use. It's quite important to test carefully since badly selected values will just allow BOINC to stuff things up more easily :-)
There is no such thing as an optimal concurrency. You really need to 'suck it and see' since there are lots of variables that can have an impact. One of these is the nature of a given type of search which can change over time.
Cheers,
Gary.
Have two AMD W6600 GPU on a
)
Have two AMD W6600 GPU on a Threadripper 3995.
This app_config.xml works for me in the Project
All-Sky Gravitational Wave search on O3 v1.07 () windows_x86_64
<app_config>
<app>
<name>einstein_O3AS</name>
<gpu_versions>
<gpu_usage>.5</gpu_usage>
<cpu_usage>.9</cpu_usage>
</gpu_versions>
</app>
</app_config>
In the next time will testing .25 instead of .5 for gpu_usage.
Sixteen Tasks is the max for using parallel on this AMD GPU.
But, my limit is four for every GPU (.25).
maeax wrote:This
)
This thread was started almost 5 years ago when the FGRPB1G search was in progress. Since that search has finished and the completely different O3AS search is now in progress, you should have started a new thread to publish your information.
The big differences are that there is significant CPU support required for efficient running of the O3AS tasks and the VRAM requirements are larger than for the previous search. The app_config.xml file you provide (cpu_usage 0.9) allows the possibility for the boinc client to perhaps run an extra CPU task (if you run CPU tasks for other projects) and this could impact GPU performance. Why use the more complicated app_config.xml mechanism if you are just duplicating what a GPU utilisation factor of 0.5 would do anyway? You really should set cpu_usage to 1.0 to make sure a full thread per GPU task is budgeted for CPU support.
You run a significant risk of tasks crashing due to lack of VRAM if you do that. Your GPUs show as having 8GB and the current version of the app was released because, previously, single tasks were crashing on 4GB GPUs. The latest version (1.07) of the app halves the memory use so whilst 2 tasks can easily share an 8GB GPU, 4 tasks may not. It would be much safer to test 0.33 to get 3 tasks per GPU.
If you're not familiar with the 'running out of memory' problem for O3AS tasks, you should read the full announcement thread in Tech News. There were quite a few examples of this that ultimately led to the app change from 1.06 to 1.07.
You really need to read up about VRAM requirements for O3AS tasks :-).
Cheers,
Gary.
Gary, this app_config.xml
)
Gary,
this app_config.xml don't work without the paramater for CPU.
.9 CPU is the same using for this GPU-Task as without app_config.xml.
Ok, doesn't testing .25 GPU, thank you for the Info.
maeax wrote:this
)
Of course it doesn't.
I suggested you use a value of 1.0 - ie. <cpu_usage>1.0</cpu_usage>. In fact, since this budgets for only one thread, you may get better GPU task times by using an even greater value. It all depends on what else is running besides Einstein GPU tasks. If you have lots of CPU tasks as well, both CPU and GPU times will suffer if you don't reserve enough CPU support for each GPU task that is running.
Cheers,
Gary.
Gary, have no
)
Gary,
have no Hyperthreading.
That's not an issue. BOINC
)
That's not an issue. BOINC shows your CPU has 64 processors. If you load too many of them with compute intensive tasks, there is very likely to be a bottleneck somewhere which will impact overall performance.
The <cpu_usage> mechanism is a way to make sure enough cores are reserved for GPU support. It doesn't specify or control how much or when it will be used. Each GPU task takes what it needs precisely when it needs it. If support isn't immediately available when asked for, the task will slow down, perhaps quite a lot.
You don't mention how many CPU type tasks (from any project) you are running in addition to O3AS tasks. If it happens to be a lot, your GPU performance could be degraded and one way to see if that is the case would be to reserve more cores by increasing the <cpu_usage> value. You can often make significant performance improvements by not over-committing your hardware resources.
Cheers,
Gary.