How do you get your GPU to run more than one task at a time?
On your account page, click the 'Preferences' link and then click on 'project' preferences. Scroll down until you find the various GPU Utilization factors (all 1.0 by default). You need to set the one for FGRP tasks to 0.5 (each task 'uses' half a GPU) in order to be able to run 2 tasks concurrently. Each task will 'reserve' a full CPU core for support so if running two GPU tasks there will be fewer CPU tasks running (if you are running CPU tasks at all).
The change to running two GPU tasks will not be immediate, even if you 'update' the project. The change is notified to the client the next time new work is fetched. You can speed up the change by triggering a work fetch (make a small increase in the size of your cached work). You can reverse this work fetch change once it has had the desired effect.
If you have several machines with different GPUs, you need to be careful to segregate any that are not suited to running multiple GPU tasks or if you need different numbers of concurrent tasks on different machines. You can achieve this by using 'locations' and setting different preferences for the different locations. There are up to 4 locations available to choose - generic, home, school, work. If you need more control options than the GPU utilization factor allows and if you want instant changes without having to wait for new work, the app_config.xml mechanism is the ultimate method for finer grained control on an individual host basis. It looks complicated at first glance but it's really not, once you have read and understood the documentation.
This is running stock clocks in case anyone is interested, and running x3 at 1400s each which works out about 640K RAC once things get stable. I think about 7K from CPU tasks as well.
The power draw at wall is about 300W, although i'd have to average over an time interval to get a good figure. Temps are 79C
If you are using published RAC numbers or if you are taking them from the host_id file, are you assuming the hosts are not crunching other CPU tasks - or are you counting them out? They do make a bit of a difference and RAC values take nearly a month to settle to a steady state.
hth
Hi AgentB, do you know the power consumption figure w/o crunching with the CPU?
My 3 x 1070 EVGA Hybrid host produces > 1.5MM/Day (real production not RAC), runnig 2 WU @ time on each GPU (could do a little more with 3@time) + 4 S@H CPU WU on the CPU (could do 6 since my CPu has 6 cores/12threads) on a X99 MB with a total of less than 500 Watts from the wall (492W measured by kill a watt) all running at stock speed (No OC).
How do you get your GPU to run more than one task at a time?
On your account page, click the 'Preferences' link and then click on 'project' preferences. Scroll down until you find the various GPU Utilization factors (all 1.0 by default). You need to set the one for FGRP tasks to 0.5 (each task 'uses' half a GPU) in order to be able to run 2 tasks concurrently. Each task will 'reserve' a full CPU core for support so if running two GPU tasks there will be fewer CPU tasks running (if you are running CPU tasks at all).
The change to running two GPU tasks will not be immediate, even if you 'update' the project. The change is notified to the client the next time new work is fetched. You can speed up the change by triggering a work fetch (make a small increase in the size of your cached work). You can reverse this work fetch change once it has had the desired effect.
If you have several machines with different GPUs, you need to be careful to segregate any that are not suited to running multiple GPU tasks or if you need different numbers of concurrent tasks on different machines. You can achieve this by using 'locations' and setting different preferences for the different locations. There are up to 4 locations available to choose - generic, home, school, work. If you need more control options than the GPU utilization factor allows and if you want instant changes without having to wait for new work, the app_config.xml mechanism is the ultimate method for finer grained control on an individual host basis. It looks complicated at first glance but it's really not, once you have read and understood the documentation.
Thanks! I have segregated this machine per your suggestion. I set it to 0.5, as you provided in your example. Is there a general formula that you use? For instance, should I be more concerned about the number of CUDA/Stream cores I have allocated, or the VRAM utilization, or both?
Hi AgentB, do you know the power consumption figure w/o crunching with the CPU?
ran the power meter over a couple of hours as the wattage values tend to fluctuate quite a bit and taken an average via the (Total kWhr / Total hours) method
I have segregated this machine per your suggestion. I set it to 0.5, as you provided in your example.
As I tried to suggest, you think of the value you set as the fractional part of a GPU that the task will occupy. This is just an artificial thought process. In reality (I'm no expert in the finer details), there aren't multiple tasks running simultaneously at all - just a single task at any instant. The increase in efficiency comes from the GPU being able to begin work on the second task whenever the other task has to pause for things like loading/unloading to/from main memory <==> GPU memory. It's undoubtedly a lot more complicated than this simplified explanation but the idea is to keep the GPU working close to 100% of the time if possible. You may get further small improvements by running three tasks (factor=0.33). The only way to find out is to experiment and make careful measurements over quite a number of completed tasks.
In reviewing the results of the linked machine, I see two quite different completion time ranges. Some tasks take less than 800sec whereas others take close to 2500sec. These go back before you started asking questions about running multiple tasks so I don't think it's anything to do with that. Do these differences mean that the two GPUs are not both 980Tis?? It's not wise to give advice about parameters to set if the two GPUs involved are not pretty much the same model. A lower grade GPU might not be able to handle the conditions.
Michamus wrote:
Is there a general formula that you use? For instance, should I be more concerned about the number of CUDA/Stream cores I have allocated, or the VRAM utilization, or both?
Yeah, it's called the 'suck it and see' formula :-). Since there is really only one task running at any instant, you don't need to concern yourself about the allocation of GPU resources. You just experiment with different parameters until you find what works the best. For most mid-range to high end GPUs, running 2 concurrent tasks will make a significant improvement. Going to 3 may give you a little more but at best it will be marginal. After a certain number, performance is very likely to degrade. Another point to be aware of is that each GPU task requires around 1GB of GPU RAM. You can't run 2 tasks per GPU unless you have at least 2GB RAM.
Whilst tweaking the GPU utilization factor is very convenient, it has two particular disadvantages. I've already mentioned that you don't get a change without new work being downloaded. The second disadvantage is that if you are also doing CPU tasks (as you are), the running of two concurrent tasks on each GPU will remove 4 of your CPU cores from being able to crunch CPU tasks. That's not the problem - that has to happen. The problem is that BOINC still sees 8 cores and will fetch CPU tasks for all 8 - double the number you should be getting with just 4 cores able to crunch. It won't necessarily cause work to not be returned in time, as long as you keep your work cache settings relatively small, but it is inconvenient to have way more tasks than you really need. Compounded with other factors, it can increase the risk of missing deadlines.
Both disadvantages can be avoided by using an app_config.xml file which will override any GPU utilization factor settings. To run 2 concurrent tasks on each GPU (4 in total) you could return the factor to the default value of 1.0 and use the following text file (called exactly app_config.xml) in your Einstein project directory (check the documentation link in my previous message).
The two important parameters are gpu_usage and cpu_usage. 0.5 for the first indicates that each GPU can run 2 concurrent tasks. 0.2 for the second means that the 4 concurrent GPU tasks in total will not reserve a full CPU core. However you do still need to reserve 4 CPU cores for GPU support duties so you do this in such a way that BOINC knows that these are unavailable for crunching CPU tasks. You set local preferences (which override website prefs) to allow BOINC on this machine to use 50% of the CPU cores for CPU crunching (4 out of the total 8). That way BOINC knows it only needs to fetch enough CPU work for 4 cores and not the full 8.
It's important to realise that you can set cpu_usage to whatever you want to meet the conditions you choose to use. If you wanted to run 3 GPU tasks per GPU, you would set it to something less than 0.166 (0.1 would be fine) so that 6 times the value chosen is less than 1.0. You would then set the % of CPU cores to 25% so that there would be 6 out of 8 available to support the 6 GPU tasks. None of this has any harmful effect on the CPU support provided for the GPU tasks.
However I strongly suggest you tell us exactly what is in the machine hardware wise and what other projects you wish to run on it before you go making any changes.
I'd like to run my system at low fan speeds. Do you know of any way to control voltage/freq and fan speed in linux? I looked around and find any. The amd overdrive control wouldn't install on ubuntu 16.04. Thanks.
I'd like to run my system at low fan speeds. Do you know of any way to control voltage/freq and fan speed in linux? I looked around and find any. The amd overdrive control wouldn't install on ubuntu 16.04. Thanks.
Target temp for default AMDGPU-PRO driver is 80'C, it runs fans as quiet as possible to keep temp below that.
Michamus wrote:How do you get
)
On your account page, click the 'Preferences' link and then click on 'project' preferences. Scroll down until you find the various GPU Utilization factors (all 1.0 by default). You need to set the one for FGRP tasks to 0.5 (each task 'uses' half a GPU) in order to be able to run 2 tasks concurrently. Each task will 'reserve' a full CPU core for support so if running two GPU tasks there will be fewer CPU tasks running (if you are running CPU tasks at all).
The change to running two GPU tasks will not be immediate, even if you 'update' the project. The change is notified to the client the next time new work is fetched. You can speed up the change by triggering a work fetch (make a small increase in the size of your cached work). You can reverse this work fetch change once it has had the desired effect.
If you have several machines with different GPUs, you need to be careful to segregate any that are not suited to running multiple GPU tasks or if you need different numbers of concurrent tasks on different machines. You can achieve this by using 'locations' and setting different preferences for the different locations. There are up to 4 locations available to choose - generic, home, school, work. If you need more control options than the GPU utilization factor allows and if you want instant changes without having to wait for new work, the app_config.xml mechanism is the ultimate method for finer grained control on an individual host basis. It looks complicated at first glance but it's really not, once you have read and understood the documentation.
Cheers,
Gary.
AgentB wrote:chester_4
)
Hi AgentB, do you know the power consumption figure w/o crunching with the CPU?
For information only. My 3 x
)
For information only.
My 3 x 1070 EVGA Hybrid host produces > 1.5MM/Day (real production not RAC), runnig 2 WU @ time on each GPU (could do a little more with 3@time) + 4 S@H CPU WU on the CPU (could do 6 since my CPu has 6 cores/12threads) on a X99 MB with a total of less than 500 Watts from the wall (492W measured by kill a watt) all running at stock speed (No OC).
Gary Roberts wrote:Michamus
)
Thanks! I have segregated this machine per your suggestion. I set it to 0.5, as you provided in your example. Is there a general formula that you use? For instance, should I be more concerned about the number of CUDA/Stream cores I have allocated, or the VRAM utilization, or both?
Trotador wrote:Hi AgentB, do
)
ran the power meter over a couple of hours as the wattage values tend to fluctuate quite a bit and taken an average via the (Total kWhr / Total hours) method
2 CPU +3GPU tasks average 275W
Disabling the CPU tasks
0 CPU + 3GPU tasks average 225W
Standard clocks. hth
Michamus wrote:I have
)
As I tried to suggest, you think of the value you set as the fractional part of a GPU that the task will occupy. This is just an artificial thought process. In reality (I'm no expert in the finer details), there aren't multiple tasks running simultaneously at all - just a single task at any instant. The increase in efficiency comes from the GPU being able to begin work on the second task whenever the other task has to pause for things like loading/unloading to/from main memory <==> GPU memory. It's undoubtedly a lot more complicated than this simplified explanation but the idea is to keep the GPU working close to 100% of the time if possible. You may get further small improvements by running three tasks (factor=0.33). The only way to find out is to experiment and make careful measurements over quite a number of completed tasks.
In reviewing the results of the linked machine, I see two quite different completion time ranges. Some tasks take less than 800sec whereas others take close to 2500sec. These go back before you started asking questions about running multiple tasks so I don't think it's anything to do with that. Do these differences mean that the two GPUs are not both 980Tis?? It's not wise to give advice about parameters to set if the two GPUs involved are not pretty much the same model. A lower grade GPU might not be able to handle the conditions.
Yeah, it's called the 'suck it and see' formula :-). Since there is really only one task running at any instant, you don't need to concern yourself about the allocation of GPU resources. You just experiment with different parameters until you find what works the best. For most mid-range to high end GPUs, running 2 concurrent tasks will make a significant improvement. Going to 3 may give you a little more but at best it will be marginal. After a certain number, performance is very likely to degrade. Another point to be aware of is that each GPU task requires around 1GB of GPU RAM. You can't run 2 tasks per GPU unless you have at least 2GB RAM.
Whilst tweaking the GPU utilization factor is very convenient, it has two particular disadvantages. I've already mentioned that you don't get a change without new work being downloaded. The second disadvantage is that if you are also doing CPU tasks (as you are), the running of two concurrent tasks on each GPU will remove 4 of your CPU cores from being able to crunch CPU tasks. That's not the problem - that has to happen. The problem is that BOINC still sees 8 cores and will fetch CPU tasks for all 8 - double the number you should be getting with just 4 cores able to crunch. It won't necessarily cause work to not be returned in time, as long as you keep your work cache settings relatively small, but it is inconvenient to have way more tasks than you really need. Compounded with other factors, it can increase the risk of missing deadlines.
Both disadvantages can be avoided by using an app_config.xml file which will override any GPU utilization factor settings. To run 2 concurrent tasks on each GPU (4 in total) you could return the factor to the default value of 1.0 and use the following text file (called exactly app_config.xml) in your Einstein project directory (check the documentation link in my previous message).
The two important parameters are gpu_usage and cpu_usage. 0.5 for the first indicates that each GPU can run 2 concurrent tasks. 0.2 for the second means that the 4 concurrent GPU tasks in total will not reserve a full CPU core. However you do still need to reserve 4 CPU cores for GPU support duties so you do this in such a way that BOINC knows that these are unavailable for crunching CPU tasks. You set local preferences (which override website prefs) to allow BOINC on this machine to use 50% of the CPU cores for CPU crunching (4 out of the total 8). That way BOINC knows it only needs to fetch enough CPU work for 4 cores and not the full 8.
It's important to realise that you can set cpu_usage to whatever you want to meet the conditions you choose to use. If you wanted to run 3 GPU tasks per GPU, you would set it to something less than 0.166 (0.1 would be fine) so that 6 times the value chosen is less than 1.0. You would then set the % of CPU cores to 25% so that there would be 6 out of 8 available to support the 6 GPU tasks. None of this has any harmful effect on the CPU support provided for the GPU tasks.
However I strongly suggest you tell us exactly what is in the machine hardware wise and what other projects you wish to run on it before you go making any changes.
Cheers,
Gary.
AgentB wrote: 2 CPU +3GPU
)
I'd like to run my system at low fan speeds. Do you know of any way to control voltage/freq and fan speed in linux? I looked around and find any. The amd overdrive control wouldn't install on ubuntu 16.04. Thanks.
Vasishk-Taneya wrote: I'd
)
Hmm, not sure running at low speeds is wise without undervolting, but some clues to both at Phoronix
Vasishk-Taneya wrote: I'd
)
Target temp for default AMDGPU-PRO driver is 80'C, it runs fans as quiet as possible to keep temp below that.
I'm using this script https://github.com/DominiLux/amdgpu-pro-fans for higher fan speeds, temp around 70'C.