CPU / GPU tuning / multiple work units?

Wiyosaya

Joined: 10 Apr 11

Posts: 6

Credit: 131675186

RAC: 0

5 Dec 2017 2:51:06 UTC

Topic 211618

(moderation:

)

I have a couple of questions regarding tuning my computer

Is it possible to reduce the number of CPU's used by a GPU WU to less than one?
Is it still possible to run multiple WUs on a single GPU? I found this thread, but the link to the app_info.xml settings is dead.

Thank you!

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119486068541

RAC: 26035543

Wiyosaya wrote:Is it possible

5 Dec 2017 3:39:27 UTC

Message 163230

(moderation:

)

Wiyosaya wrote:

Is it possible to reduce the number of CPU's used by a GPU WU to less than one?

The answer depends on what sort of GPU (brand and model) you are intending to use. As your computers are hidden, you won't get a proper answer without providing those details. You could temporarily unhide your computers, or provide a link to the host in question, or tell us all the hardware (CPU and GPU) you are using.

Please realise, a given GPU task needs a certain level of CPU support which can vary considerably. If that support is not rapidly forthcoming when needed, GPU crunch times can suffer. You can easily tweak settings to provide less CPU support but if you do it unwisely it can be very counter productive. The defaults are there to protect users from unwise decisions :-).

Quote:

Is it still possible to run multiple WUs on a single GPU?

This is also very easy to do in a couple of different ways, once the hardware information is known. A GPU task requires a lot of GPU memory so how much your card has is important information. Things are quite different these days from how they were back in 2012 so you should disregard any advice from say more than about a year ago.

If you want the greatest control over both the number of concurrent GPU tasks and the amount of CPU support that BOINC will budget for (which is quite different from what a running GPU task will actually 'take' for itself) you need to become familiar with the application configuration mechanism which uses a config file called app_config.xml (NOT app_info.xml).

There are also project preferences (GPU utilization factor) which will allow concurrent GPU tasks. However this will not allow you to tweak the default CPU support requirement of 1 CPU core per GPU task. app_config.xml does give you that control.

Cheers,
Gary.

Wiyosaya

Joined: 10 Apr 11

Posts: 6

Credit: 131675186

RAC: 0

Thanks for your reply. I

6 Dec 2017 4:52:02 UTC

Message 163246

(moderation:

)

Thanks for your reply.

I have a 6GB GTX 980 Ti running on a Xeon E5-1650v2 (6 cores, 12 threads), and a 3GB GTX 1060 running on a Phenom II 965 X4.

Joseph Stateson

Joined: 7 May 07

Posts: 174

Credit: 3132508399

RAC: 823117

I have been following this

6 Dec 2017 16:19:02 UTC

Message 163254

(moderation:

)

I have been following this thread and had the same questions. #2 of WIYOSAY's can be most easily answered by going to project preferences and select something like "0.5" or less for GPU utilization.

Question 1 is still unanswered. I would like to know if it is possible to set the CPU usage down so it is less than 1. Why? because some of my systems are core 2 quad and only have 4 cores and no hyperthread possibility.

My GTX1070 can run at least 4 apps but each app seems to require a full core. Some projects like GPUGRID show ".9 cpu + 1 nvidia" others (eg seti) show way less like "0.1 cpu + 1 nvidia"

I tried running 2, then 4 apps simultaneously and it worked up until I tried to do something else like open a bunch of windows and run some non-boinc apps (I have another life besides boinc). I have 16gb on this core 2 quad (Dell 730) but the system became so slow it was unusable, then I started getting nvidia driver resets. When I closed the other windows and shut down some running apps the driver reset went away. I even had to terminate chrome as it would not close using the [x]. System went back to normal. Looking at the WUs they didn't error out after the driver reset which was totally unexpected.

If it is actually possible to assign only 1 cpu to handle, for example, 4 Einstein concurrent WUs please advise.

Thanks for looking!

Richie

Joined: 7 Mar 14

Posts: 656

Credit: 1702989778

RAC: 0

BeemerBiker wrote:If it is

6 Dec 2017 20:16:04 UTC

Message 163259 in response to message 163254

(moderation:

)

BeemerBiker wrote:

If it is actually possible to assign only 1 cpu to handle, for example, 4 Einstein concurrent WUs

Yes it's possible. You can edit and save file app_config.xml in Einstein project folder. Use this line to slice one CPU core (or thread) for multiple GPU tasks:

<cpu_usage>.25</cpu_usage>

So the file could contain something like this:

<app_config>
<app>
<name>hsgamma_FGRPB1G</name>
<gpu_versions>
<gpu_usage>.5</gpu_usage>
<cpu_usage>.25</cpu_usage>
</gpu_versions>
</app>
</app_config>

If you have 2 GPUs that should result in running 2 tasks per GPU and dedicating only 1 CPU total for them all.

archae86

Joined: 6 Dec 05

Posts: 3165

Credit: 7379561687

RAC: 2085077

Richie_9 wrote: and

6 Dec 2017 22:05:54 UTC

Message 163263 in response to message 163259

(moderation:

)

Richie_9 wrote:

and dedicating only 1 CPU total for them all.

erm... No.

These parameters govern scheduling and task launch. They do not govern how the tasks, once started, actually run on the machine. Despite the sloppy language used commonly these forums, BOINC does not in fact "dedicate" CPU resources.

If you really want to make all of your GPU tasks share a single CPU core (Why?) you can do it, using something to set the CPU Affinity. While one can do this for a single operating instance of these tasks by just using Process Explorer (for example), that affinity assignment evaporates as soon as the GPU task it serves terminates.

I think most of us who tinker with CPU affinity use Process Lasso. One of the (many) options it gives is to require any task matching a name string you provide to run on a subset (down to 1) of cores as you specify.

I'll warn, strongly, that this is not likely to do just what you expect in terms of either task performance or system impact, but it is a powerful tool.

I, myself, currently run four GPU tasks on two GPUs on a four core system, and have used Process Lasso to restrain the resource appetite of the support tasks with the specific goal of preserving good interactive performance (I'm typing this on that system, which is my daily driver). I did not, however (after tests) restrict CPU affinity of the GPU support task.

Just because the results of my trials usually surprise me, and some results are quite unfavorable, does not mean it is not fun, and sometimes significantly helpful.

I should mention that all my comments reflect experience and understanding from Windows systems.

The configuration options I currently employ in Process Lasso include:

CPU priority, I/O priority, Memory Priority

Richie

Joined: 7 Mar 14

Posts: 656

Credit: 1702989778

RAC: 0

archae86 wrote:Richie_9

6 Dec 2017 23:10:45 UTC

Message 163265 in response to message 163263

(moderation:

)

archae86 wrote:

Richie_9 wrote:
and dedicating only 1 CPU total for them all.

erm... No.

These parameters govern scheduling and task launch. They do not govern how the tasks, once started, actually run on the machine. Despite the sloppy language used commonly these forums, BOINC does not in fact "dedicate" CPU resources.

Thanks for straightening that out. My english is far from native and I understand now I should've used some other word than "dedicate". In my head I had understood Boinc or any project app in itself isn't capable of forcing modern multi-core CPU's how to share work load inside chip).

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119486068541

RAC: 26035543

Wiyosaya wrote:I have a 6GB

8 Dec 2017 6:15:32 UTC

Message 163280 in response to message 163246

(moderation:

)

Wiyosaya wrote:

I have a 6GB GTX 980 Ti running on a Xeon E5-1650v2 (6 cores, 12 threads), and a 3GB GTX 1060 running on a Phenom II 965 X4.

I don't have any recent generation nvidia GPUs. The most recent are 750Tis which run fairly poorly and do suffer if the default allocation of 1 CPU core per GPU task is over-ridden. It's a different story with AMD GPUs. For example, I have a couple of machines with Pentium dual core (G3260) CPUs supporting 2xR7 370 GPUs. Each host runs 1 CPU task and 4 GPU tasks.

I control these hosts using app_config.xml together with BOINC preferences. I set BOINC to use 50% of the available cores. This guarantees that only one CPU task can run and it also stops BOINC from over-fetching CPU work. There can be real problems if BOINC decides to go into panic (high priority) mode because of excess CPU tasks.

For your 12 thread Xeon, if you left the default allocation of 1 CPU thread not able to crunch a CPU task for each GPU task, you would still have the ability to use 10 CPU threads if you were crunching 2 GPU tasks concurrently. You could test that scenario by setting the GPU utilization factor for FGRPB1G tasks to 0.5. As soon as you download new GPU tasks that setting would apply to all GPU tasks in your work cache. You will know if things are working properly by the new crunch time. It should be rather less than double what the single task crunch time was. If it doesn't improve the overall output, you should try changing the BOINC preference for %cores to use to stop an extra CPU task or two until you find the optimum conditions.

For the quad core Phenom II, I would suggest putting it in a different location (generic, home, school, work) from the Xeon as you will likely need different settings for it. If you use an app_config.xml file you can quickly change settings and get pretty quick feedback on whether or not a change is beneficial. You could always start the the GPU utilization factor of 0.5 to get times while crunching 2xCPU tasks and 2xGPU tasks which would be the default outcome for that factor. Then you could try setting up an app_config.xml file (link to documentation was in my previous message). To try allowing 3 CPU tasks to run, you would use these settings:-

1. Set BOINC to use 75% of CPU cores.

2. Install this file (named app_config.xml) in the Einstein project directory. Overrides GPU utilization factor.

<app_config>
    <app>
        <name>hsgamma_FGRPB1G</name>
        <gpu_versions>
            <gpu_usage>0.5</gpu_usage>
            <cpu_usage>0.2</cpu_usage>
        </gpu_versions>
    </app>
</app_config>

Please note that the purpose of the above scheme is to control the CPU tasks allowed to run with the BOINC %cores setting whilst making sure that running two GPU tasks will NOT cause a further core to be prevented from running a CPU task. The setting of 0.2 just needs to be low enough so that twice the setting doesn't add up to a further full core. This is just artificial fiddling to get the mix you want. It does NOT control the actual support being used by GPU tasks. They will take whatever they need.

If you don't provide enough support, performance will suffer as the GPU tasks will just have to fight for CPU cycles when they're needed. If GPU performance is suffering, the only remedy is to restrict (through %cores setting) the number of running CPU tasks. You will be able to get a benefit from running two concurrent GPU tasks. You could try running three but I suspect you won't be able to improve on running two. The only way to find out is to do proper experiments.

Cheers,
Gary.

Joseph Stateson

Joined: 7 May 07

Posts: 174

Credit: 3132508399

RAC: 823117

I tried a couple of things on

17 Dec 2017 16:15:54 UTC

Message 163451

(moderation:

)

I tried a couple of things on both a new generation single 1070TI and and older pair 670 and answered some questions, but raised more. These systems were core 2 quads so total of 4 cores is all I had on these win10x64.

1. The 1070Ti has 8gb but opencl can only use use 4gb. Note sure why, if opencl has 32bit address restriction (i am guessing) then obviously it cannot use all 8. I discovered this as I was unable to run 8 tasks on the 1070, only 4 ran so there was no advantage other than speed over my linux based 1050TI that have only 4gb. I tried setting gpu count to 0.125 which should have given me 8. I also set cpu cores to .5 so to make all 4 available for the 8 tasks.

2. It was also advantageous to run 4 Einstein on my 1050Ti (4gb) uBuntu systems. Although the speed dropped, the fact that 4 tasks were running simultaneously made up for the loss in speed enough to make it worth while. These systems were run on a pair of cheap xeons so 8 cores * 2 for 16 threads were available.

3. I was able to run a pair of Einstein tasks efficiently on each of a pair of gtx670 of 2gb memory each. However something strange happened that I cannot account for. I allowed a single PrimeGrid task to run for a test. I expected that task to bump two of the Einstein tasks out of one of the gtx670 boards putting them into suspension. Instead, three of the Einstein tasks went into suspension. Thus I had only 2 tasks active and 2 CPUs that were otherwise unused. This should not have happened. There is no reason why the other gtx670 was forced to have only 1 Einstein task.

[EDIT]

I am running BOINC 7.8.3. Also want to mention that I did not suspend any of the Einstein tasks, I simply enabled "allow more work" on the PrimeGrid and a total of 6 WUs downloaded before I could revert back to "no more work". However, only 1 of the 6 WUs started. Maybe the question should be "why didn't PrimeGrid start on the other GPU?"

Zalster

Joined: 26 Nov 13

Posts: 3117

Credit: 4050672230

RAC: 0

BEEMERBIKER wrote:I tried a

18 Dec 2017 4:28:19 UTC

Message 163459 in response to message 163451

(moderation:

)

BEEMERBIKER wrote:

I tried a couple of things on both a new generation single 1070TI and and older pair 670 and answered some questions, but raised more. These systems were core 2 quads so total of 4 cores is all I had on these win10x64.

Let's take these one at a time. For the first statement, we need to break it up into it's 2 parts.

Quote:

The 1070Ti has 8gb but opencl can only use use 4gb. Note sure why, if opencl has 32bit address restriction (i am guessing) then obviously it cannot use all 8. I discovered this as I was unable to run 8 tasks on the 1070, only 4 ran so there was no advantage other than speed over my linux based 1050TI that have only 4gb. I tried setting gpu count to 0.125 which should have given me 8.

You have discovered something that has only been known to a few that have bothered to research this particular issue.

OpenCl on Nvidia platforms is restricted to 25% of available RAM of the card. It's a well known fact seen in their documentations. Intel and AMD have a much higher usage between 50-70% of available RAM of the GPU.

Quote:

I also set cpu cores to .5 so to make all 4 available for the 8 tasks.

OpenCl tasks for Einstein require 1 full core apiece. When you set your value to 0.5, that's more of a "guideline" that is quickly ignored by the requirements of the work. In short, you are bottlenecking the system as there aren't enough cores to go around. Work units are having to wait for spare cycles on the CPU to crunch.

I would recommend decreasing the number of task to 4 task total and getting your value to 1.0 That should increase throughput.

Quote:

2. It was also advantageous to run 4 Einstein on my 1050Ti (4gb) uBuntu systems. Although the speed dropped, the fact that 4 tasks were running simultaneously made up for the loss in speed enough to make it worth while. These systems were run on a pair of cheap xeons so 8 cores * 2 for 16 threads were available.

Again, depending on how much RAM the GPU has, you maybe handicapping the system while work units wait for free memory. The best way to find out the best setting is to see how long it takes 1 task, do several. Find average. Then run 2 tasks for a while. Find the average and divide by 2. Run 3 task for a while. Find the average and divide by 3. etc (you get the picture) at some point the time to complete will stop going down and start going up. That means you have gone too far and should use the one with the lowest average run time. For 750ti, I believe it was 2 work units per card.

Quote:

3. I was able to run a pair of Einstein tasks efficiently on each of a pair of gtx670 of 2gb memory each. However something strange happened that I cannot account for. I allowed a single PrimeGrid task to run for a test. I expected that task to bump two of the Einstein tasks out of one of the gtx670 boards putting them into suspension. Instead, three of the Einstein tasks went into suspension. Thus I had only 2 tasks active and 2 CPUs that were otherwise unused. This should not have happened. There is no reason why the other gtx670 was forced to have only 1 Einstein task.

This one is hard to answer since I don't run Primegrid. Yes, it should have bumped only 1 GPU and it's 2 task. Why 3? Don't know, what is the physical memory requirements of that Primegrid work unit? How much RAM are those computers running? There could be issues with accessing the CPU. Is Prime a single precision or double precision? I can't remember. Maybe someone else will have an idea on this one.

Quote:

[EDIT]

I am running BOINC 7.8.3. Also want to mention that I did not suspend any of the Einstein tasks, I simply enabled "allow more work" on the PrimeGrid and a total of 6 WUs downloaded before I could revert back to "no more work". However, only 1 of the 6 WUs started. Maybe the question should be "why didn't PrimeGrid start on the other GPU?"

Maybe how you have resources set up in preferences? Both set at 100% so maybe each gets 50% of your computer? Again, question better answered by some more familiar with how BIONC decides what to run.

Zalster

CPU / GPU tuning / multiple work units?

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner