more CPUs running than what is allowed in preferences

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0
Topic 207997

max CPUs used: 3

2 x CPU task + 2 x (1 CPU + 0.5 Nvidia GPUs) running

What's the reason for that? I've seen this happening a few times in the past, but I don't remember what were the tasks running back then. I know "panic mode" could generally change which project/tasks to run, but is it also able to utilize more CPUs than what is set in the preferences?

* a picture of the situation today: http://imgur.com/a/sFmki

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7219904931
RAC: 952005

Richie_9 wrote:I know "panic

Richie_9 wrote:
I know "panic mode" could generally change which project/tasks to run, but is it also able to utilize more CPUs than what is set in the preferences?

Yes, or at least I have observed it to do so on one of my systems recently.

The short deadlines on the current tuning test run CPU work, combined with the long-known surging difficulties in prefetch and completion estimation in mixed CPU/GPU work loads make it prudent to use extremely short queue time requests at the moment if you are running Einstein work of both CPU and GPU type on a particular PC.  I suggest 0.2 days (no I am not joking).

Harri Liljeroos
Harri Liljeroos
Joined: 10 Dec 05
Posts: 4338
Credit: 3203135779
RAC: 1953319

I have seen the same

I have seen the same happening also. I have two GTX970s and they are running Seti and Einstein tasks. One Seti task or two Einstein tasks at one time. On the CPU I run CPDN and LHC. LHC is running mostly ATLAS tasks with two CPU cores on one task. The problems started with the multicore ATLAS tasks and nowadays I keep one CPU core free for this reason.

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

Now my host was running

Now my host was running that 2 x CPU task + 2 x (1 CPU + 0.5 Nvidia GPUs) setup. I noticed something funny. At first I had set "Store at least 2 days of work". Then all those 4 tasks were running, even though "Use at most..." CPUs was set to 3. CPU tasks at that time running were one tuning task and one of the new regular tasks. I had let Boinc download only one or two bursts of tasks. Then I had hit "Won't get new tasks". Boinc was still hungry to download more tasks at that point.

Okay. Then I changed "Store at least (X) days of work" from 2 to 1.

That caused the regular CPU tasks to change its state to "Waiting to run". Only the tuning task plus two GPU tasks were running.

I think that was somehow funny. It was like with the "2 days" setting Boinc was thinking: "Okay. I screwed up already with scheduling everything... and now I need to force an extra CPU to manage the mess... BUT if you let me download some more tasks now, then I'll screw up the scheduling even more!".

After changing to "1 day" setting Boinc thought: "Okay... so you won't let me download any more tasks this time?... I see... Okay, I was ready to populate the ship to a breaking point and bring in a good show... but okay, we don't need the fourth CPU then. I might be able to manage this present situation even with three CPUs."

It's like Boinc knows how things are going with the deadlines... to a some point. After that point though, it looses ability to see how downloading more work would make things even worse. It would download more and keep utilizing the "emergency" CPU.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117515866936
RAC: 35401205

Richie_9 wrote:Now my host

Richie_9 wrote:
Now my host was running that 2 x CPU task + 2 x (1 CPU + 0.5 Nvidia GPUs) setup. I noticed something funny. At first I had set "Store at least 2 days of work". Then all those 4 tasks were running, even though "Use at most..." CPUs was set to 3.

Unfortunately, this is exactly what BOINC will do if it thinks there could be a deadline issue with CPU tasks.

The best way to prevent BOINC from thinking there might be a deadline issue to to ensure that it doesn't download too much CPU work in the first place.  If you have allowed BOINC to use 3 CPU cores, it will try to fetch enough CPU tasks to feed 3 CPU cores using whatever completion estimate it has available to it (which could be quite low if GPU tasks finishing quickly have driven the DCF right down to too low a value for the 'comfort' of CPU tasks).  You have two choices to compensate for BOINC's tendency to overfetch.  First and easiest is just to set a low enough work cache setting (and don't fiddle with it) as archae86 has suggested on a number of occasions.  The second (and in some ways more preferable) is to let BOINC know that it will only be using one CPU core to crunch CPU tasks.

The settings as you describe them above (assuming your host is a quad core - I haven't checked) should have one CPU task running out of the 3 allowed since a further two are being 'reserved' by the 1 CPU + 0.5 GPUs setting for GPU tasks. However BOINC will still be fetching CPU tasks as if there were 3 cores crunching.  This is a flaw in the way BOINC works (or possibly a flaw in the way GPU utilization factor works) but it is easily worked around.  Just allow BOINC to use 1 core (25% on a quad core host) and using app_config.xml rather than the GPU utilization factor, set <gpu_usage> to 0.5 and <cpu_usage> to less than 0.5 (I use 0.3) so that no further CPUs are 'reserved' by this process.  This way BOINC will know it is only fetching work for 1 CPU core so it shouldn't get into panic mode even if you use a 2 day work cache setting.

The disadvantage with using app_config.xml is that if you ever want to stop using it, you will need to do a full project reset before you can get the GPU utilization factor to work properly.

Quote:
CPU tasks at that time running were one tuning task and one of the new regular tasks. I had let Boinc download only one or two bursts of tasks. Then I had hit "Won't get new tasks". Boinc was still hungry to download more tasks at that point.

That one 'tuning' task with the short deadline would have been what created the 'panic' mode for BOINC.

Quote:
Okay. Then I changed "Store at least (X) days of work" from 2 to 1.

Causing panic mode to be no longer needed.

Quote:
That caused the regular CPU tasks to change its state to "Waiting to run". Only the tuning task plus two GPU tasks were running.

This is all perfectly 'normal' behaviour for BOINC.  Just remember that BOINC will now want to fetch more CPU tasks because it thinks there are 3 cores available to do the crunching instead of just one.  As long as the deadline remains at 14 days, there shouldn't be too much risk of further panic mode activity.  But put things in perspective. Assume you set a 2 day cache. BOINC fetches for 3 cores instead of 1 so you really have 6 days.  The fast finishing GPU tasks reduce the DCF so that the CPU estimate goes below what it really should be.  If that effect were enough to reduce the estimate to say half of the true value, your 2 day cache becomes a 12 day cache for CPU tasks and BOINC would be likely to enter panic mode.

As set out above, you have two choices on how you decide to deal with this.  I actually use both.

 

 

Cheers,
Gary.

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

Gary Roberts wrote:However

Gary Roberts wrote:
However BOINC will still be fetching CPU tasks as if there were 3 cores crunching.

That had been unclear to me. I thought Boinc somehow takes into account the set CPU values for GPU in app_config.xml. Now I understand better.

Quote:
This is a flaw in the way BOINC works (or possibly a flaw in the way GPU utilization factor works) but it is easily worked around.  Just allow BOINC to use 1 core (25% on a quad core host) and using app_config.xml rather than the GPU utilization factor, set <gpu_usage> to 0.5 and <cpu_usage> to less than 0.5 (I use 0.3) so that no further CPUs are 'reserved' by this process.  This way BOINC will know it is only fetching work for 1 CPU core so it shouldn't get into panic mode even if you use a 2 day work cache setting.

That was new. I hadn't thought it that way. I might start using that. Thanks for sharing that tip.

Quote:
This is all perfectly 'normal' behaviour for BOINC.  Just remember that BOINC will now want to fetch more CPU tasks because it thinks there are 3 cores available to do the crunching instead of just one.  As long as the deadline remains at 14 days, there shouldn't be too much risk of further panic mode activity.  But put things in perspective. Assume you set a 2 day cache. BOINC fetches for 3 cores instead of 1 so you really have 6 days.  The fast finishing GPU tasks reduce the DCF so that the CPU estimate goes below what it really should be.  If that effect were enough to reduce the estimate to say half of the true value, your 2 day cache becomes a 12 day cache for CPU tasks and BOINC would be likely to enter panic mode.

And thanks for describing how the DCF actually effects on the cache.

Quote:
As set out above, you have two choices on how you decide to deal with this.  I actually use both.

I haven't really had much trouble with excessive amounts of downloaded tasks. Sorry if I sounded like I was complaining of the behaviour of Boinc. I aimed it to be more like observing... and observing something that I didn't understand enough. I think I understand Boinc a bit more now.

I've kept quite low work cache settings, especially in new situations recently where I haven't yet known how 'aggressively' tasks would be downloading. Another thing is that most of my hosts are connected to internet only when I'm uploading or downloading tasks. I'm watching in real time how things are going... and if I see there would be too much tasks downloading... I'm able to hit 'no more work'. But that's of course only a quick "safety net". It's best to find proper settings and let Boinc schedule how it likes. Still I have to say, I like to micro-manage and restrict and allow things in real time. It makes this hobby more lively (but I only have a few computers).

Thanks for Archae86 also.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.