Problem with BRP4 tasks - Stderr message is "Maximum disk usage exceeded ..."

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5883
Credit: 119032878229
RAC: 24723713
Topic 196707

It's deeper than just a Linux version. I'm running Win7 and many BRP tasks have completed successfully, but just as many have failed. My BOINC version is 7.0.28.

For me, stderr says "maximum disk usage exceeded". I had this problem earlier in the year and for a brief moment, people seemed interested in fixing it. A lot of pointers were sent my way in "how" I should set up several parameters. I did everything everybody said and all seemed to be ok, but only for awhile. BRP's started failing again.

I went a long time without getting any BRP's but recently I have been getting a lot of them. I don't understand the space requirements. I have a 1.5tb drive of which partition C has over 1tb available. With this kind of space available I shouldn't be having any problems in this regard, but yet here we go again. If these were 20 minute tasks, I wouldn't really give a hoot about failures, but these are 13-14 hr tasks and this causes me concern.

Does anybody have any clues as to what is going on or how to correct this?

Cheers,
Gary.

archae86
archae86
Joined: 6 Dec 05
Posts: 3163
Credit: 7343261687
RAC: 2328720

Problem with BRP4 tasks - Stderr message is "Maximum disk usage

Quote:

For me, stderr says "maximum disk usage exceeded".

With this kind of space available I shouldn't be having any problems in this regard, but yet here we go again.

The first possibility is that everything is working as intended. To check that, you could first check which location (a.k.a. "venue") your troubled host is set to, then check in the Computing Preferences section of your Einstein account what all the disk usage parameters (5) are set to, and post them here.

Then you could check in boincmgr|Tools|Computing Preferences on the offending host to check whether the general disk parameters set on the web site are currently being overridden by local settings on the host.

Lots of folks are running these tasks successfully, so the surprise-free hypothesis is that there is something different than usual about your system--and these parameters would be the place to start.

mdawson
mdawson
Joined: 23 Feb 05
Posts: 77
Credit: 6575069
RAC: 0

Archae86, Ok.... here's my

Archae86,

Ok.... here's my event log. Do you see anything in there that could be a next step in diagnosis?

12/25/2012 3:22:09 AM | | Starting BOINC client version 7.0.28 for windows_x86_64
12/25/2012 3:22:09 AM | | log flags: file_xfer, sched_ops, task
12/25/2012 3:22:09 AM | | Libraries: libcurl/7.25.0 OpenSSL/1.0.1 zlib/1.2.6
12/25/2012 3:22:09 AM | | Data directory: C:\ProgramData\BOINC
12/25/2012 3:22:09 AM | | Running under account mdawson
12/25/2012 3:22:09 AM | | Processor: 8 GenuineIntel Intel(R) Core(TM) i7 CPU 960 @ 3.20GHz [Family 6 Model 26 Stepping 5]
12/25/2012 3:22:09 AM | | Processor: 256.00 KB cache
12/25/2012 3:22:09 AM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 syscall nx lm vmx tm2 popcnt pbe
12/25/2012 3:22:09 AM | | OS: Microsoft Windows 7: Professional x64 Edition, Service Pack 1, (06.01.7601.00)
12/25/2012 3:22:09 AM | | Memory: 12.00 GB physical, 24.00 GB virtual
12/25/2012 3:22:09 AM | | Disk: 1.36 TB total, 1.14 TB free
12/25/2012 3:22:09 AM | | Local time is UTC -8 hours
12/25/2012 3:22:09 AM | | NVIDIA GPU 0: GeForce GTX 680 (driver version 306.97, CUDA version 5.0, compute capability 3.0, 2048MB, 8382276MB available, 3252 GFLOPS peak)
12/25/2012 3:22:09 AM | | OpenCL: NVIDIA GPU 0: GeForce GTX 680 (driver version 306.97, device version OpenCL 1.1 CUDA, 2048MB, 8382276MB available)
12/25/2012 3:22:09 AM | | Config: use all coprocessors
12/25/2012 3:22:09 AM | | Config: don't use GPUs while launcher.exe is running
12/25/2012 3:22:09 AM | | Config: don't use GPUs while swtor.exe is running
12/25/2012 3:22:09 AM | Collatz Conjecture | URL http://boinc.thesonntags.com/collatz/; Computer ID 49610; resource share 100
12/25/2012 3:22:09 AM | Einstein@Home | URL http://einstein.phys.uwm.edu/; Computer ID 2221041; resource share 100
12/25/2012 3:22:09 AM | Einstein@Home | General prefs: from Einstein@Home (last modified 14-Oct-2012 17:13:50)
12/25/2012 3:22:09 AM | Einstein@Home | Computer location: home
12/25/2012 3:22:09 AM | Einstein@Home | General prefs: no separate prefs for home; using your defaults
12/25/2012 3:22:09 AM | | Reading preferences override file
12/25/2012 3:22:09 AM | | Preferences:
12/25/2012 3:22:09 AM | | max memory usage when active: 6143.21MB
12/25/2012 3:22:09 AM | | max memory usage when idle: 6143.21MB
12/25/2012 3:22:09 AM | | max disk usage: 20.00GB
12/25/2012 3:22:09 AM | | max CPUs used: 6
12/25/2012 3:22:09 AM | | (to change preferences, visit the web site of an attached project, or select Preferences in the Manager)
12/25/2012 3:22:09 AM | | Not using a proxy

BOINC Manager Disk and Memory Usage tab indicates:
Use at most 20gb
Leave at most 1gb
Use at most 50%
Checkpoint 60 secs
Use at most 50% page file

I remember changing these parameters when I last tried to figure out what was going on. It's entirely possible that I have a mis-config somewhere, but I'll be darned if I can spot it.

archae86
archae86
Joined: 6 Dec 05
Posts: 3163
Credit: 7343261687
RAC: 2328720

RE: from the event log: max

Quote:

from the event log:
max disk usage: 20.00GB

... and then from your list:

Use at most 20gb


That could be a problem--depending on the queue lengths you choose to run, the application mix your choices interacting with Einstein's bestowal give you, leftovers not deleted, and such.

I'm not sure just what set of directories come under this limit, but on my daily driver system, which is has a Nehalem-type processor and a single GTX460 video card, Windows Explorer says the sum of all files under the
Program Data/BOINC
directory is at 12 Gbyte at the moment.

I, personally, have that limit set to 60 Gbytes on all my locations.

Given that you have lots of available storage, and a history of this type of trouble, why not bump it up, a lot? Say, maybe 100 Gb. I don't think it "reserves" the space in any sense, although setting the limit to lots less than your total disk gives you some protection in the case of a runaway application.

Also, it might be informative for both us and you to check the disk usage under Program Data/BOINC and report here.

Good luck

archae86
archae86
Joined: 6 Dec 05
Posts: 3163
Credit: 7343261687
RAC: 2328720

mdawson wrote:12/25/2012

mdawson wrote:
12/25/2012 3:22:09 AM | Einstein@Home | Computer location: home
12/25/2012 3:22:09 AM | Einstein@Home | General prefs: no separate prefs for home; using your defaults
12/25/2012 3:22:09 AM | | Reading preferences override file
12/25/2012 3:22:09 AM | | Preferences:
12/25/2012 3:22:09 AM | | max memory usage when active: 6143.21MB
12/25/2012 3:22:09 AM | | max memory usage when idle: 6143.21MB
12/25/2012 3:22:09 AM | | max disk usage: 20.00GB
12/25/2012 3:22:09 AM | | max CPUs used: 6
12/25/2012 3:22:09 AM | | (to change preferences, visit the web site of an attached project, or select Preferences in the Manager)

One point I was unsure of when I posted my first response but now need to say is that your event log makes clear that on this host you are not running using the web-site preferences, but have instead intervened locally (probably by going to boingmgr|tools|computing preferences). So to change any computing preference you would need to go there again to make changes, or go there and disable the local over-ride so that the web site preferences control again.

For comparison, here is a fragment of my startup event log on a host for which I have the location (venue) set to default, and which has no local override operating)

12/26/2012 9:46:39 AM | Einstein@Home | General prefs: from Einstein@Home (last modified 25-Dec-2012 19:51:46)
12/26/2012 9:46:39 AM | Einstein@Home | Host location: none
12/26/2012 9:46:39 AM | Einstein@Home | General prefs: using your defaults
12/26/2012 9:46:39 AM |  | Preferences:
12/26/2012 9:46:39 AM |  | max memory usage when active: 12286.09MB
12/26/2012 9:46:39 AM |  | max memory usage when idle: 12286.09MB
12/26/2012 9:46:39 AM |  | max disk usage: 60.00GB
12/26/2012 9:46:39 AM |  | max CPUs used: 3
12/26/2012 9:46:39 AM |  | (to change preferences, visit the web site of an attached project, or select Preferences in the Manager)
Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5883
Credit: 119032878229
RAC: 24723713

RE: It's deeper than just a

Quote:
It's deeper than just a Linux version. I'm running Win7 and many BRP tasks have completed successfully, but just as many have failed. My BOINC version is 7.0.28.


In the other thread, the problem was failure of the new FGRP2 tasks with messages like, "Task LAT##### exited with a zero status but no 'finished' file" or "00:41:33 (453): No heartbeat from client for 30 sec - exiting". These are related messages to do with premature exiting of the FGRP2 science app and have no link to error messages about disk usage for the BRP4 science app. The Binary Radio Pulsar search is a totally different sub-project.

Quote:

For me, stderr says "maximum disk usage exceeded". I had this problem earlier in the year and for a brief moment, people seemed interested in fixing it. A lot of pointers were sent my way in "how" I should set up several parameters. I did everything everybody said and all seemed to be ok, but only for awhile. BRP's started failing again.

I went a long time without getting any BRP's but recently I have been getting a lot of them. I don't understand the space requirements. I have a 1.5tb drive of which partition C has over 1tb available. With this kind of space available I shouldn't be having any problems in this regard, but yet here we go again. If these were 20 minute tasks, I wouldn't really give a hoot about failures, but these are 13-14 hr tasks and this causes me concern.

Does anybody have any clues as to what is going on or how to correct this?


Archae86 is giving you the correct advice so read it carefully. You need to tell us how much disk space BOINC is currently using and it would also be a good idea to set your 20GB limit to something a bit higher until we work out exactly what is going on.

There is another preference setting that I don't think you have mentioned, 'Disk: use at most xxx% of total'. What do you have that set at?

EDIT: Sorry, on looking back I see you have that at 50%. Try setting that to 95% or 100%. If you have that at 50%, I think that BOINC may start complaining once you reach half of your (current) 20GB limit. I'm not sure about that but many years ago I found I could stop the complaints by setting that to 100%. I've had no further problems after that. I only have a 10GB limit but I'm going to double that right now before any problems start :-).

EDIT2: On more reflection, I'm sure I'm wrong in what I said above about the 50% setting. I believe that with that setting at 50%, BOINC can only continue to function if at least half of the whole partition on which BOINC is installed remains free. So if you had a 1TB partition where BOINC was installed and you had 500GB of other stuff stored there and 500GB free, BOINC would not be allowed to store anything with a setting of 50%, even though you had plenty of free space.

So, for the partition on which BOINC is installed, can you supply the following values:

  • * Full size of the partition
    * Size of the free space left
    * Size of all non-BOINC usage
    * Size of all BOINC usage

2+3+4 above should equal 1.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5883
Credit: 119032878229
RAC: 24723713

RE: It's deeper than just a

Quote:

It's deeper than just a Linux version. I'm running Win7 and many BRP tasks have completed successfully, but just as many have failed. My BOINC version is 7.0.28.

For me, stderr says "maximum disk usage exceeded"....


When I first read your report about this problem, I took at face value, your statement that BRP (Binary Radio Pulsar Search #4 - using Arecibo radio telescope data) tasks were involved. At the time, I was supposed to be elsewhere, so I quickly moved all the posts to a separate thread, where they really need to be anyway as the 'disk usage' problem is nothing to do with the 'premature exits' problem. I didn't have time then to double check the accuracy of your report. I just wanted to clean up the thread hijack quickly.

I've now had time to look more closely at your full list of results. You don't have any errors at all with BRP4 tasks. All your 'disk usage' errors are indeed with FGRP tasks.

I was quite surprised to find this because I had imagined you might have lots of BRP4 tasks eating up your disk space. I'm also puzzled to note the differences in the output for a 'good' task compared to an 'error' task. Your 'error' tasks run for about 300 secs which is almost enough time to get to the first checkpoint but produce no 'normal' output, just the error message. By comparison, there is a lot of output up to the time of the first checkpoint in a 'good' task.

That caused me to think more about the error message, "Maximum disk usage exceeded". When you receive tasks, data about them is stored in the state file, client_state.xml. Here is what is stored for a FGRP task on one of my hosts.

    LATeah0004U_848.0_15140_0.0
    hsgamma_FGRP2
    1
    15000000000000.000000
    300000000000000.000000
    350000000.000000
    20000000.000000
    
     .....
    
    
        LATeah0004U.dat
        LATeah0004U.dat
    
    
        skygrid_LATeah0004U_0848.0.dat
        skygrid_LATeah0004U_0848.0.dat
    
    
        JPLEPH.405
        JPLEPH.405
    

Note the line with . It specifies the maximum disk space to be used by the task - in the above example 20,000,000 bytes (20MB). For some reason some of your tasks must be exceeding that value in the disk space they need when they first start to crunch. I believe that the error message is not about how much space BOINC is using. It's about what an individual task is trying to use. This is not something you have any control over through preference settings so please forget previous comments about those settings.

Crunching of a task is done in one of the sub-directories under your 'slots' directory which is in your BOINC Data directory. I've just checked the size of mine and they are all very low - about 13KB - way below the limit. How about you check the size of all of yours? They have names '0', '1', '2', ... Can you find any that are close to or above 20MB. If you can then there's your answer. Perhaps there's a large rogue file of some sort left in a slot dir that's not being deleted but is causing the over-limit condition when a new task starts to crunch there. You can easily tell if any particular slot dir is being used to crunch FGRP tasks. It will contain a link to the hsgamma_FGRP executable.

Let us know what you can find.

Cheers,
Gary.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 798163107
RAC: 1204213

Hi! Some time ago a

Hi!

Some time ago a similar problem was reported, see this thread:

http://einsteinathome.org/node/196455&nowrap=true#118497

which involved old stderr.* dumps in slot directories. Might be worth checking if this is a similar case.

Cheers
HB

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.