Any idea why I might be getting "Maximum disk usage exceeded" errors on this host? I've been getting the same errors on MilkyWay@Home but haven't gotten any feedback. I switched to Einstein as a test to see if I had the same problem and it looks like maybe I do.
I've upgraded to the latest driver from AMD and am now concerned I have a hardware problem.
Thanks for any feedback.
MarkR
Copyright © 2024 Einstein@Home. All rights reserved.
Maximum disk usage exceeded
)
Well, the first question would be: what have you got the disk usage limits set to? And the second, how does that compare to actual disk space in use for BOINC?
First:
look at that computer's details on your account page at Einstein, and determine which location (aka venue) it is assigned to default, home, work, or school.
Second:
starting from your account page, go to computing preferences, and look at the disk entries for the location your computer is assigned to.
There are three distinct entries, you could be violating any of them. Two are "use at most" restrictions, one expressed in Gigabytes and another in percentage. The third is a "leave free at least" expressed in Gigabytes.
Lastly:
While the above limits set from your account web page apply unless you have set a local preference on the specific host, if they appear not to be violated check for a local preference.
Go go Boincmgr|Tools|Computing Preferences|disk and memory usage
If you have set any limits here, they take precedence over the ones set using the web site.
Let us know what you find out.
Thanks, archae86. I think
)
Thanks, archae86. I think everything you asked about and recommended I check follows:
I have local preferences set that are pretty wide-open compared to what I need (I think!). The host has an 1TB HDD with only about 250GB used. Local disk limits are set to:
Use at most -- 150GB (most restrictive)
Leave at least -- 0.1 GB (least restrictive)
Use at most -- 50% of total (less restrictive)
The BOINC manager shows 26 GB is used for BOINC with 124GB available and that Einstein is using less than 300MB.
I think the error is thrown
)
I think the error is thrown by boinc, not by anything to do with an individual project, so it is probably pointless to check space used specifically by Einstein, for instance.
The interplay between the local limits (which are stated to prevail), and the web site ones is mildly mysterious in detail to me.
Unless you have other hosts on other projects relying on the web site preference values, maybe to rule out that source of trouble it would be wise to make sure all three limits are very unrestrictive for all four locations as set by the Einstein web site (then be sure to hit update on boincmgr with Einstein selected).
I admit this seems unlikely to help. Perhaps someone else will come along who will have another idea.
RE: Unless you have other
)
I have two other hosts running Einstein with similarly non-restrictive disk limitations and using the same location/preferences as the problem host and they are having no issues at all...
Actually, I think ritterm was
)
Actually, I think ritterm was on the right lines in his post at Milkyway, where he posted the associated with the individual task.
The exit code for EXIT_DISK_LIMIT_EXCEEDED was one of a batch added in April 2012 - the ones which are still causing problems for projects, like this one, which haven't updated their web code to handle the plain-language descriptions for the new codes.
The full changeset includes the replacement of
Error code:
-#define ERR_RSC_LIMIT_EXCEEDED -177 (resource limit exceeded)
with three specific Exit codes:
+#define EXIT_DISK_LIMIT_EXCEEDED 196
+#define EXIT_TIME_LIMIT_EXCEEDED 197
+#define EXIT_MEM_LIMIT_EXCEEDED 198
All of those are to do with the individual resource limits for each task - the time limit, in particular, makes no sense as an overall BOINC preference setting.
So, my next enquiry would be: what's going on in the slot directory while these tasks are running? Any task-related working files would be written there.
I now think I might have a
)
I now think I might have a GPU hardware problem. Many of the tasks I've checked that errored out for me have been completed by other hosts without a problem. If the tasks I ran had a bad parameter, would the same task work for another host?
RE: I now think I might
)
Depends on the failure mode of the GPU. If it writes very verbose error logs, you might exceed the disk bound, while users with working GPUs might not. Have a look at what it writes into the slot directory while running.
RE: Have a look at what it
)
I didn't have a chance to test here to see what was going on in the slots my failing tasks were using. However... Following the suggestion of a forum post about a similar problem at another project, I checked all my host's slots directories and found two "stray" VM image files left by one of the VM projects (probably CERN's CMS-dev), each of which was over 5GB. I deleted those files and slots and that seems to have solved my problem.
I'm not sure I understand, though, why those slots presented a problem. Could BOINC have tried to use them thinking they were empty only to find a large file which exceeded the disk limit? If so, did the VM task not clean something up like it should have or is BOINC not managing the slots properly?
RE: RE: Have a look at
)
That does sound like the most plausible explanation so far, and the size of 'over 5 GB' matches your report of 'Peak disk usage 5,741.01 MB' in the Milkyway thread.
I think it's BOINC, rather than the project supplying the VM image, which is responsible for cleaning the slot files, but I'm not sure exactly what rules, or how many rules, are supposed to be followed. Choose from:
1) After successful task completion
2) After unsuccessful task completion (crash)
3) At BOINC startup
4) Before new task startup
I have a suspicion that rules (1) and (3) are active, but I'm not sure about the others: that might be one that Jord could run past the developers?
I reported this problem to
)
I reported this problem to the BOINC developers, and got this reply from David Anderson:
So, help needed.
Under what circumstances does the CMS .vdi image get left behind? Is there a difference between successful task completions and abnormal (error) exits?
Can the .vdi be deleted manually? Immediately? Later? After BOINC restart? After reboot?
Does BOINC ever clean it up by itself, say after a client restart?
And anything else you can think of.