Apologies if this is covered elsewhere and I have missed it.
Since upgrading to the newest BOINC client, the results from my main crunching box are all showing as "Aborted by user." I have changed no other settings either to the software or on the machine, and needless to say I have not manually aborted any of them.
Is this a known error/behavior? Does anyone have any troubleshooting suggestions?
Copyright © 2024 Einstein@Home. All rights reserved.
Results showing "Aborted by user"
)
This is actually a mis-reported error message, due to the old version of the website server code in use at this project.
The individual tasks each show "Exit status 200 (0xc8)", and looking at a list of BOINC exit codes rather newer than this site displays, that is
#define EXIT_UNSTARTED_LATE 200
- in other words, your computer hadn't even started running the tasks before the deadline was reached. (You can check that the supposed 'abort' happened a few seconds after the deadline for each task.)
I'm afraid you'll have to look at BOINC Manager locally, especially the Event Log, to see if you can find a reason why the tasks aren't even being started - there's no clue in the task reports I can see.
Thank you for the info...but
)
Thank you for the info...but that's rather odd, since the machine has been siting there cranking out work units the entire time. There was at one point a rather large backlog of completed results to be submitted, but a simple restart remedied that.
I'll dig around and see what I can find. If nothing obvious presents itself, I'll simply reinstall and see if that takes care of it.
RE: Thank you for the
)
Could your cache be too big with the latest units? You are taking about 5 days to return your units and some of the latest units seem to have short deadlines, ie:
25 Apr 2015 23:31:45 UTC 2 May 2015 23:31:45 UTC
where as other units have deadlines like:
25 Apr 2015 23:31:45 UTC 9 May 2015 23:31:45 UTC
It looks like the gpu units have shorter deadlines than the cpu units, which could be a problem since Boinc doesn't differentiate between them when it comes to the cache.
I doubt that is the issue, as
)
I doubt that is the issue, as this machine has been crunching in the vicinity of a100,000 pt/day clip since I added it...right up until the newest update of BOINC.
I am still trying to determine the cause of the problem. I uninstalled and reinstalled the latest version, and though the client claimed work units were processing--as it has since the update--a further check of CPU activity showed that no actual work was being done. I then uninstalled again, deleted all data, reinstalled the previous version of BOINC, and things are back to normal.
When I have some time I will upgrade again and see if the problems repeat. Perhaps it's an issue with BOINC.
Have you updated to Yosemite
)
Have you updated to Yosemite recently? There have been problems with GPU tasks.
This was posted in this thread, http://einsteinathome.org/node/198054. I haven't had any luck with the beta versions either. So I'm just running CPU tasks for now.
What version of BOINC are you running?
That particular machine has
)
That particular machine has been running Yosemite since the day I added it, and the newest version of BOINC was not processing either CPU or GPU tasks. Prior to the BOINC update, the two GPUs were responsible for the majority of my WUs.
I don't have access to that box at the moment, but the BOINC versIons I'm referencing are the last two releases--revisions 36 and 42, maybe? Something like that.
Michael - Look again at
)
Michael -
Look again at your cache size (mentioned earlier). I have had your issue also.
Einstein can, and does, download more work than can be accomplished before the deadlines.
A simple example - you have your cache set at 10 days. Each task takes 24 hours - so you want, and have, 10 days of work. But, if 8 of the tasks have due dates only 6 days from now, all of the work can't possibly be accomplished before the due dates. BOINC will abort some of your tasks when it realizes the problem.
I didn't work my way all the
)
I didn't work my way all the back to the start of the problem, but if tasks stopped running at all after one or other upgrade...
The first task will have run for hours and hours, spinning its wheels in the sand. All tasks backed up behind it will get later and later and later, because the stalled one will block everything up for much longer than the estimated runtime. If the cache was anywhere near full at the time, the tail-end charlies won't have a chance of making it in time.
Richard Haselgrove wrote:if
)
There is an extra-special contribution from a single super-long duration task (however it happens). The estimated time for all the remaining tasks gets boosted as soon as the unusually long-completing task finishes, not by a small averaged-in adjustment, but to the full effect of the single slow observation.
Recovery begins as soon as the first task with normal completion time is done, but for "faster than currently predicted" completion the programming responds intentionally slowly, whereas if something is slower than expected by an appreciable margin (don't know the current definition of appreciable, but maybe something like 20%), then the new prediction is bumped all the way up.
Details aside, this sort of thing is part of the risk profile of large work queues, and one of the reasons why it is especially wise to work down one's queue (by suspending fetch) before making changes.
RE: Richard Haselgrove
)
That's certainly the design of DCF, as still in use at this project. But IIRC, DCF only updates on successful task completion, so that wheel-spinner wouldn't trigger that characteristic sawtooth uptick in runtime estimates for the following tasks, when - I suspect - it's killed by BOINC for 'maximum time exceeded'. But I suspect we all need a refresher course in how BOINC used to work (and how it works now, which is quite different) - preferably with a working example open on screen in front of us.
And I agree with keeping a modest cache size, not only when planning upgrades or other changes.
Edit - see, for example, task 495024967 from the OP (they're easier to find now that most have been purged)
Exit 197 is indeed EXIT_TIME_LIMIT_EXCEEDED, from the list I posted in the adjacent thread about EXIT_DISK_LIMIT_EXCEEDED (196).