Any ideas on this situation?...
I've been getting assigned a lot of WUs (issue #4) because host93643 isn't returning any. When I look at this host there are 928 results to his credit dating back to 27 Feb and only 1 has been returned and received any credit. WUs continue to be issued but none get returned as far as I can tell. It doesn't really matter to me since I'm getting credit and also helping 2 other hosts to finally get their credit also. Still, why keep issueing WUs to someone who isn't going to return them? Don't want to interfere with someone else's business, just trying to understand what gives?
Copyright © 2024 Einstein@Home. All rights reserved.
What's up with host 93643?
)
Hard to say what preferences apply to this host (WU cache size).
Anyway, it has reached daily quota of 1 WU/CPU and this should stop draining WUs.
Well, it was running outdated BOINC 4.26 back in February - the only result completed and actually sent back to server. This may be the issue...
RE: Any ideas on this
)
If you look at the computer status instead of just its results status, you'll see that it has done much work in the past, but the low RAC suggests it stopped contributing weeks ago, consistent with the one result returned in late February. As the Maximum daily WU quota per CPU currently shows at 1/day, it appears that the system is "on to it" in some sense, and issuing new work at a very low rate (it appears 2/day in the last few days, consistent with 1/day/CPU).
I'd presume the system is set to maintain a very big work queue, but don't understand why failure to return work did not bump down its quota/day much longer ago. For example, it got 35 new results on March 22, I suppose as queue backfill for queue expired results.
So I'll add my wonderment to NBIT's--is this the way it is supposed to work?
If so people might add another reason to avoid setting large queues, and to tidy up when reducing project priority. It would also help if the client bug which fails to consider resource share allocation in fetching work got fixed some day.
Personally, I wish folks who set very
I've mentioned this on
)
I've mentioned this on another project, but heres my $.02 worth.
I think all projects should have an "Outstanding work limit".
If you want to have a weeks worth of work on your machine, fine. But prove to the project that your going to return the work first.
For what its worth I just
)
For what its worth I just emailed the user telling them they have a misconfigured host. Hopefully they will fix the problem.
RE: For what its worth I
)
Thanks David!
I think, the BOINC developers shall consider some mean to prevent hosts from running amok, like this one did. Either by a quota 'max unfinished results' or by some program which walks thru the database and sends out mails automatically.
BTW: Was the outage the planned one, to prepare the double-speed clients?
It's a Xeon running a server
)
It's a Xeon running a server version of windows, so perhaps it's a corporate computer sitting behind a corporate firewall.
My guess is that a change in the firewall prevents WUs from downloading the files they need, but doesn't block WUs from being requested. Thus, the computer gets a WU, tries to download the files for it, gives up and requests another WU, and so on until it hits the quota limit.
Looking on the
)
Looking on the work-assignments, it actually looks like this host has got some kind of connection-problem, there never successfully gets-back the answer to RPC from server, in other words only generates "ghost"-tasks. This will show as 1-minute backoffs initially, before expands upto max 4 hours, but client also reverts-back to 1-minute after every 10 RPC...
Now, if this host is running v4.26, it means the client doesn't send any info about which Tasks already got on computer, meaning instead of re-sending the same tasks like under v4.45, he'll all the time get assigned new tasks.
As for the daily quota, this only decreases either when a task reported as an error, or if a task not returned by deadline. From the look of things, the tasks started to time-out and decreased his quota 21-22. March, indicating computer gone "bad" 07. March...
So, the quota-system works, but it does have the delay before it kicks-in.
Also, Einstein@home have enabled re-issuing of "lost" tasks, meaning only pre-v4.45-clients will continue to grab the full quota, while v4.45 and later will just re-issue the same small group of tasks again and again...
Einstein@home can choose to set v4.45 as min-allowed BOINC-client...
"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
Ghosts or something... I now
)
Ghosts or something... I now see several hundred results issued to this host on Mar 28... so I guess I don't understand why the quota parameter isn't preventing this. And two other hosts will have to wait 14 days before the results are re-issued to someone else and before credit is granted.
RE: Ghosts or something...
)
I also got alarmed on seeing that this morning, but then came to doubt the time stamps. It appears to me the total is about the same as before the outage, when only two/day were time-stamped for this host in the last couple of days. Now nearly everything shown bears the same time-stamp.
Seems likely this is an artifact of the late unpleasantness with the project servers, rather than a new outbreak of trouble in handling this particular host?
By the way, my thanks to the posters who have added insight on this thread.
RE: Ghosts or something...
)
Looks like the user has finally upgraded his client, and he'll now being re-assigned many of these "ghost"-tasks. If you looks a little closer, you'll see he's got a new date on his work, while the other users got assigned this work 21.03 or something.
Now, if he'll successfully got all of this work, not sure if he'll manage to return everything before the deadline...
"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."