What's up with host 93643?

Nothing But Idl...

Joined: 24 Aug 05

Posts: 158

Credit: 289204

RAC: 0

27 Mar 2006 16:57:29 UTC

Topic 190972

(moderation:

)

Any ideas on this situation?...
I've been getting assigned a lot of WUs (issue #4) because host93643 isn't returning any. When I look at this host there are 928 results to his credit dating back to 27 Feb and only 1 has been returned and received any credit. WUs continue to be issued but none get returned as far as I can tell. It doesn't really matter to me since I'm getting credit and also helping 2 other hosts to finally get their credit also. Still, why keep issueing WUs to someone who isn't going to return them? Don't want to interfere with someone else's business, just trying to understand what gives?

Honza

Joined: 10 Nov 04

Posts: 136

Credit: 3332354

RAC: 0

What's up with host 93643?

27 Mar 2006 17:50:13 UTC

Message 26714

(moderation:

)

Hard to say what preferences apply to this host (WU cache size).
Anyway, it has reached daily quota of 1 WU/CPU and this should stop draining WUs.

Well, it was running outdated BOINC 4.26 back in February - the only result completed and actually sent back to server. This may be the issue...

archae86

Joined: 6 Dec 05

Posts: 3165

Credit: 7376661687

RAC: 2135039

RE: Any ideas on this

27 Mar 2006 18:01:16 UTC

Message 26715

(moderation:

)

Quote:

Any ideas on this situation?...
I've been getting assigned a lot of WUs (issue #4) because host93643 isn't returning any. When I look at this host there are 928 results to his credit dating back to 27 Feb and only 1 has been returned and received any credit. WUs continue to be issued but none get returned as far as I can tell. It doesn't really matter to me since I'm getting credit and also helping 2 other hosts to finally get their credit also. Still, why keep issueing WUs to someone who isn't going to return them? Don't want to interfere with someone else's business, just trying to understand what gives?

If you look at the computer status instead of just its results status, you'll see that it has done much work in the past, but the low RAC suggests it stopped contributing weeks ago, consistent with the one result returned in late February. As the Maximum daily WU quota per CPU currently shows at 1/day, it appears that the system is "on to it" in some sense, and issuing new work at a very low rate (it appears 2/day in the last few days, consistent with 1/day/CPU).

I'd presume the system is set to maintain a very big work queue, but don't understand why failure to return work did not bump down its quota/day much longer ago. For example, it got 35 new results on March 22, I suppose as queue backfill for queue expired results.

So I'll add my wonderment to NBIT's--is this the way it is supposed to work?

If so people might add another reason to avoid setting large queues, and to tidy up when reducing project priority. It would also help if the client bug which fails to consider resource share allocation in fetching work got fixed some day.

Personally, I wish folks who set very

Robert Everly

Joined: 18 Jan 05

Posts: 9

Credit: 10393199

RAC: 0

I've mentioned this on

28 Mar 2006 0:45:47 UTC

Message 26716

(moderation:

)

I've mentioned this on another project, but heres my $.02 worth.

I think all projects should have an "Outstanding work limit".

If you want to have a weeks worth of work on your machine, fine. But prove to the project that your going to return the work first.

David Hammer

Joined: 15 Oct 04

Posts: 360

Credit: 1672886

RAC: 0

For what its worth I just

28 Mar 2006 1:05:19 UTC

Message 26717

(moderation:

)

For what its worth I just emailed the user telling them they have a misconfigured host. Hopefully they will fix the problem.

Wurgl (speak^Wc...

Joined: 11 Feb 05

Posts: 321

Credit: 140550008

RAC: 0

RE: For what its worth I

28 Mar 2006 1:11:11 UTC

Message 26718 in response to message 26717

(moderation:

)

Quote:

For what its worth I just emailed the user telling them they have a misconfigured host. Hopefully they will fix the problem.

Thanks David!

I think, the BOINC developers shall consider some mean to prevent hosts from running amok, like this one did. Either by a quota 'max unfinished results' or by some program which walks thru the database and sends out mails automatically.

BTW: Was the outage the planned one, to prepare the double-speed clients?

Spare_Cycles

Joined: 18 Feb 06

Posts: 2

Credit: 20780

RAC: 0

It's a Xeon running a server

28 Mar 2006 1:15:10 UTC

Message 26719

(moderation:

)

It's a Xeon running a server version of windows, so perhaps it's a corporate computer sitting behind a corporate firewall.

My guess is that a change in the firewall prevents WUs from downloading the files they need, but doesn't block WUs from being requested. Thus, the computer gets a WU, tries to download the files for it, gives up and requests another WU, and so on until it hits the quota limit.

Ingleside

Joined: 23 Jan 05

Posts: 33

Credit: 82140582

RAC: 1949

Looking on the

28 Mar 2006 1:16:39 UTC

Message 26720

(moderation:

)

Looking on the work-assignments, it actually looks like this host has got some kind of connection-problem, there never successfully gets-back the answer to RPC from server, in other words only generates "ghost"-tasks. This will show as 1-minute backoffs initially, before expands upto max 4 hours, but client also reverts-back to 1-minute after every 10 RPC...

Now, if this host is running v4.26, it means the client doesn't send any info about which Tasks already got on computer, meaning instead of re-sending the same tasks like under v4.45, he'll all the time get assigned new tasks.

As for the daily quota, this only decreases either when a task reported as an error, or if a task not returned by deadline. From the look of things, the tasks started to time-out and decreased his quota 21-22. March, indicating computer gone "bad" 07. March...

So, the quota-system works, but it does have the delay before it kicks-in.
Also, Einstein@home have enabled re-issuing of "lost" tasks, meaning only pre-v4.45-clients will continue to grab the full quota, while v4.45 and later will just re-issue the same small group of tasks again and again...
Einstein@home can choose to set v4.45 as min-allowed BOINC-client...

"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."

Nothing But Idl...

Joined: 24 Aug 05

Posts: 158

Credit: 289204

RAC: 0

Ghosts or something... I now

28 Mar 2006 17:58:22 UTC

Message 26721

(moderation:

)

Ghosts or something... I now see several hundred results issued to this host on Mar 28... so I guess I don't understand why the quota parameter isn't preventing this. And two other hosts will have to wait 14 days before the results are re-issued to someone else and before credit is granted.

archae86

Joined: 6 Dec 05

Posts: 3165

Credit: 7376661687

RAC: 2135039

RE: Ghosts or something...

28 Mar 2006 19:04:55 UTC

Message 26722 in response to message 26721

(moderation:

)

Quote:

Ghosts or something... I now see several hundred results issued to this host on Mar 28... so I guess I don't understand why the quota parameter isn't preventing this. And two other hosts will have to wait 14 days before the results are re-issued to someone else and before credit is granted.

I also got alarmed on seeing that this morning, but then came to doubt the time stamps. It appears to me the total is about the same as before the outage, when only two/day were time-stamped for this host in the last couple of days. Now nearly everything shown bears the same time-stamp.

Seems likely this is an artifact of the late unpleasantness with the project servers, rather than a new outbreak of trouble in handling this particular host?

By the way, my thanks to the posters who have added insight on this thread.

Ingleside

Joined: 23 Jan 05

Posts: 33

Credit: 82140582

RAC: 1949

RE: Ghosts or something...

28 Mar 2006 23:27:31 UTC

Message 26723 in response to message 26721

(moderation:

)

Quote:

Ghosts or something... I now see several hundred results issued to this host on Mar 28... so I guess I don't understand why the quota parameter isn't preventing this. And two other hosts will have to wait 14 days before the results are re-issued to someone else and before credit is granted.

Looks like the user has finally upgraded his client, and he'll now being re-assigned many of these "ghost"-tasks. If you looks a little closer, you'll see he's got a new date on his work, while the other users got assigned this work 21.03 or something.

Now, if he'll successfully got all of this work, not sure if he'll manage to return everything before the deadline...

"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."

What's up with host 93643?

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner