August Outage Aftermath

Stan Pope

Joined: 22 Dec 05

Posts: 80

Credit: 426811575

RAC: 0

RE: Some versions of BOINC

26 Aug 2009 6:53:00 UTC

Message 94590 in response to message 94589

(moderation:

)

Quote:

Some versions of BOINC won't contact a project for a week or two after a long outage, unless a manual "update" is done. That could explain why many crunchers seem to have stopped crunching.

Interesting. Thank you for that background.

My "end of UTC day" total "claimed CS but pending" has dropped gradually from 21,304 CS to 13,005 in the 7 days since E@h recovery. Still well above the usual range here, but moving in the right direction. Need credit for about 4000 of those pending CS to get down to the peak for the prior month and another 3000 to get down to the average pending CS for the prior month.

In so far as I can tell, none of the pending WU's are waiting because of a mismatch between two reported results. Each is just waiting for another completion to be reported!

It will be interesting to see how long it takes.

Stan

Bill592

Joined: 25 Feb 05

Posts: 786

Credit: 70825065

RAC: 0

RE: It will be

28 Aug 2009 10:33:59 UTC

Message 94591 in response to message 94590

(moderation:

)

Quote:

It will be interesting to see how long it takes.

Howdy Stan,

Veering off topic for a second. I happened to see this from the link on your profile.

2004 YD5 passed 0.00023 AU from the Earth on Dec. 19.86 UT.

That is very interesting ! How close is that compared to the distance
to the Moon ?

Best Regards,

Bill

Gundolf Jahn

Joined: 1 Mar 05

Posts: 1079

Credit: 341280

RAC: 0

RE: 2004 YD5 passed 0.00023

28 Aug 2009 12:03:35 UTC

Message 94592 in response to message 94591

(moderation:

)

Quote:

2004 YD5 passed 0.00023 AU from the Earth on Dec. 19.86 UT.

That is very interesting ! How close is that compared to the distance to the Moon ?

[pre]1 AU = 149597870.691 km
* 0.00023 = 34407.510 km
distance earth - moon = 384400 km[/pre]
So, it's about a tenth

GruÃŸ,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117629469560

RAC: 35242062

RE: RE: Some versions of

28 Aug 2009 22:12:54 UTC

Message 94593 in response to message 94590

(moderation:

)

Quote:

Quote:
Some versions of BOINC won't contact a project for a week or two after a long outage, unless a manual "update" is done. That could explain why many crunchers seem to have stopped crunching.

Interesting. Thank you for that background.

Hi Stan,

Very nice indeed to see you actively here at E@H!!

That problem - 1 week backoff - existed in BOINC several years ago and was reverted to one day max once the implications were realised if I remember correctly. Hopefully there aren't too many machines left on such old versions.

Quote:

My "end of UTC day" total "claimed CS but pending" has dropped gradually from 21,304 CS to 13,005 in the 7 days since E@h recovery. Still well above the usual range here, but moving in the right direction. Need credit for about 4000 of those pending CS to get down to the peak for the prior month and another 3000 to get down to the average pending CS for the prior month.

There would be a number of factors leading to your higher than normal pendings:-

* The deadline for all tasks in flight at the time of the outage was extended by a week. The effect of that on pendings should be pretty obvious :-).
* There is always a certain background 'rate of attrition' for hosts. An extended outage would tend to 'help the decision' for someone who was becoming bored and just thinking about doing something different. Under normal circumstances people usually set NNT when they intend to retire a host. Bit hard to do in an extended outage.
* Those hosts still on BOINC 5.10.21 to 5.10.45 will have created 'validate errors' for tasks from cache completed (and attempted to upload) during the outage. I believe there would be many more of these than hosts with the 7 day backoff problem. These hosts would have given a spike in the pendings rather than the expected flood of completions once the project was back up.
* There would have been hosts switched off during the outage that took a while to get started again. I contributed to that. I took advantage of the extra deadline to do a major fleet reorganisation so that 200+ hosts of mine have become about 30 or so. All the machines that are to be retired have only just now been restarted to return the last of their work. I'll write a separate story about that :-).

Quote:

In so far as I can tell, none of the pending WU's are waiting because of a mismatch between two reported results. Each is just waiting for another completion to be reported!

It will be interesting to see how long it takes.

Exactly so! (on both points) :-).

Cheers,
Gary.

Bill592

Joined: 25 Feb 05

Posts: 786

Credit: 70825065

RAC: 0

RE: So, it's about a

30 Aug 2009 12:27:33 UTC

Message 94594 in response to message 94592

(moderation:

)

Quote:

So, it's about a tenth

GruÃŸ,
Gundolf

Danke Gundolf ,

That is pretty close !

I donâ€™t know how big that asteroid was but, it would have created a
nice size crater somewhere if it hit us.

Unless it exploded like Tunguska.

Regards,

Bill

Stan Pope

Joined: 22 Dec 05

Posts: 80

Credit: 426811575

RAC: 0

RE: RE: It will be

30 Aug 2009 15:24:00 UTC

Message 94595 in response to message 94591

(moderation:

)

Quote:

Quote:

It will be interesting to see how long it takes.

Howdy Stan,

Veering off topic for a second. I happened to see this from the link on your profile.

2004 YD5 passed 0.00023 AU from the Earth on Dec. 19.86 UT.

That is very interesting ! How close is that compared to the distance
to the Moon ?

Best Regards,

Bill

Too darned close! Closer than out geostationary "birds!"

But at its size, about 5-10 meters, it would have burnt "in the air" with little ground effects (or so the optimists tell us.)

The sad part is that we don't see about half the near-earth objects until they are going away ... much too late to yell "Duck!"

Hopefully, we will soon have enough of the Tunguska-sized rocks cataloged so as to reduce the odds of asteroid strike "wiping us all out" and make the next ice age our primary worry ... speaking of which ... isn't one about due??? :(

Stan

Stan Pope

Joined: 22 Dec 05

Posts: 80

Credit: 426811575

RAC: 0

RE: RE: My "end of UTC

30 Aug 2009 16:14:22 UTC

Message 94596 in response to message 94593

(moderation:

)

Quote:

Quote:
My "end of UTC day" total "claimed CS but pending" has dropped gradually from 21,304 CS to 13,005 in the 7 days since E@h recovery. Still well above the usual range here, but moving in the right direction. Need credit for about 4000 of those pending CS to get down to the peak for the prior month and another 3000 to get down to the average pending CS for the prior month.

There would be a number of factors leading to your higher than normal pendings

Quote:
In so far as I can tell, none of the pending WU's are waiting because of a mismatch between two reported results. Each is just waiting for another completion to be reported!

It will be interesting to see how long it takes.

Exactly so! (on both points) :-).

Thank you for all that info.

After 5 more days, the backlog has reduced to the prior month's peak! :)

I poked about my pending list to try to characterize the causes:

*a small number were "Aborted by user" and have been sent to another member.

*enough to account for the "excess over history" were "No reply" and have been sent to another member to crunch.

*rather more than I'd like are nearing expiration and will become "No reply" in the next couple of days.

This suggests that host attrition is playing a dominant role. I understand that my observations may not be representative because the sample is biased ... we tend to get paired up with the same member for multiple WU's.

(Speaking of "host attrition", one of my Q6600's and my router got nailed by a "blip" in its UPS Friday afternoon and may contribute to the NoReply list for some other folks unless I can figure out what broke. The HD is okay ... I moved it temporarily to secure the "latest and greatest" personal files. The front panel lights that are normally on when powered down are on, but no response to "power on" button ... not even a "beep". I think this one will be a job for a "real tech".

The router just had to be completely reprogrammed ... all my settings disappeared and my LAN was sitting "naked and exposed" for all the InternetNoGoodNiks to beat on my computers' Firewalls.)

Stan

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2957929622

RAC: 711904

SETI ran out of data to split

31 Aug 2009 10:28:20 UTC

Message 94597

(moderation:

)

SETI ran out of data to split for multibeam work about four hours ago, and will probably remain dry for at least six hours until the start of Berkeley's week. That usually puts a much bigger strain on Einstein as SETI users' backup projects start downloading work: I think that both of the recent Einstein outages have coincided with SETI problems.

I've just found the Einstein website running very slow - I hope that's not the first indication of a third set of problems. Anything the admins can do (since for a change this is happening during working hours) to throttle back the demand before it tips us over the edge?

paul milton

Joined: 16 Sep 05

Posts: 329

Credit: 35825044

RAC: 0

RE: SETI ran out of data to

31 Aug 2009 14:48:23 UTC

Message 94598 in response to message 94597

(moderation:

)

Quote:

SETI ran out of data to split for multibeam work about four hours ago, and will probably remain dry for at least six hours until the start of Berkeley's week. That usually puts a much bigger strain on Einstein as SETI users' backup projects start downloading work: I think that both of the recent Einstein outages have coincided with SETI problems.

I've just found the Einstein website running very slow - I hope that's not the first indication of a third set of problems. Anything the admins can do (since for a change this is happening during working hours) to throttle back the demand before it tips us over the edge?

i got "connection time out" for about an hour around 7am est

seeing without seeing is something the blind learn to do, and seeing beyond vision can be a gift.

Stan Pope

Joined: 22 Dec 05

Posts: 80

Credit: 426811575

RAC: 0

RE: i got "connection time

31 Aug 2009 19:51:25 UTC

Message 94599 in response to message 94598

(moderation:

)

Quote:

i got "connection time out" for about an hour around 7am est

Me, too, Paul. From 1035Z to 1220Z (6:35-8:20 am est), I logged intermittent host comm failures.

Richard, good catch on the SETI connection!

Stan

August Outage Aftermath

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner