August Outage Aftermath

Stan Pope
Stan Pope
Joined: 22 Dec 05
Posts: 80
Credit: 426811575
RAC: 0

RE: Some versions of BOINC

Message 94590 in response to message 94589

Quote:
Some versions of BOINC won't contact a project for a week or two after a long outage, unless a manual "update" is done. That could explain why many crunchers seem to have stopped crunching.


Interesting. Thank you for that background.

My "end of UTC day" total "claimed CS but pending" has dropped gradually from 21,304 CS to 13,005 in the 7 days since E@h recovery. Still well above the usual range here, but moving in the right direction. Need credit for about 4000 of those pending CS to get down to the peak for the prior month and another 3000 to get down to the average pending CS for the prior month.

In so far as I can tell, none of the pending WU's are waiting because of a mismatch between two reported results. Each is just waiting for another completion to be reported!

It will be interesting to see how long it takes.

Stan

Bill592
Bill592
Joined: 25 Feb 05
Posts: 786
Credit: 70825065
RAC: 0

RE: It will be

Message 94591 in response to message 94590

Quote:

It will be interesting to see how long it takes.

Howdy Stan,

Veering off topic for a second. I happened to see this from the link on your profile.

2004 YD5 passed 0.00023 AU from the Earth on Dec. 19.86 UT.

That is very interesting ! How close is that compared to the distance
to the Moon ?

Best Regards,

Bill

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1079
Credit: 341280
RAC: 0

RE: 2004 YD5 passed 0.00023

Message 94592 in response to message 94591

Quote:

2004 YD5 passed 0.00023 AU from the Earth on Dec. 19.86 UT.

That is very interesting ! How close is that compared to the distance to the Moon ?

[pre]1 AU = 149597870.691 km
* 0.00023 = 34407.510 km
distance earth - moon = 384400 km[/pre]
So, it's about a tenth

Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117629469560
RAC: 35242062

RE: RE: Some versions of

Message 94593 in response to message 94590

Quote:
Quote:
Some versions of BOINC won't contact a project for a week or two after a long outage, unless a manual "update" is done. That could explain why many crunchers seem to have stopped crunching.

Interesting. Thank you for that background.

Hi Stan,

Very nice indeed to see you actively here at E@H!!

That problem - 1 week backoff - existed in BOINC several years ago and was reverted to one day max once the implications were realised if I remember correctly. Hopefully there aren't too many machines left on such old versions.

Quote:
My "end of UTC day" total "claimed CS but pending" has dropped gradually from 21,304 CS to 13,005 in the 7 days since E@h recovery. Still well above the usual range here, but moving in the right direction. Need credit for about 4000 of those pending CS to get down to the peak for the prior month and another 3000 to get down to the average pending CS for the prior month.


There would be a number of factors leading to your higher than normal pendings:-

  • * The deadline for all tasks in flight at the time of the outage was extended by a week. The effect of that on pendings should be pretty obvious :-).
    * There is always a certain background 'rate of attrition' for hosts. An extended outage would tend to 'help the decision' for someone who was becoming bored and just thinking about doing something different. Under normal circumstances people usually set NNT when they intend to retire a host. Bit hard to do in an extended outage.
    * Those hosts still on BOINC 5.10.21 to 5.10.45 will have created 'validate errors' for tasks from cache completed (and attempted to upload) during the outage. I believe there would be many more of these than hosts with the 7 day backoff problem. These hosts would have given a spike in the pendings rather than the expected flood of completions once the project was back up.
    * There would have been hosts switched off during the outage that took a while to get started again. I contributed to that. I took advantage of the extra deadline to do a major fleet reorganisation so that 200+ hosts of mine have become about 30 or so. All the machines that are to be retired have only just now been restarted to return the last of their work. I'll write a separate story about that :-).

Quote:

In so far as I can tell, none of the pending WU's are waiting because of a mismatch between two reported results. Each is just waiting for another completion to be reported!

It will be interesting to see how long it takes.


Exactly so! (on both points) :-).

Cheers,
Gary.

Bill592
Bill592
Joined: 25 Feb 05
Posts: 786
Credit: 70825065
RAC: 0

RE: So, it's about a

Message 94594 in response to message 94592

Quote:


So, it's about a tenth

Gruß,
Gundolf

Danke Gundolf ,

That is pretty close !

I don’t know how big that asteroid was but, it would have created a
nice size crater somewhere if it hit us.

Unless it exploded like Tunguska.

Regards,

Bill

Stan Pope
Stan Pope
Joined: 22 Dec 05
Posts: 80
Credit: 426811575
RAC: 0

RE: RE: It will be

Message 94595 in response to message 94591

Quote:
Quote:

It will be interesting to see how long it takes.

Howdy Stan,

Veering off topic for a second. I happened to see this from the link on your profile.

2004 YD5 passed 0.00023 AU from the Earth on Dec. 19.86 UT.

That is very interesting ! How close is that compared to the distance
to the Moon ?

Best Regards,

Bill


Too darned close! Closer than out geostationary "birds!"

But at its size, about 5-10 meters, it would have burnt "in the air" with little ground effects (or so the optimists tell us.)

The sad part is that we don't see about half the near-earth objects until they are going away ... much too late to yell "Duck!"

Hopefully, we will soon have enough of the Tunguska-sized rocks cataloged so as to reduce the odds of asteroid strike "wiping us all out" and make the next ice age our primary worry ... speaking of which ... isn't one about due??? :(

Stan

Stan Pope
Stan Pope
Joined: 22 Dec 05
Posts: 80
Credit: 426811575
RAC: 0

RE: RE: My "end of UTC

Message 94596 in response to message 94593

Quote:

Quote:
My "end of UTC day" total "claimed CS but pending" has dropped gradually from 21,304 CS to 13,005 in the 7 days since E@h recovery. Still well above the usual range here, but moving in the right direction. Need credit for about 4000 of those pending CS to get down to the peak for the prior month and another 3000 to get down to the average pending CS for the prior month.

There would be a number of factors leading to your higher than normal pendings

Quote:

In so far as I can tell, none of the pending WU's are waiting because of a mismatch between two reported results. Each is just waiting for another completion to be reported!

It will be interesting to see how long it takes.


Exactly so! (on both points) :-).


Thank you for all that info.

After 5 more days, the backlog has reduced to the prior month's peak! :)

I poked about my pending list to try to characterize the causes:

  • *a small number were "Aborted by user" and have been sent to another member.

*enough to account for the "excess over history" were "No reply" and have been sent to another member to crunch.

*rather more than I'd like are nearing expiration and will become "No reply" in the next couple of days.

This suggests that host attrition is playing a dominant role. I understand that my observations may not be representative because the sample is biased ... we tend to get paired up with the same member for multiple WU's.

(Speaking of "host attrition", one of my Q6600's and my router got nailed by a "blip" in its UPS Friday afternoon and may contribute to the NoReply list for some other folks unless I can figure out what broke. The HD is okay ... I moved it temporarily to secure the "latest and greatest" personal files. The front panel lights that are normally on when powered down are on, but no response to "power on" button ... not even a "beep". I think this one will be a job for a "real tech".

The router just had to be completely reprogrammed ... all my settings disappeared and my LAN was sitting "naked and exposed" for all the InternetNoGoodNiks to beat on my computers' Firewalls.)

Stan

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2957929622
RAC: 711904

SETI ran out of data to split

SETI ran out of data to split for multibeam work about four hours ago, and will probably remain dry for at least six hours until the start of Berkeley's week. That usually puts a much bigger strain on Einstein as SETI users' backup projects start downloading work: I think that both of the recent Einstein outages have coincided with SETI problems.

I've just found the Einstein website running very slow - I hope that's not the first indication of a third set of problems. Anything the admins can do (since for a change this is happening during working hours) to throttle back the demand before it tips us over the edge?

paul milton
paul milton
Joined: 16 Sep 05
Posts: 329
Credit: 35825044
RAC: 0

RE: SETI ran out of data to

Message 94598 in response to message 94597

Quote:

SETI ran out of data to split for multibeam work about four hours ago, and will probably remain dry for at least six hours until the start of Berkeley's week. That usually puts a much bigger strain on Einstein as SETI users' backup projects start downloading work: I think that both of the recent Einstein outages have coincided with SETI problems.

I've just found the Einstein website running very slow - I hope that's not the first indication of a third set of problems. Anything the admins can do (since for a change this is happening during working hours) to throttle back the demand before it tips us over the edge?

i got "connection time out" for about an hour around 7am est

seeing without seeing is something the blind learn to do, and seeing beyond vision can be a gift.

Stan Pope
Stan Pope
Joined: 22 Dec 05
Posts: 80
Credit: 426811575
RAC: 0

RE: i got "connection time

Message 94599 in response to message 94598

Quote:
i got "connection time out" for about an hour around 7am est


Me, too, Paul. From 1035Z to 1220Z (6:35-8:20 am est), I logged intermittent host comm failures.

Richard, good catch on the SETI connection!

Stan

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.