Some versions of BOINC won't contact a project for a week or two after a long outage, unless a manual "update" is done. That could explain why many crunchers seem to have stopped crunching.
Interesting. Thank you for that background.
My "end of UTC day" total "claimed CS but pending" has dropped gradually from 21,304 CS to 13,005 in the 7 days since E@h recovery. Still well above the usual range here, but moving in the right direction. Need credit for about 4000 of those pending CS to get down to the peak for the prior month and another 3000 to get down to the average pending CS for the prior month.
In so far as I can tell, none of the pending WU's are waiting because of a mismatch between two reported results. Each is just waiting for another completion to be reported!
Some versions of BOINC won't contact a project for a week or two after a long outage, unless a manual "update" is done. That could explain why many crunchers seem to have stopped crunching.
Interesting. Thank you for that background.
Hi Stan,
Very nice indeed to see you actively here at E@H!!
That problem - 1 week backoff - existed in BOINC several years ago and was reverted to one day max once the implications were realised if I remember correctly. Hopefully there aren't too many machines left on such old versions.
Quote:
My "end of UTC day" total "claimed CS but pending" has dropped gradually from 21,304 CS to 13,005 in the 7 days since E@h recovery. Still well above the usual range here, but moving in the right direction. Need credit for about 4000 of those pending CS to get down to the peak for the prior month and another 3000 to get down to the average pending CS for the prior month.
There would be a number of factors leading to your higher than normal pendings:-
* The deadline for all tasks in flight at the time of the outage was extended by a week. The effect of that on pendings should be pretty obvious :-).
* There is always a certain background 'rate of attrition' for hosts. An extended outage would tend to 'help the decision' for someone who was becoming bored and just thinking about doing something different. Under normal circumstances people usually set NNT when they intend to retire a host. Bit hard to do in an extended outage.
* Those hosts still on BOINC 5.10.21 to 5.10.45 will have created 'validate errors' for tasks from cache completed (and attempted to upload) during the outage. I believe there would be many more of these than hosts with the 7 day backoff problem. These hosts would have given a spike in the pendings rather than the expected flood of completions once the project was back up.
* There would have been hosts switched off during the outage that took a while to get started again. I contributed to that. I took advantage of the extra deadline to do a major fleet reorganisation so that 200+ hosts of mine have become about 30 or so. All the machines that are to be retired have only just now been restarted to return the last of their work. I'll write a separate story about that :-).
Quote:
In so far as I can tell, none of the pending WU's are waiting because of a mismatch between two reported results. Each is just waiting for another completion to be reported!
Veering off topic for a second. I happened to see this from the link on your profile.
2004 YD5 passed 0.00023 AU from the Earth on Dec. 19.86 UT.
That is very interesting ! How close is that compared to the distance
to the Moon ?
Best Regards,
Bill
Too darned close! Closer than out geostationary "birds!"
But at its size, about 5-10 meters, it would have burnt "in the air" with little ground effects (or so the optimists tell us.)
The sad part is that we don't see about half the near-earth objects until they are going away ... much too late to yell "Duck!"
Hopefully, we will soon have enough of the Tunguska-sized rocks cataloged so as to reduce the odds of asteroid strike "wiping us all out" and make the next ice age our primary worry ... speaking of which ... isn't one about due??? :(
My "end of UTC day" total "claimed CS but pending" has dropped gradually from 21,304 CS to 13,005 in the 7 days since E@h recovery. Still well above the usual range here, but moving in the right direction. Need credit for about 4000 of those pending CS to get down to the peak for the prior month and another 3000 to get down to the average pending CS for the prior month.
There would be a number of factors leading to your higher than normal pendings
Quote:
In so far as I can tell, none of the pending WU's are waiting because of a mismatch between two reported results. Each is just waiting for another completion to be reported!
It will be interesting to see how long it takes.
Exactly so! (on both points) :-).
Thank you for all that info.
After 5 more days, the backlog has reduced to the prior month's peak! :)
I poked about my pending list to try to characterize the causes:
*a small number were "Aborted by user" and have been sent to another member.
*enough to account for the "excess over history" were "No reply" and have been sent to another member to crunch.
*rather more than I'd like are nearing expiration and will become "No reply" in the next couple of days.
This suggests that host attrition is playing a dominant role. I understand that my observations may not be representative because the sample is biased ... we tend to get paired up with the same member for multiple WU's.
(Speaking of "host attrition", one of my Q6600's and my router got nailed by a "blip" in its UPS Friday afternoon and may contribute to the NoReply list for some other folks unless I can figure out what broke. The HD is okay ... I moved it temporarily to secure the "latest and greatest" personal files. The front panel lights that are normally on when powered down are on, but no response to "power on" button ... not even a "beep". I think this one will be a job for a "real tech".
The router just had to be completely reprogrammed ... all my settings disappeared and my LAN was sitting "naked and exposed" for all the InternetNoGoodNiks to beat on my computers' Firewalls.)
SETI ran out of data to split for multibeam work about four hours ago, and will probably remain dry for at least six hours until the start of Berkeley's week. That usually puts a much bigger strain on Einstein as SETI users' backup projects start downloading work: I think that both of the recent Einstein outages have coincided with SETI problems.
I've just found the Einstein website running very slow - I hope that's not the first indication of a third set of problems. Anything the admins can do (since for a change this is happening during working hours) to throttle back the demand before it tips us over the edge?
SETI ran out of data to split for multibeam work about four hours ago, and will probably remain dry for at least six hours until the start of Berkeley's week. That usually puts a much bigger strain on Einstein as SETI users' backup projects start downloading work: I think that both of the recent Einstein outages have coincided with SETI problems.
I've just found the Einstein website running very slow - I hope that's not the first indication of a third set of problems. Anything the admins can do (since for a change this is happening during working hours) to throttle back the demand before it tips us over the edge?
i got "connection time out" for about an hour around 7am est
seeing without seeing is something the blind learn to do, and seeing beyond vision can be a gift.
RE: Some versions of BOINC
)
Interesting. Thank you for that background.
My "end of UTC day" total "claimed CS but pending" has dropped gradually from 21,304 CS to 13,005 in the 7 days since E@h recovery. Still well above the usual range here, but moving in the right direction. Need credit for about 4000 of those pending CS to get down to the peak for the prior month and another 3000 to get down to the average pending CS for the prior month.
In so far as I can tell, none of the pending WU's are waiting because of a mismatch between two reported results. Each is just waiting for another completion to be reported!
It will be interesting to see how long it takes.
Stan
RE: It will be
)
Howdy Stan,
Veering off topic for a second. I happened to see this from the link on your profile.
2004 YD5 passed 0.00023 AU from the Earth on Dec. 19.86 UT.
That is very interesting ! How close is that compared to the distance
to the Moon ?
Best Regards,
Bill
RE: 2004 YD5 passed 0.00023
)
[pre]1 AU = 149597870.691 km
* 0.00023 = 34407.510 km
distance earth - moon = 384400 km[/pre]
So, it's about a tenth
Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)
RE: RE: Some versions of
)
Hi Stan,
Very nice indeed to see you actively here at E@H!!
That problem - 1 week backoff - existed in BOINC several years ago and was reverted to one day max once the implications were realised if I remember correctly. Hopefully there aren't too many machines left on such old versions.
There would be a number of factors leading to your higher than normal pendings:-
* There is always a certain background 'rate of attrition' for hosts. An extended outage would tend to 'help the decision' for someone who was becoming bored and just thinking about doing something different. Under normal circumstances people usually set NNT when they intend to retire a host. Bit hard to do in an extended outage.
* Those hosts still on BOINC 5.10.21 to 5.10.45 will have created 'validate errors' for tasks from cache completed (and attempted to upload) during the outage. I believe there would be many more of these than hosts with the 7 day backoff problem. These hosts would have given a spike in the pendings rather than the expected flood of completions once the project was back up.
* There would have been hosts switched off during the outage that took a while to get started again. I contributed to that. I took advantage of the extra deadline to do a major fleet reorganisation so that 200+ hosts of mine have become about 30 or so. All the machines that are to be retired have only just now been restarted to return the last of their work. I'll write a separate story about that :-).
Exactly so! (on both points) :-).
Cheers,
Gary.
RE: So, it's about a
)
Danke Gundolf ,
That is pretty close !
I don’t know how big that asteroid was but, it would have created a
nice size crater somewhere if it hit us.
Unless it exploded like Tunguska.
Regards,
Bill
RE: RE: It will be
)
Too darned close! Closer than out geostationary "birds!"
But at its size, about 5-10 meters, it would have burnt "in the air" with little ground effects (or so the optimists tell us.)
The sad part is that we don't see about half the near-earth objects until they are going away ... much too late to yell "Duck!"
Hopefully, we will soon have enough of the Tunguska-sized rocks cataloged so as to reduce the odds of asteroid strike "wiping us all out" and make the next ice age our primary worry ... speaking of which ... isn't one about due??? :(
Stan
RE: RE: My "end of UTC
)
Thank you for all that info.
After 5 more days, the backlog has reduced to the prior month's peak! :)
I poked about my pending list to try to characterize the causes:
*a small number were "Aborted by user" and have been sent to another member.
*enough to account for the "excess over history" were "No reply" and have been sent to another member to crunch.
*rather more than I'd like are nearing expiration and will become "No reply" in the next couple of days.
This suggests that host attrition is playing a dominant role. I understand that my observations may not be representative because the sample is biased ... we tend to get paired up with the same member for multiple WU's.
(Speaking of "host attrition", one of my Q6600's and my router got nailed by a "blip" in its UPS Friday afternoon and may contribute to the NoReply list for some other folks unless I can figure out what broke. The HD is okay ... I moved it temporarily to secure the "latest and greatest" personal files. The front panel lights that are normally on when powered down are on, but no response to "power on" button ... not even a "beep". I think this one will be a job for a "real tech".
The router just had to be completely reprogrammed ... all my settings disappeared and my LAN was sitting "naked and exposed" for all the InternetNoGoodNiks to beat on my computers' Firewalls.)
Stan
SETI ran out of data to split
)
SETI ran out of data to split for multibeam work about four hours ago, and will probably remain dry for at least six hours until the start of Berkeley's week. That usually puts a much bigger strain on Einstein as SETI users' backup projects start downloading work: I think that both of the recent Einstein outages have coincided with SETI problems.
I've just found the Einstein website running very slow - I hope that's not the first indication of a third set of problems. Anything the admins can do (since for a change this is happening during working hours) to throttle back the demand before it tips us over the edge?
RE: SETI ran out of data to
)
i got "connection time out" for about an hour around 7am est
seeing without seeing is something the blind learn to do, and seeing beyond vision can be a gift.
RE: i got "connection time
)
Me, too, Paul. From 1035Z to 1220Z (6:35-8:20 am est), I logged intermittent host comm failures.
Richard, good catch on the SETI connection!
Stan