You knwo how it is with computer stuff ... usually Murphy's Law does strike when you least expect it. So good luck with the upgrade!
CU
Bikeman
Best thing is that you often get two different faults at the same time and it takes extra brainstorming to recognize this special case.
Or is this just my very special personal luck? ;-)
Best thing is that you often get two different faults at the same time and it takes extra brainstorming to recognize this special case.
Or is this just my very special personal luck? ;-)
cu,
Michael
You're giving me flashbacks of the days when I had to help troubleshoot the old transistorized computers that ran the Poseidon missile system.
Well, it's clearly winding down now. The work units are no longer being emitted, not to me anyway :
Quote:
Sending scheduler request: To fetch work. Requesting 4044 seconds of work, reporting 1 completed tasks
Scheduler request succeeded: got 0 new tasks
Message from server: Project is temporarily shut down for maintenance
and a couple of these :
Quote:
Scheduler request failed: HTTP internal server error
'cos it's a rather busy bee I expect at the moment, but on retry is OK.
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
Well, it's clearly winding down now. The work units are no longer being emitted, not to me anyway :
Quote:
Sending scheduler request: To fetch work. Requesting 4044 seconds of work, reporting 1 completed tasks
Scheduler request succeeded: got 0 new tasks
Message from server: Project is temporarily shut down for maintenance
and a couple of these :
Quote:
Scheduler request failed: HTTP internal server error
'cos it's a rather busy bee I expect at the moment, but on retry is OK.
Cheers, Mike.
Hmmm...
Well I saw the scheduler was down for most of the afternoon, and just now I looked and just about everything is not running.
The 32Bit system the old server is running on is currently at its limits [Edit]Various components struggle to get enough memory[/Edit]. One reason to upgrade software and hardware [Edit]This will also include an update of the DB structure for the new server-side software[/Edit].
It currently looks like the server will be down for the whole weekend to perform these, and we'll hopefully start the new week and month with an all-new server and a new run.
Several hours ago, it was red virtually right across the board.
More recently (last couple of hours) the board has been green with just the scheduler not running (red).
Just now everything is green but it still reports "project shut down for maintenance" when a work request is given.
It has been an "interesting" day in this part of the world (night where most of you come from). I arrived at my building around 11 hours ago to be greeted by stony silence and a coldness (middle of winter here) I haven't experienced in quite a while :-). Upon investigation, circuit breakers were all tripped and after getting a machine or two going, I found a common theme of the timestamp on client_state.xml being around 1:50AM.
Apparently there had been a short power outage and the current spike when the power was restored tripped out the circuit breakers. To get back into production required enough machines to be physically disconnected so that the breakers could be reset without immediately tripping again. I have quite a number of servers that do immediately try to restart when power is restored but even machines that don't try to restart are a problem apparently because to supply just the standby voltage and current still requires quite a large short duration spike of power to charge up the power supply.
So, it wasn't too hard to disconnect most stuff and get the breakers reset. Certain more recent servers fire up without keyboard and mouse but some older ones aren't too happy and basically all of the desktops and towers need the keyboard there or else you get a "Keyboard missing" followed by "Hit F1 to continue" the incongruity of which I always find highly amusing.
So, nothing else to do but bite the bullet and go around individually to each machine and plug in KVM and watch each one boot safely back into production. You start to question your sanity after doing this for 100+ machines, some of which aren't exactly in a convenient position to easily attach KVM or even get to the power button sometimes :-).
Everything is now basically back to normal with my farm so it would be nice if the scheduler would start cooperating shortly :-).
EDIT: In the time spent composing this message the scheduler has gone to condition red again.
"Cap'n she canna take much more o' this! She's gonna blow any minute now ...!!"
"Hold her together Scotty! We've just got to make it to Friday ...!!"
"Aye, Cap'n! I'm sure gonna miss this old lass when Starfleet Command give us our new ship ...!!"
I just queued up a bunch (24 to be exact). Hopefully that'll get this machine through the outage.
I have all my systems set up to have a days worth of work to do with 1 day backup. I was thinking about adding an additional day but not sure I can handle the no connects and have a bunch waiting for the server to come back up. I really dont' want to reconnect to another project right now as I have been pushing to get my stats here for sometime to the 1 mil mark. Seti is unreliable and CPDN takes too long to finish.
[edit] Maybe part of the hosts are new systems being upgraded. I've been in the process the past few weeks and have been combining hosts. Next week my new system should be online with a AMD x2 6400+ engine. RMA on the board was supposed to have shipped Tuesday. We'll see. [/edit]
RE: LOL... Regarding the
)
We should all pitch in by Fedex'ing some regular donuts and coffee shipments to them for this changeover w/end! :-)
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
Indeed! You knwo how it is
)
Indeed!
You knwo how it is with computer stuff ... usually Murphy's Law does strike when you least expect it. So good luck with the upgrade!
CU
Bikeman
RE: Indeed! You knwo how
)
Best thing is that you often get two different faults at the same time and it takes extra brainstorming to recognize this special case.
Or is this just my very special personal luck? ;-)
cu,
Michael
RE: Best thing is that
)
You're giving me flashbacks of the days when I had to help troubleshoot the old transistorized computers that ran the Poseidon missile system.
Well, it's clearly winding
)
Well, it's clearly winding down now. The work units are no longer being emitted, not to me anyway :
and a couple of these :
'cos it's a rather busy bee I expect at the moment, but on retry is OK.
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
RE: Well, it's clearly
)
Hmmm...
Well I saw the scheduler was down for most of the afternoon, and just now I looked and just about everything is not running.
Alinator
The 32Bit system the old
)
The 32Bit system the old server is running on is currently at its limits [Edit]Various components struggle to get enough memory[/Edit]. One reason to upgrade software and hardware [Edit]This will also include an update of the DB structure for the new server-side software[/Edit].
It currently looks like the server will be down for the whole weekend to perform these, and we'll hopefully start the new week and month with an all-new server and a new run.
BM
BM
Several hours ago, it was red
)
Several hours ago, it was red virtually right across the board.
More recently (last couple of hours) the board has been green with just the scheduler not running (red).
Just now everything is green but it still reports "project shut down for maintenance" when a work request is given.
It has been an "interesting" day in this part of the world (night where most of you come from). I arrived at my building around 11 hours ago to be greeted by stony silence and a coldness (middle of winter here) I haven't experienced in quite a while :-). Upon investigation, circuit breakers were all tripped and after getting a machine or two going, I found a common theme of the timestamp on client_state.xml being around 1:50AM.
Apparently there had been a short power outage and the current spike when the power was restored tripped out the circuit breakers. To get back into production required enough machines to be physically disconnected so that the breakers could be reset without immediately tripping again. I have quite a number of servers that do immediately try to restart when power is restored but even machines that don't try to restart are a problem apparently because to supply just the standby voltage and current still requires quite a large short duration spike of power to charge up the power supply.
So, it wasn't too hard to disconnect most stuff and get the breakers reset. Certain more recent servers fire up without keyboard and mouse but some older ones aren't too happy and basically all of the desktops and towers need the keyboard there or else you get a "Keyboard missing" followed by "Hit F1 to continue" the incongruity of which I always find highly amusing.
So, nothing else to do but bite the bullet and go around individually to each machine and plug in KVM and watch each one boot safely back into production. You start to question your sanity after doing this for 100+ machines, some of which aren't exactly in a convenient position to easily attach KVM or even get to the power button sometimes :-).
Everything is now basically back to normal with my farm so it would be nice if the scheduler would start cooperating shortly :-).
EDIT: In the time spent composing this message the scheduler has gone to condition red again.
"Cap'n she canna take much more o' this! She's gonna blow any minute now ...!!"
"Hold her together Scotty! We've just got to make it to Friday ...!!"
"Aye, Cap'n! I'm sure gonna miss this old lass when Starfleet Command give us our new ship ...!!"
Cheers,
Gary.
I just queued up a bunch (24
)
I just queued up a bunch (24 to be exact). Hopefully that'll get this machine through the outage.
Kathryn :o)
Einstein@Home Moderator
RE: I just queued up a
)
I have all my systems set up to have a days worth of work to do with 1 day backup. I was thinking about adding an additional day but not sure I can handle the no connects and have a bunch waiting for the server to come back up. I really dont' want to reconnect to another project right now as I have been pushing to get my stats here for sometime to the 1 mil mark. Seti is unreliable and CPDN takes too long to finish.
[edit] Maybe part of the hosts are new systems being upgraded. I've been in the process the past few weeks and have been combining hosts. Next week my new system should be online with a AMD x2 6400+ engine. RMA on the board was supposed to have shipped Tuesday. We'll see. [/edit]