S5R3 Nearing Completion

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6588
Credit: 316849823
RAC: 362360

RE: LOL... Regarding the

Message 82809 in response to message 82808

Quote:

LOL...

Regarding the first remark... Good One! :-D

Pertaining to the second... There goes their continuous uptime record! Still, pretty impressive though. ;-)

We should all pitch in by Fedex'ing some regular donuts and coffee shipments to them for this changeover w/end! :-)

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 726998560
RAC: 1241045

Indeed! You knwo how it is

Indeed!

You knwo how it is with computer stuff ... usually Murphy's Law does strike when you least expect it. So good luck with the upgrade!

CU
Bikeman

M. Schmitt
M. Schmitt
Joined: 27 Jun 05
Posts: 478
Credit: 15872262
RAC: 0

RE: Indeed! You knwo how

Message 82811 in response to message 82810

Quote:

Indeed!

You knwo how it is with computer stuff ... usually Murphy's Law does strike when you least expect it. So good luck with the upgrade!

CU
Bikeman

Best thing is that you often get two different faults at the same time and it takes extra brainstorming to recognize this special case.
Or is this just my very special personal luck? ;-)

cu,
Michael

Donald A. Tevault
Donald A. Tevault
Joined: 17 Feb 06
Posts: 439
Credit: 73516529
RAC: 0

RE: Best thing is that

Message 82812 in response to message 82811

Quote:

Best thing is that you often get two different faults at the same time and it takes extra brainstorming to recognize this special case.
Or is this just my very special personal luck? ;-)

cu,
Michael

You're giving me flashbacks of the days when I had to help troubleshoot the old transistorized computers that ran the Poseidon missile system.

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6588
Credit: 316849823
RAC: 362360

Well, it's clearly winding

Well, it's clearly winding down now. The work units are no longer being emitted, not to me anyway :

Quote:
Sending scheduler request: To fetch work. Requesting 4044 seconds of work, reporting 1 completed tasks
Scheduler request succeeded: got 0 new tasks
Message from server: Project is temporarily shut down for maintenance

and a couple of these :

Quote:
Scheduler request failed: HTTP internal server error

'cos it's a rather busy bee I expect at the moment, but on retry is OK.

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0

RE: Well, it's clearly

Message 82814 in response to message 82813

Quote:

Well, it's clearly winding down now. The work units are no longer being emitted, not to me anyway :

Quote:
Sending scheduler request: To fetch work. Requesting 4044 seconds of work, reporting 1 completed tasks
Scheduler request succeeded: got 0 new tasks
Message from server: Project is temporarily shut down for maintenance

and a couple of these :

Quote:
Scheduler request failed: HTTP internal server error

'cos it's a rather busy bee I expect at the moment, but on retry is OK.

Cheers, Mike.

Hmmm...

Well I saw the scheduler was down for most of the afternoon, and just now I looked and just about everything is not running.

Alinator

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250488199
RAC: 34718

The 32Bit system the old

The 32Bit system the old server is running on is currently at its limits [Edit]Various components struggle to get enough memory[/Edit]. One reason to upgrade software and hardware [Edit]This will also include an update of the DB structure for the new server-side software[/Edit].

It currently looks like the server will be down for the whole weekend to perform these, and we'll hopefully start the new week and month with an all-new server and a new run.

BM

BM

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117595756447
RAC: 35213939

Several hours ago, it was red

Several hours ago, it was red virtually right across the board.

More recently (last couple of hours) the board has been green with just the scheduler not running (red).

Just now everything is green but it still reports "project shut down for maintenance" when a work request is given.

It has been an "interesting" day in this part of the world (night where most of you come from). I arrived at my building around 11 hours ago to be greeted by stony silence and a coldness (middle of winter here) I haven't experienced in quite a while :-). Upon investigation, circuit breakers were all tripped and after getting a machine or two going, I found a common theme of the timestamp on client_state.xml being around 1:50AM.

Apparently there had been a short power outage and the current spike when the power was restored tripped out the circuit breakers. To get back into production required enough machines to be physically disconnected so that the breakers could be reset without immediately tripping again. I have quite a number of servers that do immediately try to restart when power is restored but even machines that don't try to restart are a problem apparently because to supply just the standby voltage and current still requires quite a large short duration spike of power to charge up the power supply.

So, it wasn't too hard to disconnect most stuff and get the breakers reset. Certain more recent servers fire up without keyboard and mouse but some older ones aren't too happy and basically all of the desktops and towers need the keyboard there or else you get a "Keyboard missing" followed by "Hit F1 to continue" the incongruity of which I always find highly amusing.

So, nothing else to do but bite the bullet and go around individually to each machine and plug in KVM and watch each one boot safely back into production. You start to question your sanity after doing this for 100+ machines, some of which aren't exactly in a convenient position to easily attach KVM or even get to the power button sometimes :-).

Everything is now basically back to normal with my farm so it would be nice if the scheduler would start cooperating shortly :-).

EDIT: In the time spent composing this message the scheduler has gone to condition red again.

"Cap'n she canna take much more o' this! She's gonna blow any minute now ...!!"

"Hold her together Scotty! We've just got to make it to Friday ...!!"

"Aye, Cap'n! I'm sure gonna miss this old lass when Starfleet Command give us our new ship ...!!"

Cheers,
Gary.

KSMarksPsych
KSMarksPsych
Moderator
Joined: 15 Oct 05
Posts: 2702
Credit: 4090227
RAC: 0

I just queued up a bunch (24

I just queued up a bunch (24 to be exact). Hopefully that'll get this machine through the outage.

Kathryn :o)

Einstein@Home Moderator

Arion
Arion
Joined: 20 Mar 05
Posts: 147
Credit: 1626747
RAC: 0

RE: I just queued up a

Message 82818 in response to message 82817

Quote:
I just queued up a bunch (24 to be exact). Hopefully that'll get this machine through the outage.

I have all my systems set up to have a days worth of work to do with 1 day backup. I was thinking about adding an additional day but not sure I can handle the no connects and have a bunch waiting for the server to come back up. I really dont' want to reconnect to another project right now as I have been pushing to get my stats here for sometime to the 1 mil mark. Seti is unreliable and CPDN takes too long to finish.

[edit] Maybe part of the hosts are new systems being upgraded. I've been in the process the past few weeks and have been combining hosts. Next week my new system should be online with a AMD x2 6400+ engine. RMA on the board was supposed to have shipped Tuesday. We'll see. [/edit]

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.