Project downtime tomorrow

Herman van Kempen
Herman van Kempen
Joined: 21 May 09
Posts: 18
Credit: 380674723
RAC: 30327

After running outof work I

After running outof work I get this:

9-10-2014 0:05:25 | Einstein@Home | Requesting new tasks for CPU and ATI
9-10-2014 0:05:35 | Einstein@Home | Scheduler request failed: HTTP file not found

As I am not a specialist in programming, perhaps one can indicate where I have to change to the new scheduler URL
It would have been more user-friendly if this information had been given before the system shutdown yesterday.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7232261310
RAC: 1159898

RE: perhaps one can

Quote:
perhaps one can indicate where I have to change to the new scheduler URL


As mentioned in previous posts in this thread, referenced also by a thread in the problems and bug reports board , the application will accumulate about 10 failures, then automatically get a new scheduler list, after which normal function resumes.

If you are in a hurry, you can just click Update a few times. Otherwise it will fix itself in time.

Herman van Kempen
Herman van Kempen
Joined: 21 May 09
Posts: 18
Credit: 380674723
RAC: 30327

It works!! I should have been

It works!! I should have been more patient.
Thank you very much archae86

David S
David S
Joined: 6 Dec 05
Posts: 2473
Credit: 22936222
RAC: 0

RE: RE: RE: 5th update

Quote:
Quote:
Quote:
5th update gets the master file.

Perhaps there is a difference depending on whether work is being requested.

Three of my PC's that wanted work seemed to take about 5 update requests each, but my laptop, which was off all night and had work to report but none to request, logged eleven "Scheduler request failed: HTTP file not found" entries before finally doing the "Fetching scheduler list, Master file download succeeded" pair, after which the next update request succeeded.


Computers which were active during the (European day / American night) probably got through their first few attempts during the 'down for maintenance' period, so fewer were needed to reach the "after 10 consecutive failures" trigger that Bernd mentioned. If the machine has been off, you need to do them all yourself.


My primary cruncher is always on, but it wasn't asking for new work, so it may not have tried at all during the outage to report the three it had finished. I had to kick it eleven times before it downloaded the Master file.

David

Miserable old git
Patiently waiting for the asteroid with my name on it.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117851264965
RAC: 34837110

RE: RE: RE: 5th update

Quote:
Quote:
Quote:
5th update gets the master file.

Perhaps there is a difference depending on whether work is being requested.

Three of my PC's that wanted work seemed to take about 5 update requests each, but my laptop, which was off all night and had work to report but none to request, logged eleven "Scheduler request failed: HTTP file not found" entries before finally doing the "Fetching scheduler list, Master file download succeeded" pair, after which the next update request succeeded.


Computers which were active during the (European day / American night) probably got through their first few attempts during the 'down for maintenance' period, so fewer were needed to reach the "after 10 consecutive failures" trigger that Bernd mentioned. If the machine has been off, you need to do them all yourself.


The version of BOINC matters as well. The bulk of my hosts don't ever request work 'on their own'. Their cache settings are manipulated from an external script that makes sure they have up-to-date common data files before making a work request. These controlled work requests are rather infrequent. Those machines on more 'current' versions of BOINC will report work soon after completion and hence will have made a number of contacts anyway without requesting work but those on v6 BOINCs will not have made contact. They report about once per day when not requesting work.

I've just 'updated' machines at home and those on v6 needed a full 12 clicks whilst those on 7.2.42 needed just a couple. I'll have to head off shortly and attend to a very much larger group at a different location. Fortunately most of them are on 7.2.42 so just completing and reporting tasks should get them out of trouble on their own.

Cheers,
Gary.

Mike.Gibson
Mike.Gibson
Joined: 17 Dec 07
Posts: 21
Credit: 3759410
RAC: 0

08/10/2014 23:03:09 |

08/10/2014 23:03:09 | Einstein@Home | Scheduler request failed: HTTP file not found

I now have 13 units waiting to report. All have uploaded.

I have "No new tasks" set and 24 hours work left.

Version 7.2.47

Mike

Mike.Gibson
Mike.Gibson
Joined: 17 Dec 07
Posts: 21
Credit: 3759410
RAC: 0

Switched to "Allow new tasks"

Switched to "Allow new tasks" and all reported and new tasks downloaded.

The problem seems to be linked to the "No new tasks" setting.

Mike

Mumak
Joined: 26 Feb 13
Posts: 325
Credit: 3531237247
RAC: 1447623

RE: Tomorrow I may also

Quote:
Tomorrow I may also give a more extensive report on what we actually did.

We'd certainly appreciate such report :-)

-----

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4313
Credit: 250745757
RAC: 34802

Basically we have been

Basically we have been running on the spare wheel with the DB server for about a year. There were three identical servers set up @UWM, two of which already stopped working without a clear sign of what went wrong (hardware, OS, software, whatever) or how to fix these problems. The third was (still) running our master DB. Our fingers hurt from being crossed.

The end of the S6CasA "run" and thus the absence of "locality" work for a few weeks gave us the opportunity to move the active master DB to AEI (Hannover), where we got three newer and much more powerful DB servers as part of our "fallback infrastructure" that is meant to take over when something really bad happens to the UWM side.

The actual move, however, still came a bit rushed to avoid foreseeable difficulties next week (team challenge, vacations). Given the circumstances, all in all it went pretty smooth and within our plans.

For reliability reasons the "scheduler" had to be moved with the DB, so that's why the scheduler URL changed. We knew that the Clients should automatically adjust to that change, however we haven't been aware of how long it would take them. So currently we still have less than half the request rate on the new scheduler that we were used to from the old one. It will probably take until next week before we see a remotely comparable load on the AEI machines to what we saw at UWM.

BM

BM

Elektra*
Elektra*
Joined: 4 Sep 05
Posts: 948
Credit: 1124049
RAC: 0

Will there be some "grace"

Will there be some "grace" time for tasks being submitted immediately before deadline and being unable to be reported as ready because of the delay with updating the scheduler URL? I think a lot of guys will have bunkered a lot of work for the forthcoming team challenge and won't be able to report their finished tasks in time when enabling network communication for the first time after URL change just at challenge start AND just before hitting the deadline.

Love, Michi

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.