Upload problems?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117850628301
RAC: 34824032

RE: Does the WUs that have

Message 59181 in response to message 59177

Quote:
Does the WUs that have been sent out, but are not being crunched because the machines are in 1 week coma (some machines have 300+ waiting in the cache) have an effect on the servers/database?

A couple of points to note:-

  • *Even if a machine is in a 1 week backoff, it will still be crunching and uploading (when it can) all work previously sent to it.
    *It's just the reporting stage that is not being done or even attempted.
    *The machine will only truly be idle once its cache is fully crunched and uploaded.
    *There shouldn't be any great impact on the servers once they are back on line as these stuck reports out in limbo-land are just like any other slow reporting work.
    *One impact on the server will come from all the clients out there waking up after the week and all clamouring to report results and refill their caches pretty much at the same general time. This could well be days after the scheduler is back on line and shouldn't generally conflict with the other, bigger load of manually updated clients hammering away just after the scheduler has first come back on line.
    *There is also a backoff for uploading but this never seems to grow to more than a few hours. As soon as the upload server is back on line, the stuck uploads will relatively quickly clear all on their own.
    *It would be really nice if the BOINC client was smart enough to notice the resumption of uploads and decide to give the scheduler (the download server) a call, just in case it was back on line as well and thereby possibly break out early from its 1 week coma.
    *There is probably a very good reason why the BOINC Devs haven't implemented something along these lines :).

So if anyone has a machine still in a one week coma, you should be manually updating the project ASAP.

Cheers,
Gary.

googloo
googloo
Joined: 11 Feb 05
Posts: 43
Credit: 13396842
RAC: 995

Uploads seem ok. Otherwise,

Uploads seem ok. Otherwise, something is not working. This has been going on for several days. These messages are for this evening.

1/12/2007 7:15:01 PM|Einstein@Home|Starting task h1_0373.5_S5R1__7463_S5R1a_0 using einstein_S5R1 version 424
1/12/2007 7:15:03 PM|Einstein@Home|Started upload of file h1_0379.5_S5R1__16208_S5R1a_1_0
1/12/2007 7:15:07 PM|Einstein@Home|Finished upload of file h1_0379.5_S5R1__16208_S5R1a_1_0
1/12/2007 7:15:07 PM|Einstein@Home|Throughput 33579 bytes/sec
1/12/2007 7:46:47 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
1/12/2007 7:46:47 PM|Einstein@Home|Reason: To fetch work
1/12/2007 7:46:47 PM|Einstein@Home|Requesting 24 seconds of new work, and reporting 1 completed tasks
1/12/2007 7:47:09 PM||Project communication failed: attempting access to reference site
1/12/2007 7:47:11 PM||Access to reference site succeeded - project servers may be temporarily down.
1/12/2007 7:47:13 PM|Einstein@Home|Scheduler request failed: couldn't connect to server
1/12/2007 7:47:13 PM|Einstein@Home|Deferring scheduler requests for 1 minutes and 0 seconds
1/12/2007 7:48:13 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
1/12/2007 7:48:13 PM|Einstein@Home|Reason: To fetch work
1/12/2007 7:48:13 PM|Einstein@Home|Requesting 462 seconds of new work, and reporting 1 completed tasks
1/12/2007 7:48:35 PM||Project communication failed: attempting access to reference site
1/12/2007 7:48:37 PM||Access to reference site succeeded - project servers may be temporarily down.
1/12/2007 7:48:39 PM|Einstein@Home|Scheduler request failed: couldn't connect to server
1/12/2007 7:48:39 PM|Einstein@Home|Deferring scheduler requests for 1 minutes and 0 seconds
1/12/2007 7:49:39 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
1/12/2007 7:49:39 PM|Einstein@Home|Reason: To fetch work
1/12/2007 7:49:39 PM|Einstein@Home|Requesting 905 seconds of new work, and reporting 1 completed tasks
1/12/2007 7:50:02 PM||Project communication failed: attempting access to reference site
1/12/2007 7:50:03 PM||Access to reference site succeeded - project servers may be temporarily down.
1/12/2007 7:50:04 PM|Einstein@Home|Scheduler request failed: couldn't connect to server
1/12/2007 7:50:04 PM|Einstein@Home|Deferring scheduler requests for 1 minutes and 0 seconds
1/12/2007 7:51:04 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
1/12/2007 7:51:04 PM|Einstein@Home|Reason: To fetch work
1/12/2007 7:51:04 PM|Einstein@Home|Requesting 1342 seconds of new work, and reporting 1 completed tasks
1/12/2007 7:52:44 PM|Einstein@Home|Scheduler request succeeded
1/12/2007 7:52:44 PM|Einstein@Home|Message from server: Project is temporarily shut down for maintenance
1/12/2007 7:52:44 PM|Einstein@Home|Project is down
1/12/2007 7:59:03 PM||Rescheduling CPU: application exited
1/12/2007 7:59:03 PM|Einstein@Home|Computation for task h1_0373.5_S5R1__7463_S5R1a_0 finished
1/12/2007 7:59:05 PM|Einstein@Home|Started upload of file h1_0373.5_S5R1__7463_S5R1a_0_0
1/12/2007 7:59:07 PM|Einstein@Home|Finished upload of file h1_0373.5_S5R1__7463_S5R1a_0_0
1/12/2007 7:59:07 PM|Einstein@Home|Throughput 20834 bytes/sec
1/12/2007 8:52:45 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
1/12/2007 8:52:45 PM|Einstein@Home|Reason: To fetch work
1/12/2007 8:52:45 PM|Einstein@Home|Requesting 3644 seconds of new work, and reporting 2 completed tasks
1/12/2007 8:53:07 PM||Project communication failed: attempting access to reference site
1/12/2007 8:53:08 PM||Access to reference site succeeded - project servers may be temporarily down.
1/12/2007 8:53:10 PM|Einstein@Home|Scheduler request failed: couldn't connect to server
1/12/2007 8:53:10 PM|Einstein@Home|Deferring scheduler requests for 2 minutes and 3 seconds
1/12/2007 8:55:16 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
1/12/2007 8:55:16 PM|Einstein@Home|Reason: To fetch work
1/12/2007 8:55:16 PM|Einstein@Home|Requesting 3644 seconds of new work, and reporting 2 completed tasks
1/12/2007 8:56:11 PM|Einstein@Home|Scheduler request succeeded
1/12/2007 8:56:11 PM|Einstein@Home|Message from server: Project is temporarily shut down for maintenance
1/12/2007 8:56:11 PM|Einstein@Home|Project is down

Aurora Borealis
Aurora Borealis
Joined: 11 Feb 05
Posts: 19
Credit: 5657955
RAC: 0

RE: RE: My machine went

Message 59183 in response to message 59179

Quote:
Quote:
My machine went into a 7 day coma, but as soon as I saw that the project was running again, I pressed the Update button and it came out of the coma.

Install boinc 5.8 and you haven't got the problem with one week defered scheduler anymore.


Sometime in the testing of 5.5.x the Max backoff was reduced to 24 hrs to prevent Boinc from going idle for a week when a project goes down for 1 or 2 days. This will be the new standard.

Questions? Answers are in the BOINC Wiki.

Boinc V6.10.6 Alpha Test
WinXP C2D 2.1G 3GB

hitsov
hitsov
Joined: 2 Dec 06
Posts: 14
Credit: 38212
RAC: 0

Cam anyone tell me how to

Message 59184 in response to message 59183

Cam anyone tell me how to increase the cache of the WU downloaded?
I didnt see any option in the BOINC client.(ver 5.4.11 for linux)
Its really annoying to see my PC left without work because of some server problems.

hitsov
hitsov
Joined: 2 Dec 06
Posts: 14
Credit: 38212
RAC: 0

Can anyone tell me how to

Message 59185 in response to message 59183

Can anyone tell me how to increase the cache of the WU downloaded?
I didnt see any option in the BOINC client.(ver 5.4.11 for linux)
Its really annoying to see my PC left without work because of some server problems.

FalconFly
FalconFly
Joined: 16 Feb 05
Posts: 191
Credit: 15650710
RAC: 0

It can only be done in your

Message 59186 in response to message 59185

It can only be done in your General Preferences (in your Account Area) here on the Website.

http://einstein.phys.uwm.edu/prefs.php?subset=global

hitsov
hitsov
Joined: 2 Dec 06
Posts: 14
Credit: 38212
RAC: 0

RE: It can only be done in

Message 59187 in response to message 59186

Quote:

It can only be done in your General Preferences (in your Account Area) here on the Website.

http://einstein.phys.uwm.edu/prefs.php?subset=global


Oh yeah i figured it.Thanks a lot!

BarryAZ
BarryAZ
Joined: 8 May 05
Posts: 190
Credit: 325252848
RAC: 16182

By the way, that setting is

Message 59188 in response to message 59185

By the way, that setting is 'global' -- if you have multiple BOINC projects (and you need to seriously look into this in any event, but even more so these days if Einstein is your only BOINC project), the last modified setting for this affects the size of your download cache for all projects).

Quote:
Can anyone tell me how to increase the cache of the WU downloaded?
I didnt see any option in the BOINC client.(ver 5.4.11 for linux)
Its really annoying to see my PC left without work because of some server problems.


RandyC
RandyC
Joined: 18 Jan 05
Posts: 6625
Credit: 111139797
RAC: 0

RE: By the way, that

Message 59189 in response to message 59188

Quote:
By the way, that setting is 'global' -- if you have multiple BOINC projects (and you need to seriously look into this in any event, but even more so these days if Einstein is your only BOINC project), the last modified setting for this affects the size of your download cache for all projects).

The Connect setting IS global, however, you DO have a (limited) way around it.

Each venue (general/default, home, school, work) can have a different setting.

One of my projects is Malaria Control. It does a rather poor job of respecting the resource share allocation for itself and any other projects attached to a machine and so, when it connects, it downloads too many WUs at once to run both E@H and MCN at the same time. And immediately forces EDF mode. Luckily, the BOINC client enforces resource share via Long Term Debt values, and over time, E@H and MCN observe the resource share I specified. MCN crunches its short deadline WUs and then sits until LTD is satisfied to allow it to startup again.

In order to avoid the EDF mode whenever MCN does its downloads, I had to specify a smaller cache for the system I run MCN on. I did it by setting that machine (and will set any future systems I attach to MCN) to a specific venue which has a smaller "Connect to" setting than the rest of my systems.

That way, most of my systems have a reasonable size cache, and only systems I attached MCN to get the smaller cache.

[edit - fixed error in quote brackets]

Seti Classic Final Total: 11446 WU.

KAMasud
KAMasud
Joined: 6 Oct 06
Posts: 14
Credit: 67317758
RAC: 2

:-) thats a very cute way

Message 59190 in response to message 59189


:-) thats a very cute way around it :-) never thought along those lines LoL maybe wicked 2 some :-) I like it.
Regards
Masud.

Quote:
Quote:
By the way, that setting is 'global' -- if you have multiple BOINC projects (and you need to seriously look into this in any event, but even more so these days if Einstein is your only BOINC project), the last modified setting for this affects the size of your download cache for all projects).

The Connect setting IS global, however, you DO have a (limited) way around it.

Each venue (general/default, home, school, work) can have a different setting.

One of my projects is Malaria Control. It does a rather poor job of respecting the resource share allocation for itself and any other projects attached to a machine and so, when it connects, it downloads too many WUs at once to run both E@H and MCN at the same time. And immediately forces EDF mode. Luckily, the BOINC client enforces resource share via Long Term Debt values, and over time, E@H and MCN observe the resource share I specified. MCN crunches its short deadline WUs and then sits until LTD is satisfied to allow it to startup again.

In order to avoid the EDF mode whenever MCN does its downloads, I had to specify a smaller cache for the system I run MCN on. I did it by setting that machine (and will set any future systems I attach to MCN) to a specific venue which has a smaller "Connect to" setting than the rest of my systems.

That way, most of my systems have a reasonable size cache, and only systems I attached MCN to get the smaller cache.

[edit - fixed error in quote brackets]


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.