Upload problems?

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117850628301

RAC: 34824032

RE: Does the WUs that have

12 Jan 2007 23:22:24 UTC

Message 59181 in response to message 59177

(moderation:

)

Quote:

Does the WUs that have been sent out, but are not being crunched because the machines are in 1 week coma (some machines have 300+ waiting in the cache) have an effect on the servers/database?

A couple of points to note:-

*Even if a machine is in a 1 week backoff, it will still be crunching and uploading (when it can) all work previously sent to it.
*It's just the reporting stage that is not being done or even attempted.
*The machine will only truly be idle once its cache is fully crunched and uploaded.
*There shouldn't be any great impact on the servers once they are back on line as these stuck reports out in limbo-land are just like any other slow reporting work.
*One impact on the server will come from all the clients out there waking up after the week and all clamouring to report results and refill their caches pretty much at the same general time. This could well be days after the scheduler is back on line and shouldn't generally conflict with the other, bigger load of manually updated clients hammering away just after the scheduler has first come back on line.
*There is also a backoff for uploading but this never seems to grow to more than a few hours. As soon as the upload server is back on line, the stuck uploads will relatively quickly clear all on their own.
*It would be really nice if the BOINC client was smart enough to notice the resumption of uploads and decide to give the scheduler (the download server) a call, just in case it was back on line as well and thereby possibly break out early from its 1 week coma.
*There is probably a very good reason why the BOINC Devs haven't implemented something along these lines :).

So if anyone has a machine still in a one week coma, you should be manually updating the project ASAP.

Cheers,
Gary.

googloo

Joined: 11 Feb 05

Posts: 43

Credit: 13396842

RAC: 995

Uploads seem ok. Otherwise,

13 Jan 2007 2:11:29 UTC

Message 59182

(moderation:

)

Uploads seem ok. Otherwise, something is not working. This has been going on for several days. These messages are for this evening.

1/12/2007 7:15:01 PM|Einstein@Home|Starting task h1_0373.5_S5R1__7463_S5R1a_0 using einstein_S5R1 version 424
1/12/2007 7:15:03 PM|Einstein@Home|Started upload of file h1_0379.5_S5R1__16208_S5R1a_1_0
1/12/2007 7:15:07 PM|Einstein@Home|Finished upload of file h1_0379.5_S5R1__16208_S5R1a_1_0
1/12/2007 7:15:07 PM|Einstein@Home|Throughput 33579 bytes/sec
1/12/2007 7:46:47 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
1/12/2007 7:46:47 PM|Einstein@Home|Reason: To fetch work
1/12/2007 7:46:47 PM|Einstein@Home|Requesting 24 seconds of new work, and reporting 1 completed tasks
1/12/2007 7:47:09 PM||Project communication failed: attempting access to reference site
1/12/2007 7:47:11 PM||Access to reference site succeeded - project servers may be temporarily down.
1/12/2007 7:47:13 PM|Einstein@Home|Scheduler request failed: couldn't connect to server
1/12/2007 7:47:13 PM|Einstein@Home|Deferring scheduler requests for 1 minutes and 0 seconds
1/12/2007 7:48:13 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
1/12/2007 7:48:13 PM|Einstein@Home|Reason: To fetch work
1/12/2007 7:48:13 PM|Einstein@Home|Requesting 462 seconds of new work, and reporting 1 completed tasks
1/12/2007 7:48:35 PM||Project communication failed: attempting access to reference site
1/12/2007 7:48:37 PM||Access to reference site succeeded - project servers may be temporarily down.
1/12/2007 7:48:39 PM|Einstein@Home|Scheduler request failed: couldn't connect to server
1/12/2007 7:48:39 PM|Einstein@Home|Deferring scheduler requests for 1 minutes and 0 seconds
1/12/2007 7:49:39 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
1/12/2007 7:49:39 PM|Einstein@Home|Reason: To fetch work
1/12/2007 7:49:39 PM|Einstein@Home|Requesting 905 seconds of new work, and reporting 1 completed tasks
1/12/2007 7:50:02 PM||Project communication failed: attempting access to reference site
1/12/2007 7:50:03 PM||Access to reference site succeeded - project servers may be temporarily down.
1/12/2007 7:50:04 PM|Einstein@Home|Scheduler request failed: couldn't connect to server
1/12/2007 7:50:04 PM|Einstein@Home|Deferring scheduler requests for 1 minutes and 0 seconds
1/12/2007 7:51:04 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
1/12/2007 7:51:04 PM|Einstein@Home|Reason: To fetch work
1/12/2007 7:51:04 PM|Einstein@Home|Requesting 1342 seconds of new work, and reporting 1 completed tasks
1/12/2007 7:52:44 PM|Einstein@Home|Scheduler request succeeded
1/12/2007 7:52:44 PM|Einstein@Home|Message from server: Project is temporarily shut down for maintenance
1/12/2007 7:52:44 PM|Einstein@Home|Project is down
1/12/2007 7:59:03 PM||Rescheduling CPU: application exited
1/12/2007 7:59:03 PM|Einstein@Home|Computation for task h1_0373.5_S5R1__7463_S5R1a_0 finished
1/12/2007 7:59:05 PM|Einstein@Home|Started upload of file h1_0373.5_S5R1__7463_S5R1a_0_0
1/12/2007 7:59:07 PM|Einstein@Home|Finished upload of file h1_0373.5_S5R1__7463_S5R1a_0_0
1/12/2007 7:59:07 PM|Einstein@Home|Throughput 20834 bytes/sec
1/12/2007 8:52:45 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
1/12/2007 8:52:45 PM|Einstein@Home|Reason: To fetch work
1/12/2007 8:52:45 PM|Einstein@Home|Requesting 3644 seconds of new work, and reporting 2 completed tasks
1/12/2007 8:53:07 PM||Project communication failed: attempting access to reference site
1/12/2007 8:53:08 PM||Access to reference site succeeded - project servers may be temporarily down.
1/12/2007 8:53:10 PM|Einstein@Home|Scheduler request failed: couldn't connect to server
1/12/2007 8:53:10 PM|Einstein@Home|Deferring scheduler requests for 2 minutes and 3 seconds
1/12/2007 8:55:16 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
1/12/2007 8:55:16 PM|Einstein@Home|Reason: To fetch work
1/12/2007 8:55:16 PM|Einstein@Home|Requesting 3644 seconds of new work, and reporting 2 completed tasks
1/12/2007 8:56:11 PM|Einstein@Home|Scheduler request succeeded
1/12/2007 8:56:11 PM|Einstein@Home|Message from server: Project is temporarily shut down for maintenance
1/12/2007 8:56:11 PM|Einstein@Home|Project is down

Aurora Borealis

Joined: 11 Feb 05

Posts: 19

Credit: 5657955

RAC: 0

RE: RE: My machine went

13 Jan 2007 6:06:21 UTC

Message 59183 in response to message 59179

(moderation:

)

Quote:

Quote:
My machine went into a 7 day coma, but as soon as I saw that the project was running again, I pressed the Update button and it came out of the coma.

Install boinc 5.8 and you haven't got the problem with one week defered scheduler anymore.

Sometime in the testing of 5.5.x the Max backoff was reduced to 24 hrs to prevent Boinc from going idle for a week when a project goes down for 1 or 2 days. This will be the new standard.

Questions? Answers are in the BOINC Wiki.

Boinc V6.10.6 Alpha Test
WinXP C2D 2.1G 3GB

hitsov

Joined: 2 Dec 06

Posts: 14

Credit: 38212

RAC: 0

Cam anyone tell me how to

14 Jan 2007 1:26:07 UTC

Message 59184 in response to message 59183

(moderation:

)

Cam anyone tell me how to increase the cache of the WU downloaded?
I didnt see any option in the BOINC client.(ver 5.4.11 for linux)
Its really annoying to see my PC left without work because of some server problems.

hitsov

Joined: 2 Dec 06

Posts: 14

Credit: 38212

RAC: 0

Can anyone tell me how to

14 Jan 2007 1:26:43 UTC

Message 59185 in response to message 59183

(moderation:

)

Can anyone tell me how to increase the cache of the WU downloaded?
I didnt see any option in the BOINC client.(ver 5.4.11 for linux)
Its really annoying to see my PC left without work because of some server problems.

FalconFly

Joined: 16 Feb 05

Posts: 191

Credit: 15650710

RAC: 0

It can only be done in your

14 Jan 2007 1:30:44 UTC

Message 59186 in response to message 59185

(moderation:

)

It can only be done in your General Preferences (in your Account Area) here on the Website.

http://einstein.phys.uwm.edu/prefs.php?subset=global

hitsov

Joined: 2 Dec 06

Posts: 14

Credit: 38212

RAC: 0

RE: It can only be done in

14 Jan 2007 1:41:20 UTC

Message 59187 in response to message 59186

(moderation:

)

Quote:

It can only be done in your General Preferences (in your Account Area) here on the Website.

http://einstein.phys.uwm.edu/prefs.php?subset=global

Oh yeah i figured it.Thanks a lot!

BarryAZ

Joined: 8 May 05

Posts: 190

Credit: 325252848

RAC: 16182

By the way, that setting is

14 Jan 2007 1:56:14 UTC

Message 59188 in response to message 59185

(moderation:

)

By the way, that setting is 'global' -- if you have multiple BOINC projects (and you need to seriously look into this in any event, but even more so these days if Einstein is your only BOINC project), the last modified setting for this affects the size of your download cache for all projects).

Quote:

Can anyone tell me how to increase the cache of the WU downloaded?
I didnt see any option in the BOINC client.(ver 5.4.11 for linux)
Its really annoying to see my PC left without work because of some server problems.

RandyC

Joined: 18 Jan 05

Posts: 6625

Credit: 111139797

RAC: 0

RE: By the way, that

14 Jan 2007 2:35:02 UTC

Message 59189 in response to message 59188

(moderation:

)

Quote:

By the way, that setting is 'global' -- if you have multiple BOINC projects (and you need to seriously look into this in any event, but even more so these days if Einstein is your only BOINC project), the last modified setting for this affects the size of your download cache for all projects).

The Connect setting IS global, however, you DO have a (limited) way around it.

Each venue (general/default, home, school, work) can have a different setting.

One of my projects is Malaria Control. It does a rather poor job of respecting the resource share allocation for itself and any other projects attached to a machine and so, when it connects, it downloads too many WUs at once to run both E@H and MCN at the same time. And immediately forces EDF mode. Luckily, the BOINC client enforces resource share via Long Term Debt values, and over time, E@H and MCN observe the resource share I specified. MCN crunches its short deadline WUs and then sits until LTD is satisfied to allow it to startup again.

In order to avoid the EDF mode whenever MCN does its downloads, I had to specify a smaller cache for the system I run MCN on. I did it by setting that machine (and will set any future systems I attach to MCN) to a specific venue which has a smaller "Connect to" setting than the rest of my systems.

That way, most of my systems have a reasonable size cache, and only systems I attached MCN to get the smaller cache.

[edit - fixed error in quote brackets]

Seti Classic Final Total: 11446 WU.

KAMasud

Joined: 6 Oct 06

Posts: 14

Credit: 67317758

RAC: 2

:-) thats a very cute way

14 Jan 2007 11:58:31 UTC

Message 59190 in response to message 59189

(moderation:

)

:-) thats a very cute way around it :-) never thought along those lines LoL maybe wicked 2 some :-) I like it.
Regards
Masud.

Quote:

Quote:
By the way, that setting is 'global' -- if you have multiple BOINC projects (and you need to seriously look into this in any event, but even more so these days if Einstein is your only BOINC project), the last modified setting for this affects the size of your download cache for all projects).

The Connect setting IS global, however, you DO have a (limited) way around it.

Each venue (general/default, home, school, work) can have a different setting.

One of my projects is Malaria Control. It does a rather poor job of respecting the resource share allocation for itself and any other projects attached to a machine and so, when it connects, it downloads too many WUs at once to run both E@H and MCN at the same time. And immediately forces EDF mode. Luckily, the BOINC client enforces resource share via Long Term Debt values, and over time, E@H and MCN observe the resource share I specified. MCN crunches its short deadline WUs and then sits until LTD is satisfied to allow it to startup again.

In order to avoid the EDF mode whenever MCN does its downloads, I had to specify a smaller cache for the system I run MCN on. I did it by setting that machine (and will set any future systems I attach to MCN) to a specific venue which has a smaller "Connect to" setting than the rest of my systems.

That way, most of my systems have a reasonable size cache, and only systems I attached MCN to get the smaller cache.

[edit - fixed error in quote brackets]

Upload problems?

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner