Does the WUs that have been sent out, but are not being crunched because the machines are in 1 week coma (some machines have 300+ waiting in the cache) have an effect on the servers/database?
A couple of points to note:-
*Even if a machine is in a 1 week backoff, it will still be crunching and uploading (when it can) all work previously sent to it.
*It's just the reporting stage that is not being done or even attempted.
*The machine will only truly be idle once its cache is fully crunched and uploaded.
*There shouldn't be any great impact on the servers once they are back on line as these stuck reports out in limbo-land are just like any other slow reporting work.
*One impact on the server will come from all the clients out there waking up after the week and all clamouring to report results and refill their caches pretty much at the same general time. This could well be days after the scheduler is back on line and shouldn't generally conflict with the other, bigger load of manually updated clients hammering away just after the scheduler has first come back on line.
*There is also a backoff for uploading but this never seems to grow to more than a few hours. As soon as the upload server is back on line, the stuck uploads will relatively quickly clear all on their own.
*It would be really nice if the BOINC client was smart enough to notice the resumption of uploads and decide to give the scheduler (the download server) a call, just in case it was back on line as well and thereby possibly break out early from its 1 week coma.
*There is probably a very good reason why the BOINC Devs haven't implemented something along these lines :).
So if anyone has a machine still in a one week coma, you should be manually updating the project ASAP.
Uploads seem ok. Otherwise, something is not working. This has been going on for several days. These messages are for this evening.
1/12/2007 7:15:01 PM|Einstein@Home|Starting task h1_0373.5_S5R1__7463_S5R1a_0 using einstein_S5R1 version 424
1/12/2007 7:15:03 PM|Einstein@Home|Started upload of file h1_0379.5_S5R1__16208_S5R1a_1_0
1/12/2007 7:15:07 PM|Einstein@Home|Finished upload of file h1_0379.5_S5R1__16208_S5R1a_1_0
1/12/2007 7:15:07 PM|Einstein@Home|Throughput 33579 bytes/sec
1/12/2007 7:46:47 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
1/12/2007 7:46:47 PM|Einstein@Home|Reason: To fetch work
1/12/2007 7:46:47 PM|Einstein@Home|Requesting 24 seconds of new work, and reporting 1 completed tasks
1/12/2007 7:47:09 PM||Project communication failed: attempting access to reference site
1/12/2007 7:47:11 PM||Access to reference site succeeded - project servers may be temporarily down.
1/12/2007 7:47:13 PM|Einstein@Home|Scheduler request failed: couldn't connect to server
1/12/2007 7:47:13 PM|Einstein@Home|Deferring scheduler requests for 1 minutes and 0 seconds
1/12/2007 7:48:13 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
1/12/2007 7:48:13 PM|Einstein@Home|Reason: To fetch work
1/12/2007 7:48:13 PM|Einstein@Home|Requesting 462 seconds of new work, and reporting 1 completed tasks
1/12/2007 7:48:35 PM||Project communication failed: attempting access to reference site
1/12/2007 7:48:37 PM||Access to reference site succeeded - project servers may be temporarily down.
1/12/2007 7:48:39 PM|Einstein@Home|Scheduler request failed: couldn't connect to server
1/12/2007 7:48:39 PM|Einstein@Home|Deferring scheduler requests for 1 minutes and 0 seconds
1/12/2007 7:49:39 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
1/12/2007 7:49:39 PM|Einstein@Home|Reason: To fetch work
1/12/2007 7:49:39 PM|Einstein@Home|Requesting 905 seconds of new work, and reporting 1 completed tasks
1/12/2007 7:50:02 PM||Project communication failed: attempting access to reference site
1/12/2007 7:50:03 PM||Access to reference site succeeded - project servers may be temporarily down.
1/12/2007 7:50:04 PM|Einstein@Home|Scheduler request failed: couldn't connect to server
1/12/2007 7:50:04 PM|Einstein@Home|Deferring scheduler requests for 1 minutes and 0 seconds
1/12/2007 7:51:04 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
1/12/2007 7:51:04 PM|Einstein@Home|Reason: To fetch work
1/12/2007 7:51:04 PM|Einstein@Home|Requesting 1342 seconds of new work, and reporting 1 completed tasks
1/12/2007 7:52:44 PM|Einstein@Home|Scheduler request succeeded
1/12/2007 7:52:44 PM|Einstein@Home|Message from server: Project is temporarily shut down for maintenance
1/12/2007 7:52:44 PM|Einstein@Home|Project is down
1/12/2007 7:59:03 PM||Rescheduling CPU: application exited
1/12/2007 7:59:03 PM|Einstein@Home|Computation for task h1_0373.5_S5R1__7463_S5R1a_0 finished
1/12/2007 7:59:05 PM|Einstein@Home|Started upload of file h1_0373.5_S5R1__7463_S5R1a_0_0
1/12/2007 7:59:07 PM|Einstein@Home|Finished upload of file h1_0373.5_S5R1__7463_S5R1a_0_0
1/12/2007 7:59:07 PM|Einstein@Home|Throughput 20834 bytes/sec
1/12/2007 8:52:45 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
1/12/2007 8:52:45 PM|Einstein@Home|Reason: To fetch work
1/12/2007 8:52:45 PM|Einstein@Home|Requesting 3644 seconds of new work, and reporting 2 completed tasks
1/12/2007 8:53:07 PM||Project communication failed: attempting access to reference site
1/12/2007 8:53:08 PM||Access to reference site succeeded - project servers may be temporarily down.
1/12/2007 8:53:10 PM|Einstein@Home|Scheduler request failed: couldn't connect to server
1/12/2007 8:53:10 PM|Einstein@Home|Deferring scheduler requests for 2 minutes and 3 seconds
1/12/2007 8:55:16 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
1/12/2007 8:55:16 PM|Einstein@Home|Reason: To fetch work
1/12/2007 8:55:16 PM|Einstein@Home|Requesting 3644 seconds of new work, and reporting 2 completed tasks
1/12/2007 8:56:11 PM|Einstein@Home|Scheduler request succeeded
1/12/2007 8:56:11 PM|Einstein@Home|Message from server: Project is temporarily shut down for maintenance
1/12/2007 8:56:11 PM|Einstein@Home|Project is down
My machine went into a 7 day coma, but as soon as I saw that the project was running again, I pressed the Update button and it came out of the coma.
Install boinc 5.8 and you haven't got the problem with one week defered scheduler anymore.
Sometime in the testing of 5.5.x the Max backoff was reduced to 24 hrs to prevent Boinc from going idle for a week when a project goes down for 1 or 2 days. This will be the new standard.
Cam anyone tell me how to increase the cache of the WU downloaded?
I didnt see any option in the BOINC client.(ver 5.4.11 for linux)
Its really annoying to see my PC left without work because of some server problems.
Can anyone tell me how to increase the cache of the WU downloaded?
I didnt see any option in the BOINC client.(ver 5.4.11 for linux)
Its really annoying to see my PC left without work because of some server problems.
By the way, that setting is 'global' -- if you have multiple BOINC projects (and you need to seriously look into this in any event, but even more so these days if Einstein is your only BOINC project), the last modified setting for this affects the size of your download cache for all projects).
Quote:
Can anyone tell me how to increase the cache of the WU downloaded?
I didnt see any option in the BOINC client.(ver 5.4.11 for linux)
Its really annoying to see my PC left without work because of some server problems.
By the way, that setting is 'global' -- if you have multiple BOINC projects (and you need to seriously look into this in any event, but even more so these days if Einstein is your only BOINC project), the last modified setting for this affects the size of your download cache for all projects).
The Connect setting IS global, however, you DO have a (limited) way around it.
Each venue (general/default, home, school, work) can have a different setting.
One of my projects is Malaria Control. It does a rather poor job of respecting the resource share allocation for itself and any other projects attached to a machine and so, when it connects, it downloads too many WUs at once to run both E@H and MCN at the same time. And immediately forces EDF mode. Luckily, the BOINC client enforces resource share via Long Term Debt values, and over time, E@H and MCN observe the resource share I specified. MCN crunches its short deadline WUs and then sits until LTD is satisfied to allow it to startup again.
In order to avoid the EDF mode whenever MCN does its downloads, I had to specify a smaller cache for the system I run MCN on. I did it by setting that machine (and will set any future systems I attach to MCN) to a specific venue which has a smaller "Connect to" setting than the rest of my systems.
That way, most of my systems have a reasonable size cache, and only systems I attached MCN to get the smaller cache.
:-) thats a very cute way around it :-) never thought along those lines LoL maybe wicked 2 some :-) I like it.
Regards
Masud.
Quote:
Quote:
By the way, that setting is 'global' -- if you have multiple BOINC projects (and you need to seriously look into this in any event, but even more so these days if Einstein is your only BOINC project), the last modified setting for this affects the size of your download cache for all projects).
The Connect setting IS global, however, you DO have a (limited) way around it.
Each venue (general/default, home, school, work) can have a different setting.
One of my projects is Malaria Control. It does a rather poor job of respecting the resource share allocation for itself and any other projects attached to a machine and so, when it connects, it downloads too many WUs at once to run both E@H and MCN at the same time. And immediately forces EDF mode. Luckily, the BOINC client enforces resource share via Long Term Debt values, and over time, E@H and MCN observe the resource share I specified. MCN crunches its short deadline WUs and then sits until LTD is satisfied to allow it to startup again.
In order to avoid the EDF mode whenever MCN does its downloads, I had to specify a smaller cache for the system I run MCN on. I did it by setting that machine (and will set any future systems I attach to MCN) to a specific venue which has a smaller "Connect to" setting than the rest of my systems.
That way, most of my systems have a reasonable size cache, and only systems I attached MCN to get the smaller cache.
RE: Does the WUs that have
)
A couple of points to note:-
*It's just the reporting stage that is not being done or even attempted.
*The machine will only truly be idle once its cache is fully crunched and uploaded.
*There shouldn't be any great impact on the servers once they are back on line as these stuck reports out in limbo-land are just like any other slow reporting work.
*One impact on the server will come from all the clients out there waking up after the week and all clamouring to report results and refill their caches pretty much at the same general time. This could well be days after the scheduler is back on line and shouldn't generally conflict with the other, bigger load of manually updated clients hammering away just after the scheduler has first come back on line.
*There is also a backoff for uploading but this never seems to grow to more than a few hours. As soon as the upload server is back on line, the stuck uploads will relatively quickly clear all on their own.
*It would be really nice if the BOINC client was smart enough to notice the resumption of uploads and decide to give the scheduler (the download server) a call, just in case it was back on line as well and thereby possibly break out early from its 1 week coma.
*There is probably a very good reason why the BOINC Devs haven't implemented something along these lines :).
So if anyone has a machine still in a one week coma, you should be manually updating the project ASAP.
Cheers,
Gary.
Uploads seem ok. Otherwise,
)
Uploads seem ok. Otherwise, something is not working. This has been going on for several days. These messages are for this evening.
1/12/2007 7:15:01 PM|Einstein@Home|Starting task h1_0373.5_S5R1__7463_S5R1a_0 using einstein_S5R1 version 424
1/12/2007 7:15:03 PM|Einstein@Home|Started upload of file h1_0379.5_S5R1__16208_S5R1a_1_0
1/12/2007 7:15:07 PM|Einstein@Home|Finished upload of file h1_0379.5_S5R1__16208_S5R1a_1_0
1/12/2007 7:15:07 PM|Einstein@Home|Throughput 33579 bytes/sec
1/12/2007 7:46:47 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
1/12/2007 7:46:47 PM|Einstein@Home|Reason: To fetch work
1/12/2007 7:46:47 PM|Einstein@Home|Requesting 24 seconds of new work, and reporting 1 completed tasks
1/12/2007 7:47:09 PM||Project communication failed: attempting access to reference site
1/12/2007 7:47:11 PM||Access to reference site succeeded - project servers may be temporarily down.
1/12/2007 7:47:13 PM|Einstein@Home|Scheduler request failed: couldn't connect to server
1/12/2007 7:47:13 PM|Einstein@Home|Deferring scheduler requests for 1 minutes and 0 seconds
1/12/2007 7:48:13 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
1/12/2007 7:48:13 PM|Einstein@Home|Reason: To fetch work
1/12/2007 7:48:13 PM|Einstein@Home|Requesting 462 seconds of new work, and reporting 1 completed tasks
1/12/2007 7:48:35 PM||Project communication failed: attempting access to reference site
1/12/2007 7:48:37 PM||Access to reference site succeeded - project servers may be temporarily down.
1/12/2007 7:48:39 PM|Einstein@Home|Scheduler request failed: couldn't connect to server
1/12/2007 7:48:39 PM|Einstein@Home|Deferring scheduler requests for 1 minutes and 0 seconds
1/12/2007 7:49:39 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
1/12/2007 7:49:39 PM|Einstein@Home|Reason: To fetch work
1/12/2007 7:49:39 PM|Einstein@Home|Requesting 905 seconds of new work, and reporting 1 completed tasks
1/12/2007 7:50:02 PM||Project communication failed: attempting access to reference site
1/12/2007 7:50:03 PM||Access to reference site succeeded - project servers may be temporarily down.
1/12/2007 7:50:04 PM|Einstein@Home|Scheduler request failed: couldn't connect to server
1/12/2007 7:50:04 PM|Einstein@Home|Deferring scheduler requests for 1 minutes and 0 seconds
1/12/2007 7:51:04 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
1/12/2007 7:51:04 PM|Einstein@Home|Reason: To fetch work
1/12/2007 7:51:04 PM|Einstein@Home|Requesting 1342 seconds of new work, and reporting 1 completed tasks
1/12/2007 7:52:44 PM|Einstein@Home|Scheduler request succeeded
1/12/2007 7:52:44 PM|Einstein@Home|Message from server: Project is temporarily shut down for maintenance
1/12/2007 7:52:44 PM|Einstein@Home|Project is down
1/12/2007 7:59:03 PM||Rescheduling CPU: application exited
1/12/2007 7:59:03 PM|Einstein@Home|Computation for task h1_0373.5_S5R1__7463_S5R1a_0 finished
1/12/2007 7:59:05 PM|Einstein@Home|Started upload of file h1_0373.5_S5R1__7463_S5R1a_0_0
1/12/2007 7:59:07 PM|Einstein@Home|Finished upload of file h1_0373.5_S5R1__7463_S5R1a_0_0
1/12/2007 7:59:07 PM|Einstein@Home|Throughput 20834 bytes/sec
1/12/2007 8:52:45 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
1/12/2007 8:52:45 PM|Einstein@Home|Reason: To fetch work
1/12/2007 8:52:45 PM|Einstein@Home|Requesting 3644 seconds of new work, and reporting 2 completed tasks
1/12/2007 8:53:07 PM||Project communication failed: attempting access to reference site
1/12/2007 8:53:08 PM||Access to reference site succeeded - project servers may be temporarily down.
1/12/2007 8:53:10 PM|Einstein@Home|Scheduler request failed: couldn't connect to server
1/12/2007 8:53:10 PM|Einstein@Home|Deferring scheduler requests for 2 minutes and 3 seconds
1/12/2007 8:55:16 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
1/12/2007 8:55:16 PM|Einstein@Home|Reason: To fetch work
1/12/2007 8:55:16 PM|Einstein@Home|Requesting 3644 seconds of new work, and reporting 2 completed tasks
1/12/2007 8:56:11 PM|Einstein@Home|Scheduler request succeeded
1/12/2007 8:56:11 PM|Einstein@Home|Message from server: Project is temporarily shut down for maintenance
1/12/2007 8:56:11 PM|Einstein@Home|Project is down
RE: RE: My machine went
)
Sometime in the testing of 5.5.x the Max backoff was reduced to 24 hrs to prevent Boinc from going idle for a week when a project goes down for 1 or 2 days. This will be the new standard.
Questions? Answers are in the BOINC Wiki.
Boinc V6.10.6 Alpha Test
WinXP C2D 2.1G 3GB
Cam anyone tell me how to
)
Cam anyone tell me how to increase the cache of the WU downloaded?
I didnt see any option in the BOINC client.(ver 5.4.11 for linux)
Its really annoying to see my PC left without work because of some server problems.
Can anyone tell me how to
)
Can anyone tell me how to increase the cache of the WU downloaded?
I didnt see any option in the BOINC client.(ver 5.4.11 for linux)
Its really annoying to see my PC left without work because of some server problems.
It can only be done in your
)
It can only be done in your General Preferences (in your Account Area) here on the Website.
http://einstein.phys.uwm.edu/prefs.php?subset=global
RE: It can only be done in
)
Oh yeah i figured it.Thanks a lot!
By the way, that setting is
)
By the way, that setting is 'global' -- if you have multiple BOINC projects (and you need to seriously look into this in any event, but even more so these days if Einstein is your only BOINC project), the last modified setting for this affects the size of your download cache for all projects).
RE: By the way, that
)
The Connect setting IS global, however, you DO have a (limited) way around it.
Each venue (general/default, home, school, work) can have a different setting.
One of my projects is Malaria Control. It does a rather poor job of respecting the resource share allocation for itself and any other projects attached to a machine and so, when it connects, it downloads too many WUs at once to run both E@H and MCN at the same time. And immediately forces EDF mode. Luckily, the BOINC client enforces resource share via Long Term Debt values, and over time, E@H and MCN observe the resource share I specified. MCN crunches its short deadline WUs and then sits until LTD is satisfied to allow it to startup again.
In order to avoid the EDF mode whenever MCN does its downloads, I had to specify a smaller cache for the system I run MCN on. I did it by setting that machine (and will set any future systems I attach to MCN) to a specific venue which has a smaller "Connect to" setting than the rest of my systems.
That way, most of my systems have a reasonable size cache, and only systems I attached MCN to get the smaller cache.
[edit - fixed error in quote brackets]
Seti Classic Final Total: 11446 WU.
:-) thats a very cute way
)
:-) thats a very cute way around it :-) never thought along those lines LoL maybe wicked 2 some :-) I like it.
Regards
Masud.