Error condition after CPU Benchmark (S4 l1)

Kilcock
Kilcock
Joined: 1 Jun 05
Posts: 41
Credit: 2604
RAC: 0
Topic 189455

Before benchmark I had model
#1 > Running
#2 > Preempted
#3 > Suspended

After benchmark:
#1 > Running (81.82 % and going up)
#2 > Running (02.77 % and going up)
#3 > Suspended

Using version 4.79 on Windows 2003 Server
Platform Intel PIII 1000MHz FPGA (single Processor)

NB1) Task manager indeed shows 3 einstein_4.79... processes running 50 resp. 48 % and 0 %
NB2) After an hour the total time to run model still is 17 hours (time alike normally running one process !)
NB3) For sake of completeness, I have to add that this computer participates in a mini-expiriment as computer #2 (http://einsteinathome.org/node/189415)

Please advise, as to me it seems to be an undesired condition.

Kilcock
Kilcock
Joined: 1 Jun 05
Posts: 41
Credit: 2604
RAC: 0

Error condition after CPU Benchmark (S4 l1)

Before benchmark I had model
#1 > Running
#2 > Preempted

After benchmark:
#1 > Running (47.51 % and going up)
#2 > Running (05.07 % and going up)

After suspending both models it was possible to get to the situation before the benchmark.

Using version 4.79 on Windows 98
Platform Intel PIII 800MHz FPGA (single Processor)

NB1) After two hours the total time to run model is 24 resp. 42 hours.
NB2) For sake of completeness, I have to add that this computer participates in a mini-expiriment as computer #1 (http://einsteinathome.org/node/189415)

Please advise, as to me it seems to be an undesired condition.
____________
Eric.ie

Below the log: (see bold)

6/28/05 2:34:35 PM||Starting BOINC client version 4.45 for windows_intelx86
6/28/05 2:34:35 PM||Data directory: C:PROGRAM FILESBOINC
6/28/05 2:34:35 PM|Einstein@Home|Computer ID: 265720; location: home; project prefs: default
6/28/05 2:34:35 PM||General prefs: from unknown project http://climateprediction.net/ (last modified 2005-06-12 02:21:32)
6/28/05 2:34:35 PM||General prefs: using your defaults
6/28/05 2:34:35 PM||Remote control not allowed; using loopback address
6/28/05 2:34:35 PM|Einstein@Home|Resuming computation for result l1_0778.0__0778.1_0.1_T00_S4lA_2 using einstein version 4.79
6/28/05 2:34:35 PM|Einstein@Home|Deferring computation for result l1_0778.0__0778.2_0.1_T00_S4lA_0
6/28/05 2:34:35 PM||Suspending network activity - user request
6/28/05 2:34:35 PM||Suspending work fetch because computer is overcommitted.
6/28/05 2:34:35 PM||Using earliest-deadline-first scheduling because computer is overcommitted.
6/28/05 2:34:35 PM|Einstein@Home|Pausing result l1_0778.0__0778.1_0.1_T00_S4lA_2 (left in memory)
6/28/05 2:35:05 PM||request_reschedule_cpus: result op
6/28/05 2:35:05 PM|Einstein@Home|Restarting result l1_0778.0__0778.2_0.1_T00_S4lA_0 using einstein version 4.79
6/28/05 5:35:05 PM||Allowing work fetch again.
6/28/05 5:35:05 PM||Resuming round-robin CPU scheduling.
6/28/05 8:04:52 PM||request_reschedule_cpus: result op
6/28/05 8:04:52 PM|Einstein@Home|Pausing result l1_0778.0__0778.2_0.1_T00_S4lA_0 (left in memory)
6/28/05 8:04:55 PM||Resuming network activity
6/28/05 8:04:56 PM|Einstein@Home|Deferring communication with project for 1 days, 3 hours, 45 minutes, and 15 seconds
6/28/05 8:04:56 PM||Insufficient work; requesting more
6/28/05 8:05:12 PM||request_reschedule_cpus: project op
6/28/05 8:05:13 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
6/28/05 8:05:13 PM|Einstein@Home|Requesting 1 seconds of work, returning 0 results
6/28/05 8:05:21 PM|Einstein@Home|Scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
6/28/05 8:05:21 PM|Einstein@Home|Message from server: No work sent
6/28/05 8:05:21 PM|Einstein@Home|Message from server: (won't finish in time) Computer on 26.3% of time, BOINC on 100.0% of that, this project gets 100.0% of that
6/28/05 8:05:21 PM|Einstein@Home|No work from project
6/28/05 8:05:22 PM|Einstein@Home|Deferring communication with project for 23 hours, 57 minutes, and 59 seconds
6/28/05 8:06:01 PM||request_reschedule_cpus: result op
6/28/05 8:06:01 PM|Einstein@Home|Resuming result l1_0778.0__0778.2_0.1_T00_S4lA_0 using einstein version 4.79
6/28/05 8:06:08 PM||Suspending network activity - user request
6/29/05 11:39:07 AM||request_reschedule_cpus: process exited
6/29/05 11:39:07 AM|Einstein@Home|Computation for result l1_0778.0__0778.2_0.1_T00_S4lA_0 finished
6/29/05 1:04:22 PM||Resuming network activity
6/29/05 1:04:23 PM|Einstein@Home|Deferring communication with project for 6 hours, 58 minutes, and 57 seconds
6/29/05 1:04:23 PM||Insufficient work; requesting more
6/29/05 1:04:27 PM|Einstein@Home|Started upload of l1_0778.0__0778.2_0.1_T00_S4lA_0_0
6/29/05 1:05:04 PM|Einstein@Home|Finished upload of l1_0778.0__0778.2_0.1_T00_S4lA_0_0
6/29/05 1:05:04 PM|Einstein@Home|Throughput 1328 bytes/sec
6/29/05 1:05:57 PM||request_reschedule_cpus: project op
6/29/05 1:05:58 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
6/29/05 1:05:58 PM|Einstein@Home|Requesting 0 seconds of work, returning 1 results
6/29/05 1:06:14 PM|Einstein@Home|Scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
6/29/05 1:06:33 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
6/29/05 1:06:33 PM|Einstein@Home|Requesting 1 seconds of work, returning 0 results
6/29/05 1:06:47 PM|Einstein@Home|Scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
6/29/05 1:06:47 PM|Einstein@Home|Message from server: Not sending work - last RPC too recent: 34 sec
6/29/05 1:06:47 PM|Einstein@Home|No work from project
6/29/05 1:06:48 PM|Einstein@Home|Deferring communication with project for 59 seconds
6/29/05 1:07:48 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
6/29/05 1:07:48 PM|Einstein@Home|Requesting 1 seconds of work, returning 0 results
6/29/05 1:08:11 PM|Einstein@Home|Scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
6/29/05 1:08:12 PM||request_reschedule_cpus: files downloaded
6/29/05 1:08:12 PM|Einstein@Home|Starting result l1_0778.0__0778.0_0.1_T01_S4lA_0 using einstein version 4.79
6/29/05 1:08:15 PM||Suspending work fetch because computer is overcommitted.
6/29/05 1:08:15 PM||Using earliest-deadline-first scheduling because computer is overcommitted.
6/29/05 1:27:05 PM||Suspending network activity - user request
6/29/05 4:08:12 PM||Allowing work fetch again.
6/29/05 4:08:12 PM||Resuming round-robin CPU scheduling.
6/29/05 6:37:19 PM||Resuming network activity
6/29/05 6:38:14 PM||request_reschedule_cpus: project op
6/29/05 6:38:14 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
6/29/05 6:38:14 PM|Einstein@Home|Requesting 0 seconds of work, returning 0 results
6/29/05 6:38:21 PM|Einstein@Home|Scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
6/29/05 6:39:52 PM|Einstein@Home|Unrecoverable error for result l1_0778.0__0778.1_0.1_T00_S4lA_2 (aborted via GUI RPC)
6/29/05 6:39:52 PM||request_reschedule_cpus: result op
6/29/05 6:39:53 PM||request_reschedule_cpus: process exited
6/29/05 6:39:53 PM|Einstein@Home|Deferring communication with project for 59 seconds
6/29/05 6:39:53 PM|Einstein@Home|Computation for result l1_0778.0__0778.1_0.1_T00_S4lA_2 finished
6/29/05 6:40:53 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
6/29/05 6:40:53 PM|Einstein@Home|Requesting 0 seconds of work, returning 1 results
6/29/05 6:40:59 PM|Einstein@Home|Scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
6/29/05 6:42:06 PM||request_reschedule_cpus: result op
6/29/05 6:42:06 PM|Einstein@Home|Pausing result l1_0778.0__0778.0_0.1_T01_S4lA_0 (left in memory)
6/29/05 6:42:07 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
6/29/05 6:42:07 PM|Einstein@Home|Requesting 1 seconds of work, returning 0 results
6/29/05 6:42:09 PM||request_reschedule_cpus: project op
6/29/05 6:42:19 PM|Einstein@Home|Scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
6/29/05 6:42:20 PM||request_reschedule_cpus: files downloaded
6/29/05 6:42:20 PM|Einstein@Home|Starting result l1_0778.0__0778.2_0.1_T01_S4lA_0 using einstein version 4.79
6/29/05 6:42:24 PM||Suspending work fetch because computer is overcommitted.
6/29/05 6:42:24 PM||Using earliest-deadline-first scheduling because computer is overcommitted.
6/29/05 6:42:41 PM||request_reschedule_cpus: result op
6/29/05 6:42:41 PM||Resuming round-robin CPU scheduling.
6/29/05 6:42:41 PM|Einstein@Home|Pausing result l1_0778.0__0778.2_0.1_T01_S4lA_0 (left in memory)
6/29/05 6:42:42 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
6/29/05 6:42:42 PM|Einstein@Home|Requesting 1 seconds of work, returning 0 results
6/29/05 6:42:44 PM||request_reschedule_cpus: result op
6/29/05 6:42:44 PM||Using earliest-deadline-first scheduling because computer is overcommitted.
6/29/05 6:42:44 PM|Einstein@Home|Resuming result l1_0778.0__0778.0_0.1_T01_S4lA_0 using einstein version 4.79
6/29/05 6:42:47 PM|Einstein@Home|Scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
6/29/05 6:42:47 PM|Einstein@Home|Message from server: Not sending work - last RPC too recent: 35 sec
6/29/05 6:42:47 PM|Einstein@Home|No work from project
6/29/05 6:42:48 PM||Suspending network activity - user request
6/29/05 6:44:08 PM||request_reschedule_cpus: result op
6/29/05 6:44:08 PM|Einstein@Home|Pausing result l1_0778.0__0778.0_0.1_T01_S4lA_0 (left in memory)
6/29/05 6:44:13 PM||request_reschedule_cpus: project op
6/29/05 6:44:35 PM||Resuming network activity
6/29/05 6:44:36 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
6/29/05 6:44:36 PM|Einstein@Home|Requesting 1 seconds of work, returning 0 results
6/29/05 6:44:43 PM|Einstein@Home|Scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
6/29/05 6:44:43 PM|Einstein@Home|Message from server: No work sent
6/29/05 6:44:43 PM|Einstein@Home|Message from server: (won't finish in time) Computer on 28.6% of time, BOINC on 100.0% of that, this project gets 100.0% of that
6/29/05 6:44:43 PM|Einstein@Home|No work from project
6/29/05 6:44:44 PM|Einstein@Home|Deferring communication with project for 1 days, 3 hours, 31 minutes, and 11 seconds
6/29/05 6:45:30 PM||Suspending network activity - user request
6/29/05 6:45:41 PM||request_reschedule_cpus: result op
6/29/05 6:45:41 PM|Einstein@Home|Resuming result l1_0778.0__0778.0_0.1_T01_S4lA_0 using einstein version 4.79
6/29/05 8:01:06 PM||request_reschedule_cpus: result op
6/29/05 8:01:06 PM||Resuming round-robin CPU scheduling.
6/29/05 9:39:52 PM||request_reschedule_cpus: result op
6/29/05 9:39:52 PM||Allowing work fetch again.
6/29/05 9:39:56 PM||request_reschedule_cpus: result op
6/29/05 9:39:56 PM|Einstein@Home|Pausing result l1_0778.0__0778.0_0.1_T01_S4lA_0 (left in memory)
6/29/05 9:40:00 PM||Resuming network activity
6/29/05 9:40:00 PM|Einstein@Home|Deferring communication with project for 1 days, 0 hours, 35 minutes, and 55 seconds
6/29/05 9:40:00 PM||Insufficient work; requesting more
6/29/05 9:40:04 PM||request_reschedule_cpus: project op
6/29/05 9:40:04 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
6/29/05 9:40:04 PM|Einstein@Home|Requesting 1 seconds of work, returning 0 results
6/29/05 9:40:11 PM|Einstein@Home|Scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
6/29/05 9:40:11 PM|Einstein@Home|Message from server: No work sent
6/29/05 9:40:11 PM|Einstein@Home|Message from server: (won't finish in time) Computer on 28.9% of time, BOINC on 100.0% of that, this project gets 100.0% of that
6/29/05 9:40:11 PM|Einstein@Home|No work from project
6/29/05 9:40:12 PM|Einstein@Home|Deferring communication with project for 1 days, 1 hours, 14 minutes, and 52 seconds
6/29/05 9:40:42 PM||request_reschedule_cpus: project op
6/29/05 9:40:42 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
6/29/05 9:40:42 PM|Einstein@Home|Requesting 1 seconds of work, returning 0 results
6/29/05 9:40:49 PM|Einstein@Home|Scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
6/29/05 9:40:49 PM|Einstein@Home|Message from server: Not sending work - last RPC too recent: 37 sec
6/29/05 9:40:49 PM|Einstein@Home|No work from project
6/29/05 9:40:50 PM|Einstein@Home|Deferring communication with project for 59 seconds
6/29/05 9:41:49 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
6/29/05 9:41:49 PM|Einstein@Home|Requesting 1 seconds of work, returning 0 results
6/29/05 9:41:56 PM|Einstein@Home|Scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
6/29/05 9:41:56 PM|Einstein@Home|Message from server: No work sent
6/29/05 9:41:56 PM|Einstein@Home|Message from server: (won't finish in time) Computer on 28.9% of time, BOINC on 100.0% of that, this project gets 100.0% of that
6/29/05 9:41:56 PM|Einstein@Home|No work from project
6/29/05 9:41:57 PM|Einstein@Home|Deferring communication with project for 1 days, 1 hours, 14 minutes, and 43 seconds
6/29/05 9:42:22 PM||Suspending network activity - user request
6/29/05 9:42:31 PM||request_reschedule_cpus: result op
6/29/05 9:42:31 PM|Einstein@Home|Resuming result l1_0778.0__0778.0_0.1_T01_S4lA_0 using einstein version 4.79
6/29/05 9:42:34 PM||request_reschedule_cpus: result op
6/30/05 12:07:17 AM||request_reschedule_cpus: result op
6/30/05 12:07:33 AM||request_reschedule_cpus: result op
6/30/05 12:07:33 AM|Einstein@Home|Pausing result l1_0778.0__0778.0_0.1_T01_S4lA_0 (left in memory)
6/30/05 12:07:37 AM||Resuming network activity
6/30/05 12:07:38 AM|Einstein@Home|Deferring communication with project for 22 hours, 49 minutes, and 2 seconds
6/30/05 12:07:38 AM||Insufficient work; requesting more
6/30/05 12:07:42 AM||request_reschedule_cpus: project op
6/30/05 12:07:43 AM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
6/30/05 12:07:43 AM|Einstein@Home|Requesting 1 seconds of work, returning 0 results
6/30/05 12:07:49 AM|Einstein@Home|Scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
6/30/05 12:07:49 AM|Einstein@Home|Message from server: No work sent
6/30/05 12:07:49 AM|Einstein@Home|Message from server: (won't finish in time) Computer on 29.1% of time, BOINC on 100.0% of that, this project gets 100.0% of that
6/30/05 12:07:49 AM|Einstein@Home|No work from project
6/30/05 12:07:50 AM|Einstein@Home|Deferring communication with project for 23 hours, 22 minutes, and 46 seconds
6/30/05 12:08:22 AM||Suspending network activity - user request
6/30/05 12:08:29 AM||request_reschedule_cpus: result op
6/30/05 12:08:29 AM|Einstein@Home|Resuming result l1_0778.0__0778.0_0.1_T01_S4lA_0 using einstein version 4.79
6/30/05 12:08:33 AM||request_reschedule_cpus: result op
6/30/05 7:18:31 AM||request_reschedule_cpus: result op
6/30/05 7:18:33 AM||request_reschedule_cpus: result op
6/30/05 7:18:33 AM|Einstein@Home|Pausing result l1_0778.0__0778.0_0.1_T01_S4lA_0 (left in memory)
6/30/05 7:18:38 AM||Resuming network activity
6/30/05 7:18:39 AM|Einstein@Home|Deferring communication with project for 16 hours, 11 minutes, and 57 seconds
6/30/05 7:18:39 AM||Insufficient work; requesting more
6/30/05 7:18:44 AM||request_reschedule_cpus: project op
6/30/05 7:18:45 AM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
6/30/05 7:18:45 AM|Einstein@Home|Requesting 1 seconds of work, returning 0 results
6/30/05 7:18:59 AM|Einstein@Home|Scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
6/30/05 7:19:00 AM||Suspending work fetch because computer is overcommitted.
6/30/05 7:19:00 AM||Using earliest-deadline-first scheduling because computer is overcommitted.
6/30/05 7:19:00 AM||request_reschedule_cpus: files downloaded
6/30/05 7:19:00 AM|Einstein@Home|Starting result l1_0778.0__0778.1_0.1_T02_S4lA_0 using einstein version 4.79
6/30/05 7:19:27 AM||request_reschedule_cpus: result op
6/30/05 7:19:27 AM|Einstein@Home|Pausing result l1_0778.0__0778.1_0.1_T02_S4lA_0 (left in memory)
6/30/05 7:19:28 AM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
6/30/05 7:19:28 AM|Einstein@Home|Requesting 1 seconds of work, returning 0 results
6/30/05 7:19:30 AM||request_reschedule_cpus: result op
6/30/05 7:19:30 AM|Einstein@Home|Resuming result l1_0778.0__0778.0_0.1_T01_S4lA_0 using einstein version 4.79
6/30/05 7:19:33 AM||request_reschedule_cpus: result op
6/30/05 7:19:35 AM|Einstein@Home|Scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
6/30/05 7:19:35 AM|Einstein@Home|Message from server: Not sending work - last RPC too recent: 44 sec
6/30/05 7:19:35 AM|Einstein@Home|No work from project
6/30/05 7:19:36 AM|Einstein@Home|Deferring communication with project for 59 seconds
6/30/05 7:19:44 AM||Suspending network activity - user request
6/30/05 10:20:21 AM||request_reschedule_cpus: process exited
6/30/05 10:20:21 AM|Einstein@Home|Computation for result l1_0778.0__0778.0_0.1_T01_S4lA_0 finished
6/30/05 10:20:21 AM|Einstein@Home|Resuming result l1_0778.0__0778.2_0.1_T01_S4lA_0 using einstein version 4.79
6/30/05 2:48:44 PM||Resuming network activity
6/30/05 2:48:46 PM|Einstein@Home|Started upload of l1_0778.0__0778.0_0.1_T01_S4lA_0_0
6/30/05 2:49:03 PM|Einstein@Home|Finished upload of l1_0778.0__0778.0_0.1_T01_S4lA_0_0
6/30/05 2:49:03 PM|Einstein@Home|Throughput 2325 bytes/sec
6/30/05 2:49:05 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
6/30/05 2:49:05 PM|Einstein@Home|Requesting 1 seconds of work, returning 1 results
6/30/05 2:49:11 PM|Einstein@Home|Scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
6/30/05 2:49:11 PM|Einstein@Home|Message from server: No work sent
6/30/05 2:49:11 PM|Einstein@Home|Message from server: (won't finish in time) Computer on 30.6% of time, BOINC on 100.0% of that, this project gets 100.0% of that
6/30/05 2:49:12 PM|Einstein@Home|Deferring communication with project for 1 days, 1 hours, 53 minutes, and 42 seconds
6/30/05 2:51:34 PM||request_reschedule_cpus: result op
6/30/05 2:51:34 PM|Einstein@Home|Pausing result l1_0778.0__0778.2_0.1_T01_S4lA_0 (left in memory)
6/30/05 2:51:34 PM||Insufficient work; requesting more
6/30/05 3:04:31 PM||request_reschedule_cpus: result op
6/30/05 3:04:31 PM|Einstein@Home|Resuming result l1_0778.0__0778.2_0.1_T01_S4lA_0 using einstein version 4.79
6/30/05 3:04:37 PM||Suspending network activity - user request
6/30/05 3:04:42 PM||request_reschedule_cpus: result op
6/30/05 6:04:42 PM||Allowing work fetch again.
6/30/05 6:04:42 PM||Resuming round-robin CPU scheduling.
6/30/05 7:30:41 PM||Suspending computation and network activity - running CPU benchmarks
6/30/05 7:30:41 PM|Einstein@Home|Pausing result l1_0778.0__0778.2_0.1_T01_S4lA_0 (removed from memory)
6/30/05 7:30:43 PM||request_reschedule_cpus: process exited
6/30/05 7:30:43 PM||Running CPU benchmarks
6/30/05 7:31:41 PM||Benchmark results:
6/30/05 7:31:41 PM|| Number of CPUs: 1
6/30/05 7:31:41 PM|| 708 double precision MIPS (Whetstone) per CPU
6/30/05 7:31:41 PM|| 1237 integer MIPS (Dhrystone) per CPU
6/30/05 7:31:41 PM||Finished CPU benchmarks
6/30/05 7:31:41 PM||Resuming computation and network activity
6/30/05 7:31:41 PM||request_reschedule_cpus: Resuming activities
6/30/05 7:31:41 PM|Einstein@Home|Restarting result l1_0778.0__0778.2_0.1_T01_S4lA_0 using einstein version 4.79
6/30/05 7:31:41 PM|Einstein@Home|Resuming result l1_0778.0__0778.1_0.1_T02_S4lA_0 using einstein version 4.79

6/30/05 8:12:14 PM||Resuming network activity
6/30/05 8:12:14 PM||Suspending work fetch because computer is overcommitted.
6/30/05 8:12:14 PM||Using earliest-deadline-first scheduling because computer is overcommitted.
6/30/05 8:12:14 PM|Einstein@Home|Deferring communication with project for 20 hours, 30 minutes, and 40 seconds
6/30/05 9:12:15 PM|Einstein@Home|Deferring communication with project for 19 hours, 30 minutes, and 39 seconds
6/30/05 9:40:40 PM||request_reschedule_cpus: result op
6/30/05 9:40:40 PM|Einstein@Home|Pausing result l1_0778.0__0778.2_0.1_T01_S4lA_0 (left in memory)
6/30/05 9:40:42 PM||request_reschedule_cpus: result op
6/30/05 9:40:42 PM|Einstein@Home|Pausing result l1_0778.0__0778.1_0.1_T02_S4lA_0 (left in memory)
6/30/05 9:40:43 PM||Insufficient work; requesting more
6/30/05 9:42:13 PM||request_reschedule_cpus: project op
6/30/05 9:42:13 PM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
6/30/05 9:42:13 PM|Einstein@Home|Requesting 1 seconds of work, returning 0 results
6/30/05 9:42:19 PM|Einstein@Home|Scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
6/30/05 9:42:19 PM|Einstein@Home|General preferences have been updated
6/30/05 9:42:19 PM|Einstein@Home|Message from server: No work sent
6/30/05 9:42:19 PM|Einstein@Home|Message from server: (won't finish in time) Computer on 31.2% of time, BOINC on 100.0% of that, this project gets 100.0% of that
6/30/05 9:42:19 PM||General prefs: from Einstein@Home (last modified 2005-06-30 19:23:00)
6/30/05 9:42:19 PM||General prefs: using your defaults
6/30/05 9:42:19 PM|Einstein@Home|No work from project
6/30/05 9:42:21 PM|Einstein@Home|Deferring communication with project for 1 days, 10 hours, 3 minutes, and 46 seconds
6/30/05 9:43:42 PM||request_reschedule_cpus: result op
6/30/05 9:43:42 PM|Einstein@Home|Resuming result l1_0778.0__0778.2_0.1_T01_S4lA_0 using einstein version 4.79
6/30/05 9:44:26 PM||request_reschedule_cpus: result op

Walt Gribben
Walt Gribben
Joined: 20 Feb 05
Posts: 219
Credit: 1645393
RAC: 0

RE: Before benchmark I had

Message 13661 in response to message 13660

Quote:

Before benchmark I had model
#1 > Running
#2 > Preempted

After benchmark:
#1 > Running (47.51 % and going up)
#2 > Running (05.07 % and going up)

After suspending both models it was possible to get to the situation before the benchmark.

Using version 4.79 on Windows 98
Platform Intel PIII 800MHz FPGA (single Processor)

NB1) After two hours the total time to run model is 24 resp. 42 hours.
NB2) For sake of completeness, I have to add that this computer participates in a mini-expiriment as computer #1 (http://einsteinathome.org/node/189415)

Please advise, as to me it seems to be an undesired condition.
____________
Eric.ie

Stop BOINC, wait a minute, then start BOINC again. The wait will ensure any science apps end.

Kilcock
Kilcock
Joined: 1 Jun 05
Posts: 41
Credit: 2604
RAC: 0

RE: Stop BOINC, wait a

Message 13662 in response to message 13661

Quote:
Stop BOINC, wait a minute, then start BOINC again. The wait will ensure any science apps end.

Done so,for W'95, but don't see any differance !
Later, on W2k3 Server, which I didn't restart, a 'Allowing work fetch again' pauses the second work-thread again and returns to preempted @ 9.10%

Assume, there is a flaw in the new Config file, or so !

6/30/05 7:31:41 PM|Einstein@Home| Restarting result l1_0778.0__0778.2_0.1_T01_S4lA_0 using einstein version 4.79
6/30/05 7:31:41 PM|Einstein@Home| Resuming result l1_0778.0__0778.1_0.1_T02_S4lA_0 using einstein version 4.79

Those seem to be generated by different procedures !

Walt Gribben
Walt Gribben
Joined: 20 Feb 05
Posts: 219
Credit: 1645393
RAC: 0

RE: RE: Stop BOINC, wait

Message 13663 in response to message 13662

Quote:
Quote:
Stop BOINC, wait a minute, then start BOINC again. The wait will ensure any science apps end.

Done so,for W'95, but don't see any differance !
Later, on W2k3 Server, which I didn't restart, a 'Allowing work fetch again' pauses the second work-thread again and returns to preempted @ 9.10%

Assume, there is a flaw in the new Config file, or so !

6/30/05 7:31:41 PM|Einstein@Home| Restarting result l1_0778.0__0778.2_0.1_T01_S4lA_0 using einstein version 4.79
6/30/05 7:31:41 PM|Einstein@Home| Resuming result l1_0778.0__0778.1_0.1_T02_S4lA_0 using einstein version 4.79

Those seem to be generated by different procedures !

Yes, one part handles deciding when additional work is needed or theres too much, another part requests it and yet another decides which WU to run.

Grab Process Explorer[/url] from SysInternals, its good for both Win98 and WIn3k. Run it and check how many of the science apps are actually running.

I've seen a couple instances where boinc gets the status wrong for WU's, particularly after benchmarks ran. Culprit seems to be that boinc tries to stop the science apps (remove it from memory even) before running the benchmarks. Then starts a WU after. Problem occurs when one or all of the science apps keep running, it starts a different WU for the same project. I think it somehow lost track of the WU that didn't stop properly. Actually, the benchmarks are supposed to be bypassed when that happens.

Since you say restarting BOINC didn't fix it the WU status, you need to verify that only one science app is actually running. That is, actively working on the WU, consuming CPU cycles. On single processor systems like you have, only one should show near 99% CPU busy. Any others should be idle.

Thats probably the case with your Win98 system, but you could still have multiple apps running on your Win3k system. If so, stop BOINC, wait until all the science apps stop (watch them with the task manager or Process Explorer), then restart BOINC.

I don't think theres much you can do about the status shown until the WU's finish. One problem seems to be that more that one got started for a project, which shouldn't be possible.

Walt

Kilcock
Kilcock
Joined: 1 Jun 05
Posts: 41
Credit: 2604
RAC: 0

RE: RE: 6/30/05 7:31:41

Message 13664 in response to message 13663

Quote:
Quote:

6/30/05 7:31:41 PM|Einstein@Home| Restarting result l1_0778.0__0778.2_0.1_T01_S4lA_0 using einstein version 4.79
6/30/05 7:31:41 PM|Einstein@Home| Resuming result l1_0778.0__0778.1_0.1_T02_S4lA_0 using einstein version 4.79

Those seem to be generated by different procedures !

Yes, one part handles deciding when additional work is needed or theres too much, another part requests it and yet another decides which WU to run.

Grab Process Explorer[/url] from SysInternals, its good for both Win98 and WIn3k. Run it and check how many of the science apps are actually running.

I've seen a couple instances where boinc gets the status wrong for WU's, particularly after benchmarks ran. Culprit seems to be that boinc tries to stop the science apps (remove it from memory even) before running the benchmarks. Then starts a WU after. Problem occurs when one or all of the science apps keep running, it starts a different WU for the same project. I think it somehow lost track of the WU that didn't stop properly. Actually, the benchmarks are supposed to be bypassed when that happens.


Suggestion:
from log below:
6/30/05 7:30:41 PM|Einstein@Home|Pausing result l1_0778.0__0778.2_0.1_T01_S4lA_0 (removed from memory)
6/30/05 7:30:43 PM||request_reschedule_cpus: process exited
6/30/05 7:30:43 PM||Running CPU benchmarks

Note that log reports that wu # 1 is stopped and removed from memory !
Than there is 2 seconds before cpu-resched is exited, which might give a sea of time for the preemted wu #2 to be flagged running.

After bench, wu #1 is restarted and wu #2 goes ahead and resumes.

That seems to me the most logic explenation

Quote:
Since you say restarting BOINC didn't** fix it the WU status, you need to verify that only one science app is actually running. That is, actively working on the WU, consuming CPU cycles. On single processor systems like you have, only one should show near 99% CPU busy. Any others should be idle.

**Wrong !, Sorry for not reporting that clearly !
With a resume operation on both wu threads I had already left the error condition, and restarting did not change anything after that.

Quote:

Thats probably the case with your Win98 system, but you could still have multiple apps running on your Win3k system. **NO** If so, stop BOINC, wait until all the science apps stop (watch them with the task manager or Process Explorer), then restart BOINC.

I don't think theres much you can do about the status shown until the WU's finish. One problem seems to be that more that one got started for a project, which shouldn't be possible.

Walt


Both wu #1 got status Over/Succeeded/Done, one got credit strait away, but I doubt that the wu #2's will be OK.

Walt, thanks for taking the issue and please have a look at my suggestion, assuming you are a E@H Admin and code literate.

Eric.ie

Walt Gribben
Walt Gribben
Joined: 20 Feb 05
Posts: 219
Credit: 1645393
RAC: 0

RE: RE: Since you say

Message 13665 in response to message 13664

Quote:
Quote:
Since you say restarting BOINC didn't** fix it the WU status, you need to verify that only one science app is actually running. That is, actively working on the WU, consuming CPU cycles. On single processor systems like you have, only one should show near 99% CPU busy. Any others should be idle.
**Wrong !, Sorry for not reporting that clearly !
With a resume operation on both wu threads I had already left the error condition, and restarting did not change anything after that.

Perhaps we're both misreading the messages. And I see an additional "problem" I didn't see before, there are two E@H apps running. More on that below.

You said (quoting my reply):

Quote:
Quote:
Stop BOINC, wait a minute, then start BOINC again. The wait will ensure any science apps end.

Done so,for W'95, but don't see any differance !

Which I take as it didn't work. That is, you originally reported having two science apps running at the same time, each one taking some portion of the CPU. And restarting BOINC still had two science apps running, each one still taking some portion of the CPU.

By "fix" I meant that after the restart, BOINC will show one process running, the others either "ready to run" or "preempted" if they had previously been started. Process Explorer would show only one science app consuming CPU resources, up around 90-99%. Is this the case? Or did BOINC still start, and run, two science apps?

Details on that, which maybe I didn't make clear.

The messages from the log do show one result "restarting", thats OK. And it shows one result "resuming". Thats not OK. With only one processor, BOINC should only let one "result" run. The other one was preempted before the benchmark ran, and it should have stayed preempted after the benchmark finished.

That is, the sequence is (from the log):

Quote:


6/30/05 7:19:00 AM Einstein@Home Starting result l1_0778.0__0778.1_0.1_T02_S4lA_0 using einstein version 4.79
6/30/05 7:19:27 AM Einstein@Home Pausing result l1_0778.0__0778.1_0.1_T02_S4lA_0 (left in memory)

6/30/05 10:20:21 AM Einstein@Home Resuming result l1_0778.0__0778.2_0.1_T01_S4lA_0 using einstein version 4.79
6/30/05 2:51:34 PM Einstein@Home Pausing result l1_0778.0__0778.2_0.1_T01_S4lA_0 (left in memory)
6/30/05 3:04:31 PM Einstein@Home Resuming result l1_0778.0__0778.2_0.1_T01_S4lA_0 using einstein version 4.79

6/30/05 7:30:41 PM Suspending computation and network activity - running CPU benchmarks
6/30/05 7:30:41 PM Einstein@Home|Pausing result l1_0778.0__0778.2_0.1_T01_S4lA_0 (removed from memory)
6/30/05 7:30:43 PM request_reschedule_cpus: process exited
6/30/05 7:30:43 PM Running CPU benchmarks
6/30/05 7:31:41 PM Benchmark results:
6/30/05 7:31:41 PM Number of CPUs: 1
6/30/05 7:31:41 PM 708 double precision MIPS (Whetstone) per CPU
6/30/05 7:31:41 PM 1237 integer MIPS (Dhrystone) per CPU
6/30/05 7:31:41 PM Finished CPU benchmarks
6/30/05 7:31:41 PM Resuming computation and network activity
6/30/05 7:31:41 PM request_reschedule_cpus: Resuming activities
6/30/05 7:31:41 PM Einstein@Home Restarting result l1_0778.0__0778.2_0.1_T01_S4lA_0 using einstein version 4.79
6/30/05 7:31:41 PM Einstein@Home Resuming result l1_0778.0__0778.1_0.1_T02_S4lA_0 using einstein version 4.79

I put some spaces in to make it easier to see the two WU's and show where I removed messages.

So BOINC should have either restarted ...778.2... (the one running before the benchmark) or resumed ...0778.1... (the one paused at 7:19:27) but not both.

You continued with (again, quoting my reply):

Quote:
Quote:

Thats probably the case with your Win98 system, but you could still have multiple apps running on your Win3k system. **NO** If so, stop BOINC, wait until all the science apps stop (watch them with the task manager or Process Explorer), then restart BOINC.

I don't think theres much you can do about the status shown until the WU's finish. One problem seems to be that more that one got started for a project, which shouldn't be possible.

Walt


Both wu #1 got status Over/Succeeded/Done, one got credit strait away, but I doubt that the wu #2's will be OK.

Both WU's should complete OK and get credit. WU#2 only had two results returned so give it time.

Seems I misunderstook your setup. Apologies for that, I try to figure out all the details before replying and this time I didn't. Made an unwarranted assumption. Actually two of them.

Thought you were having the same problem with your Win3k system. Apparently not. EDIT: That should read, Thought you were having problems with BOINC on your Win2k system also.

And that you had three projects "attached". Again apparently not. But it does look like you have Einstein@Home attached twice. Is this so?

If not, theres a problem in that BOINC is running two results for one project.

If so, and you see Einstein@Home listed twice in the "projects" tab, that might be the cause of the problems. Internally to BOINC, each project has its own data area, kept separate from the other projects. Externally, thats not the case. Externally, the account and project files are all based on the project site, pulled out of the URL for the home page. For Einstein@Home thats einstein.phys.uwm.edu. So the account file is account_einstein.phys.uwm.edu.xml and the project files are kept in projects/einstein.phys.uwm.edu.

From the beginning of your message:

Quote:


Suggestion:
from log below:
6/30/05 7:30:41 PM|Einstein@Home|Pausing result l1_0778.0__0778.2_0.1_T01_S4lA_0 (removed from memory)
6/30/05 7:30:43 PM||request_reschedule_cpus: process exited
6/30/05 7:30:43 PM||Running CPU benchmarks

Note that log reports that wu # 1 is stopped and removed from memory !
Than there is 2 seconds before cpu-resched is exited, which might give a sea of time for the preemted wu #2 to be flagged running.

After bench, wu #1 is restarted and wu #2 goes ahead and resumes.

Thats certainly a good explanation, but doesn't explain why/how the second one got started. Like I said above, that shouldn't happen. Certainly could be that having one WU "paused" tells the request_scheduler to start the other, which can't happen until after the benchmarks complete. Something to look into.

And I don't always believe BOINC when it says "process exited" and "removed from memory". Its supposed to, but that message seems to get issued based on what BOINC requests from the app, not an indication that it happened. Only using Task Manager or Process Explorer while the benchmarks were running would tell for sure.

Quote:


Walt, thanks for taking the issue and please have a look at my suggestion, assuming you are a E@H Admin and code literate.

Eric.ie

Not an admin, just another Einstein@Home user and BOINC tester. Besides testing, I'm trying to research some of the bigger bugs to get them fixed. No such luck yet, they either happen when I'm not looking or on someone elses machine :)

Walt

Kilcock
Kilcock
Joined: 1 Jun 05
Posts: 41
Credit: 2604
RAC: 0

Hi Walt, First let me show

Hi Walt,

First let me show appreciation for your intention to solve this(I can't), but things are getting a bit messy now.

To get you back on track please take following advice.

1) Focus on the creation of the error condition. (how I got out of the error condition is not relevant)
2) Create an simulation environment, so you can reproduce the error condition yourself.

-By doing a manual benchmark I'm able to reproduce the error condition, so it is not an intermitant issue (on my machines)
-As initial condition (pre benchmark) all you need is one wu running and a second one in the preempted state. Other wu's if present are irrelevant.
-Don't simulate it on a machine, while connected to the project servers. Afterall scientific data is involved.
-Disgard the simulation environment after your testing.

Let me give a hint on one remark:

Quote:
Thats certainly a good explanation, but doesn't explain why/how the second one got started. Like I said above, that shouldn't happen. Certainly could be that having one WU "paused" tells the request_scheduler to start the other, which can't happen until after the benchmarks complete. Something to look into.


It is the normal behaviour that a preempted wu is started automatically upon suspension of a running wu.

Do the simulation, get a clear understanding of the issue and next go into the code to analyse why the behaviour is as unintended.

Good Luck and let us know what you find.

Eric.ie

Walt Gribben
Walt Gribben
Joined: 20 Feb 05
Posts: 219
Credit: 1645393
RAC: 0

RE: Hi Walt, First let me

Message 13667 in response to message 13666

Quote:

Hi Walt,

First let me show appreciation for your intention to solve this(I can't), but things are getting a bit messy now.

To get you back on track please take following advice.

Eric, thats some interesting "advice". I was under the impression that I was trying to fix your problem. So don't misunderstand me here, I was trying to fix your problem. The one you had with BOINC on Win98. Not trying to fix BOINC, thats a whole different matter altogether.

Instead of offering suggestions, perhaps you can answer the questions I asked in my last message. Here they are again, although I can infer part of the info from your last reply.

Q1 - where did the second E@H WU come from? Three possibilities - you have two projects named E@H, you suspended a E@H WU in the Work tab so another started, or it got started by BOINC. Actually, I'm not expecting you to look thru your log, nor offering to check it for you. So the question really is, did you do something to start a second WU (and if so what) or did BOINC?

Q2 - After restarting BOINC, you should show one WU running and the other ones preempted. Is that the case? Or do you have two that show running?

And heres something you can try. Might "fix" the problem where you have two E@H WU's active:

-Set the project to "no new work" and "suspend" all the "ready to run" E@H WU's. You should have one "running", one "preempted" and the rest "suspended by user". (Ignoring any that have already finished)

-Keep it that way until theres only one E@H WU running and none preempted. That should happen after one of the active WU's finishes.

-Resume the ones you suspended and "allow new work". That should get rid of the extra one and bring things back to normal. This assumes you're attached to just one "Einstein@home" project.

Walt

Kilcock
Kilcock
Joined: 1 Jun 05
Posts: 41
Credit: 2604
RAC: 0

RE: Eric, thats some

Message 13668 in response to message 13667

Quote:
Eric, thats some interesting "advice". I was under the impression that I was trying to fix your problem. So don't misunderstand me here, I was trying to fix your problem. The one you had with BOINC on Win98. Not trying to fix BOINC, thats a whole different matter altogether.


Walt, If we are working under the header Problems and Bug Reports than that would be the right place to find and report issue's like this one. By doing so, we, or at least I, hope to get Admin/(volunteer)Developper attention to get a reported issue solved or analysed. Search did not work, so report or wait till somebody else reports. Maybe I should have ? Yes, I have misunderstood you here or have set my expectations (to get it fixed in the source) to high.

Quote:
Instead of offering suggestions, perhaps you can answer the questions I asked in my last message. Here they are again, although I can infer part of the info from your last reply.

Yes, you deserve that.

Quote:
Q1 - where did the second E@H WU come from? Three possibilities - you have two projects named E@H, you suspended a E@H WU in the Work tab so another started, or it got started by BOINC. Actually, I'm not expecting you to look thru your log, nor offering to check it for you. So the question really is, did you do something to start a second WU (and if so what) or did BOINC?

No, I did not started the 2nd wu, but the BOINC-client must have done this automaticly.

Quote:
Q2 - After restarting BOINC, you should show one WU running and the other ones preempted. Is that the case? Or do you have two that show running?

It most likely would, don't know, but the issue is that the BOINC client has created a unwanted condition which could possibly damage the results of one or both wu's per machine.

Quote:

And heres something you can try. Might "fix" the problem where you have two E@H WU's active:

-Set the project to "no new work" and "suspend" all the "ready to run" E@H WU's. You should have one "running", one "preempted" and the rest "suspended by user". (Ignoring any that have already finished)

-Keep it that way until theres only one E@H WU running and none preempted. That should happen after one of the active WU's finishes.

-Resume the ones you suspended and "allow new work". That should get rid of the extra one and bring things back to normal. This assumes you're attached to just one "Einstein@home" project.

Walt


I'm attached to only one E@H project per machine but as indicated before, correcting from the error condition is not the issue ! The posibility of Data corruption is.

NB: Based on the completion of the wu's, its succesful validation and creditation, I think or assume 'now' that there was no data corruption in both cases, but I'm not sure, as I lack insight of Data and Validation Process.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.