Stuck in endless loop

Th. Walter
Th. Walter
Joined: 10 Jan 13
Posts: 6
Credit: 294648
RAC: 0

RE: Could the units be

Quote:
Could the units be 'suspending' due to the pc doing other things?

No, the task is running at the moment of the reset. The other three tasks running in parallel show normal behaviour and therefore there is no suspension because of high cpu load or another reason.

Quote:
Just to recap, how is the setting for "Leave tasks in memory while suspended" set?

I have checked the setting in the advanced view of the boinc manager. The checkbox "Leave applications in memory while suspended" is set.

Quote:
Or fully restart Boinc and then after some minutes when the tasks as reset a few time post all of the messages here.

I have restarted the Boinc client as suggested. Then I waited until the task was reset three times. Here is the event log until that time. The task reset at 23:22, 23:25 and 23:28. The suspension at 23:29 was due to my copying of the event log and continuing of this post.

Tue 25 Nov 23:19:27 2014 | | cc_config.xml not found - using defaults
Tue 25 Nov 23:19:27 2014 | | Starting BOINC client version 7.4.26 for x86_64-apple-darwin
Tue 25 Nov 23:19:27 2014 | | log flags: file_xfer, sched_ops, task
Tue 25 Nov 23:19:27 2014 | | Libraries: libcurl/7.35.0 OpenSSL/1.0.1h zlib/1.2.5 c-ares/1.10.0
Tue 25 Nov 23:19:27 2014 | | Data directory: /Library/Application Support/BOINC Data
Tue 25 Nov 23:19:27 2014 | | OpenCL: Intel GPU 0: Iris (driver version 1.2(Aug 17 2014 20:29:25), device version OpenCL 1.2, 1536MB, 1536MB available, 2688 GFLOPS peak)
Tue 25 Nov 23:19:27 2014 | | OpenCL CPU: Intel(R) Core(TM) i5-4258U CPU @ 2.40GHz (OpenCL driver vendor: Apple, driver version 1.1, device version OpenCL 1.2)
Tue 25 Nov 23:19:27 2014 | | Host name: xxxxxx
Tue 25 Nov 23:19:27 2014 | | Processor: 4 GenuineIntel Intel(R) Core(TM) i5-4258U CPU @ 2.40GHz [x86 Family 6 Model 69 Stepping 1]
Tue 25 Nov 23:19:27 2014 | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clfsh ds acpi mmx fxsr sse sse2 ss htt tm pbe pni pclmulqdq dtes64 mon dscpl vmx est tm2 ssse3 fma cx16 tpr pdcm sse4_1 sse4_2 x2apic movbe popcnt aes pcid xsave osxsave seglim64 tsctmr avx rdrand f16c
Tue 25 Nov 23:19:27 2014 | | OS: Mac OS X 10.9.5 (Darwin 13.4.0)
Tue 25 Nov 23:19:27 2014 | | Memory: 8.00 GB physical, 105.44 GB virtual
Tue 25 Nov 23:19:27 2014 | | Disk: 232.96 GB total, 105.20 GB free
Tue 25 Nov 23:19:27 2014 | | Local time is UTC +1 hours
Tue 25 Nov 23:19:27 2014 | Einstein@Home | URL http://einstein.phys.uwm.edu/; Computer ID 11684846; resource share 100
Tue 25 Nov 23:19:27 2014 | Einstein@Home | General prefs: from Einstein@Home (last modified 18-Nov-2014 22:39:43)
Tue 25 Nov 23:19:27 2014 | Einstein@Home | Host location: none
Tue 25 Nov 23:19:27 2014 | Einstein@Home | General prefs: using your defaults
Tue 25 Nov 23:19:27 2014 | | Reading preferences override file
Tue 25 Nov 23:19:27 2014 | | Preferences:
Tue 25 Nov 23:19:27 2014 | | max memory usage when active: 5734.40MB
Tue 25 Nov 23:19:27 2014 | | max memory usage when idle: 7372.80MB
Tue 25 Nov 23:19:27 2014 | | max disk usage: 100.00GB
Tue 25 Nov 23:19:27 2014 | | suspend work if non-BOINC CPU load exceeds 90%
Tue 25 Nov 23:19:27 2014 | | (to change preferences, visit a project web site or select Preferences in the Manager)
Tue 25 Nov 23:19:27 2014 | | Not using a proxy
Tue 25 Nov 23:29:40 2014 | | Suspending computation - CPU is busy

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117688419208
RAC: 35092902

RE: Tue 25 Nov 23:19:27

Quote:
Tue 25 Nov 23:19:27 2014 | | Preferences:
Tue 25 Nov 23:19:27 2014 | | max memory usage when active: 5734.40MB
Tue 25 Nov 23:19:27 2014 | | max memory usage when idle: 7372.80MB
Tue 25 Nov 23:19:27 2014 | | max disk usage: 100.00GB
Tue 25 Nov 23:19:27 2014 | | suspend work if non-BOINC CPU load exceeds 90%
Tue 25 Nov 23:19:27 2014 | | (to change preferences, visit a project web site or select Preferences in the Manager)
Tue 25 Nov 23:19:27 2014 | | Not using a proxy
Tue 25 Nov 23:29:40 2014 | | Suspending computation - CPU is busy


I've highlighted the two bits from your log snippet that tell you why one task is not making progress. That 4th task is always going to push the CPU usage over 90% so it will always get suspended like you describe. If you change your preferences to stop continually suspending that task, it will go to completion. just change the 90% value to zero (which means no restriction) and all will be well.

Cheers,
Gary.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2959099505
RAC: 707721

RE: RE: Tue 25 Nov

Quote:
Quote:
Tue 25 Nov 23:19:27 2014 | | Preferences:
Tue 25 Nov 23:19:27 2014 | | max memory usage when active: 5734.40MB
Tue 25 Nov 23:19:27 2014 | | max memory usage when idle: 7372.80MB
Tue 25 Nov 23:19:27 2014 | | max disk usage: 100.00GB
Tue 25 Nov 23:19:27 2014 | | suspend work if non-BOINC CPU load exceeds 90%
Tue 25 Nov 23:19:27 2014 | | (to change preferences, visit a project web site or select Preferences in the Manager)
Tue 25 Nov 23:19:27 2014 | | Not using a proxy
Tue 25 Nov 23:29:40 2014 | | Suspending computation - CPU is busy

I've highlighted the two bits from your log snippet that tell you why one task is not making progress. That 4th task is always going to push the CPU usage over 90% so it will always get suspended like you describe. If you change your preferences to stop continually suspending that task, it will go to completion. just change the 90% value to zero (which means no restriction) and all will be well.


If true, that would be a bug in the client. Einstein tasks should be counted as a BOINC load: the question would be - what else you you use the computer for, that regularly takes CPU usage above 90%?

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

RE: RE: RE: Tue 25 Nov

Quote:
Quote:
Quote:
Tue 25 Nov 23:19:27 2014 | | Preferences:
Tue 25 Nov 23:19:27 2014 | | max memory usage when active: 5734.40MB
Tue 25 Nov 23:19:27 2014 | | max memory usage when idle: 7372.80MB
Tue 25 Nov 23:19:27 2014 | | max disk usage: 100.00GB
Tue 25 Nov 23:19:27 2014 | | suspend work if non-BOINC CPU load exceeds 90%
Tue 25 Nov 23:19:27 2014 | | (to change preferences, visit a project web site or select Preferences in the Manager)
Tue 25 Nov 23:19:27 2014 | | Not using a proxy
Tue 25 Nov 23:29:40 2014 | | Suspending computation - CPU is busy

I've highlighted the two bits from your log snippet that tell you why one task is not making progress. That 4th task is always going to push the CPU usage over 90% so it will always get suspended like you describe. If you change your preferences to stop continually suspending that task, it will go to completion. just change the 90% value to zero (which means no restriction) and all will be well.

If true, that would be a bug in the client. Einstein tasks should be counted as a BOINC load: the question would be - what else you you use the computer for, that regularly takes CPU usage above 90%?


But this should not matter as the OP has "Leave applications in memory while suspended" set to yes and so the app should not have to restart from the checkpoint when suspended. See message#135518 for confirmation.

Quote:
Tue 25 Nov 23:19:27 2014 | | cc_config.xml not found - using defaults
Tue 25 Nov 23:19:27 2014 | | Starting BOINC client version 7.4.26 for x86_64-apple-darwin
Tue 25 Nov 23:19:27 2014 | | log flags: file_xfer, sched_ops, task
Tue 25 Nov 23:19:27 2014 | | Libraries: libcurl/7.35.0 OpenSSL/1.0.1h zlib/1.2.5 c-ares/1.10.0
Tue 25 Nov 23:19:27 2014 | | Data directory: /Library/Application Support/BOINC Data
Tue 25 Nov 23:19:27 2014 | | OpenCL: Intel GPU 0: Iris (driver version 1.2(Aug 17 2014 20:29:25), device version OpenCL 1.2, 1536MB, 1536MB available, 2688 GFLOPS peak)
Tue 25 Nov 23:19:27 2014 | | OpenCL CPU: Intel(R) Core(TM) i5-4258U CPU @ 2.40GHz (OpenCL driver vendor: Apple, driver version 1.1, device version OpenCL 1.2)
Tue 25 Nov 23:19:27 2014 | | Host name: xxxxxx
Tue 25 Nov 23:19:27 2014 | | Processor: 4 GenuineIntel Intel(R) Core(TM) i5-4258U CPU @ 2.40GHz [x86 Family 6 Model 69 Stepping 1]
Tue 25 Nov 23:19:27 2014 | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clfsh ds acpi mmx fxsr sse sse2 ss htt tm pbe pni pclmulqdq dtes64 mon dscpl vmx est tm2 ssse3 fma cx16 tpr pdcm sse4_1 sse4_2 x2apic movbe popcnt aes pcid xsave osxsave seglim64 tsctmr avx rdrand f16c
Tue 25 Nov 23:19:27 2014 | | OS: Mac OS X 10.9.5 (Darwin 13.4.0)
Tue 25 Nov 23:19:27 2014 | | Memory: 8.00 GB physical, 105.44 GB virtual
Tue 25 Nov 23:19:27 2014 | | Disk: 232.96 GB total, 105.20 GB free
Tue 25 Nov 23:19:27 2014 | | Local time is UTC +1 hours
Tue 25 Nov 23:19:27 2014 | Einstein@Home | URL http://einstein.phys.uwm.edu/; Computer ID 11684846; resource share 100
Tue 25 Nov 23:19:27 2014 | Einstein@Home | General prefs: from Einstein@Home (last modified 18-Nov-2014 22:39:43)
Tue 25 Nov 23:19:27 2014 | Einstein@Home | Host location: none
Tue 25 Nov 23:19:27 2014 | Einstein@Home | General prefs: using your defaults
Tue 25 Nov 23:19:27 2014 | | Reading preferences override file
Tue 25 Nov 23:19:27 2014 | | Preferences:
Tue 25 Nov 23:19:27 2014 | | max memory usage when active: 5734.40MB
Tue 25 Nov 23:19:27 2014 | | max memory usage when idle: 7372.80MB
Tue 25 Nov 23:19:27 2014 | | max disk usage: 100.00GB
Tue 25 Nov 23:19:27 2014 | | suspend work if non-BOINC CPU load exceeds 90%
Tue 25 Nov 23:19:27 2014 | | (to change preferences, visit a project web site or select Preferences in the Manager)
Tue 25 Nov 23:19:27 2014 | | Not using a proxy
Tue 25 Nov 23:29:40 2014 | | Suspending computation - CPU is busy


Are these the only messages in the log? No lines edited out?
I expected to see some messages about task starting between the last two lines and maybe even some hints as to why the one task keeps restarting.
But maybe we need to set some extra log flags to see this? (<-- Richard, Gary or anyone else, do we need to set extra flags to get a hint and if so which ones?)

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2959099505
RAC: 707721

RE: But maybe we need to

Quote:

But maybe we need to set some extra log flags to see this? ( and , which between them produce

Quote:
25-Nov-2014 16:10:23 [boincsimap] Sending scheduler request: To fetch work.
25-Nov-2014 16:10:23 [boincsimap] Reporting 2 completed tasks
25-Nov-2014 16:10:23 [boincsimap] Requesting new tasks for CPU
25-Nov-2014 16:10:23 [boincsimap] [sched_op] CPU work request: 3466.19 seconds; 1.00 devices
25-Nov-2014 16:10:23 [boincsimap] [sched_op] NVIDIA GPU work request: 0.00 seconds; 0.00 devices
25-Nov-2014 16:10:26 [boincsimap] Scheduler request completed: got 1 new tasks
25-Nov-2014 16:10:26 [boincsimap] [sched_op] Server version 703
25-Nov-2014 16:10:26 [boincsimap] Project requested delay of 7 seconds
25-Nov-2014 16:10:26 [boincsimap] [sched_op] estimated total CPU task duration: 4289 seconds
25-Nov-2014 16:10:26 [boincsimap] [sched_op] estimated total NVIDIA GPU task duration: 0 seconds
25-Nov-2014 16:10:26 [boincsimap] [sched_op] handle_scheduler_reply(): got ack for task 141103.603623_1
25-Nov-2014 16:10:26 [boincsimap] [sched_op] handle_scheduler_reply(): got ack for task 141103.603604_1
25-Nov-2014 16:10:26 [boincsimap] [sched_op] Deferring communication for 00:00:07
25-Nov-2014 16:10:26 [boincsimap] [sched_op] Reason: requested by project
25-Nov-2014 16:10:28 [boincsimap] Started download of 141103.621971
25-Nov-2014 16:10:35 [boincsimap] Finished download of 141103.621971
25-Nov-2014 16:10:35 [NumberFields@home] [cpu_sched] Preempting wu_sf3_DS-10x271_Grp33510of682667_0 (left in memory)
25-Nov-2014 16:10:35 [boincsimap] Starting task 141103.621971_1
25-Nov-2014 16:10:35 [boincsimap] [cpu_sched] Starting task 141103.621971_1 using simap version 512 in slot 5

In particular, we want to be able to see 'Preempting' and '(left in memory)' from .

I'd also recommend (in general terms, not specifically for this problem) upgrading to BOINC v7.4.27, which has this neat little tool for managing event log diagnostic flags:

The questioner's v7.4.26 for Mac should have the same facility available, though the appearance may be slighly different and I don't know the keyboard shortcut for Mac users - it's Ctrl-Shift-F for Windows.

Th. Walter
Th. Walter
Joined: 10 Jan 13
Posts: 6
Credit: 294648
RAC: 0

I switched on the recommended

I switched on the recommended log options and . Then I closed the Boinc client and restarted it again. Like yesterday I waited until the task three times reset (between 22:26 and 22:33) and then copied the log.

The task with the problem is LATeah0035E_80.0_4290_-2.84e-10_0 in slot 6. This task restarted at 22:26, 22:29 and 22:33 as you have assumed.

Mi 26 Nov 22:23:49 2014 | | Starting BOINC client version 7.4.26 for x86_64-apple-darwin
Mi 26 Nov 22:23:49 2014 | | log flags: file_xfer, sched_ops, task, cpu_sched, sched_op_debug
Mi 26 Nov 22:23:49 2014 | | Libraries: libcurl/7.35.0 OpenSSL/1.0.1h zlib/1.2.5 c-ares/1.10.0
Mi 26 Nov 22:23:49 2014 | | Data directory: /Library/Application Support/BOINC Data
Mi 26 Nov 22:23:49 2014 | | OpenCL: Intel GPU 0: Iris (driver version 1.2(Aug 17 2014 20:29:25), device version OpenCL 1.2, 1536MB, 1536MB available, 2688 GFLOPS peak)
Mi 26 Nov 22:23:49 2014 | | OpenCL CPU: Intel(R) Core(TM) i5-4258U CPU @ 2.40GHz (OpenCL driver vendor: Apple, driver version 1.1, device version OpenCL 1.2)
Mi 26 Nov 22:23:49 2014 | | Host name: xxxxx
Mi 26 Nov 22:23:49 2014 | | Processor: 4 GenuineIntel Intel(R) Core(TM) i5-4258U CPU @ 2.40GHz [x86 Family 6 Model 69 Stepping 1]
Mi 26 Nov 22:23:49 2014 | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clfsh ds acpi mmx fxsr sse sse2 ss htt tm pbe pni pclmulqdq dtes64 mon dscpl vmx est tm2 ssse3 fma cx16 tpr pdcm sse4_1 sse4_2 x2apic movbe popcnt aes pcid xsave osxsave seglim64 tsctmr avx rdrand f16c
Mi 26 Nov 22:23:49 2014 | | OS: Mac OS X 10.9.5 (Darwin 13.4.0)
Mi 26 Nov 22:23:49 2014 | | Memory: 8.00 GB physical, 105.63 GB virtual
Mi 26 Nov 22:23:49 2014 | | Disk: 232.96 GB total, 105.38 GB free
Mi 26 Nov 22:23:49 2014 | | Local time is UTC +1 hours
Mi 26 Nov 22:23:49 2014 | Einstein@Home | URL http://einstein.phys.uwm.edu/; Computer ID 11684846; resource share 100
Mi 26 Nov 22:23:49 2014 | Einstein@Home | General prefs: from Einstein@Home (last modified 18-Nov-2014 22:39:43)
Mi 26 Nov 22:23:49 2014 | Einstein@Home | Host location: none
Mi 26 Nov 22:23:49 2014 | Einstein@Home | General prefs: using your defaults
Mi 26 Nov 22:23:49 2014 | | Reading preferences override file
Mi 26 Nov 22:23:49 2014 | | Preferences:
Mi 26 Nov 22:23:49 2014 | | max memory usage when active: 5734.40MB
Mi 26 Nov 22:23:49 2014 | | max memory usage when idle: 7372.80MB
Mi 26 Nov 22:23:49 2014 | | max disk usage: 100.00GB
Mi 26 Nov 22:23:49 2014 | | suspend work if non-BOINC CPU load exceeds 90%
Mi 26 Nov 22:23:49 2014 | | (to change preferences, visit a project web site or select Preferences in the Manager)
Mi 26 Nov 22:23:49 2014 | | Not using a proxy
Mi 26 Nov 22:23:49 2014 | Einstein@Home | [cpu_sched] Restarting task LATeah0035E_80.0_4290_-2.84e-10_0 using hsgamma_FGRP4 version 104 (FGRP4-SSE2) in slot 6
Mi 26 Nov 22:23:49 2014 | Einstein@Home | [cpu_sched] Restarting task LATeah0038E_48.0_434_-8.85e-10_0 using hsgamma_FGRP4 version 104 (FGRP4-SSE2) in slot 1
Mi 26 Nov 22:23:49 2014 | Einstein@Home | [cpu_sched] Restarting task LATeah0038E_48.0_620_-7.96e-10_0 using hsgamma_FGRP4 version 104 (FGRP4-SSE2) in slot 2
Mi 26 Nov 22:23:49 2014 | Einstein@Home | [cpu_sched] Restarting task LATeah0038E_48.0_806_-2.59e-10_0 using hsgamma_FGRP4 version 104 (FGRP4-SSE2) in slot 3
Mi 26 Nov 22:24:49 2014 | Einstein@Home | [cpu_sched] Resuming LATeah0035E_80.0_4290_-2.84e-10_0
Mi 26 Nov 22:24:49 2014 | Einstein@Home | [cpu_sched] Resuming task LATeah0035E_80.0_4290_-2.84e-10_0 using hsgamma_FGRP4 version 104 (FGRP4-SSE2) in slot 6
Mi 26 Nov 22:24:49 2014 | Einstein@Home | [cpu_sched] Resuming LATeah0038E_48.0_434_-8.85e-10_0
Mi 26 Nov 22:24:49 2014 | Einstein@Home | [cpu_sched] Resuming task LATeah0038E_48.0_434_-8.85e-10_0 using hsgamma_FGRP4 version 104 (FGRP4-SSE2) in slot 1
Mi 26 Nov 22:24:49 2014 | Einstein@Home | [cpu_sched] Resuming LATeah0038E_48.0_620_-7.96e-10_0
Mi 26 Nov 22:24:49 2014 | Einstein@Home | [cpu_sched] Resuming task LATeah0038E_48.0_620_-7.96e-10_0 using hsgamma_FGRP4 version 104 (FGRP4-SSE2) in slot 2
Mi 26 Nov 22:24:49 2014 | Einstein@Home | [cpu_sched] Resuming LATeah0038E_48.0_806_-2.59e-10_0
Mi 26 Nov 22:24:49 2014 | Einstein@Home | [cpu_sched] Resuming task LATeah0038E_48.0_806_-2.59e-10_0 using hsgamma_FGRP4 version 104 (FGRP4-SSE2) in slot 3
Mi 26 Nov 22:25:50 2014 | Einstein@Home | [cpu_sched] Resuming LATeah0035E_80.0_4290_-2.84e-10_0
Mi 26 Nov 22:25:50 2014 | Einstein@Home | [cpu_sched] Resuming task LATeah0035E_80.0_4290_-2.84e-10_0 using hsgamma_FGRP4 version 104 (FGRP4-SSE2) in slot 6
Mi 26 Nov 22:25:50 2014 | Einstein@Home | [cpu_sched] Resuming LATeah0038E_48.0_434_-8.85e-10_0
Mi 26 Nov 22:25:50 2014 | Einstein@Home | [cpu_sched] Resuming task LATeah0038E_48.0_434_-8.85e-10_0 using hsgamma_FGRP4 version 104 (FGRP4-SSE2) in slot 1
Mi 26 Nov 22:25:50 2014 | Einstein@Home | [cpu_sched] Resuming LATeah0038E_48.0_620_-7.96e-10_0
Mi 26 Nov 22:25:50 2014 | Einstein@Home | [cpu_sched] Resuming task LATeah0038E_48.0_620_-7.96e-10_0 using hsgamma_FGRP4 version 104 (FGRP4-SSE2) in slot 2
Mi 26 Nov 22:25:50 2014 | Einstein@Home | [cpu_sched] Resuming LATeah0038E_48.0_806_-2.59e-10_0
Mi 26 Nov 22:25:50 2014 | Einstein@Home | [cpu_sched] Resuming task LATeah0038E_48.0_806_-2.59e-10_0 using hsgamma_FGRP4 version 104 (FGRP4-SSE2) in slot 3
Mi 26 Nov 22:26:50 2014 | Einstein@Home | [cpu_sched] Resuming LATeah0035E_80.0_4290_-2.84e-10_0
Mi 26 Nov 22:26:50 2014 | Einstein@Home | [cpu_sched] Resuming task LATeah0035E_80.0_4290_-2.84e-10_0 using hsgamma_FGRP4 version 104 (FGRP4-SSE2) in slot 6
Mi 26 Nov 22:26:50 2014 | Einstein@Home | [cpu_sched] Resuming LATeah0038E_48.0_434_-8.85e-10_0
Mi 26 Nov 22:26:50 2014 | Einstein@Home | [cpu_sched] Resuming task LATeah0038E_48.0_434_-8.85e-10_0 using hsgamma_FGRP4 version 104 (FGRP4-SSE2) in slot 1
Mi 26 Nov 22:26:50 2014 | Einstein@Home | [cpu_sched] Resuming LATeah0038E_48.0_620_-7.96e-10_0
Mi 26 Nov 22:26:50 2014 | Einstein@Home | [cpu_sched] Resuming task LATeah0038E_48.0_620_-7.96e-10_0 using hsgamma_FGRP4 version 104 (FGRP4-SSE2) in slot 2
Mi 26 Nov 22:26:50 2014 | Einstein@Home | [cpu_sched] Resuming LATeah0038E_48.0_806_-2.59e-10_0
Mi 26 Nov 22:26:50 2014 | Einstein@Home | [cpu_sched] Resuming task LATeah0038E_48.0_806_-2.59e-10_0 using hsgamma_FGRP4 version 104 (FGRP4-SSE2) in slot 3
Mi 26 Nov 22:26:53 2014 | Einstein@Home | [cpu_sched] Restarting task LATeah0035E_80.0_4290_-2.84e-10_0 using hsgamma_FGRP4 version 104 (FGRP4-SSE2) in slot 6
Mi 26 Nov 22:28:04 2014 | Einstein@Home | [cpu_sched] Resuming LATeah0035E_80.0_4290_-2.84e-10_0
Mi 26 Nov 22:28:04 2014 | Einstein@Home | [cpu_sched] Resuming task LATeah0035E_80.0_4290_-2.84e-10_0 using hsgamma_FGRP4 version 104 (FGRP4-SSE2) in slot 6
Mi 26 Nov 22:28:04 2014 | Einstein@Home | [cpu_sched] Resuming LATeah0038E_48.0_434_-8.85e-10_0
Mi 26 Nov 22:28:04 2014 | Einstein@Home | [cpu_sched] Resuming task LATeah0038E_48.0_434_-8.85e-10_0 using hsgamma_FGRP4 version 104 (FGRP4-SSE2) in slot 1
Mi 26 Nov 22:28:04 2014 | Einstein@Home | [cpu_sched] Resuming LATeah0038E_48.0_620_-7.96e-10_0
Mi 26 Nov 22:28:04 2014 | Einstein@Home | [cpu_sched] Resuming task LATeah0038E_48.0_620_-7.96e-10_0 using hsgamma_FGRP4 version 104 (FGRP4-SSE2) in slot 2
Mi 26 Nov 22:28:04 2014 | Einstein@Home | [cpu_sched] Resuming LATeah0038E_48.0_806_-2.59e-10_0
Mi 26 Nov 22:28:04 2014 | Einstein@Home | [cpu_sched] Resuming task LATeah0038E_48.0_806_-2.59e-10_0 using hsgamma_FGRP4 version 104 (FGRP4-SSE2) in slot 3
Mi 26 Nov 22:29:04 2014 | Einstein@Home | [cpu_sched] Resuming LATeah0035E_80.0_4290_-2.84e-10_0
Mi 26 Nov 22:29:04 2014 | Einstein@Home | [cpu_sched] Resuming task LATeah0035E_80.0_4290_-2.84e-10_0 using hsgamma_FGRP4 version 104 (FGRP4-SSE2) in slot 6
Mi 26 Nov 22:29:04 2014 | Einstein@Home | [cpu_sched] Resuming LATeah0038E_48.0_434_-8.85e-10_0
Mi 26 Nov 22:29:04 2014 | Einstein@Home | [cpu_sched] Resuming task LATeah0038E_48.0_434_-8.85e-10_0 using hsgamma_FGRP4 version 104 (FGRP4-SSE2) in slot 1
Mi 26 Nov 22:29:04 2014 | Einstein@Home | [cpu_sched] Resuming LATeah0038E_48.0_620_-7.96e-10_0
Mi 26 Nov 22:29:04 2014 | Einstein@Home | [cpu_sched] Resuming task LATeah0038E_48.0_620_-7.96e-10_0 using hsgamma_FGRP4 version 104 (FGRP4-SSE2) in slot 2
Mi 26 Nov 22:29:04 2014 | Einstein@Home | [cpu_sched] Resuming LATeah0038E_48.0_806_-2.59e-10_0
Mi 26 Nov 22:29:04 2014 | Einstein@Home | [cpu_sched] Resuming task LATeah0038E_48.0_806_-2.59e-10_0 using hsgamma_FGRP4 version 104 (FGRP4-SSE2) in slot 3
Mi 26 Nov 22:29:57 2014 | Einstein@Home | [cpu_sched] Restarting task LATeah0035E_80.0_4290_-2.84e-10_0 using hsgamma_FGRP4 version 104 (FGRP4-SSE2) in slot 6
Mi 26 Nov 22:29:57 2014 | Einstein@Home | [cpu_sched] Resuming LATeah0038E_48.0_434_-8.85e-10_0
Mi 26 Nov 22:29:57 2014 | Einstein@Home | [cpu_sched] Resuming task LATeah0038E_48.0_434_-8.85e-10_0 using hsgamma_FGRP4 version 104 (FGRP4-SSE2) in slot 1
Mi 26 Nov 22:29:57 2014 | Einstein@Home | [cpu_sched] Resuming LATeah0038E_48.0_620_-7.96e-10_0
Mi 26 Nov 22:29:57 2014 | Einstein@Home | [cpu_sched] Resuming task LATeah0038E_48.0_620_-7.96e-10_0 using hsgamma_FGRP4 version 104 (FGRP4-SSE2) in slot 2
Mi 26 Nov 22:29:57 2014 | Einstein@Home | [cpu_sched] Resuming LATeah0038E_48.0_806_-2.59e-10_0
Mi 26 Nov 22:29:57 2014 | Einstein@Home | [cpu_sched] Resuming task LATeah0038E_48.0_806_-2.59e-10_0 using hsgamma_FGRP4 version 104 (FGRP4-SSE2) in slot 3
Mi 26 Nov 22:33:01 2014 | Einstein@Home | [cpu_sched] Restarting task LATeah0035E_80.0_4290_-2.84e-10_0 using hsgamma_FGRP4 version 104 (FGRP4-SSE2) in slot 6

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117688419208
RAC: 35092902

RE: RE: I've highlighted

Quote:
Quote:

I've highlighted the two bits from your log snippet that tell you why one task is not making progress. That 4th task is always going to push the CPU usage over 90% so it will always get suspended like you describe. If you change your preferences to stop continually suspending that task, it will go to completion. just change the 90% value to zero (which means no restriction) and all will be well.

If true, that would be a bug in the client. Einstein tasks should be counted as a BOINC load: the question would be - what else you you use the computer for, that regularly takes CPU usage above 90%?


As always, Richard is absolutely correct - it shouldn't be a task (but rather something else) that is taking the usage beyond 90%. I should have thought more carefully before responding.

I was reminded of an experience many years ago (and therefore possibly not relevant to current BOINC behaviour) where stupid little things like moving a mouse, or typing, or launching some other innocuous app could momentarily trigger the exceeding of some relatively high limit. You would think that such a limit couldn't possibly have been exceeded. It was probably at the time that feature was first added to BOINC. It was so annoying that I disabled it completely as being useless. I've always found that tasks running on all cores do not impinge on the usability of a machine for normal 'office' type work so I've never felt the need to use that feature.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117688419208
RAC: 35092902

RE: I switched on the

Quote:

I switched on the recommended log options and . Then I closed the Boinc client and restarted it again. Like yesterday I waited until the task three times reset (between 22:26 and 22:33) and then copied the log.

The task with the problem is LATeah0035E_80.0_4290_-2.84e-10_0 in slot 6. This task restarted at 22:26, 22:29 and 22:33 as you have assumed.
....


There seem to be some really weird things in that log - the slot 6 task is restarting and the other three are resuming for no apparent reason, it would seem.

It would be useful to change the 90% limit to zero (temporarily) and then re-run the exact same sequence to see what is different. Is there anything else also running that's likely to be consuming lots of CPU cycles? I'm not talking about the OS or any of the standard background system processes that are normally running.

Cheers,
Gary.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2959099505
RAC: 707721

RE: RE: I switched on the

Quote:
Quote:

I switched on the recommended log options and . Then I closed the Boinc client and restarted it again. Like yesterday I waited until the task three times reset (between 22:26 and 22:33) and then copied the log.

The task with the problem is LATeah0035E_80.0_4290_-2.84e-10_0 in slot 6. This task restarted at 22:26, 22:29 and 22:33 as you have assumed.
....


There seem to be some really weird things in that log - the slot 6 task is restarting and the other three are resuming for no apparent reason, it would seem.

It would be useful to change the 90% limit to zero (temporarily) and then re-run the exact same sequence to see what is different. Is there anything else also running that's likely to be consuming lots of CPU cycles? I'm not talking about the OS or any of the standard background system processes that are normally running.


'Resuming' is the normal word for re-activating a task which has been kept in memory: 'Restarting' would be right for a task which exits completely and tries to restart from the checkpoint file.

The slot 6 task 'resumes' a few times, then starts 'restarting'. I've sometimes seen Einstein GW tasks take a couple of attempts to get running, especially after a cold restart like this when all four tasks are competing for bus and memory bandwidth at the same time. I'm beginning to think that the slot 6 task may be getting starved of memory and dying (crashing): there's no evidence from this log that BOINC is even attempting to ask any task to preempt.

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

So if it's not Boinc

So if it's not Boinc restarting the task then maybe there might be a hint in the stderr file for the task. If things on the Mac side works as in Windows you could open /Library/Application Support/BOINC Data/slots/6 and there should be a file called stderr.txt that might contain some useful info as to why the tasks restarts. I would start looking at the end of the file to see if there are anything obvious and post it here.

A normal running task would have a file looking something like this around a checkpoint:

Quote:
.
.
.
% checkpoint 4
% Sky point 5/31
% Starting semicoherent search over F0 and F1.
% nf1dots: 175 df1dot: 5.7656946e-015 f1dot_start: -1.1e-011 f1dot_band: 1e-012
.
.
.
.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.