Stuck in endless loop

Zalster
Zalster
Joined: 26 Nov 13
Posts: 3117
Credit: 4050672230
RAC: 0

MacBook Pro 2011 dual core

MacBook Pro 2011 dual core with hyperthreading? Correct?

If you are worried about something else hogging the system or if lack of memory for the application then go to spotlight and type in activity monitor

Launch it, then go to view on pull down menu, click on view and go to columns move to the right and make sure % CPU, CPU time and Real Memory are clicked.

This will now allow you to see what applications are using your computer and what percentage of resources are being used.

Back to spotlight, type console and launch.

This will give a look at any system report that is logged by the computer.

There are 2 areas system log queries and Diagnostic and usage information.

If you know what time the application crashes out, you can go back and look at the logs and see if there is a recorded message at those times.

Good luck..

Th. Walter
Th. Walter
Joined: 10 Jan 13
Posts: 6
Credit: 294648
RAC: 0

These are the entries written

These are the entries written in the stderr.txt after the job was restarting at 22:42:16

22:42:16 (2646): [normal]: This Einstein@home App was built at: Aug 21 2014 22:41:54

22:42:16 (2646): [normal]: Start of BOINC application 'hsgamma_FGRP4_1.04_x86_64-apple-darwin__FGRP4-SSE2'.
22:42:16 (2646): [debug]: 2.1e+15 fp, 3.6e+09 fp/s, 588946 s, 163h35m45s95
command line: hsgamma_FGRP4_1.04_x86_64-apple-darwin__FGRP4-SSE2 --inputfile ../../projects/einstein.phys.uwm.edu/LATeah0035E.dat --outputfile results.cand.out --alpha 2.88819004482 --delta -1.06283744662 --skyRadius 2.178171e-03 --ldiBins 15 --f0start 48 --f0Band 32 --firstSkyPoint 4290 --numSkyPoints 30 --f1dot -2.85e-10 --f1dotBand 1e-12 --df1dot 5.757173436e-15 --ephemdir ../../projects/einstein.phys.uwm.edu/JPLEPH --Tcoh 2097152.0 --toplist 5 --cohFollow 5 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.15 --reftime 55716 --f0orbit 0.005 --debug 1
output files: 'results.cand.out' '../../projects/einstein.phys.uwm.edu/LATeah0035E_80.0_4290_-2.84e-10_0_0' 'results.cand.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah0035E_80.0_4290_-2.84e-10_0_1'
22:42:16 (2646): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86
22:42:16 (2646): [normal]: WARNING: Resultfile '../../projects/einstein.phys.uwm.edu/LATeah0035E_80.0_4290_-2.84e-10_0_0' present - doing nothing
22:42:16 (2646): [debug]: Set up communication with graphics process.
mv: results.cand.out: No such file or directory
mv: results.cand.out: No such file or directory
mv: results.cand.out: No such file or directory

Quote:
MacBook Pro 2011 dual core with hyperthreading? Correct?

Almost correct. MacBook Pro 2013 dual core with hyperthreading

Quote:
It would be useful to change the 90% limit to zero (temporarily) and then re-run the exact same sequence to see what is different.

I tried it and set the limit to zero, but the behaviour did not change.

mikey
mikey
Joined: 22 Jan 05
Posts: 12692
Credit: 1839098411
RAC: 3714

RE: Almost correct.

Quote:


Almost correct. MacBook Pro 2013 dual core with hyperthreading

Quote:
It would be useful to change the 90% limit to zero (temporarily) and then re-run the exact same sequence to see what is different.

I tried it and set the limit to zero, but the behaviour did not change.

Why do you have 6 slots and only 4 cpu cores? Are you running two tasks on the gpu at the same time?

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

RE: These are the entries

Quote:

These are the entries written in the stderr.txt after the job was restarting at 22:42:16

22:42:16 (2646): [normal]: This Einstein@home App was built at: Aug 21 2014 22:41:54

22:42:16 (2646): [normal]: Start of BOINC application 'hsgamma_FGRP4_1.04_x86_64-apple-darwin__FGRP4-SSE2'.
22:42:16 (2646): [debug]: 2.1e+15 fp, 3.6e+09 fp/s, 588946 s, 163h35m45s95
command line: hsgamma_FGRP4_1.04_x86_64-apple-darwin__FGRP4-SSE2 --inputfile ../../projects/einstein.phys.uwm.edu/LATeah0035E.dat --outputfile results.cand.out --alpha 2.88819004482 --delta -1.06283744662 --skyRadius 2.178171e-03 --ldiBins 15 --f0start 48 --f0Band 32 --firstSkyPoint 4290 --numSkyPoints 30 --f1dot -2.85e-10 --f1dotBand 1e-12 --df1dot 5.757173436e-15 --ephemdir ../../projects/einstein.phys.uwm.edu/JPLEPH --Tcoh 2097152.0 --toplist 5 --cohFollow 5 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.15 --reftime 55716 --f0orbit 0.005 --debug 1
output files: 'results.cand.out' '../../projects/einstein.phys.uwm.edu/LATeah0035E_80.0_4290_-2.84e-10_0_0' 'results.cand.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah0035E_80.0_4290_-2.84e-10_0_1'
22:42:16 (2646): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86
22:42:16 (2646): [normal]: WARNING: Resultfile '../../projects/einstein.phys.uwm.edu/LATeah0035E_80.0_4290_-2.84e-10_0_0' present - doing nothing
22:42:16 (2646): [debug]: Set up communication with graphics process.
mv: results.cand.out: No such file or directory
mv: results.cand.out: No such file or directory
mv: results.cand.out: No such file or directory


Without actually knowing, I would say that the red line above looks wrong. It seems as though the task completed and wrote the output file but got shut down before it could finish the final cleanup, so when Boinc starts it again it gets confused and can't complete. I would say this task is really stuck in a endless loop and I would probably abort it and hope it doesn't happen again.

As a comparison here's the output from a restart of a FGRP4 task on my machine:

Quote:

11:36:58 (5884): [normal]: This Einstein@home App was built at: Nov 25 2014 13:29:37

11:36:58 (5884): [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/hsgamma_FGRP4_1.05_windows_intelx86__FGRP4-Beta.exe'.
11:36:58 (5884): [debug]: 2.1e+015 fp, 4.7e+009 fp/s, 449148 s, 124h45m48s06
command line: projects/einstein.phys.uwm.edu/hsgamma_FGRP4_1.05_windows_intelx86__FGRP4-Beta.exe --inputfile ../../projects/einstein.phys.uwm.edu/LATeah0038E.dat --outputfile results.cand.out --alpha 2.93244635866 --delta -1.07484705721 --skyRadius 2.581342e-03 --ldiBins 15 --f0start 1232 --f0Band 32 --firstSkyPoint 1043817 --numSkyPoints 291 --f1dot -1.0e-13 --f1dotBand 1.0e-13 --df1dot 5.7656946e-15 --ephemdir ..\..\projects\einstein.phys.uwm.edu\JPLEPH --Tcoh 2097152.0 --toplist 5 --cohFollow 5 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.15 --reftime 55716 --f0orbit 0.005 --debug 1
output files: 'results.cand.out' '../../projects/einstein.phys.uwm.edu/LATeah0038E_1264.0_1043817_0.0_0_0' 'results.cand.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah0038E_1264.0_1043817_0.0_0_1'
11:36:58 (5884): [debug]: Flags: i386 SSE GNUC X86 GNUX86
11:36:58 (5884): [debug]: Set up communication with graphics process.
% Opening inputfile: ../../projects/einstein.phys.uwm.edu/LATeah0038E.dat
% Total amount of photon times: 30008
% Preparing toplist of length: 5
% checkpoint read: skypoint 210
% fft_size: 67108864 (0x4000000)
% Sky point 211/291
% Creating FFT plan.
% Starting semicoherent search over f0 and f1.
% nf1dots: 19 df1dot: 5.7656946e-015 f1dot_start: -1e-013 f1dot_band: 1e-013
.

Odysseus
Odysseus
Joined: 17 Dec 05
Posts: 372
Credit: 20569004
RAC: 5991

I noticed task Nº466453803

I noticed task Nº466453803 on my MacBook hadn’t progressed past 95.666% in a couple of days, then observed it looping through the same small range of times, both elapsed and estimated-remaining, over and over. The system having already been stopped and started a couple of times since I noticed the behaviour, making me doubt I could do anything more useful, I aborted the task. My quorum partner doesn’t seem to have had any problem.

P.S. Mods, feel free to move this post to the other thread, which I didn’t notice before because of all the topics pinned above it—sorry.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117688212549
RAC: 35083893

RE: P.S. Mods, feel free to

Quote:
P.S. Mods, feel free to move this post to the other thread ...


Done!

I wonder if this behaviour is specific to OS X. That's what the OP is also using.

Cheers,
Gary.

Odysseus
Odysseus
Joined: 17 Dec 05
Posts: 372
Credit: 20569004
RAC: 5991

Possibly; the QP I mentioned

Possibly; the QP I mentioned above is on Win v7 … in my case I’m pretty sure the cause was not contention for CPU time, as this laptop is set to run BOINC on one core, and only half the time at that. It therefore has just one task active at a time (with S@h out of the picture, that means alternating between E@h & MW@h) and was not being used for anything else over the majority of the time it spent ‘spinning its wheels’.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.