exited with zero status but no 'finished' file

MacRonin
MacRonin
Joined: 6 Nov 08
Posts: 4
Credit: 242207
RAC: 0
Topic 194101

Sun Dec 28 18:43:27 2008||Suspending computation - user is active
Sun Dec 28 18:43:38 2008|Einstein@Home|Task h1_0805.75_S5R4__410_S5R4a_2: no shared memory segment
Sun Dec 28 18:43:38 2008|Einstein@Home|Task h1_0805.75_S5R4__410_S5R4a_2 exited with zero status but no 'finished' file
Sun Dec 28 18:43:38 2008|Einstein@Home|If this happens repeatedly you may need to reset the project.

I received the above msgs in my msg log. I was wondering if the 'no fished file' may have happened because computation was suspended just prior to the error msg. I am running Mac OSX 10.5

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1079
Credit: 341280
RAC: 0

exited with zero status but no 'finished' file

Quote:
...I was wondering if the 'no fished file' may have happened because computation was suspended just prior to the error msg...


I'm quite sure that the suspending was not the cause. In any case don't reset the project! A simple restart of BOINC should be enough, if needed at all.

If that message appears only sporadically (2-3 times a day), just ignore it. It often is caused by automatic adjusting of the system clock by time servers.

If it appears often (about every few minutes), you have to check the log file stderr.txt in the appropriate slots directory for the cause of the error, because after 100 retries, the Task will be aborted.

Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 737296842
RAC: 1303364

RE: Sun Dec 28 18:43:27

Quote:

Sun Dec 28 18:43:27 2008||Suspending computation - user is active
Sun Dec 28 18:43:38 2008|Einstein@Home|Task h1_0805.75_S5R4__410_S5R4a_2: no shared memory segment
Sun Dec 28 18:43:38 2008|Einstein@Home|Task h1_0805.75_S5R4__410_S5R4a_2 exited with zero status but no 'finished' file
Sun Dec 28 18:43:38 2008|Einstein@Home|If this happens repeatedly you may need to reset the project.

I received the above msgs in my msg log. I was wondering if the 'no fished file' may have happened because computation was suspended just prior to the error msg. I am running Mac OSX 10.5

Each time this message appears, the work spent on the WU so far is abandoned/wasted, so this is something to worry about.

The problem might be a too small shared memory config under MacOS. This is more frequently seen on systems with more than two cores, but you never know, it might be worthwhile to try this fix, as quoted here.

Happy Crunching
Bikeman

MacRonin
MacRonin
Joined: 6 Nov 08
Posts: 4
Credit: 242207
RAC: 0

RE: RE: Sun Dec 28

Message 89274 in response to message 89273

Quote:
Quote:

Sun Dec 28 18:43:27 2008||Suspending computation - user is active
Sun Dec 28 18:43:38 2008|Einstein@Home|Task h1_0805.75_S5R4__410_S5R4a_2: no shared memory segment
Sun Dec 28 18:43:38 2008|Einstein@Home|Task h1_0805.75_S5R4__410_S5R4a_2 exited with zero status but no 'finished' file
Sun Dec 28 18:43:38 2008|Einstein@Home|If this happens repeatedly you may need to reset the project.

I received the above msgs in my msg log. I was wondering if the 'no fished file' may have happened because computation was suspended just prior to the error msg. I am running Mac OSX 10.5

Each time this message appears, the work spent on the WU so far is abandoned/wasted, so this is something to worry about.

The problem might be a too small shared memory config under MacOS. This is more frequently seen on systems with more than two cores, but you never know, it might be worthwhile to try this fix, as quoted here.

Happy Crunching
Bikeman

I'm running it on a new UniBody MacBookPro so it only has the two processors. But one additional detail I thought I'd mention. In the thread you pointed to, it mentioned BOINC not always doing a complete job of cleaning up the shared memory segments when it crashes or gets shut down. While I haven't noticed any crashes recently, I have been stopping and restarting BOINC between reboots to disable it when playing back video.(Boinc takes up to much CPU and lowers the frame rate). I stop instead of suspending so that in case I forget it will worst case be reactivated the next time I reboot

Also I do have 8 projects installed in my system, although one is suspended due to not currently having any work for my processor type. Although at the time of the failure I believe that only 3 had tasks in the local queue.

Also I only mentioned the "Suspending computation - user is active" msg because it happened (without a corresponding reactivation) 11 seconds prior to the task being completed and getting the error. I would thought work would have stopped and therefore would not have completed till things were started back up again. But maybe there is a lag between the "Suspending computation - user is active" msg and work actually stopping

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1079
Credit: 341280
RAC: 0

RE: Also I only mentioned

Message 89275 in response to message 89274

Quote:
Also I only mentioned the "Suspending computation - user is active" msg because it happened (without a corresponding reactivation) 11 seconds prior to the task being completed and getting the error. I would thought work would have stopped and therefore would not have completed till things were started back up again. But maybe there is a lag between the "Suspending computation - user is active" msg and work actually stopping


If you have Leave applications in memory while suspended? in your preferences set to yes, then the aforementioned system-time reset will cause every task in memory to give the message (without losing work done), whether active or not.

Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

MacRonin
MacRonin
Joined: 6 Nov 08
Posts: 4
Credit: 242207
RAC: 0

RE: RE: Also I only

Message 89276 in response to message 89275

Quote:
Quote:
Also I only mentioned the "Suspending computation - user is active" msg because it happened (without a corresponding reactivation) 11 seconds prior to the task being completed and getting the error. I would thought work would have stopped and therefore would not have completed till things were started back up again. But maybe there is a lag between the "Suspending computation - user is active" msg and work actually stopping

If you have Leave applications in memory while suspended? in your preferences set to yes, then the aforementioned system-time reset will cause every task in memory to give the message (without losing work done), whether active or not.

Gruß,
Gundolf

In my case Leave applications in memory while suspended? is NOT checked. But here is the startup status info from my system.

Tue Dec 30 03:52:20 2008||Starting BOINC client version 6.2.18 for x86_64-apple-darwin
Tue Dec 30 03:52:20 2008||log flags: task, file_xfer, sched_ops
Tue Dec 30 03:52:20 2008||Libraries: libcurl/7.18.0 OpenSSL/0.9.7l zlib/1.2.3 c-ares/1.5.1
Tue Dec 30 03:52:20 2008||Data directory: /Library/Application Support/BOINC Data
Tue Dec 30 03:52:20 2008||Processor: 2 GenuineIntel Intel(R) Core(TM)2 Duo CPU T9400 @ 2.53GHz [x86 Family 6 Model 23 Stepping 6]
Tue Dec 30 03:52:20 2008||Processor features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM SSE3 MON DSCPL VMX SMX EST TM2 SSSE3 CX16 TPR PDCM SSE4.1
Tue Dec 30 03:52:20 2008||OS: Darwin: 9.5.1
Tue Dec 30 03:52:20 2008||Memory: 4.00 GB physical, 238.13 GB virtual
Tue Dec 30 03:52:20 2008||Disk: 297.77 GB total, 237.89 GB free
Tue Dec 30 03:52:20 2008||Local time is UTC -5 hours
Tue Dec 30 03:52:20 2008||No coprocessors
Tue Dec 30 03:52:20 2008|rosetta@home|URL: http://boinc.bakerlab.org/rosetta/; Computer ID: 000000(number blocked); location: (none); project prefs: default
Tue Dec 30 03:52:20 2008|boincsimap|URL: http://boinc.bio.wzw.tum.de/boincsimap/; Computer ID: 000000(number blocked);; location: (none); project prefs: default
Tue Dec 30 03:52:20 2008|The Lattice Project|URL: http://boinc.umiacs.umd.edu/; Computer ID: 000000(number blocked);; location: home; project prefs: default
Tue Dec 30 03:52:20 2008|climateprediction.net|URL: http://climateprediction.net/; Computer ID: 000000(number blocked);; location: (none); project prefs: default
Tue Dec 30 03:52:20 2008|Docking@Home|URL: http://docking.cis.udel.edu/; Computer ID: 000000(number blocked);; location: (none); project prefs: default
Tue Dec 30 03:52:20 2008|Einstein@Home|URL: http://einstein.phys.uwm.edu/; Computer ID: 000000(number blocked);; location: (none); project prefs: default
Tue Dec 30 03:52:20 2008|SETI@home|URL: http://setiathome.berkeley.edu/; Computer ID: 000000(number blocked);; location: home; project prefs: default
Tue Dec 30 03:52:20 2008|uFluids|URL: http://www.ufluids.net/; Computer ID: 91301; location: (none); project prefs: default
Tue Dec 30 03:52:20 2008||General prefs: from Einstein@Home (last modified 18-Feb-2006 10:48:31)
Tue Dec 30 03:52:20 2008||Host location: none
Tue Dec 30 03:52:20 2008||General prefs: using your defaults
Tue Dec 30 03:52:20 2008||Reading preferences override file
Tue Dec 30 03:52:20 2008||Preferences limit memory usage when active to 2048.00MB
Tue Dec 30 03:52:20 2008||Preferences limit memory usage when idle to 3072.00MB
Tue Dec 30 03:52:20 2008||Preferences limit disk usage to 1.12GB

mikey
mikey
Joined: 22 Jan 05
Posts: 12712
Credit: 1839116161
RAC: 3610

RE: RE: RE: Also I only

Message 89277 in response to message 89276

Quote:
Quote:
Quote:
Also I only mentioned the "Suspending computation - user is active" msg because it happened (without a corresponding reactivation) 11 seconds prior to the task being completed and getting the error. I would thought work would have stopped and therefore would not have completed till things were started back up again. But maybe there is a lag between the "Suspending computation - user is active" msg and work actually stopping

If you have Leave applications in memory while suspended? in your preferences set to yes, then the aforementioned system-time reset will cause every task in memory to give the message (without losing work done), whether active or not.

Gruß,
Gundolf

In my case Leave applications in memory while suspended? is NOT checked. But here is the startup status info from my system.

With this statement by you too "Also I do have 8 projects installed in my system" you should DEFINITELY have the box checked. You are depending on your system drive to keep all that data while doing everything else too, not a good scenario. You also might try stopping crunching for some projects and try to focus on an elite few. With that many your pc must be switching from project to project constantly!

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1079
Credit: 341280
RAC: 0

RE: With this statement by

Message 89278 in response to message 89277

Quote:
With this statement by you too "Also I do have 8 projects installed in my system" you should DEFINITELY have the box checked.


That's only true for projects that don't do checkpointing.

Quote:
You are depending on your system drive to keep all that data while doing everything else too, not a good scenario.


?????????????

Quote:
You also might try stopping crunching for some projects and try to focus on an elite few. With that many your pc must be switching from project to project constantly!


If you mean with 'constantly' every hour (default preferences), then you are right, but I don't see a problem with that.

Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

mikey
mikey
Joined: 22 Jan 05
Posts: 12712
Credit: 1839116161
RAC: 3610

RE: RE: You also might

Message 89279 in response to message 89278

Quote:
Quote:
You also might try stopping crunching for some projects and try to focus on an elite few. With that many your pc must be switching from project to project constantly!

If you mean with 'constantly' every hour (default preferences), then you are right, but I don't see a problem with that.
Gruß, Gundolf

Actually I was thinking of the LTD you must be accumulating because of the many projects and their individual deadlines all being different. But I guess as long as it is working for you, keep up the crunching!

MacRonin
MacRonin
Joined: 6 Nov 08
Posts: 4
Credit: 242207
RAC: 0

RE: RE: RE: You also

Message 89280 in response to message 89279

Quote:
Quote:
Quote:
You also might try stopping crunching for some projects and try to focus on an elite few. With that many your pc must be switching from project to project constantly!

If you mean with 'constantly' every hour (default preferences), then you are right, but I don't see a problem with that.
Gruß, Gundolf

Actually I was thinking of the LTD you must be accumulating because of the many projects and their individual deadlines all being different. But I guess as long as it is working for you, keep up the crunching!


While there are 8 projects installed, they are are not all active at the same time. One currently has nothing for the mac. Another comes in spurts, with new data to crunch at the start of each month. The climate one has a 12 month deadline. Other than the originally mentioned error things are running fine.

BTW the next work block from E@H seems to have run OK

ritterm
ritterm
Joined: 18 Jun 08
Posts: 23
Credit: 46657826
RAC: 0

RE: If it appears often

Message 89281 in response to message 89272

Quote:
If it appears often (about every few minutes), you have to check the log file stderr.txt in the appropriate slots directory for the cause of the error, because after 100 retries, the Task will be aborted.

Is there anything that can be done then if it keeps happening? Here are the kinds of messages that keep showing up in the stderr.txt file when one of my WUs tries to run:

2009-01-15 13:58:39.2820 [normal]: This program is published under the GNU General Public License, version 2
2009-01-15 13:58:39.2820 [normal]: For details see http://einstein.phys.uwm.edu/license.php
2009-01-15 13:58:39.2820 [normal]: This Einstein@home App was built at: Dec 31 2008 04:02:27

2009-01-15 13:58:39.2820 [normal]: Start of BOINC application '..\..\projects\einstein.phys.uwm.edu\einstein_S5R4_6.10_windows_intelx86_2.exe'.
Activated exception handling...
Can't acquire lockfile - exiting
FILE_LOCK::unlock(): close failed.: No such file or directory

I've suspended this task and am hoping that I haven't lost several hours of work.

Thanks for any ideas! :-)

MarkR

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.