Noticed the following in the BOINC Event Log.
03/07/2013 03:03:53 | Einstein@Home | Task PA0057_01191_213_5 exited with zero status but no 'finished' file
03/07/2013 03:03:53 | Einstein@Home | If this happens repeatedly you may need to reset the project.
This is repeated several times for this task which has not finished yet.
What does this actually mean. Is it important. Do I need to reset E@H.
If this is important, the message is well hidden with no highlighted indication there is anything wrong.
BobM
Copyright © 2024 Einstein@Home. All rights reserved.
Task xxx exited with zero status but no 'finished' file
)
I don't know why the task exits, but don't reset the project; the message is very misleading!
Since your computers are hidden, I can't tell much, but perhaps a reboot might help.
Does the progress of the task advance between exits?
Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)
Apart from these messages,
)
Apart from these messages, the task appears to progress normally.
BobM
RE: Noticed the following
)
Look here http://einsteinathome.org/node/197031
Check if you have a status.cpt file in the slot subfolder. I suppose you don't.
Depending in your settings, the task will not restart, until calculation is done. Then it can,t create the finished file.
There is no status.cpt in the
)
There is no status.cpt in the slot sub-folder. However in the stderr.txt file see the highlighted line. (I've deleted non-timed messages for clarity and size).
stderr.txt
Activated exception handling...
[03:11:42][5448][INFO ] Starting data processing...
[03:11:42][5448][INFO ] CUDA global memory status (initial GPU state, including context):
------> Used in total: 134 MB (1915 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 0 MB
[03:11:42][5448][INFO ] Using CUDA device #0 "GeForce GT 640" (0 CUDA cores / 0.00 GFLOPS)
[03:11:42][5448][INFO ] Version of installed CUDA driver: 5050
[03:11:42][5448][INFO ] Version of CUDA driver API used: 3020
[03:11:43][5448][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[03:11:43][5448][INFO ] Header contents:
[03:11:44][5448][INFO ] Seed for random number generator is 1081711002.
[03:11:51][5448][INFO ] Derived global search parameters:
[03:11:51][5448][INFO ] CUDA global memory status (GPU setup complete):
------> Used in total: 255 MB (1794 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 121 MB
[03:16:45][5448][INFO ] Checkpoint committed!
[03:21:46][5448][INFO ] Checkpoint committed!
[03:26:49][5448][INFO ] Checkpoint committed!
[03:31:50][5448][INFO ] Checkpoint committed!
[03:36:52][5448][INFO ] Checkpoint committed!
[03:41:53][5448][INFO ] Checkpoint committed!
Activated exception handling...
[03:46:16][10540][INFO ] Starting data processing...
[03:46:16][10540][INFO ] CUDA global memory status (initial GPU state, including context):
------> Used in total: 134 MB (1915 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 0 MB
[03:46:16][10540][INFO ] Using CUDA device #0 "GeForce GT 640" (0 CUDA cores / 0.00 GFLOPS)
[03:46:16][10540][INFO ] Version of installed CUDA driver: 5050
[03:46:16][10540][INFO ] Version of CUDA driver API used: 3020
[03:46:16][10540][INFO ] Continuing work on ../../projects/einstein.phys.uwm.edu/PA0079_015A1_285.bin4 at template no. 42047
[03:46:17][10540][INFO ] Header contents:
[03:46:17][10540][INFO ] Seed for random number generator is 1081711002.
[03:46:26][10540][INFO ] Derived global search parameters:
[03:46:26][10540][INFO ] CUDA global memory status (GPU setup complete):
------> Used in total: 255 MB (1794 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 121 MB
[03:51:17][10540][INFO ] Checkpoint committed!
[03:56:20][10540][INFO ] Checkpoint committed!
[04:01:20][10540][INFO ] Checkpoint committed!
[04:06:23][10540][INFO ] Checkpoint committed!
[04:11:24][10540][INFO ] Checkpoint committed!
[04:16:25][10540][INFO ] Checkpoint committed!
[04:21:29][10540][INFO ] Checkpoint committed!
[04:26:33][10540][INFO ] Checkpoint committed!
[04:31:33][10540][INFO ] Checkpoint committed!
[04:36:36][10540][INFO ] Checkpoint committed!
[04:41:36][10540][INFO ] Checkpoint committed!
[04:46:37][10540][INFO ] Checkpoint committed!
[04:51:37][10540][INFO ] Checkpoint committed!
[04:56:41][10540][INFO ] Checkpoint committed!
[05:01:41][10540][INFO ] Checkpoint committed!
[05:06:44][10540][INFO ] Checkpoint committed!
[05:11:44][10540][INFO ] Checkpoint committed!
[05:16:47][10540][INFO ] Checkpoint committed!
[05:21:47][10540][INFO ] Checkpoint committed!
[05:26:50][10540][INFO ] Checkpoint committed!
[05:31:50][10540][INFO ] Checkpoint committed!
[05:36:53][10540][INFO ] Checkpoint committed!
[05:41:53][10540][INFO ] Checkpoint committed!
[05:46:56][10540][INFO ] Checkpoint committed!
[05:51:57][10540][INFO ] Checkpoint committed!
[05:56:59][10540][INFO ] Checkpoint committed!
[06:12:32][10540][INFO ] Statistics: count dirty SumSpec pages 1061 (not checkpointed), Page Size 1024, fundamental_idx_hi-window_2: 1100505
[06:12:32][10540][INFO ] Data processing finished successfully!
[06:12:32][10540][INFO ] Starting data processing...
[06:12:32][10540][INFO ] CUDA global memory status (initial GPU state, including context):
------> Used in total: 134 MB (1915 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 0 MB
[06:12:32][10540][INFO ] Using CUDA device #0 "GeForce GT 640" (0 CUDA cores / 0.00 GFLOPS)
[06:12:32][10540][INFO ] Version of installed CUDA driver: 5050
[06:12:32][10540][INFO ] Version of CUDA driver API used: 3020
[06:12:32][10540][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[06:12:32][10540][INFO ] Header contents:
[06:12:33][10540][INFO ] Seed for random number generator is 1084647014.
[06:12:40][10540][INFO ] Derived global search parameters:
[06:12:40][10540][INFO ] CUDA global memory status (GPU setup complete):
------> Used in total: 256 MB (1793 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 122 MB
BobM
RE: 03/07/2013 03:03:53 |
)
This is usually a symptom of a failed communication between the BOINC Client and the project application. If this happens a few times during a tasks it usually isn't a problem. If this happens rather frequently up to tasks being aborted with a "too many exits" error, you should do something about it. However, the message written is indeed obsolete and misleading, resetting the project rarely ever helps.
Most of such errors these days arise from problems in the "throtteling" mechanism. If you told your BOINC Client to "use less than 100% CPU time" in your settings, switch it back to 100%. On nowadays multi-core machines it's much cleaner and more effective to leave, say, half of the cores unused by BOINC (setting "on multiprocessor systems use at most ...") than to use all cores 50% of the time.
The second most common cause with older BOINC Clients and older Apps are networking problems of the Client: While the Client waits for a response from the network (DNS server) it delays communication with the application, leading to the application 'thinking' that the Client has terminated. You usually find a message in stderr.txt that reads "No heartbeat from core client - exiting". If this happens, you need to fix your network problems.
BM
BM
CPU setting is 100% Set to
)
CPU setting is 100%
Set to use 75% of cores (3 out of 4)
No network problems.
BOINC Client 7.0.64
Thanks for the assistance. I'll just leave everything well alone for the moment.
BobM
I am new to Einstein, but
)
I am new to Einstein, but have run World Community Grid for many years. I too am having this problem, but only with Einstein. Running on i7 x64 with 8 threads. Seems as if we are wasting a huge amount of processing time.
7/27/2013 4:35:02 AM | Einstein@Home | Restarting task PA0011_00581_374_0 using einsteinbinary_BRP5 version 133 (BRP4cuda32nv301) in slot 0
7/27/2013 5:35:30 AM | Einstein@Home | Task PA0011_00581_374_0 exited with zero status but no 'finished' file
7/27/2013 5:35:30 AM | Einstein@Home | If this happens repeatedly you may need to reset the project.
7/27/2013 5:35:30 AM | Einstein@Home | Restarting task PA0011_00581_374_0 using einsteinbinary_BRP5 version 133 (BRP4cuda32nv301) in slot 0
7/27/2013 5:55:52 AM | Einstein@Home | Task PA0011_00581_374_0 exited with zero status but no 'finished' file
7/27/2013 5:55:52 AM | Einstein@Home | If this happens repeatedly you may need to reset the project.
7/27/2013 5:55:52 AM | Einstein@Home | Restarting task PA0011_00581_374_0 using einsteinbinary_BRP5 version 133 (BRP4cuda32nv301) in slot 0
7/27/2013 9:33:35 AM | Einstein@Home | Task PA0011_00581_374_0 exited with zero status but no 'finished' file
7/27/2013 9:33:35 AM | Einstein@Home | If this happens repeatedly you may need to reset the project.
7/27/2013 9:33:35 AM | Einstein@Home | Restarting task PA0011_00581_374_0 using einsteinbinary_BRP5 version 133 (BRP4cuda32nv301) in slot 0
Other forum posts indicate that this is not a problem, and will eventually work itself out, but I hate to waste all this processing time.
Thoughts?
Fred
Broadway, VA
Fred,
Broadway, VA
Are you running any software
)
Are you running any software on that host that influences the system clock?
How often does the host connect to a time server?
Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)
RE: Other forum posts
)
The app should be capable of checkpoiting rather frequently, so not much computing time should be wasted that way.
I suspect your particular problem being related to a bug in the BOINC code that apparently was recently fixed. We are currently testing an app version that incorporates this fix over at Albert@Home. This version should appear here on Einstein in the next few days.
If you really want to avoid this problem altogether, try an older BOINC Client (pre-7.0.40 IIRC). But I think that would not really be worth the effort.
BM
BM
To reply to Gundolf: No,
)
To reply to Gundolf: No, nothing affects the system clock. Time is updated to network at a minimum of once per day. The system is idle about 22 Hours per day working only BOINC tasks.
Bernd: Thanks, I'll watch for the fix. Where will the update be made available?
Thanks and Great Day,
fa
Fred,
Broadway, VA