Hi,
My problem seemed to have started around Jan 23rd 2012, since this is the last time I've gotten any WU credits.
Every one of my WU's seem to be putting out "no finished file" statuses. I've looked back in my log and I can't find a WU that doesn't show this. For instance:
Tue Feb 7 15:29:21 2012 | Einstein@Home | Restarting task p2030.20111006.G36.59-00.68.S.b6s0g0.00000_80_1 using einsteinbinary_BRP4 version 101 Tue Feb 7 15:29:26 2012 | Einstein@Home | Task p2030.20111006.G36.59-00.68.S.b6s0g0.00000_80_1 exited with zero status but no 'finished' file Tue Feb 7 15:29:26 2012 | Einstein@Home | If this happens repeatedly you may need to reset the project.
Previously to this time period, I've been successfully finishing WU's and getting credits. I've reset the E@H project several times, but these errors still occur. I'm running the Mac version 6.12.35 and using CUDA with my Nvidia graphics card.
Thanks for any advice.
Michael
____________________________________
Copyright © 2024 Einstein@Home. All rights reserved.
Since Jan 23rd 2012, no WU credits due to "no finished file" WU
)
Reboot your system. Your GPU memory is completely filled.
As per http://einsteinathome.org/task/271807687, which constantly says:
Hi, Thanks for trying to
)
Hi,
Thanks for trying to help, but the messages still persist after rebooting and updating to the current CUDA driver version 4.1.28. I also reset the E@H project (again):
Michael
____________________________________
I'm going to try fixing my
)
I'm going to try fixing my problem by uninstalling BOINC completely from my iMac, then reinstalling v35 and running E@H without GPU-only options. I can go back to v33, but I've never run GPU-only with it, so I'm not sure how well it would work.
Michael
____________________________________
RE: I'm going to try fixing
)
I don't think there would be any benefit from going with anything other than the recommended version of BOINC. I would also suggest that resetting the project is likely to be a waste of time, as would be uninstalling and reinstalling BOINC itself. The
... exited with zero status but no 'finished' file
also known as
has been around for a very long time and quite often it's not at all clear as to what is causing it. It has happened to me over the years on CPU tasks on various machines and I've always been able to stop it by reducing the overclock and/or improving the CPU cooling efficiency. Never has resetting the project proved to be useful.
My understanding of (a very simplified version of) what happens is this. The BOINC client starts a task by invoking the appropriate science app. The necessary command line for this is extracted from the state file. BOINC follows the progress of the science app by monitoring the exit code of the app when it finishes. When an error occurs, the app will emit a non-zero exit code which BOINC can pass back to the project to help with diagnosis of the problem. BOINC knows there should be an output file of results from the app (the 'finished' file) when the app terminates normally. So if BOINC sees an exit code of zero but can't find the output (finished file), it issues the above first complaint and restarts the app. If the app continues to terminate with a zero exit status, after a set number of these, BOINC will quit, and the above second message will be included in what is sent back to the project.
In looking through your list of tasks, you appear to have two separate situations going on. A number of tasks are exiting and being restarted by BOINC (with the above messages in evidence) but you are eventually aborting them before the set number of exit(0)s are up - I think it's 100. There are other tasks that error out with a definite non-zero exit code - for example, with the following type of output
eForce GT 130" (64 CUDA cores / 288.00 GFLOPS)
[13:00:46][79793][INFO ] Version of installed CUDA driver: 4000
[13:00:46][79793][INFO ] Version of CUDA driver API used: 3020
[13:00:46][79793][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
....
[13:00:50][79793][INFO ] Seed for random number generator is -993699923.
[13:00:51][79793][ERROR] Error allocating modulated time offsets device memory: 16777216 bytes (error: 2)
[13:00:51][79793][ERROR] Demodulation failed (error: 1006)!
[13:00:51][79793][WARN ] CUDA memory allocation problem encountered!
------> Returning control to BOINC, delaying restart for at least five minutes...
------> If this problem persists you should consider aborting this task.
[13:07:13][79887][INFO ] Application startup - thank you for supporting Einstein@Home!
[13:07:13][79887][INFO ] Starting data processing...
[13:07:13][79887][INFO ] CUDA global memory status (initial GPU state, including context):
------> Used in total: 500 MB (12 MB free / 512 MB total) -> Used by this application (assuming a single GPU task): 0 MB
....
These are quite clearly problems with available GPU memory. In the above example there is only 12 MB free - so the task is put on hold for 5 minutes to see if more memory will become available - and it never seems to.
So, it would seem that perhaps you should set your preferences so as to not attempt GPU crunching but to do CPU tasks instead - if you wish to.
Cheers,
Gary.
This message appears at the
)
This message appears at the end of every 60 minutes session of my Test4Theory@home project, with no ill effect:
11-Feb-2012 09:19:26 [Test4Theory@Home] Task uc_1328182110_36196_0: no shared memory segment
11-Feb-2012 09:19:26 [Test4Theory@Home] Task uc_1328182110_36196_0 exited with zero status but no 'finished' file
11-Feb-2012 09:19:26 [Test4Theory@Home] If this happens repeatedly you may need to reset the project.
Tullio
Gary, Thanks for the
)
Gary,
Thanks for the detailed reply. Unfortunately I read your post too late; I completely wiped my installation of BOINC and installed v35 again. Oh well.
I changed my E@H prefs to use both the CPU and GPU, and I should see shortly if I can process successfully with the defaults. If that works, I'll try GPU-only like I had before.
Michael
____________________________________
Well, I succeeded in getting
)
Well, I succeeded in getting everything back to square 1. I can run the Gamma Ray Pulsar search just fine, but the Binary Radio Pulsar search is back to its stalled state, so - as Gary said - uninstalling and reinstalling the BOINC client did nothing.
The only other thing I can think of is the CUDA. Maybe the current version is a problem. I'm not sure if downgrading to an earlier version would help or not. I would really like to do GPU work if I can (and I know I can since those WU's worked before).
Michael
____________________________________
After leaving BOINC sit
)
After leaving BOINC sit through undisturbed the entire night, the GPU WU's ended up aborting with only 3 or 4 seconds used.
To try my CUDA theory, I downgraded the CUDA from 4.1.28 to 4.0.21, which I had run before in July 2011. Hopefully, this version will free me of my GPU memory problem. I should know in a couple of days.
Michael
____________________________________
Hmmmm. It looks like I won't
)
Hmmmm. It looks like I won't get any work until tomorrow for E@H since the log shows that I've already processed 8 tasks for the day. How that amount is figured, I can't understand since as far as I can tell I've processed only 4 tasks for today (2 complete, 2 aborted).
Michael
____________________________________
RE: Hmmmm. It looks like I
)
Belay that last remark. Lots of tasks just showed up in my queue. Then what does that log entry mean?
____________________________________