I've been watching the similar threads... none seem to be like this. BOINC starts the job OK. About 7 minutes in it creates a "cpt" (checkpoint) file - OK. Anywhere from 30 min to 9 hrs later I get an "output file nnn_0 for nnn absent". (nnn being the current, now dead, job) SETI runs just fine. Happens any time of the day. I don't suspect a conflict with anti-virus (Comodo) or windows defender, but both avoid the whole BOINC sub-directory. Active-X is version 10. There are no errors, other than the above, in the stderrdae, stdoutgui, and stdoutdae except for SETI comm fails, when they compress.
I forgot to mention. When it runs OK, it runs for days/ weeks, until it doesn't! Then it won't complete, no matter what I do for days/ weeks until it starts again!! AAARGH! I've reset the project and tested the memory. It appears to write to the disk just fine. I don't have a temperature problem. For 23 hours a day it's the only thing running and it usually croaks when I'm not on. Is there the possibly a data pattern-specific bug in the code?
I'm running:
OS Name Microsoft Windows XP Professional
Version 5.1.2600 Service Pack 3 Build 2600
.
System Manufacturer P4i6G
System Model P4i65G
System Type X86-based PC (Single processor, 2 stacks)
Processor x86 Family 15 Model 3 Stepping 3 GenuineIntel ~2999 Mhz
BOINC has been 3 different versions, now 6.4.7 Screensaver is turned off.
I have snapshots of the std.. and cpt files and directory listings of the slot if required.
Copyright © 2024 Einstein@Home. All rights reserved.
Yet another "output file absent" problem
)
I run about 170 machines and I've seen numerous examples of problems very similar to this, over quite a considerable time. The science app is highly optimised and is particularly sensitive to hardware instabilities and it's not surprising that it affects E@H and not Seti in your case. I have a number of machines that run both projects and, in my case, it's always the E@H task that fails rather than Seti. Seti still seems to be able to run on quite dodgey hardware.
Here is a list of causes of this type of problem that I've identified, roughly in order of frequency:-
* CPU fan not running at full speed (dry bearings)
* Flakey motherboard (check for swollen capacitors)
* Flakey PSU (check for swollen capacitors)
* Unstabke overclock
* Flakey RAM
* Other random hardware issue
Perhaps, in much earlier days, some of these failures were due to software bugs. That may even still be possible but whenever I now see the problem, I can always find a hardware issue to explain it. The problem invariably disappears when I fix the hardware. As an example I have done about 20 separate motherboard repairs (replacing one or more obviously swollen caps) and in all cases the machines were put back into successful production. As I run a lot of 2001 - 2004 vintage machines, the swollen caps issue is not at all surprising.
Cheers,
Gary.
RE: RE: I've been
)
Thanks Gary
This motherboard is actually a replacement for one I had, that had the capacitor problem. I guess I'll have to have a closer look at them and the fans. Maybe I'll just change them out on spec. I have found cases where they don't look bad but are. Won't do any harm.
Thanks for the quick reply.