3/22/2010 10:14:45 AM Einstein@Home Output file p2030_53611_01579_0009_G36.03+00.54.N_2.dm_20_1_0 for task p2030_53611_01579_0009_G36.03+00.54.N_2.dm_20_1 absent
3/22/2010 10:14:45 AM Einstein@Home Output file p2030_53611_01579_0009_G36.03+00.54.N_2.dm_20_1_1 for task p2030_53611_01579_0009_G36.03+00.54.N_2.dm_20_1 absent
3/22/2010 10:14:45 AM Einstein@Home Output file p2030_53611_01579_0009_G36.03+00.54.N_2.dm_20_1_2 for task p2030_53611_01579_0009_G36.03+00.54.N_2.dm_20_1 absent
3/22/2010 10:14:45 AM Einstein@Home Output file p2030_53611_01579_0009_G36.03+00.54.N_2.dm_20_1_3 for task p2030_53611_01579_0009_G36.03+00.54.N_2.dm_20_1 absent
I have been getting this result consistently for the ABS work units. I have reset the project, downloaded the BOINC software again, and nothing has helped. I have no idea why I'm having this problem with only these WU's and not the others. Is there some sort of fix I'm unaware of? I am not a BOINC expert, so if anyone has a solution or suggestion please phrase it so I can understand it.
THE MOTHER OF FOOLS IS ALWAYS PREGNANT
Copyright © 2024 Einstein@Home. All rights reserved.
What's the Cure?
)
Hmmm. 167296238
How do you get 'too many exit(0)s' in 0 seconds?
[Sorry, jacklass1, that's a question for other potential helpers - indicating that you've set us an "interesting", i.e. tough, question. Hopefully the answer will be easier to understand, but it may take us a while to find it.]
RE: Hmmm. 167296238 How do
)
OK, I'm game ....
- exit() is a language call for program termination with an error code.
- exit(0) is a terminate returning a code of zero.
- traditionally zero means 'no problem' or 'success' that will be read ( probably ) by whatever called the program in the first place.
- it looks like the BOINC client ( version 6.10.18 in this case ) was that program invoking the one that exited ( evidently a E@H application - STSP )
- so this is reported as happening too many times in no time at all !?!?
- there must be a counter reflecting that ( number of times that is excessive )
- someone has used/nominated an integer type for that count
- but has mixed up a signed rather than an unsigned comparison. Eg what is 255 as an unsigned integer byte, is -1 as a signed integer byte.
- and/or hasn't initialised the counter prior to use, hence it didn't start at zero but rather any old value ( depending on memory contents prior to load ).
- tested that value ( prior to application program invocation actually ) in a conditional construct ( test before body/block is executed ), so that it errors out quick slick.
Thus I hypothecate a programming boner in the BOINC client of that version, possibly also an issue with compiler switches for a given target system. In C/C++ for instance ( my guess at the BOINC source code language ) the type 'int' without other qualification can be deemed as signed or unsigned, depending on a variety of stuff.
[ Always initialise your variables. If you want a certain data type then say so, don't assume. ]
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
RE: - so this is reported
)
You have to see it in the context of error -226
If the app can't even make the first checkpoint, and that 99 times before we exit the application, the CPU time is effectively zero.
This error usually happens when something external is locking the BOINC Data directory and sub-directories, like an anti-virus program or anti-spyware program. Advice is to exclude the BOINC Data directory completely, or only do active scans on the system when BOINC isn't running.
The inevitable Windows OS
)
The inevitable Windows OS answer is to shutdown the PC and then reboot. Probably after dealing with the point Jord made.
Shih-Tzu are clever, cuddly, playful and rule!! Jack Russell are feisty!
RE: RE: - so this is
)
Well done! I had a nice theory for the ten minutes it lasted! :-) :-)
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
RE: Well done! I had a nice
)
Yes, but the only problem I see with my answer is that it only happens with his ABP2, not with GCE/S5R6.
I didn't follow everything here, but does ABP2 come in ATI flavor as well?
RE: Yes, but the only
)
I don't think that's a problem at all. His security software is upset only with something in ABP2. I don't know how AV/Security stuff works but could it be that an ABP2 file is being permanently locked, rather than just drive-by scanning inserting a temp lock? The file can still be seen but never can be opened?
No.
@ Jack Lass - can you temporarily disable your security software to see if ABP2 tasks can then complete? If so, and they do complete, you will need to investigate how to get your security software to stop interfering with ABP2 stuff. It might be to reconfigure your security software to advise it that ABP2 files are OK. There should be logs somewhere in your security system that tell you what particular file it is unhappy with. So rather than disabling anything, first see if you can find that log information (try reading the docs that came with your software) and once you find the offending file listed in the logs, read up on how to tell your security software that the 'problem' file is actually OK.
EDIT: On re-reading your original post, the error is that the output file is missing. Perhaps your security software is deleting the output file as soon as it is created and so the science app is continually being restarted right from the beginning until there are too many of these restarts.
Cheers,
Gary.
RE: EDIT: On re-reading
)
Unlikely. If you look at host 2262468, where I got the example task from, the time interval between tasks isn't enough to iterate the full run 100 times with a file deletion between each run.....
It's a problem - like many others - with recent BOINC versions: they report the consequences of an error (the expected output files didn't exist), but they're too coy to actually say there was an error in the first place. Ticket [trac]#985[/trac] relates.
Just got a nudge from
)
Just got a nudge from ZZUBYTTIHS in the next thread. Could this be our old friend the clunky thermal throttling back again? To be serious for a moment, the OP's actual problem seems to be that the task started 100 times, but never got far enough into the task to (a) start the CPU time counter, or (b) post any application startup stderr_out messages. That could be AV locking - though I would be surprised, and worried, if that come back as an exit(0) - or it could be BOINC's own stop/start.
jacklass1, if you look at your Computing preferences page, what does it say for the very bottom item in the first section: Use at most (Can be used to reduce CPU heat)?
If it's anything less than 100 percent of CPU time, try turning it up to 100 and see if that makes any difference.
And yet exit(0) ought mean a
)
And yet exit(0) ought mean a happy exit .... or is the general 'non-zero values are true' boolean rule equating to a 'false' message here? Are we sure of the exit(0) semantics in this case? Maybe it's just too many exits per se, regardless of the return code.
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal