Computation Error

Jaranth
Jaranth
Joined: 25 Jun 08
Posts: 2
Credit: 356443
RAC: 0
Topic 193865

Hi, just thought i'd report this, for whatever good it might do.

Computation error in Boink for:
Hierarchical all-sky pulsar search 6.04
h1_1092.85_S5R4__1058_S5R4a_1

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 0

Computation Error

Can you report it so we can see the stderr.txt?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5887
Credit: 119456912058
RAC: 25973233

By "report it" Jord means

By "report it" Jord means "hit the update button in BOINC Manager because at the moment the result has not been reported back to the server so we can't see any details yet about the error message that will have been sent when the result was uploaded."

Cheers,
Gary.

Jaranth
Jaranth
Joined: 25 Jun 08
Posts: 2
Credit: 356443
RAC: 0

RE: By "report it" Jord

Message 84696 in response to message 84695

Quote:
By "report it" Jord means "hit the update button in BOINC Manager because at the moment the result has not been reported back to the server so we can't see any details yet about the error message that will have been sent when the result was uploaded."

Thanks for the explanation... unfortunately, I'm a Boinc noob and didn't know that clicking update sent the damaged unit back to you, and now it's gone. :( Perhaps Boinc sent it back automatically?

Well... live and learn, I'll remember next time!

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 0

RE: Perhaps Boinc sent it

Message 84697 in response to message 84696

Quote:
Perhaps Boinc sent it back automatically?


BOINC reports automatically upon the next contact with the server.

Now the problem here is that the task has already been purged from the database, we can't check it in your list. So... let's wait for the next one.

When it happens again, please post the messages from your Messages tab in BOINC Manager, as we can learn some of that as well.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5887
Credit: 119456912058
RAC: 25973233

RE: RE: Perhaps Boinc

Message 84698 in response to message 84697

Quote:
Quote:
Perhaps Boinc sent it back automatically?

BOINC reports automatically upon the next contact with the server.

Now the problem here is that the task has already been purged from the database ...

No it hasn't. It's still there showing exit code -226 and the message "Too many harmless exits. Further down near the bottom of stderr.out you can see where the program was repeatedly starting and stopping unable to acquire a lock file.

Maybe the OP did some sort of cleanup in the BOINC folder and removed files that shouldn't be removed or perhaps something like an antivirus program did something that upset BOINC.

@Jaranth - returning of results is a two stage process. Firstly results are uploaded as soon as crunching of a task finishes. At a later stage (usually the next time BOINC needs to request work) the fact that the upload has occurred is "reported" to the server. This happens automatically but you can intervene manually (via the update button) when you need to report urgently - as in this case since we needed to see the error message.

Cheers,
Gary.

Byron S Goodgame
Byron S Goodgame
Joined: 16 Jan 06
Posts: 187
Credit: 56581
RAC: 0

Looks like similar messages I

Looks like similar messages I had just the other day when my WU did a client error on me. http://einsteinathome.org/task/104127185

It hasn't repeated itself yet
And here's the one Jaranth reported.

http://einsteinathome.org/task/103605812

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 0

RE: RE: RE: Perhaps

Message 84700 in response to message 84698

Quote:
Quote:
Quote:
Perhaps Boinc sent it back automatically?

BOINC reports automatically upon the next contact with the server.

Now the problem here is that the task has already been purged from the database ...

No it hasn't.


OK, I swear that at the time I looked he had 2 results there that were still busy. Neither showed as an error in any way. (Which is correct as I posted it at 21:10:03 UTC and it was reported at 21:13:28 UTC

Quote:
Maybe the OP did some sort of cleanup in the BOINC folder and removed files that shouldn't be removed or perhaps something like an antivirus program did something that upset BOINC.


No, then you run into a different error all together.
This is the infamous "exited with zero status" problem, where the checkpoint cannot be written. After 100 times of this happening, BOINC will now exit-kill the application. Go count the amount of times it happens in there, I bet it's 100. ;-)

This problem can happen when you have multiple CPUs/cores or use Hyperthreading and you allow BOINC to throttle the CPUs. It can be overcome by not using BOINC throttle set at 100%), or telling BOINC to use one CPU only. No, it's not fixed in BOINC 6, as it's a bug with the throttling, which is unchanged in 6.

Thunder
Thunder
Joined: 18 Jan 05
Posts: 138
Credit: 46754541
RAC: 0

Jord, Is there any chance

Jord,

Is there any chance that selecting "Leave applications in memory while suspended?" would make a difference with this problem?

I only ask because I only have used the CPU throttling on one machine, but I never had this problem (that I know of) and the only non-standard setting that I can think of was the "leave in memory" thing.

Of course I have no idea if the client actually exits the application at any point with throttling (though I tend to doubt it), so this may be completely unrelated and I was simply lucky in the past.

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 0

When BOINC uses the CPU

Message 84702 in response to message 84701

When BOINC uses the CPU throttling, it won't exit the applications in use, just pause them, run them, pause them, run them, etc. I don't think it can read the applications back into memory within the two seconds of pausing/running, it's not that fast.

Would the setting make a difference? I don't know, a group of people using the throttling option should test that.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.