Credit granted for client error and zero work seconds??? Why?

Brian D from Georgia
Brian D from Georgia
Joined: 17 Apr 05
Posts: 4
Credit: 369354
RAC: 0
Topic 190364

Interesting result here that dragged down my granted credit due to a zero score being used in the formula. Here is the work unit link:
2848118

Granted credit should have been 117.71, not 43.92. That is a significant difference. Specific error message on the zero time work unit shows:

stderr out

5.2.7

Can't set up shared mem: -1
: bytecount 1736125 checksum 83163978
: bytecount 1988941 checksum 95315179

Is this a logic fault in the BOINC client that would give credit on an errored result?

Michael Roycraft
Michael Roycraft
Joined: 10 Mar 05
Posts: 846
Credit: 157718
RAC: 0

Credit granted for client error and zero work seconds??? Why?

Quote:

Interesting result here that dragged down my granted credit due to a zero score being used in the formula. Here is the work unit link:
2848118

Granted credit should have been 117.71, not 43.92. That is a significant difference. Specific error message on the zero time work unit shows:

stderr out

5.2.7

Can't set up shared mem: -1
: bytecount 1736125 checksum 83163978
: bytecount 1988941 checksum 95315179

Is this a logic fault in the BOINC client that would give credit on an errored result?

Brian,

I think this is an example of a validator error - very rare, but it does happen occasionally (I think Dr. Allen said something about either 0.1% or 0.01%).

Good thing that optimized client on your machines helps make up for that, huh? :-)

Regards,

Michael

microcraft
"The arc of history is long, but it bends toward justice" - MLK

Bruce Allen
Bruce Allen
Moderator
Joined: 15 Oct 04
Posts: 1119
Credit: 172127663
RAC: 0

Not a validator error. The

Not a validator error.

The validator ran on the first three results that arrived. One was invalid. The other two results were valid. Credit assigned to canonical result was the smaller of the credits claimed by the two valid results (43). Then, when the fourth result arrived, it was found valid and given the same credit (43) as the previous two valid results.

[Edited one day later]
As has been pointed out, indeed none of the results was 'invalid'. However due to a BOINC client bug, it claimed zero credit. The remainder of what I have written above is correct. The validator did its job correctly.

Director, Einstein@Home

Michael Roycraft
Michael Roycraft
Joined: 10 Mar 05
Posts: 846
Credit: 157718
RAC: 0

RE: Not a validator

Message 21378 in response to message 21377

Quote:

Not a validator error.

The validator ran on the first three results that arrived. One was invalid. The other two results were valid. Credit assigned to canonical result was the smaller of the credits claimed by the two valid results (43). Then, when the fourth result arrived, it was found valid and given the same credit (43) as the previous two valid results.

Dr. Allen,

Thank you for the explanation. Maybe I'm a bit thick. I still can't understand that WU. There was NO invalid result! By all appearances there should have been, because I can't seem to wrap my mind around how in heck a host returns a result with zero time, zero credit, and yet is declared "valid", unless there was maybe some error in reported time only, there seems to have been enough turn-around time on that host. Could that be the case? I'm curious about this.

Regards,

Michael

(edit for phrasing and clarity)

microcraft
"The arc of history is long, but it bends toward justice" - MLK

Walt Gribben
Walt Gribben
Joined: 20 Feb 05
Posts: 219
Credit: 1645393
RAC: 0

The zero credit comes from

The zero credit comes from the zero CPU time. It doesn't mean that Einstein didn't spend any time crunching the workunit, it means that it wasn't able to report what it did use.

If you look at the result you'll see the message "Can't set up shared mem: -1". And the output goes on to show the bytecounts and checksums, which match the other results. So its apparent that it did do the work.

Michael Roycraft
Michael Roycraft
Joined: 10 Mar 05
Posts: 846
Credit: 157718
RAC: 0

RE: The zero credit comes

Message 21380 in response to message 21379

Quote:

The zero credit comes from the zero CPU time. It doesn't mean that Einstein didn't spend any time crunching the workunit, it means that it wasn't able to report what it did use.

If you look at the result you'll see the message "Can't set up shared mem: -1". And the output goes on to show the bytecounts and checksums, which match the other results. So its apparent that it did do the work.

Walt,

Thank you. It sounds just about as I'd imagined, faulty time reported.

microcraft
"The arc of history is long, but it bends toward justice" - MLK

Brian D from Georgia
Brian D from Georgia
Joined: 17 Apr 05
Posts: 4
Credit: 369354
RAC: 0

I understand everyone's

I understand everyone's explanation on what caused the granted credit. The validator did its job since it had 3 "good" results to arrive at its average value and hence give credit. Common sense would tell me though that the zero credit work unit return should be thrown out even if the result is "valid". It makes no logical sense to me why that result would be used in the averaging. I think the client should not return a result as "valid" if there is an error involved regardless of what caused the error. Why would you want to use any part of an errored work unit? That just doesn't make sense to me.

Walt Gribben
Walt Gribben
Joined: 20 Feb 05
Posts: 219
Credit: 1645393
RAC: 0

RE: I understand everyone's

Message 21382 in response to message 21381

Quote:
I understand everyone's explanation on what caused the granted credit. The validator did its job since it had 3 "good" results to arrive at its average value and hence give credit. Common sense would tell me though that the zero credit work unit return should be thrown out even if the result is "valid". It makes no logical sense to me why that result would be used in the averaging. I think the client should not return a result as "valid" if there is an error involved regardless of what caused the error. Why would you want to use any part of an errored work unit? That just doesn't make sense to me.

I think it would be very unfair if the result were tossed just because of the zero credit. Its not an "errored work unit".

There wasn't an error in crunching the workunit, it completed successfully and returned a valid result. The error was in communications beteen BOINC and the Einstein application. Which is completely separate from the crunching - two different threads in the process even. Its the piece that tells BOINC how much work is done so far and how much CPU time it used to get there.

Normally, BOINC would detect the lack of communication and restart the workunit. But for some reason that didn't happen, and the application went on and produced a result. But wasn't able to report the CPU time used.

Some things to think about:

If the validator didn't include "zero requested credit" results in calculating the "average", it would have two numbers to pick from. And that would still be the lower of the two remaining results, or 43.92.

If the host had been another Mac or any other host with low "requested credit" (like a Linux based system), it still would have granted around 40 credits.

Some hosts request high credit, is that a problem? Because of the disparity, the high and low numbers are tossed, which is what happened here. But for some workunits there aren't any Macs or Linux systems to drag the average down. Instead there are hosts that drag the average up. My view is this - its shows the system is working, sometimes you gain a little and sometimes you lose a little. And its the science thats important.

Walt

Pooh Bear 27
Pooh Bear 27
Joined: 20 Mar 05
Posts: 1376
Credit: 20312671
RAC: 0

There have been some

There have been some optimized BOINC versions that sometimes show 0 Cobblestones and Wheatstones, accidentally. The data gets crunched correctly, but all the numbers * 0 = 0, no matter what. Why penalize them for doing the work? Yes less credit may be given because of it, but at least the people get their credit for the work done.

Brian D from Georgia
Brian D from Georgia
Joined: 17 Apr 05
Posts: 4
Credit: 369354
RAC: 0

To Pooh and Walt: Your

To Pooh and Walt: Your explanations clear things up for me and I am now in agreement with your logic. I now understand how the situation developed. The host computer did return a valid result but the error caused a request for zero credit. Since E@H uses the average of 3 validated results, the high and low are tossed (just like Olympic diving to digress for a moment). It is fair for them to claim their share of the credit pie.

I agree that science is the important thing being done here with the credit just a flashy reward for our cobblestone ego's... Live long and crunch!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.