Interesting result here that dragged down my granted credit due to a zero score being used in the formula. Here is the work unit link:
2848118
Granted credit should have been 117.71, not 43.92. That is a significant difference. Specific error message on the zero time work unit shows:
stderr out
5.2.7
Can't set up shared mem: -1
: bytecount 1736125 checksum 83163978
: bytecount 1988941 checksum 95315179
Is this a logic fault in the BOINC client that would give credit on an errored result?
Copyright © 2024 Einstein@Home. All rights reserved.
Credit granted for client error and zero work seconds??? Why?
)
Brian,
I think this is an example of a validator error - very rare, but it does happen occasionally (I think Dr. Allen said something about either 0.1% or 0.01%).
Good thing that optimized client on your machines helps make up for that, huh? :-)
Regards,
Michael
microcraft
"The arc of history is long, but it bends toward justice" - MLK
Not a validator error. The
)
Not a validator error.
The validator ran on the first three results that arrived. One was invalid. The other two results were valid. Credit assigned to canonical result was the smaller of the credits claimed by the two valid results (43). Then, when the fourth result arrived, it was found valid and given the same credit (43) as the previous two valid results.
[Edited one day later]
As has been pointed out, indeed none of the results was 'invalid'. However due to a BOINC client bug, it claimed zero credit. The remainder of what I have written above is correct. The validator did its job correctly.
Director, Einstein@Home
RE: Not a validator
)
Dr. Allen,
Thank you for the explanation. Maybe I'm a bit thick. I still can't understand that WU. There was NO invalid result! By all appearances there should have been, because I can't seem to wrap my mind around how in heck a host returns a result with zero time, zero credit, and yet is declared "valid", unless there was maybe some error in reported time only, there seems to have been enough turn-around time on that host. Could that be the case? I'm curious about this.
Regards,
Michael
(edit for phrasing and clarity)
microcraft
"The arc of history is long, but it bends toward justice" - MLK
The zero credit comes from
)
The zero credit comes from the zero CPU time. It doesn't mean that Einstein didn't spend any time crunching the workunit, it means that it wasn't able to report what it did use.
If you look at the result you'll see the message "Can't set up shared mem: -1". And the output goes on to show the bytecounts and checksums, which match the other results. So its apparent that it did do the work.
RE: The zero credit comes
)
Walt,
Thank you. It sounds just about as I'd imagined, faulty time reported.
microcraft
"The arc of history is long, but it bends toward justice" - MLK
I understand everyone's
)
I understand everyone's explanation on what caused the granted credit. The validator did its job since it had 3 "good" results to arrive at its average value and hence give credit. Common sense would tell me though that the zero credit work unit return should be thrown out even if the result is "valid". It makes no logical sense to me why that result would be used in the averaging. I think the client should not return a result as "valid" if there is an error involved regardless of what caused the error. Why would you want to use any part of an errored work unit? That just doesn't make sense to me.
RE: I understand everyone's
)
I think it would be very unfair if the result were tossed just because of the zero credit. Its not an "errored work unit".
There wasn't an error in crunching the workunit, it completed successfully and returned a valid result. The error was in communications beteen BOINC and the Einstein application. Which is completely separate from the crunching - two different threads in the process even. Its the piece that tells BOINC how much work is done so far and how much CPU time it used to get there.
Normally, BOINC would detect the lack of communication and restart the workunit. But for some reason that didn't happen, and the application went on and produced a result. But wasn't able to report the CPU time used.
Some things to think about:
If the validator didn't include "zero requested credit" results in calculating the "average", it would have two numbers to pick from. And that would still be the lower of the two remaining results, or 43.92.
If the host had been another Mac or any other host with low "requested credit" (like a Linux based system), it still would have granted around 40 credits.
Some hosts request high credit, is that a problem? Because of the disparity, the high and low numbers are tossed, which is what happened here. But for some workunits there aren't any Macs or Linux systems to drag the average down. Instead there are hosts that drag the average up. My view is this - its shows the system is working, sometimes you gain a little and sometimes you lose a little. And its the science thats important.
Walt
There have been some
)
There have been some optimized BOINC versions that sometimes show 0 Cobblestones and Wheatstones, accidentally. The data gets crunched correctly, but all the numbers * 0 = 0, no matter what. Why penalize them for doing the work? Yes less credit may be given because of it, but at least the people get their credit for the work done.
To Pooh and Walt: Your
)
To Pooh and Walt: Your explanations clear things up for me and I am now in agreement with your logic. I now understand how the situation developed. The host computer did return a valid result but the error caused a request for zero credit. Since E@H uses the average of 3 validated results, the high and low are tossed (just like Olympic diving to digress for a moment). It is fair for them to claim their share of the credit pie.
I agree that science is the important thing being done here with the credit just a flashy reward for our cobblestone ego's... Live long and crunch!