Credit granted for client error and zero work seconds??? Why?

Brian D from Georgia

Joined: 17 Apr 05

Posts: 4

Credit: 369354

RAC: 0

13 Dec 2005 14:34:24 UTC

Topic 190364

(moderation:

)

Interesting result here that dragged down my granted credit due to a zero score being used in the formula. Here is the work unit link:
2848118

Granted credit should have been 117.71, not 43.92. That is a significant difference. Specific error message on the zero time work unit shows:

stderr out

5.2.7

Can't set up shared mem: -1
: bytecount 1736125 checksum 83163978
: bytecount 1988941 checksum 95315179

Is this a logic fault in the BOINC client that would give credit on an errored result?

Michael Roycraft

Joined: 10 Mar 05

Posts: 846

Credit: 157718

RAC: 0

Credit granted for client error and zero work seconds??? Why?

13 Dec 2005 14:54:54 UTC

Message 21376

(moderation:

)

Quote:

Interesting result here that dragged down my granted credit due to a zero score being used in the formula. Here is the work unit link:
2848118

Granted credit should have been 117.71, not 43.92. That is a significant difference. Specific error message on the zero time work unit shows:

stderr out

5.2.7

Can't set up shared mem: -1
: bytecount 1736125 checksum 83163978
: bytecount 1988941 checksum 95315179

Is this a logic fault in the BOINC client that would give credit on an errored result?

Brian,

I think this is an example of a validator error - very rare, but it does happen occasionally (I think Dr. Allen said something about either 0.1% or 0.01%).

Good thing that optimized client on your machines helps make up for that, huh? :-)

Regards,

Michael

microcraft
"The arc of history is long, but it bends toward justice" - MLK

Bruce Allen

Moderator

Joined: 15 Oct 04

Posts: 1119

Credit: 172127663

RAC: 0

Not a validator error. The

15 Dec 2005 22:12:26 UTC

Message 21377

(moderation:

)

Not a validator error.

The validator ran on the first three results that arrived. One was invalid. The other two results were valid. Credit assigned to canonical result was the smaller of the credits claimed by the two valid results (43). Then, when the fourth result arrived, it was found valid and given the same credit (43) as the previous two valid results.

[Edited one day later]
As has been pointed out, indeed none of the results was 'invalid'. However due to a BOINC client bug, it claimed zero credit. The remainder of what I have written above is correct. The validator did its job correctly.

Director, Einstein@Home

Michael Roycraft

Joined: 10 Mar 05

Posts: 846

Credit: 157718

RAC: 0

RE: Not a validator

15 Dec 2005 22:27:08 UTC

Message 21378 in response to message 21377

(moderation:

)

Quote:

Not a validator error.

The validator ran on the first three results that arrived. One was invalid. The other two results were valid. Credit assigned to canonical result was the smaller of the credits claimed by the two valid results (43). Then, when the fourth result arrived, it was found valid and given the same credit (43) as the previous two valid results.

Dr. Allen,

Thank you for the explanation. Maybe I'm a bit thick. I still can't understand that WU. There was NO invalid result! By all appearances there should have been, because I can't seem to wrap my mind around how in heck a host returns a result with zero time, zero credit, and yet is declared "valid", unless there was maybe some error in reported time only, there seems to have been enough turn-around time on that host. Could that be the case? I'm curious about this.

Regards,

Michael

(edit for phrasing and clarity)

microcraft
"The arc of history is long, but it bends toward justice" - MLK

Walt Gribben

Joined: 20 Feb 05

Posts: 219

Credit: 1645393

RAC: 0

The zero credit comes from

15 Dec 2005 23:05:14 UTC

Message 21379

(moderation:

)

The zero credit comes from the zero CPU time. It doesn't mean that Einstein didn't spend any time crunching the workunit, it means that it wasn't able to report what it did use.

If you look at the result you'll see the message "Can't set up shared mem: -1". And the output goes on to show the bytecounts and checksums, which match the other results. So its apparent that it did do the work.

Michael Roycraft

Joined: 10 Mar 05

Posts: 846

Credit: 157718

RAC: 0

RE: The zero credit comes

15 Dec 2005 23:55:19 UTC

Message 21380 in response to message 21379

(moderation:

)

Quote:

The zero credit comes from the zero CPU time. It doesn't mean that Einstein didn't spend any time crunching the workunit, it means that it wasn't able to report what it did use.

If you look at the result you'll see the message "Can't set up shared mem: -1". And the output goes on to show the bytecounts and checksums, which match the other results. So its apparent that it did do the work.

Walt,

Thank you. It sounds just about as I'd imagined, faulty time reported.

microcraft
"The arc of history is long, but it bends toward justice" - MLK

Brian D from Georgia

Joined: 17 Apr 05

Posts: 4

Credit: 369354

RAC: 0

I understand everyone's

17 Dec 2005 18:04:04 UTC

Message 21381

(moderation:

)

I understand everyone's explanation on what caused the granted credit. The validator did its job since it had 3 "good" results to arrive at its average value and hence give credit. Common sense would tell me though that the zero credit work unit return should be thrown out even if the result is "valid". It makes no logical sense to me why that result would be used in the averaging. I think the client should not return a result as "valid" if there is an error involved regardless of what caused the error. Why would you want to use any part of an errored work unit? That just doesn't make sense to me.

Walt Gribben

Joined: 20 Feb 05

Posts: 219

Credit: 1645393

RAC: 0

RE: I understand everyone's

17 Dec 2005 19:32:40 UTC

Message 21382 in response to message 21381

(moderation:

)

Quote:

I understand everyone's explanation on what caused the granted credit. The validator did its job since it had 3 "good" results to arrive at its average value and hence give credit. Common sense would tell me though that the zero credit work unit return should be thrown out even if the result is "valid". It makes no logical sense to me why that result would be used in the averaging. I think the client should not return a result as "valid" if there is an error involved regardless of what caused the error. Why would you want to use any part of an errored work unit? That just doesn't make sense to me.

I think it would be very unfair if the result were tossed just because of the zero credit. Its not an "errored work unit".

There wasn't an error in crunching the workunit, it completed successfully and returned a valid result. The error was in communications beteen BOINC and the Einstein application. Which is completely separate from the crunching - two different threads in the process even. Its the piece that tells BOINC how much work is done so far and how much CPU time it used to get there.

Normally, BOINC would detect the lack of communication and restart the workunit. But for some reason that didn't happen, and the application went on and produced a result. But wasn't able to report the CPU time used.

Some things to think about:

If the validator didn't include "zero requested credit" results in calculating the "average", it would have two numbers to pick from. And that would still be the lower of the two remaining results, or 43.92.

If the host had been another Mac or any other host with low "requested credit" (like a Linux based system), it still would have granted around 40 credits.

Some hosts request high credit, is that a problem? Because of the disparity, the high and low numbers are tossed, which is what happened here. But for some workunits there aren't any Macs or Linux systems to drag the average down. Instead there are hosts that drag the average up. My view is this - its shows the system is working, sometimes you gain a little and sometimes you lose a little. And its the science thats important.

Walt

Pooh Bear 27

Joined: 20 Mar 05

Posts: 1376

Credit: 20312671

RAC: 0

There have been some

17 Dec 2005 20:19:27 UTC

Message 21383

(moderation:

)

There have been some optimized BOINC versions that sometimes show 0 Cobblestones and Wheatstones, accidentally. The data gets crunched correctly, but all the numbers * 0 = 0, no matter what. Why penalize them for doing the work? Yes less credit may be given because of it, but at least the people get their credit for the work done.

Brian D from Georgia

Joined: 17 Apr 05

Posts: 4

Credit: 369354

RAC: 0

To Pooh and Walt: Your

18 Dec 2005 4:54:23 UTC

Message 21384

(moderation:

)

To Pooh and Walt: Your explanations clear things up for me and I am now in agreement with your logic. I now understand how the situation developed. The host computer did return a valid result but the error caused a request for zero credit. Since E@H uses the average of 3 validated results, the high and low are tossed (just like Olympic diving to digress for a moment). It is fair for them to claim their share of the credit pie.

I agree that science is the important thing being done here with the credit just a flashy reward for our cobblestone ego's... Live long and crunch!

Credit granted for client error and zero work seconds??? Why?

Forums › Cafe Einstein

Comment viewing options

Forums › Cafe Einstein