> On this particular wu, what would have happened had the 2
>linux systems returned their results first? It hardly seems
>that simply crossing the finish line first should either
>validate or invalidate science.
In this case the two Linux systems reported first and effectively locked out the Windows boxes - by the look of it anyway.
I wonder what it is about the results that causes this problem?
> All that said, I'm really not overly concerned with points,
>but I really hate it when my contribution is useless.
I'm with you on that. I don't think our contributions are completely useless in cases like this, but it is somewhat frustrating - seems like such a waste of time, IYKWIM?
> I guess it's time for developers to say something here. Our work (and CPU
> power and energy power) is wasted this way.
>
developers please do something. this is an obvious bug.
> > I guess it's time for developers to say something here. Our work (and
> CPU
> > power and energy power) is wasted this way.
> >
> developers please do something. this is an obvious bug.
We are looking into this: it appears that our validator may be setting the agreement threshold slightly too tight in some cases. This is hard to 'tune in advance' without having access to the actual results. So please be patient: one of our developers is now working on this.
> I examined some result from the Merlin cluster running linux;
>
> I found this one very ???
> machine ID 3701 WU 1217945
>
> It reported first but with different checksums with the 2 other WinXP's which
> had identical ones.
>
> Found on the other hand also late reports for a Merlin machine but was
> credited also together with XP's.
>
> ????????????????
>
> Sorry, work ID = 342166 (and granted credit)
>
> We are looking into this: it appears that our validator may be setting the
> agreement threshold slightly too tight in some cases. This is hard to 'tune
> in advance' without having access to the actual results. So please be
> patient: one of our developers is now working on this.
thanks in advance.
I'll be patient.
But hard to understand for an newbie like me. two computers running the same software - receive the same input - do the same computational stuff - should get the same results - apply the same checksum algorithm on these results - why can there be a difference at all?
Bruce Allen, note that this could be bigger problem than simply tuning the validator.
The history shows, that there were cases where first two windows machines returned similar results, and they were ok as should be. Then two linux machines returned identical results (hence different from windows machines) and they were marked invalid.
On some other thread i read, that core has different version on linux than windows (4.80 vs 4.79).
Maybe they just do different computation? Maybe difference is too big?
>
> > We are looking into this: it appears that our validator may be setting
> the
> > agreement threshold slightly too tight in some cases. This is hard to
> 'tune
> > in advance' without having access to the actual results. So please be
> > patient: one of our developers is now working on this.
>
> thanks in advance.
> I'll be patient.
>
> But hard to understand for an newbie like me. two computers running the same
> software - receive the same input - do the same computational stuff - should
> get the same results - apply the same checksum algorithm on these results -
> why can there be a difference at all?
>
> Peter
>
In two words: rounding errors.
The best example of this is 1/2=4.999999999 with as many 9's as your default floating type has significant digits. Different OS and CPUs handle this differently. After doing this reapeatedly and using the output for input on the next calculation the differences can become significant.
Not sure what calculator you used to come to 1/2=4.9999999999999 John, but I would throw it in the bin and use some old fashioned paper and a pencil. ;)
Hi, > On this particular
)
Hi,
> On this particular wu, what would have happened had the 2
>linux systems returned their results first? It hardly seems
>that simply crossing the finish line first should either
>validate or invalidate science.
Actually, take a look at this WU: #366758.
In this case the two Linux systems reported first and effectively locked out the Windows boxes - by the look of it anyway.
I wonder what it is about the results that causes this problem?
> All that said, I'm really not overly concerned with points,
>but I really hate it when my contribution is useless.
I'm with you on that. I don't think our contributions are completely useless in cases like this, but it is somewhat frustrating - seems like such a waste of time, IYKWIM?
TTFN - Pete.
Well i guess there are some
)
Well i guess there are some differences in computation between linux and windows versions.
I guess it's time for developers to say something here. Our work (and CPU power and energy power) is wasted this way.
> I guess it's time for
)
> I guess it's time for developers to say something here. Our work (and CPU
> power and energy power) is wasted this way.
>
developers please do something. this is an obvious bug.
I examined some result from
)
I examined some result from the Merlin cluster running linux;
I found this one very ???
machine ID 3701 WU 1217945
It reported first but with different checksums with the 2 other WinXP's which had identical ones.
Found on the other hand also late reports for a Merlin machine but was credited also together with XP's.
????????????????
John,
> > I guess it's time for
)
> > I guess it's time for developers to say something here. Our work (and
> CPU
> > power and energy power) is wasted this way.
> >
> developers please do something. this is an obvious bug.
We are looking into this: it appears that our validator may be setting the agreement threshold slightly too tight in some cases. This is hard to 'tune in advance' without having access to the actual results. So please be patient: one of our developers is now working on this.
Bruce
Director, Einstein@Home
> I examined some result from
)
> I examined some result from the Merlin cluster running linux;
>
> I found this one very ???
> machine ID 3701 WU 1217945
>
> It reported first but with different checksums with the 2 other WinXP's which
> had identical ones.
>
> Found on the other hand also late reports for a Merlin machine but was
> credited also together with XP's.
>
> ????????????????
>
> Sorry, work ID = 342166 (and granted credit)
>
John,
> We are looking into this:
)
> We are looking into this: it appears that our validator may be setting the
> agreement threshold slightly too tight in some cases. This is hard to 'tune
> in advance' without having access to the actual results. So please be
> patient: one of our developers is now working on this.
thanks in advance.
I'll be patient.
But hard to understand for an newbie like me. two computers running the same software - receive the same input - do the same computational stuff - should get the same results - apply the same checksum algorithm on these results - why can there be a difference at all?
Peter
Bruce Allen, note that this
)
Bruce Allen, note that this could be bigger problem than simply tuning the validator.
The history shows, that there were cases where first two windows machines returned similar results, and they were ok as should be. Then two linux machines returned identical results (hence different from windows machines) and they were marked invalid.
On some other thread i read, that core has different version on linux than windows (4.80 vs 4.79).
Maybe they just do different computation? Maybe difference is too big?
Greetings from Poland!
> > > We are looking into
)
>
> > We are looking into this: it appears that our validator may be setting
> the
> > agreement threshold slightly too tight in some cases. This is hard to
> 'tune
> > in advance' without having access to the actual results. So please be
> > patient: one of our developers is now working on this.
>
> thanks in advance.
> I'll be patient.
>
> But hard to understand for an newbie like me. two computers running the same
> software - receive the same input - do the same computational stuff - should
> get the same results - apply the same checksum algorithm on these results -
> why can there be a difference at all?
>
> Peter
>
In two words: rounding errors.
The best example of this is 1/2=4.999999999 with as many 9's as your default floating type has significant digits. Different OS and CPUs handle this differently. After doing this reapeatedly and using the output for input on the next calculation the differences can become significant.
BOINC WIKI
BOINCing since 2002/12/8
Not sure what calculator you
)
Not sure what calculator you used to come to 1/2=4.9999999999999 John, but I would throw it in the bin and use some old fashioned paper and a pencil. ;)