The Clean Energy Project, a subproject of the World Community Grid (they're looking for next generation organic photovoltaics), is going essentially from a quorum of 2 to 1, with no redundancy checking for results from hosts that are deemed to be reliable, but with occasional random workunits selected for double checking. The thread is here https://secure.worldcommunitygrid.org/forums/wcg/viewthread_thread,31985 : unfortunately you need a WCG account to read it.
Is there a reason why a similar approach wouldn't work for Einstein@home or one of the pulsar-searching spinoff projects? It must be a very rare event that two reliable hosts report the same workunit as valid but the results are in fact different.
Copyright © 2024 Einstein@Home. All rights reserved.
Quorum of one?
)
AFAIK by now most projects use what is known as "adaptive replication", which would mean to accept results without further validation from "reliable" hosts.
For the GW search this was discussed in the LVC (LIGO-Virgo collaboration, the scientific community behind Einstein@home) at least two times that I remember, and each time strongly voted against.
The BRP search is doing a lot of computation on GPUs, which are numerically less reliable than e.g. CPUs (note that the invalid result rate is 20x as high as that of the other searches). As the results from Einstein@home are used directly for targeting re-observations, the requirements on the correctness of the results are somewhat higher.
Finally our youngest application for the FGRP search hasn't yet reached the reliability that we would dare to take the results without comparison.
BM
BM
RE: AFAIK by now most
)
Is this still not viable?
Are we still continuing with a high rate of invalids from GPU's?
RE: Are we still
)
I would say yes, very high at times.
See http://einstein6.aei.uni-hannover.de/EinsteinAtHome/download/BRP6-progress/ for one example. BRP4 is similar.
And for CPU dedicated
)
And for CPU dedicated searches? Like it is the case now for S6GW and FGRP?
Would it be possible?
RE: And for CPU dedicated
)
http://einstein.phys.uwm.edu/server_status.html
Certainly shows even for CPU the invalid rates are high, S6BucketFU1UB is around 20% which was higher than I expected.
Insofar as I can remember, I
)
Insofar as I can remember, I have never had a S6Bucket Follow-up #2 or a Parkes PMPS XT v1.52 (BRP6-cuda32-nv301) fail.
http://einsteinathome.org/host/11368189/tasks
http://einsteinathome.org/host/11671653/tasks
The possible exceptions might be if the machine crashes for other reasons, but that is quite rare now. So it would seem to me that some machines are more susceptible to failures than others.
In the case of GPUs, this is easy to understand: the gamers frequently overclock their cards and think that just because the games don't crash, they are good for Einstein. It happens on every project; they don't realize that scientific calculations are a different story, and it takes some education to get the majority up to speed.
In the case of CPUs, overclocking problems are also possible, but I would suspect it is more likely chip or OS incompatibilities. Some projects just prefer one type over another.
But isn't the real question whether the machines that DO complete get it right often enough? The outright failures are easy to spot and are eliminated anyway. If the successful runs are always "good" at the scientific level, then the scheme mentioned by svincent above should work here too. It might be worth a study to compare the machines giving successful results to see if the quorum is really necessary.
Therefore, some machines may be more reliable than others, and can be "trusted", if their results are good often enough as you define it. I believe that on CEP2, there are periodic re-evaluations too, to ensure that machines are still providing good results.