Three weeks ago I posted some info in the Unlucky Validation error thread concerning the rate of validation errors being experienced by my Linux boxes. Here is a quote:-
I have now had time to do a survey of a number of my Linux boxes to get an idea of how bad the validation problem really is (for me anyway). I've been through the results lists of about 15 linux boxes picked at random. I've examined 103 total results from those boxes and found 15 marked as "invalid" or "checked but no consensus yet" (which almost invariably eventually become invalid).
With the announcement of the new validator, I was interested in gathering some fresh numbers for the period immediately before the new validator was introduced. I examined a total of 113 results from 18 different boxes. Of these only 4 were either "invalid" or "checked but no consensus yet" compared with 15 from the last time I did this, as reported in the above quote.
The period examined covered the range from about June 20 to July 9. Interestingly not only was the rate of invalids much lower than in the earlier survey, but also there were zero invalids where the Windows box was running version 4.24 science app. The large majority of observed results were post the introduction of 4.24 so there seems to have been quite an improvement in validation performance just from the introduction of 4.24. There were approx 25 of my results where the windows app was 4.17 of which 4 were eventually marked invalid.
I realise that the sample size may not be statistically significant but the trend seems interesting and encouraging nevertheless.
This particular WU is one of the 4 potential invalids just mentioned and it is particularly interesting as it has not yet been subjected to final validation. My Linux result was originally paired with one being crunched by 4.17, giving the "no consensus" outcome. The result has been recently reissued to a box running 4.24 and has not yet been returned. When it does get returned it will be subjected to the new validator which will have to decide between Linux, Win 4.17 and Win 4.24. Normally you would put your money on the two Win boxes, but ...
My thinking is that if the validator is better now, perhaps all three will pass the test.
What do you think?? :).
Cheers,
Gary.
Copyright © 2024 Einstein@Home. All rights reserved.
Some Observations on Cross Platform Validation
)
Gary,
did you take into account that the servers delete invalid results much quicker now? In the past you could roll back your results for at least 4-6 weeks before they were deleted. Nowadays you only see the last 2-3 weeks (and many invalid results get deleted almost immediateley) which makes monitoring invalid results much more difficult. So, if you don't monitor your result pages at least twice a day you have a good chance to miss some invalid results, which will bias your stats towards an improving situation.
RE: did you take into
)
Actually, I don't think they do. All invalids go through the CBNC (checked but no consensus) stage before being declared invalid. This should almost inevitably delay proceedings while the decider result gets done. Sure, once the jury comes back then the result will get deleted quickly because it has already been hanging around as a pending for a relatively long period. I actually looked at quite a high number of pendings and was surprised to see so few CNBCs amongst them.
However you make a good point about possible shortcomings in the experimental technique :). I didn't spend much time on this so I didn't observe results throughout their life. I was just interested in getting a quick and dirty estimate of how things were going now compared to how they were the last time I looked. So I just used exactly the same technique as last time, expecting to get a similar picture to that of last time. I was surprised to see the apparent difference. My intention was to establish a baseline so that I could repeat the procedure in a week or two in order to gauge the performance of the new validator. I'll still do that but I think it might be difficult to see another similar improvement :).
Not at all. Potential invalids are very visible by looking at all the pendings and noting how many of them are "double pendings" :).
Cheers,
Gary.
Talking about "double
)
Talking about "double pendings", here is a further example of exactly the sort of thing I'm talking about. This is the exact reverse of the double pending example I gave in my original message. There my machine was the Linux box whereas this time it's the windows box. The Linux box is in the Merlin/Morgane dual cluster at AEI. I seem to be in rather top notch company :).
I'm guessing that these two pendings became CNBCs just before the new validator was installed, judging by the dates. They will hang about until the decider returns and at that stage I'll probably have to be paying attention if I want to see how the new validator handles it. Once again it would be nice to see all three get the nod :).
Cheers,
Gary.
RE: RE: did you take
)
Gary,
this is the problem: Not all pending work-units appear in the "Pending"-list. I currently see 2 work-units in my Pending-list but checking the Resultspage I see 5 work-units showing the status "pending".
Example: The following work-units do NOT appear in the pending list but do show as pending in the result pages (just the first 3). They also do NOT show the title "checked but no consensus":
http://einsteinathome.org/task/85345232
http://einsteinathome.org/task/85534343
http://einsteinathome.org/task/85493212
It is these work-units that disappear very quickly after the are signed "Invalid" and therefore are very hard to track.
RE: With the announcement
)
Gary, I think that a grave statistical barrier is the extreme non-randomness of one's selection of quorum partners. This of course arises from the method of result assignment. I notice this as major shifts in my pending situation, depending on whether my fastest machine has recently been in a pool with fast or slow responders. I imagine the validation error matter is similarly systematically biased. This would stop those of us with very small fleets from concluding much from our own results. Your 18 may be enough for your result to mean something.
My big hope is that Bernd has clearly indicated the validation problem is high concern, is being worked on, and that a new validator has been started. My smaller hope is that a portion of the problem that is stemming in some sense from "bad hosts" may get better as their owners take them off the project. This will help whether they are motivated by public virtue or by pique at lost credit.
RE: this is the problem:
)
I'm not 100% sure but I believe this is normal behaviour. I know that the results lists for individual computers are always up-to-date. However I believe that because it takes a lot of server resources to construct the full per participant pending list, that operation is done relatively infrequently. Therefore your full pending list is always going to be missing the most recent pendings.
In my case, with well in excess of 100 boxes running, I never consult my full pending list as it's just too large. As mentioned in my original post, I checked the results lists of 18 separate Linux boxes and examined every pending that I found in those lists. I could only find 4 out of 113 results that were either already invalid or CBNC which means that someone (most likely the Linux box) is going to miss out.
Here I'm not quite understanding you. You say they are not CBNCs but the first one definitely is. So it will become invalid for somebody. The second and third ones are not CBNCs simply because the validator has not looked at them yet as only one result has been returned in each case. There is no indication at this point that they will ever become CBNCs and hence eventually invalid for somebody.
When I did my survey, I deliberately excluded all pendings where only one result had been returned, simply because you can't predict the final outcome. So in my case I actually looked at quite a lot more than 113 in total.
Using the data you listed above, I've had a look at the full results lists of those three machine and have noted your continuing problem with invalids. Compared to you, I seem to be having a lucky run at the moment.
Cheers,
Gary.
RE: Gary, I think that a
)
Thanks for the comments. I think you have identified the significant factor that contributes to the systematic bias you refer to. That's why I kept adding extra hosts to the list I was surveying in order to try to cancel out this effect.
Hopefully the fact that there really is a "bad host syndrome" will become more evident to the owners of "bad hosts" when the "good hosts" stop being so regularly "invalidated". :).
Cheers,
Gary.
Keep in mind that a while
)
Keep in mind that a while back EAH rolled back from using the latest builds of the Server backend, when it turned out it was causing trouble with the MSD.
As a result I'm going to assume they don't have the fix which keeps CBNC's on the pendings summary page implimented. This can make it easy to miss some, since they don't hang around for too long as pending (or even on the host summary anymore) once the third one comes back. I think that was the point Martin was driving at in his comment about being careful to not miss them when looking for invalidation rates.
In fact, the only reason I noticed the quirk was I started doing detailed logging of results for my hosts awhile back and discovered the CBNC's weren't showing up when I went to reconcile the pendings for the hosts.
Interestingly, the fix they implimented (which is in place over at SAH currently) has a quirk. Normally there pendings are listed in ascending RID order by default, but once a result gets transitioned to CBNC, it ends up relisted after the ones marked pending (in ascending RID order).
Alinator
Regarding the frequency of observation to be reasonably sure of not missing them; given the increased runtimes of the results, once a day is still more than adequate for EAH, even with later model CPU's for the most part. SAH, with their ultra agressive purging is a different story though. ;-)
In the opening message of
)
In the opening message of this thread I wrote:-
Well, the decider for this particular WU has now been returned and the new validator has been called to do its thing. I'm very pleased to report that all three results have been declared valid and been assigned their full credit.
Congratulations to the team for this significant advance in the solving of the cross-platform validation issue.
Cheers,
Gary.
It may be just luck, but all
)
It may be just luck, but all my results on SETI, Einstein and QMC obtained with a PII running Linux have been validated. It may be a slow CPU, but its floating point arithmetics must be OK.
Tullio