Validate error - What this really means!

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513211304
RAC: 0

RE: :-( still makes

Quote:

:-( still makes errors , maybe this card is somehow defective.
Any other known features that might be able/disabled to try to narrow this down?

Is this one of your tasks ?

Looks ok to me, there are several like it.

Tron
Tron
Joined: 5 Nov 12
Posts: 8
Credit: 49207
RAC: 0

AgentB wrote:Is this one of

AgentB wrote:

Is this one of your tasks ?

Looks ok to me, there are several like it.

according to the validator it wont be though.

That specific task is still waiting on the wingmate to report before it too will likely be rejected before being compared to the other result.

Logforme
Logforme
Joined: 13 Aug 10
Posts: 332
Credit: 1714373961
RAC: 0

RE: I've taken on the job

Quote:
I've taken on the job of running the queries and trying to keep participants informed.

I don't get many invalid results so this one made me curious.
Usually my invalid results are with BRP4 units when 2 CPU wing men gang up on my GPU. This time however there are 2 nvidia GPUs (newer and shinier, but still essentially the same) that have ganged up on my poor GPU. The wing men even run windows, same as me.

Any chance to get to know why my GPU failed this one?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5870
Credit: 116969901529
RAC: 36800182

RE: RE: I've taken on the

Quote:
Quote:
I've taken on the job of running the queries and trying to keep participants informed.

Any chance to get to know why my GPU failed this one?


Unfortunately, no. This thread is about 'Validate errors' which are situations where the validator can determine that there is something wrong with a particular result prior to doing any comparisons with a second result. In those cases, the validator will document the problem that caused it to reject the result.

In your case, your result passed the validator's 'sanity check' and went on to be compared with other results. There is nothing available to me to indicate why your result failed the comparison with the other two. For some unknown reason, it just wasn't 'close enough'.

I had a look through all the results for your host that are in the online database. There is currently just one 'validate error' and one 'invalid', the latter being the one you posted about. That's a pretty low error rate and probably pretty much in line with the overall error rate for GPU tasks that would be expected for the project as a whole. In other words, it's probably just 'one of those little mysteries of life, the universe, and everything ... ' :-).

If you started getting quite a few in quick succession, you would then need to pursue the matter further.

Cheers,
Gary.

Logforme
Logforme
Joined: 13 Aug 10
Posts: 332
Credit: 1714373961
RAC: 0

RE: If you started getting

Quote:
If you started getting quite a few in quick succession, you would then need to pursue the matter further.


As I wrote, just curious :)
Thanks for the reply

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5870
Credit: 116969901529
RAC: 36800182

I started this thread more

I started this thread more than a year ago, not long after the FGRP1 run had started, to draw attention to the steady stream of validate errors that was plaguing the owners of Linux and MacOSX hosts when doing FGRP1 tasks.

The problem was eventually solved when a bug in the FGRP1 app was finally found by Heinz Bernd (Bikeman). A fixed app was released and (at least for the FGRP1 run) the pesky validate errors pretty much became a thing of the past.

People doing BRP4 tasks also see these and in that case there must be other explanations - the BRP4 app would be very unlikely to have the same sort of bug that affected FGRP1.

Quite a while ago, Bernd said that most validate errors (in general) were a symptom of hardware problems on particular hosts. People often are unwilling to accept that their host might have a hardware problem and so tend to blame things like bugs in the app or bad data, etc. I was also tempted to think that way earlier today when I noticed that one of my hosts had a surprisingly low RAC. The computer was running, seemingly OK, and processing tasks normally but a very big percentage of them were ending up as validate errors. Of those that went past this sanity check, most were ending up as invalid and the rest were validating.

Applying a bit of logic, the first thing I considered was temperature. Today is a very hot summers day and the machine is not in aircon. I ruled out temperature because the problem didn't start today and the preceding days had been relatively cool. The machine had actually been cleaned and rebuilt about 3 weeks ago. The other bit of evidence was that the machine seemed to be running quite happily - no lockups, crashes or other strange behaviour. In my experience, things crash when the temp is too high.

The machine is a Pentium dual core E6300 2.8GHz overclocked to 3.4GHz. I have a few of these and I know from experience that these will run all day at 3.5GHz with all voltages on auto. So 3.4GHz should be a safe overclock. I figured the problem had to be RAM.

I checked the BIOS settings. I'm using DDR3 RAM 2x2GB 1333MHz, and, because of the overclock, I had chosen a lower mem speed setting (1066MHz) which when combined with the higher bus speed was actually giving a mem speed slightly below 1333MHz. The timings were on auto and the budget boards I use tend not to have decent timings settings anyway. I noticed that this particular board did have some basic settings so I took them off auto and set some values suitable for 1333MHz RAM. I figured that the auto settings might actually be causing the more aggressive timings for 1066MHz to be used.

This seems to have completely solved the problem. 12 tasks have completed since I relaxed the timings and the result is 10 validated and 2 pending. No more validate errors - for the time being at least :-).

I'm posting this little saga just to remind people that there are lots of little hardware details that need to be checked if a machine starts producing bad results, before you start blaming the app or the data or anything else project related.

Cheers,
Gary.

Nobody316
Nobody316
Joined: 14 Jan 13
Posts: 141
Credit: 2008126
RAC: 0

http://einstein.phys.uwm.edu/

http://einsteinathome.org/workunit/144505444 is the only 1 I have so far and it looks like there is 4 of us that failed so far and the other 2 is still working on it... GPU is not overclocked and running windows 7 x64 for me

PC setup MSI-970A-G46 AMD FX-8350 8 core OC'd 4.45GHz 16GB ram PC3-10700 Geforce GTX 650Ti Windows 7 x64 Einstein@Home

Alex
Alex
Joined: 1 Mar 05
Posts: 451
Credit: 504288296
RAC: 14477

After weeks of crunching

After weeks of crunching without errors or invalids I found 4 of invalids today.
http://einsteinathome.org/workunit/144507334
The status of this wu says: 2 returned wu's are marked as invalid, one returned is marked as in progress and one is marked as waiting for validation.

So my question is: with one waiting, how can two others be marked as invalid?

Nobody316
Nobody316
Joined: 14 Jan 13
Posts: 141
Credit: 2008126
RAC: 0

RE: After weeks of

Quote:

After weeks of crunching without errors or invalids I found 4 of invalids today.
http://einsteinathome.org/workunit/144507334
The status of this wu says: 2 returned wu's are marked as invalid, one returned is marked as in progress and one is marked as waiting for validation.

So my question is: with one waiting, how can two others be marked as invalid?

My guess is both did not match but not for sure...

PC setup MSI-970A-G46 AMD FX-8350 8 core OC'd 4.45GHz 16GB ram PC3-10700 Geforce GTX 650Ti Windows 7 x64 Einstein@Home

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

RE: After weeks of

Quote:

After weeks of crunching without errors or invalids I found 4 of invalids today.
http://einsteinathome.org/workunit/144507334
The status of this wu says: 2 returned wu's are marked as invalid, one returned is marked as in progress and one is marked as waiting for validation.

So my question is: with one waiting, how can two others be marked as invalid?

Because the term "Validate Error" is given to a task that don't pass the first sanity check the validator does. So the task is not even compared to another.
The sanity check is performed to make sure the result has a chance to pass validation and don't contain data that's just garbage.

Two tasks that passed the sanity check and then don't match are given the status "Invalid".

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.