I've taken on the job of running the queries and trying to keep participants informed.
I don't get many invalid results so this one made me curious.
Usually my invalid results are with BRP4 units when 2 CPU wing men gang up on my GPU. This time however there are 2 nvidia GPUs (newer and shinier, but still essentially the same) that have ganged up on my poor GPU. The wing men even run windows, same as me.
Any chance to get to know why my GPU failed this one?
I've taken on the job of running the queries and trying to keep participants informed.
Any chance to get to know why my GPU failed this one?
Unfortunately, no. This thread is about 'Validate errors' which are situations where the validator can determine that there is something wrong with a particular result prior to doing any comparisons with a second result. In those cases, the validator will document the problem that caused it to reject the result.
In your case, your result passed the validator's 'sanity check' and went on to be compared with other results. There is nothing available to me to indicate why your result failed the comparison with the other two. For some unknown reason, it just wasn't 'close enough'.
I had a look through all the results for your host that are in the online database. There is currently just one 'validate error' and one 'invalid', the latter being the one you posted about. That's a pretty low error rate and probably pretty much in line with the overall error rate for GPU tasks that would be expected for the project as a whole. In other words, it's probably just 'one of those little mysteries of life, the universe, and everything ... ' :-).
If you started getting quite a few in quick succession, you would then need to pursue the matter further.
I started this thread more than a year ago, not long after the FGRP1 run had started, to draw attention to the steady stream of validate errors that was plaguing the owners of Linux and MacOSX hosts when doing FGRP1 tasks.
The problem was eventually solved when a bug in the FGRP1 app was finally found by Heinz Bernd (Bikeman). A fixed app was released and (at least for the FGRP1 run) the pesky validate errors pretty much became a thing of the past.
People doing BRP4 tasks also see these and in that case there must be other explanations - the BRP4 app would be very unlikely to have the same sort of bug that affected FGRP1.
Quite a while ago, Bernd said that most validate errors (in general) were a symptom of hardware problems on particular hosts. People often are unwilling to accept that their host might have a hardware problem and so tend to blame things like bugs in the app or bad data, etc. I was also tempted to think that way earlier today when I noticed that one of my hosts had a surprisingly low RAC. The computer was running, seemingly OK, and processing tasks normally but a very big percentage of them were ending up as validate errors. Of those that went past this sanity check, most were ending up as invalid and the rest were validating.
Applying a bit of logic, the first thing I considered was temperature. Today is a very hot summers day and the machine is not in aircon. I ruled out temperature because the problem didn't start today and the preceding days had been relatively cool. The machine had actually been cleaned and rebuilt about 3 weeks ago. The other bit of evidence was that the machine seemed to be running quite happily - no lockups, crashes or other strange behaviour. In my experience, things crash when the temp is too high.
The machine is a Pentium dual core E6300 2.8GHz overclocked to 3.4GHz. I have a few of these and I know from experience that these will run all day at 3.5GHz with all voltages on auto. So 3.4GHz should be a safe overclock. I figured the problem had to be RAM.
I checked the BIOS settings. I'm using DDR3 RAM 2x2GB 1333MHz, and, because of the overclock, I had chosen a lower mem speed setting (1066MHz) which when combined with the higher bus speed was actually giving a mem speed slightly below 1333MHz. The timings were on auto and the budget boards I use tend not to have decent timings settings anyway. I noticed that this particular board did have some basic settings so I took them off auto and set some values suitable for 1333MHz RAM. I figured that the auto settings might actually be causing the more aggressive timings for 1066MHz to be used.
This seems to have completely solved the problem. 12 tasks have completed since I relaxed the timings and the result is 10 validated and 2 pending. No more validate errors - for the time being at least :-).
I'm posting this little saga just to remind people that there are lots of little hardware details that need to be checked if a machine starts producing bad results, before you start blaming the app or the data or anything else project related.
http://einsteinathome.org/workunit/144505444 is the only 1 I have so far and it looks like there is 4 of us that failed so far and the other 2 is still working on it... GPU is not overclocked and running windows 7 x64 for me
PC setup MSI-970A-G46 AMD FX-8350 8 core OC'd 4.45GHz 16GB ram PC3-10700 Geforce GTX 650Ti Windows 7 x64 Einstein@Home
After weeks of crunching without errors or invalids I found 4 of invalids today. http://einsteinathome.org/workunit/144507334
The status of this wu says: 2 returned wu's are marked as invalid, one returned is marked as in progress and one is marked as waiting for validation.
So my question is: with one waiting, how can two others be marked as invalid?
After weeks of crunching without errors or invalids I found 4 of invalids today. http://einsteinathome.org/workunit/144507334
The status of this wu says: 2 returned wu's are marked as invalid, one returned is marked as in progress and one is marked as waiting for validation.
So my question is: with one waiting, how can two others be marked as invalid?
My guess is both did not match but not for sure...
PC setup MSI-970A-G46 AMD FX-8350 8 core OC'd 4.45GHz 16GB ram PC3-10700 Geforce GTX 650Ti Windows 7 x64 Einstein@Home
After weeks of crunching without errors or invalids I found 4 of invalids today. http://einsteinathome.org/workunit/144507334
The status of this wu says: 2 returned wu's are marked as invalid, one returned is marked as in progress and one is marked as waiting for validation.
So my question is: with one waiting, how can two others be marked as invalid?
Because the term "Validate Error" is given to a task that don't pass the first sanity check the validator does. So the task is not even compared to another.
The sanity check is performed to make sure the result has a chance to pass validation and don't contain data that's just garbage.
Two tasks that passed the sanity check and then don't match are given the status "Invalid".
RE: :-( still makes
)
Is this one of your tasks ?
Looks ok to me, there are several like it.
AgentB wrote:Is this one of
)
according to the validator it wont be though.
That specific task is still waiting on the wingmate to report before it too will likely be rejected before being compared to the other result.
RE: I've taken on the job
)
I don't get many invalid results so this one made me curious.
Usually my invalid results are with BRP4 units when 2 CPU wing men gang up on my GPU. This time however there are 2 nvidia GPUs (newer and shinier, but still essentially the same) that have ganged up on my poor GPU. The wing men even run windows, same as me.
Any chance to get to know why my GPU failed this one?
RE: RE: I've taken on the
)
Unfortunately, no. This thread is about 'Validate errors' which are situations where the validator can determine that there is something wrong with a particular result prior to doing any comparisons with a second result. In those cases, the validator will document the problem that caused it to reject the result.
In your case, your result passed the validator's 'sanity check' and went on to be compared with other results. There is nothing available to me to indicate why your result failed the comparison with the other two. For some unknown reason, it just wasn't 'close enough'.
I had a look through all the results for your host that are in the online database. There is currently just one 'validate error' and one 'invalid', the latter being the one you posted about. That's a pretty low error rate and probably pretty much in line with the overall error rate for GPU tasks that would be expected for the project as a whole. In other words, it's probably just 'one of those little mysteries of life, the universe, and everything ... ' :-).
If you started getting quite a few in quick succession, you would then need to pursue the matter further.
Cheers,
Gary.
RE: If you started getting
)
As I wrote, just curious :)
Thanks for the reply
I started this thread more
)
I started this thread more than a year ago, not long after the FGRP1 run had started, to draw attention to the steady stream of validate errors that was plaguing the owners of Linux and MacOSX hosts when doing FGRP1 tasks.
The problem was eventually solved when a bug in the FGRP1 app was finally found by Heinz Bernd (Bikeman). A fixed app was released and (at least for the FGRP1 run) the pesky validate errors pretty much became a thing of the past.
People doing BRP4 tasks also see these and in that case there must be other explanations - the BRP4 app would be very unlikely to have the same sort of bug that affected FGRP1.
Quite a while ago, Bernd said that most validate errors (in general) were a symptom of hardware problems on particular hosts. People often are unwilling to accept that their host might have a hardware problem and so tend to blame things like bugs in the app or bad data, etc. I was also tempted to think that way earlier today when I noticed that one of my hosts had a surprisingly low RAC. The computer was running, seemingly OK, and processing tasks normally but a very big percentage of them were ending up as validate errors. Of those that went past this sanity check, most were ending up as invalid and the rest were validating.
Applying a bit of logic, the first thing I considered was temperature. Today is a very hot summers day and the machine is not in aircon. I ruled out temperature because the problem didn't start today and the preceding days had been relatively cool. The machine had actually been cleaned and rebuilt about 3 weeks ago. The other bit of evidence was that the machine seemed to be running quite happily - no lockups, crashes or other strange behaviour. In my experience, things crash when the temp is too high.
The machine is a Pentium dual core E6300 2.8GHz overclocked to 3.4GHz. I have a few of these and I know from experience that these will run all day at 3.5GHz with all voltages on auto. So 3.4GHz should be a safe overclock. I figured the problem had to be RAM.
I checked the BIOS settings. I'm using DDR3 RAM 2x2GB 1333MHz, and, because of the overclock, I had chosen a lower mem speed setting (1066MHz) which when combined with the higher bus speed was actually giving a mem speed slightly below 1333MHz. The timings were on auto and the budget boards I use tend not to have decent timings settings anyway. I noticed that this particular board did have some basic settings so I took them off auto and set some values suitable for 1333MHz RAM. I figured that the auto settings might actually be causing the more aggressive timings for 1066MHz to be used.
This seems to have completely solved the problem. 12 tasks have completed since I relaxed the timings and the result is 10 validated and 2 pending. No more validate errors - for the time being at least :-).
I'm posting this little saga just to remind people that there are lots of little hardware details that need to be checked if a machine starts producing bad results, before you start blaming the app or the data or anything else project related.
Cheers,
Gary.
http://einstein.phys.uwm.edu/
)
http://einsteinathome.org/workunit/144505444 is the only 1 I have so far and it looks like there is 4 of us that failed so far and the other 2 is still working on it... GPU is not overclocked and running windows 7 x64 for me
PC setup MSI-970A-G46 AMD FX-8350 8 core OC'd 4.45GHz 16GB ram PC3-10700 Geforce GTX 650Ti Windows 7 x64 Einstein@Home
After weeks of crunching
)
After weeks of crunching without errors or invalids I found 4 of invalids today.
http://einsteinathome.org/workunit/144507334
The status of this wu says: 2 returned wu's are marked as invalid, one returned is marked as in progress and one is marked as waiting for validation.
So my question is: with one waiting, how can two others be marked as invalid?
RE: After weeks of
)
My guess is both did not match but not for sure...
PC setup MSI-970A-G46 AMD FX-8350 8 core OC'd 4.45GHz 16GB ram PC3-10700 Geforce GTX 650Ti Windows 7 x64 Einstein@Home
RE: After weeks of
)
Because the term "Validate Error" is given to a task that don't pass the first sanity check the validator does. So the task is not even compared to another.
The sanity check is performed to make sure the result has a chance to pass validation and don't contain data that's just garbage.
Two tasks that passed the sanity check and then don't match are given the status "Invalid".