Hi, Im trying to figure out what is wrong with one of my GPUs thats is throwing several validate errors or invalids results... I know it's the GPU, Ive tested everything else, changed the PSU, changed the cables, changed the PCIe slots, changed drivers and etc. but this particular GPU keeps throwing erros (around a 25/30% of the crunched WUs and the ratio it is the same even underclocked).
Ive tested it with other software (from games to tech apps) and it works OK with all those, so I have no way to RMA it. The CPU is in this host: http://einsteinathome.org/host/4232972 and
the failling GPU is #0 since jun 24, it was #1 before.
I want to see if I can find a way to test the specific part of the GPU that is suspected of the failures... Can someone tell me what this error is and/or any insight on which component of the GPU I should suspect?
I've been getting quite a few validate errors since I started running this project on an old xeon computer. Here's a link to the computer info and a few examples.
I've been getting quite a few validate errors since I started running this project on an old xeon computer. Here's a link to the computer info and a few examples.
They all seem to be Gamma-ray pulsar search #1 v0.23 tasks with output stating "PRECISION ZERO_DIVIDE DENORMALIZED INVALID"
Have the same error couple of times.
Not sure what does it mean and how to overcome it.
I find it interesting that the original and secondary computer run win7 while mine runs linux. The task was originally marked no consensus but marked invalid when the resend was returned. No errors reported in stderr out.
Nothing travels faster than the speed of light with the possible exception of bad news, which obeys its own special laws.
Douglas Adams (1952 - 2001)
This isn't a 'validate error'. Before the validator actually compares two results, it does a preliminary check of each one to ensure the results don't contain gross errors like non-numeric data or values out of range, etc. If a result fails this preliminary check, it is permanently labeled as 'validate error' and no actual comparison with other results is done. Your result was given a clean bill of health in that regard. The problem arose at the next step when the comparison was done.
Quote:
I find it interesting that the original and secondary computer run win7 while mine runs linux. The task was originally marked no consensus but marked invalid when the resend was returned. No errors reported in stderr out.
If you look through all your validations for that host, I'm sure you will find plenty of Linux/Windows examples that didn't fail the comparison. Unfortunately, comparisons (even between similar OS results) do fail very occasionally but the rate is very low. By comparison, the incidence of true 'Validate Errors', particularly for hosts running Mac OSX, is very much higher. Some stats are in the opening posts of this thread.
I thought I'd mention that the validation strategy - as applied to S5 data - has been described/discussed in the LIGO S5 report. Look on page 10 under B. Validation of returned candidates which gives the full rationale.
What I found of especial interest was the criteria for rejection based upon the significance values ( denoted as CR1 and CR2 ). For example if you set CR1 = 100 then CR2 would have to be less than ~78.5 or greater than ~127.3 for that veto to trigger. As discussed that's quite a generous margin to 'forgive'.
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
Can't you just directly
)
Can't you just directly request the resultid?
http://einsteinathome.org/task/294264077 regardless of my computers being hidden? Replacing the id for each of the ones referenced in my post.
I guess I would have assumed that validator results/logs could be checked for those ids also.
Indeed, I missed that in your
)
Indeed, I missed that in your post.
The failure rate is too high to be just a statistical fluke, the card does give numerically wrong results.
Have you checked the cooling? Is it overclocked?
Cheers
HB
Hi, Im trying to figure out
)
Hi, Im trying to figure out what is wrong with one of my GPUs thats is throwing several validate errors or invalids results... I know it's the GPU, Ive tested everything else, changed the PSU, changed the cables, changed the PCIe slots, changed drivers and etc. but this particular GPU keeps throwing erros (around a 25/30% of the crunched WUs and the ratio it is the same even underclocked).
Ive tested it with other software (from games to tech apps) and it works OK with all those, so I have no way to RMA it. The CPU is in this host: http://einsteinathome.org/host/4232972 and
the failling GPU is #0 since jun 24, it was #1 before.
The outcome is: Validate error (58:00111010) for example: http://einsteinathome.org/task/294326368 but also it gives some succeesfull but invalids results.
I want to see if I can find a way to test the specific part of the GPU that is suspected of the failures... Can someone tell me what this error is and/or any insight on which component of the GPU I should suspect?
I've been getting quite a few
)
I've been getting quite a few validate errors since I started running this project on an old xeon computer. Here's a link to the computer info and a few examples.
http://einsteinathome.org/host/5407322
Task ID
294665306
294521044
293647466
They all seem to be Gamma-ray pulsar search #1 v0.23 tasks with output stating "PRECISION ZERO_DIVIDE DENORMALIZED INVALID"
Disabling gamma search to avoid wasted cpu time.
Nothing travels faster than the speed of light with the possible exception of bad news, which obeys its own special laws.
Douglas Adams (1952 - 2001)
Here is one that has had 4 so
)
Here is one that has had 4 so far.
http://einsteinathome.org/workunit/125707933
RE: I've been getting quite
)
Have the same error couple of times.
Not sure what does it mean and how to overcome it.
2 LAT validate errors in a
)
2 LAT validate errors in a row on my Linux box: But they seem OK:
FPU flags: PRECISION DENORMALIZED
Tullio
BRP validate error on task
)
BRP validate error on task number 300474836
I find it interesting that the original and secondary computer run win7 while mine runs linux. The task was originally marked no consensus but marked invalid when the resend was returned. No errors reported in stderr out.
Nothing travels faster than the speed of light with the possible exception of bad news, which obeys its own special laws.
Douglas Adams (1952 - 2001)
RE: BRP validate error on
)
This isn't a 'validate error'. Before the validator actually compares two results, it does a preliminary check of each one to ensure the results don't contain gross errors like non-numeric data or values out of range, etc. If a result fails this preliminary check, it is permanently labeled as 'validate error' and no actual comparison with other results is done. Your result was given a clean bill of health in that regard. The problem arose at the next step when the comparison was done.
If you look through all your validations for that host, I'm sure you will find plenty of Linux/Windows examples that didn't fail the comparison. Unfortunately, comparisons (even between similar OS results) do fail very occasionally but the rate is very low. By comparison, the incidence of true 'Validate Errors', particularly for hosts running Mac OSX, is very much higher. Some stats are in the opening posts of this thread.
Cheers,
Gary.
I thought I'd mention that
)
I thought I'd mention that the validation strategy - as applied to S5 data - has been described/discussed in the LIGO S5 report. Look on page 10 under B. Validation of returned candidates which gives the full rationale.
What I found of especial interest was the criteria for rejection based upon the significance values ( denoted as CR1 and CR2 ). For example if you set CR1 = 100 then CR2 would have to be less than ~78.5 or greater than ~127.3 for that veto to trigger. As discussed that's quite a generous margin to 'forgive'.
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal