23 Validate Errors in last 2 days -- New O/S installation on a refurbished Mobo

CElliott
CElliott
Joined: 9 Feb 05
Posts: 28
Credit: 1046853711
RAC: 1792655
Topic 198430

In the last 2 days computer 12193582 has had 23 GPU WUs with validate errors. It is mostly the WUs from one video card that are failing. I can't find anything wrong with the computer setup or the StdErr file. Could some look and see why all these WUs won't validate?

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

23 Validate Errors in last 2 days -- New O/S installation on a r

You're getting "Validate errors" on all the failed tasks.
When a result is uploaded to the servers the validator does a preliminary check on the file to make sure it's formatted in the right way and contains valid data, if not it throws a validate error.
Usually a "validate error" is a good indicator of a problem with the hardware or the settings for the affected machine.

First of all do a reboot of the machine, preferably powering down to completely clear the RAM and GPU-RAM. Then check temps and voltages under load to make sure they are within specs. If overclocked go back to stock settings. If running multiple GPU units at once go back to one at a time.

For further help post more info about temps, voltages, frequencies, number of tasks run at a time and so on.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 118371278849
RAC: 25615764

RE: ... It is mostly the

Quote:
... It is mostly the WUs from one video card that are failing.


I had a quick look at some of the stderr.txt outputs for some of these errors. There's a line that gives device number and I saw regular errors for both device #0 and #1 but not #2. I looked at some valid results and saw examples of all three devices giving acceptable answers.

When you say "refurbished" what does that actually mean? If you have replaced failed components (eg swollen capacitors) maybe you have missed one. If it's just a 'cleanup and dustoff', I wouldn't imagine that would be of concern.

Quote:
Could some look and see why all these WUs won't validate?


That's not really possible since, as Holmis explains, all you would see is something that doesn't comply with the proper format of a result that is able to be tested against another properly formatted result. There's no way to diagnose why your result can't pass a basic sanity check. All that can really be said is that it's most likely there is something not quite right with your machine. Something is a little outside its comfort zone.

The classic things to check are operating frequencies, temperatures, sufficiency and quality of power and quality of RAM. Swapping components (if possible) is a good way to narrow things down. Since you are running three similar GPUs, try removing one temporarily and see if it makes any difference.

Good luck with hunting the problem down.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.