Plugged in a CUDA card a few days ago and get a lot of validation errors. Yesterday 11 WUs invalid and only 3 valid. :(
Screensaver and everything else is off. System is OpenSuse 11.3 64 Bit. Invalid tasks.
[pre]==============NVSMI LOG==============
Timestamp : Sat Feb 12 10:38:06 2011
Driver Version : 260.19.36
GPU 0:
Product Name : GeForce GTX 460
PCI Device/Vendor ID : e2210de
PCI Location ID : 0:4:0
Board Serial : 629154929
Display : Connected
Temperature : 46 C
Fan Speed : 40%
Utilization
GPU : 52%
Memory : 14%[/pre]
Any ideas what could be the reason?
Copyright © 2024 Einstein@Home. All rights reserved.
Linux CUDA validation errors
)
Validate errors usually are server side problems, nothing wrong on your machine(s).
Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)
So there is hope that this
)
So there is hope that this problem will be solved soon?
I'd really appreciate. :)
All your Cuda tasks are
)
All your Cuda tasks are getting reported straight after the result has been uploaded,
which is too soon after upload, try and find a way of getting Boinc to report them later,
or set NNT until you have a batch of them to report,
Claggy
Hi! There are two ways
)
Hi!
There are two ways that results can end up as being invalid:
1) after being sent to the server, they go straight to invalid (validation error). This means that the server rejects the result just by looking at the individual file (e.g. the file is corrupted as a whole or some values in the file fail a basic sanity range check).
2) the result survives the initial sanity check but fails to agree with the result of another cruncher ("wingman"). This is the "inconclusive validation" scenario. After one or more additional results come in to finally reach the quorum of two agreeing results, all other not-matching results become "invalid".
This second case can happen because of cross-platform validation problems (CPU calculates a bit differently than GPU), see this discussion http://einsteinathome.org/node/195567&nowrap=true#109649.
But I think your results were failing as in the first scenario? That would indicate that the results are a complete mess.
I had a similar string of "bad" results recently, and after a reboot of the machine, all was back to normal with just a few results failing validation as in scenario 2).
I have no idea why it was ok after the reboot or even if the reboot had anything to do with it, it could be that there was a sequence of tasks that somehow are more sensitive to cross-validation problems than others and the reboot just coincided with the end of these results.
So my advise would be to
-reboot,
-watch the temperature of the GPU (if it's always like the one in your thread starting message, it's ok of course. Actually I think it's surprisingingly low for a 50% loaded card and fan at 40%??)
-maybe upgrade the driver. NVIDIA' new 270.* Linux driver fixes a certain problem that kept the E@H app from yielding the CPU to other tasks during GPU computations, and Oliver plans to release a beta-app that would no longer need one full CPU core anymore. So you will want to go to 270 driver anyway I guess.
Happy crunching
HB
Thanks for the help @you all.
)
Thanks for the help @you all.
The temperature is always in the range of 47° to 48° C and the box doesn't run 24/7. It's a Gigabyte card with two fans and a bit oc'ed by Gigabyte. I will try the new driver and see if it helps. My results are always invalid as soon as they arrive. Never saw a result 'inconclusive' only the wingman ones when mine was invalid. I wait for a wingman atm that is running Linux too and see what is happening. Today I got 3 valid and 1 invalid result so far, the rest is pending. So there is hope they will not all error out. ;)
I deleted 'return_results_immediately' in my cc_config.xml but the first result reported was invalid.
In some days I will check what happens when I boot Win 7. This might tell something about the Linux driver.
RE: Thanks for the help
)
Did you do a 'Read config file' after deleting 'return_results_immediately' from your cc_config.xml?
Claggy
Sure :) I restarted
)
Sure :)
I restarted BOINC.
3 invalid so far. 3 valid, rest pending.
Tested CUDA with Win 7 today
)
Tested CUDA with Win 7 today and didn't get any invalid result so far. Last days the success quote with Linux was really frustrating. So two facts arise:
1) My card is ok.
2) The Linux app is buggy.
I will keep on testing with Win 7 the next days. Started to run 2 tasks parallel today without any problems. Linux is my main system so I might be forced to run the Win app in a VM like some years ago.
RE: So two facts arise: 1)
)
I can't confirm that, I'm running ubuntu 10.04, drivers 270.18, without any mayor problems besides the usual validation problems between Cuda and non-Cuda for very few WUs.
Grüße vom Sänger
When I start Linux again to
)
When I start Linux again to finish my tasks, I will give that 27.xx beta driver a chance. Atm I get invalid results with SSE2 and Windows CUDA wingman. So almost no chance to get valid results.