Ni!
I have recently had a real problem with my results being invalid. In the last screen on my account summary I have had 304 Valid results and 208 Invalid. I understand errors, but that is almost 915,000 points lost. They are factory overclocked 980GTX x 2 and a 780TI running together. They had been running a higher daily total of 350-380,000 points per day but have recently dropped. I have not changed any settings. Is there a new project that utilizes resources differently?
I am running these at X3 per WU and the loads are 88%,89% and 91%. They had been running higher loads but I noticed recently the have dropped from 94%.
Any suggestions form the peanut gallery?
The only thing I can think of is the new Nvidia driver for Windows 10. I am running 352.84.
Is there anything to glean from the Stderr file of why they failed?
My computers are viewable form the main page.
http://einsteinathome.org/host/11747462/tasks&offset=0&show_names=1&state=0&appid=0
Thanks in advance for any insight...
KWSN-SpongeBob SquarePants
Brent
Copyright © 2024 Einstein@Home. All rights reserved.
Multi NVIDIA GPUs with almost half Invalids
)
Hey, I run 2 GTX970's here and run x3 tasks per GPU. I am almost certain that what you are experiencing is a driver related issue. I run 350.12 and get a fluctuation anywhere from 88% load on my cards to 95% load, depending on the mix of work units being processed.
From my observations, different drivers affect GPU computation loads differently, why you may ask? That I do not know. However, back when I ran 344.60 I had higher average GPU utilization and then subsequent driver releases made it fall from the ~94-96% GPU loads I saw under 344.60. It wasn't until 350.12 came around that I saw my work unit completion times average out to what I was seeing under 344.60 and that my GPU utilization had started averaging mid-low 90% as well.
Finally, I have found that stopping BOINC and restarting it can mess up the CUDA tasks on my 970's, and maybe a mix of this is what you are seeing on your PC?
Have you checked to see which cards are throwing out the invalids? in the stderr, you can see which device # that WU went to which would then represent a specific graphics card in your PC.
That may be helpful as the Maxwell 2 architecture cards (GTX9xx) seem to show strange behavior under BOINC at times.
Best of luck!
Here are my peanuts. Valid
)
Here are my peanuts.
Valid tasks seem to take longer (around 10k secs) while invalids are faster (around 6-7k). The invalid tasks seem to be produced by the 780Ti.
I have seen factory overclocked cards producing invalids. Suggest you try to underclock (restore to stock) the 780Ti and see if that helps. Pushing a card slightly over the limit can produce errors. Errors causing artifacts in a game is no big deal, errors while computing a task is a big deal.
Backing down to the previous version should be an easy test.
RE: Here are my
)
I tried that and I lost all GPU knowledge from BOINC. I think it is a combo of Windows 10 and new drivers. I can try an older version than the most recent, i.e. go back two versions.
Thanks for the insight! @MP and @Log
SBSP
RE: I tried that and I lost
)
Life on the bleeding edge :)
This is why I save my peanuts
)
This is why I save my peanuts and let others test the Win10 update before I switch any of mine since I am anti-error with all my OC'd cards and they don't get to take a break yet........since on my dsl it will take hours to do that upgrade on each one.
By then it will be peanut butter here
Well, thanks all who replied.
)
Well, thanks all who replied.
I reinstalled NVIDIA drivers again for WIN 10 and backed off the speed for the 780TI from 1137 to 1098 and I have not had a failure in 24 hours. Let's hope that was it. I have gone from 200,000 to almost 500,00 per day rate in E@H.
I also dialed down cpu usage in BOINC to 80% to give them some headroom for CPU. I am running a ton of ASICs on the machine so I am sure that was overloading the system even though it is screaming fast.
i7 5820 running at 4.0 GHZ liquid cooled, 2 980s and a 780Ti, and 8 x 125 GH/S ASIC all running at full blast!
(also a NVIDIA GTX 770 running on another machine.)
Ni!
SpongeBob SquarePants
Yes it sounds like you have
)
Yes it sounds like you have it fixed and the card drivers are that way some times.
I usually wait before I update the drivers even though you get the message every time you look.
And yes they always run faster and better when you leave a free core for them.
I mess around with mine all the time doing different things with my OC'd cards
I just upgraded to a gtx 970.
)
I just upgraded to a gtx 970. How do I run more that one task at a time ?? Thx Zim
Go to your account page and
)
Go to your account page and click on the Einstein preferences. You need to set the GPU utilization factor - 0.5 for 2x, 0.33 for 3x, 0.25 for 4x, etc. Multiple tasks will be started after your host does the next work fetch. You can speed this up with a small increase in your work cache settings (which can be reverted after the work fetch occurs).
Be aware that some people report adverse effects when running multiple tasks from different science runs. You are currently running both BRP4G and BRP6. You may find better performance if you select just one of these when running multiple concurrent tasks.
At the moment a BRP6 beta app is being tested which is using cuda55 libs. The old app uses cuda32. For most people using Maxwell cards, this seems to be giving ~20% performance improvement. If it were my card, I'd let the current work finish and select (through preference settings) the BRP6 science run only. I'd then change the preference setting to allow beta test apps to run on the machine. You should try 2x, 3x, 4x to see what the optimum concurrency is for your setup. It will be a case of diminishing returns but seeing as you aren't running CPU tasks, I would think each higher setting will give an improvement. The only way to know for certain is to do experiments.
Just be aware that your GPU will be under higher load so you need to pay attention to proper cooling. That's why there is a warning attached to the settings.
Cheers,
Gary.
Thank you for the
)
Thank you for the information. I will have to give it a try.