Are you leaving a cpu core free for each gpu unit you are running? If not try it and see if that helps. The gpu can do 10 times the work a cpu core can in the same amount of time, so if it helps it can be very beneficial. A gpu can need a cpu core just to keep it fed and running optimally while the gpu is doing its thing, since your invalid units are gpu units I would start there.
I have a 5930k CPU, I have allowed boinc to use 4 cores (threads) and leave the rest, I run Seti on these 4 cores and I run 3 instances of Einstein on the GPU, plenty of CPU resource left over.
Ill keep my eye on it when I start getting units for the machine again.
Interestingly I was having similar issues with Seti on this GPU, may have found a fix for that just gotta wait now to see how it pans out.
Most of the failed GPU tasks are "Validate errors" with a few invalids mixed in.
Validate errors are thrown when the tasks doesn't pass the first sanity check of the result meaning there is something wrong with the result file produced, it might not be formatted correctly or there might be values way out of bounds. This sanity check is performed before your result is compared to that of your wingman.
Both are well within limits and both the GPU and CPU pass hours of Prime95 / Furmark?
Ryan, despite lots of (false) advice to the contrary, there exist no perfect "most strenuous" tests which can tell you whether the operating condition of a computing device is suitable for a particular body of code running on a particular set of data.
Climbing off my pedestal, my practical suggestion is that you try lowering both the primary GPU clock rate and the GPU memory clock rate and observe whether the behavior changes. If it gets better, than your unit thinks your rates are too fast for this application. No other opinion matters as much as that of the unit on the work in question.
Obviously higher temperatures and greater power supply noise alter the answers.
Disclosure: I worked most of my career in the semiconductor business, mostly in microprocessor design, testing, reliability, and manufacturing roles. Precious few of my colleagues and managers understood just how imperfect testing necessarily is.
Ok, I did try dropping the GPU clocks and ran some milky way units (cant get einstien at the moment) and they gave the same result, validation inconclusive) ill try lowering the memory clocks as well
Do you realise that when you first start (or restart after an absence) crunching at Milkyway, the first several results are always listed as 'inconclusive' until compared with duplicate tasks from other machines? After a relatively short period, the vast bulk of these turn into validated results.
Here is the results list for that machine at Milkyway. At the time when I looked, there were 41 valid, 6 inconclusive, 1 invalid and 2 error. The 2 error were "aborted by user" and I would expect the 6 inconclusive to get validated in due course. There doesn't appear to be a problem at all.
The other thing to understand about Milkyway is that the original 'flagship' application (Milkyway@Home) seems quite stable but the newer app versions (like Modified Fit) seem to have problems from time to time as versions change. There also seem to be occasional questionable data batches so it's hard to properly test hardware using Milkyway. You can never really be sure of the exact cause (hardware, software or data) of any observed problems.
Lots of invalid results?
)
Are you leaving a cpu core free for each gpu unit you are running? If not try it and see if that helps. The gpu can do 10 times the work a cpu core can in the same amount of time, so if it helps it can be very beneficial. A gpu can need a cpu core just to keep it fed and running optimally while the gpu is doing its thing, since your invalid units are gpu units I would start there.
I have a 5930k CPU, I have
)
I have a 5930k CPU, I have allowed boinc to use 4 cores (threads) and leave the rest, I run Seti on these 4 cores and I run 3 instances of Einstein on the GPU, plenty of CPU resource left over.
Ill keep my eye on it when I start getting units for the machine again.
Interestingly I was having similar issues with Seti on this GPU, may have found a fix for that just gotta wait now to see how it pans out.
Think it could be a GPU issue?
RE: Think it could be a GPU
)
Yes, or PSU, heat, overclocking to much etc.
Most of the failed GPU tasks are "Validate errors" with a few invalids mixed in.
Validate errors are thrown when the tasks doesn't pass the first sanity check of the result meaning there is something wrong with the result file produced, it might not be formatted correctly or there might be values way out of bounds. This sanity check is performed before your result is compared to that of your wingman.
Start by checking temps on both the GPU and CPU.
Both are well within limits
)
Both are well within limits and both the GPU and CPU pass hours of Prime95 / Furmark?
RE: Both are well within
)
Ryan, despite lots of (false) advice to the contrary, there exist no perfect "most strenuous" tests which can tell you whether the operating condition of a computing device is suitable for a particular body of code running on a particular set of data.
Climbing off my pedestal, my practical suggestion is that you try lowering both the primary GPU clock rate and the GPU memory clock rate and observe whether the behavior changes. If it gets better, than your unit thinks your rates are too fast for this application. No other opinion matters as much as that of the unit on the work in question.
Obviously higher temperatures and greater power supply noise alter the answers.
Disclosure: I worked most of my career in the semiconductor business, mostly in microprocessor design, testing, reliability, and manufacturing roles. Precious few of my colleagues and managers understood just how imperfect testing necessarily is.
Ok, I did try dropping the
)
Ok, I did try dropping the GPU clocks and ran some milky way units (cant get einstien at the moment) and they gave the same result, validation inconclusive) ill try lowering the memory clocks as well
Do you really check things
)
Do you really check things properly?
Do you realise that when you first start (or restart after an absence) crunching at Milkyway, the first several results are always listed as 'inconclusive' until compared with duplicate tasks from other machines? After a relatively short period, the vast bulk of these turn into validated results.
Here is the results list for that machine at Milkyway. At the time when I looked, there were 41 valid, 6 inconclusive, 1 invalid and 2 error. The 2 error were "aborted by user" and I would expect the 6 inconclusive to get validated in due course. There doesn't appear to be a problem at all.
The other thing to understand about Milkyway is that the original 'flagship' application (Milkyway@Home) seems quite stable but the newer app versions (like Modified Fit) seem to have problems from time to time as versions change. There also seem to be occasional questionable data batches so it's hard to properly test hardware using Milkyway. You can never really be sure of the exact cause (hardware, software or data) of any observed problems.
Cheers,
Gary.
Bump, getting invalid results
)
Bump, getting invalid results again :(
https://einsteinathome.org/host/11700141/tasks&offset=0&show_names=1&state=4&appid=0
Recent changes are upgrade to Windows 10 and change of GPU to a Radeon Fury.
Thoughts?