Dear fellow crunchers,
I stumbled upon a problem with my Galax GTX970 running GPU-Grid, which seems to affect all cards and all GP-GPU crunching, i.e. CUDA and OpenCL. I have also seen it reported from the Bitcoin guys, but am not aware of any solution yet.
The problem:
As soon as a CUDA or OpenCL task is started, the card is limited to "P2" as the highest performance state. This is meant for some less-demanding load and limits GPU memory clocks to 1.5 / 3.0 / 6.0 GHz on GM204 cards, depending on how you want to count DDR clock speeds.
At GPU-Grid the memory controller load is ~50% with my GTX970, while loads >30% are known to reduce performance measureably. I expect GTX980 with more shaders to feed to be affected even more, and Einstein with generally higher bandwidth demand to be affected even more.
Diagnosing the problem:
This is difficult, because at first glance everything seem OK. Running regular 3D loads the card switches into "P0" and runs the memory at 1.75 / 3.5 / 7.0 GHz. This speed is also what's reported by the Einstein and GPU-Grid apps. However, tools like GPU-Z and nVidia Inspector show the clock speed being lower.
Inspector also shows the performance state. You can use this to check for yourself:
- launch a benchmark like Heaven in windowed mode, halt BOINC GPU computations -> "P0" and full memory clocks are reported
- resume BOINC GPU crunching -> the card immediately switches to "P2" and the memory clock speed drops
Increasing memory clock speed:
Regular tools like EVGA Precision, Afterburner etc. only set values for the P0 state. If you increase the memory clock there, it applies just fine to regular 3D loads like Heaven, but has no effect on the memory clock in P2. Strangely a GPU OC set in those tools applies to both, P0 and P2, so this works just as expected.
But you can use nVidia inspector to set the memory clock speed for P2 separately.
The bigger problem:
My card runs 1.75 / 3.5 / 7.0 GHz fine in P0 and should reach almost 2 / 4 / 8 GHz according to usual reviews. However, when I increase the memory clock in P2 I can not even get 3.15 GHz stable!
This makes me suspect the memory voltage is lower in P2, but I have no way to verify or change this.
What I already tried:
- contacted the card manufacturer but got no answer at all
- tried to force the card into P0 using various methods found on the net, but the card simply ignores the commands
- searched the card's BIOS using Maxwell BIOS editor for any helpful setting, but couldn't identify any
- system or nVidia control panel power setting has no effect
- all 344 WHQL drivers show this behaviour
What's left:
- are any of your cards not crunching in P2?
- which memory speeds do you achieve in P2?
- contact nvidia?
- trying to work with NvAPI directly?
I'm grateful for any help.. and if I'm right and we can correct this, your Einstein throughput could see a significant boost!
MrS
Scanning for our furry friends since Jan 2002
Copyright © 2024 Einstein@Home. All rights reserved.
Low memory clock on Maxwell2 cards (960/970/980, probably Titan
)
What do you consider to be a reliable download source for Nvidia Inspector? I looked around a bit and found offers from some of the usual shareware sites, and may have found the author's site, but my security software blocked what appeared to be the author's own download location.
Also what do you consider to be an adequately current version?
I have a GTX 970 and would be happy to report, though I'm not very interested in tinkering with it much (it is on my wife's primary use PC).
Good question.. I'd certainly
)
Good question.. I'd certainly trust Guru3D.
Update: sent this to nVidia as a bug report.
MrS
Scanning for our furry friends since Jan 2002
I have a dual-core HT Haswell
)
I have a dual-core HT Haswell host which has been running 3x Perseus jobs for over a week on an EVGA superclocked GTX 970.
I tried the Guru3D US download link, and my security software did not like it, with the same display I saw for the author's download page, but it made no complaint of the European download site.
I'm not fully clear on what I see in the various Nvidia inspector fields, and as I don't run other graphics applications, it is not instantly convenient to make comparisons, but I do see this when running my 3X Perseus load:
1. less than double my GTX 750 output on the same box, which seems low
2. NV I reports I am in state P2
3. NV I's Mem entry on the third row up from the bottom reads 3005
4. the loadings readings from NV I (which appear broadly consistent with what I see from GPU-Z, include GPU load 94%, MCU 65%, VPU 0%.
I took a look at NV I's overclocking page, but did not tamper. While I said I'm uneager to tinker much, I might consider trying a modest Memory clock bump, with the intent of observing impact on reported GPU load and on elapsed time of Perseus jobs.
Partly as a consequence of Process Lasso imposed affinity and priority setting, my Perseus ET's are in a very tight distribution on this box, so I could detect a quite small change in compute performance within a few hours of run time.
Any suggestions?
Thanks so far! What you
)
Thanks so far! What you report sounds expected: your card is also in P2 and running the memory at 3.0 GHz. And a memory controller load of 65% is indeed impressively high.
What I'd try:
- open the OC tab in NV I
- switch the top selection to P2
- increase memory clock speed to 3.05 or 3.1 GHz
Judging from my not-well-behaved card this should be about safe. You should see an instant drop in MCU by 1-2%. If this works out you could try higher clocks. In my case 3.5 GHz lead to Blue Screens and lost WUs within minutes, though.
MrS
Scanning for our furry friends since Jan 2002
Thanks for the advice--it
)
Thanks for the advice--it will probably be at least tomorrow before I try anything. I have urgent non-BOINC matters in my lap today.
At least I was able to supply you information from one additional machine.
It seems alarming that your memory clock rate cliff is so close by. Perhaps it is not a simple speed/voltage problem but something else?
I assume these issues are on
)
I assume these issues are on Windows boxes??? Any feedback from a Linux perspective?
RE: It seems alarming that
)
I agree. I'm not 100% certain yet where the cliff exactly is.. mostly because I want to avoid the troublesome range and because it's difficult to test with non-BOINC workload.
It could also be that nVidia decided that GP-GPU needs tighter timings rather than high bandwidth, so they use a different profile in P2. That would fit a hard memory clock speed wall. However, it would be strange to make such a general decision, especially at bandwidth loads of 50 - 65%.
@robl: yes, this is all Windows. If I'm right and its intentional behaviour set by the cards BIOS, Linux will be affected as well. I have no idea how to diagnose it there, though.
MrS
Scanning for our furry friends since Jan 2002
Just what is possibly a
)
Just what is possibly a wildly optimistic thought. I wonder if applications compiled with a sufficiently high CUDA version might present themselves to the card in a way which would not give this behavior.
Or, possibly, there is a directive choice available in coding which, if used, might avoid it.
GPU-Grid has compiled a CUDA
)
GPU-Grid has compiled a CUDA 6.5 app in order to support Maxwell. If anything is there it doesn't come automatically with the new version. Manual switches could be possible, though.
MrS
Scanning for our furry friends since Jan 2002
I am seeing the same
)
I am seeing the same behaviour on a Asus GTX 970 OC Edition. Currently it is running 6 Seti MultiBeam tasks concurrently (0.34 C + 0.165 NV). GPU usage 95...99 %, MCU usage 80...85 %, P-State is P2, Current Clock 1316 MHz and Memory 3005 MHz. Driver is 344.65 WHQL.
Have you checked if any of the Driver Profile settings make any difference? You can access them in Nvidia Inspector by clicking the button next to Driver Version line.