Nvidia Pascal and AMD Polaris, starting with GTX 1080/1070, and the AMD 480

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7225641597
RAC: 1050827

RE: Archae, if i'm

Quote:
Archae, if i'm understanding correctly, the Pascal Architecture cards are underclocking the memory by 500mhz in P2 State. For instance, on 970's the P0 memory clock rate is the rated 7000mhz which NVI reports as 3505mhz. In P2 state, the 970's run the memory at 3005mhz by default.


No, the difference between my 1070 in P0 vs. P2 state is not 500 MHz on any of the three competing MHz scales.

Scale 1: as reported by GPU-z nominal P0 is about 2000
Scale 2: as reported by Nvidia Inspector nominal P0 is about 4000
Scale 3: as recounted by Nvidia marketing, the card memory clock is 8000

As I like to use GPU-z for averaging in particular, and for monitoring in general, I've been generally using GPU-z "scale 1" numbers here. In that scale, all of my results published here before I (just yesterday) began overclock work have consistently reported 1901.2 MHz memory clock rate. In round numbers that is 100 below gaming nominal by Scale 1, 200 by scale 2, and 400 by scale 3. The Pascal difference is smaller than the Maxwell2 difference, by any reasonable method of comparison. Much smaller. On a percentage basis, regardless of scale chosen it is just 5%. More than half of the gain I have already reported came from going well above the card's claimed memory clock rate, NOT from getting back the 5%.

Quote:
It would seem from your findings that 4x concurrency on the 1070 produces optimal credit/day configuration, correct?


I have not looked above 4X, but would not be surprised if 5X were to give a bit more, and perhaps further gains above that. I judged it more interesting in the short term to pursue the overclock opportunity. I also think that answer is probably far more configuration and application dependent than is the overclocking answer.

Manuel Palacios
Manuel Palacios
Joined: 18 Jan 05
Posts: 40
Credit: 224259334
RAC: 0

RE: RE: Archae, if i'm

Quote:
Quote:
Archae, if i'm understanding correctly, the Pascal Architecture cards are underclocking the memory by 500mhz in P2 State. For instance, on 970's the P0 memory clock rate is the rated 7000mhz which NVI reports as 3505mhz. In P2 state, the 970's run the memory at 3005mhz by default.

No, the difference between my 1070 in P0 vs. P2 state is not 500 MHz on any of the three competing MHz scales.

Scale 1: as reported by GPU-z nominal P0 is about 2000
Scale 2: as reported by Nvidia Inspector nominal P0 is about 4000
Scale 3: as recounted by Nvidia marketing, the card memory clock is 8000

As I like to use GPU-z for averaging in particular, and for monitoring in general, I've been generally using GPU-z "scale 1" numbers here. In that scale, all of my results published here before I (just yesterday) began overclock work have consistently reported 1901.2 MHz memory clock rate. In round numbers that is 100 below gaming nominal by Scale 1, 200 by scale 2, and 400 by scale 3. The Pascal difference is smaller than the Maxwell2 difference, by any reasonable method of comparison. Much smaller. On a percentage basis, regardless of scale chosen it is just 5%. More than half of the gain I have already reported came from going well above the card's claimed memory clock rate, NOT from getting back the 5%.

Quote:
It would seem from your findings that 4x concurrency on the 1070 produces optimal credit/day configuration, correct?

I have not looked above 4X, but would not be surprised if 5X were to give a bit more, and perhaps further gains above that. I judged it more interesting in the short term to pursue the overclock opportunity. I also think that answer is probably far more configuration and application dependent than is the overclocking answer.

Thank you for the clarification, I seem to have been getting mixed up due to the different scales given the reporting tool. Nonetheless, I now see how much more significant your point is concerning the application's thirst for memory clock. It is quite surprising just how sensitive it is.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7225641597
RAC: 1050827

For the last couple of days I

For the last couple of days I have repeatedly had trouble when starting Google Chrome with having the web page displayed in every tab be completely black. Often just shutting down that copy and doing a fresh start would give me a working copy, but this morning I had no joy after at least five tries.

This is nothing new, as a brief web search shows people voicing this complaint and being directed to similar solutions for some years now. The general theme of the solutions is a problem in using the GPU.

I mention it here because the standard solution is to disable hardware acceleration, and thus in some sense must involve a coordination problem between Google Chrome and the installed GPU or its driver.

Of the many solutions on offer, the one that I adopted was to start a fresh copy of chrome having set the command prompt window to have local context where chrome was installed using this command:
chrome.exe --disable-gpu
Then in that (working) copy of chrome going to settings and unchecking the option to use hardware acceleration.

I don't know whether my problem had anything specific to do with either the current Nvidia driver or the GTX 1070 at all, but in case it did I thought I should mention the matter here.

Anonymous

RE: For the last couple of

Quote:

For the last couple of days I have repeatedly had trouble when starting Google Chrome with having the web page displayed in every tab be completely black. Often just shutting down that copy and doing a fresh start would give me a working copy, but this morning I had no joy after at least five tries.

This is nothing new, as a brief web search shows people voicing this complaint and being directed to similar solutions for some years now. The general theme of the solutions is a problem in using the GPU.

I mention it here because the standard solution is to disable hardware acceleration, and thus in some sense must involve a coordination problem between Google Chrome and the installed GPU or its driver.

Of the many solutions on offer, the one that I adopted was to start a fresh copy of chrome having set the command prompt window to have local context where chrome was installed using this command:
chrome.exe --disable-gpu
Then in that (working) copy of chrome going to settings and unchecking the option to use hardware acceleration.

I don't know whether my problem had anything specific to do with either the current Nvidia driver or the GTX 1070 at all, but in case it did I thought I should mention the matter here.

Is Google Chrome the only browser effected?

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7225641597
RAC: 1050827

RE: Is Google Chrome the

Quote:
Is Google Chrome the only browser effected?


On my system Firefox was fine throughout the episodes. I believe I had a previous episode a couple of months ago, though I don't remember it, as Google search informed me I had visited one of the pages earlier this year. That makes it less likely to be something specifically wrong with either the 1070 or narrowly with the latest driver.

I did not try any other browser. Posts on the subject did not seem generally to describe a multi-browser condition.

My guess is that more than one particular cause can give this result, else I'd expect it to have been fixed long ago.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7225641597
RAC: 1050827

My commissioning work with my

My commissioning work with my MSI GTX 1070 FE card in the last few days has taken the form of overclocking exploration. While other methods might work, I stopped looking when I found one that seemed adequate to my short-term purpose. All my efforts have used Nvidia Inspector. Nearly all of my clock rate commands have used the command line interface, providing P0 state offsets for the GPU clock rate and the memory clock rate. These P0 state offsets propagate into the P2 state in which the card actually runs Einstein GRP6/CUDA55 work.

While I have seen by experiment that running multiplicity of 3X and 4X gives progressively modestly more output, I chose to standardize my overclocking testing at 2X, which allows me to cycle through test points more quickly. I think it likely that 2X overclock success rates will closely mimic those at 3X and above.

I inched up in memory clock rate in steps of 50 MHz, on the NVI scale for which full-speed nominal gaming P0 condition rate is reported as 4000. As the starting condition at which the card naturally runs Einstein was 3800 on this scale I was quite surprised that it ran correctly at a +950 offset, giving 2376.0 as reported by GPU-Z, or 4752 as reported by NVI, or 9504 in Nvidia marketing terms. The fastest observed success was 25% faster memory clock than the card runs Einstein in stock condition, and 18.8% faster than the marketing nominal memory clock rate for the card. As the Einstein GRP6 work is very strong in responding to memory clock rate, the card credit rate production improved by 19.6%, reaching a calculated rate for the GPU itself of 192,632, for a system total of 196,214.

While I expect my long-term running point will be a couple of steps down in memory clock rate from this observed peak, there seems likely to be more than enough combined opportunity from GPU core clock rate, GPU task multiplicity, and number of allowed CPU tasks to get the total system output over 200,000/day, and perhaps that from the GPU alone.

My "ceiling error" for maximum memory overclock took an undesired form. The PC spontaneously rebooted after about 30 minutes of running at that test condition. It is possible this had nothing to do with the overclock, and it may not be the consistent symptom. But +950 is a lot, so I don't plan to look harder for the memory clock peak real soon. I've backed down to +850 and have started inching my way up the core clock overclock. I also saw PC rebooting as the failure symptom in some of my GTX 750 overclocking work, and currently suspect this was a real ceiling symptom, and not a coincidence from an unrelated problem in my system.

It appears that P2 state core clock overclocking using the NVI P0 state offset input may genuinely set the clock to an offset of the prescribed amount to the time and temperature dependent value it would have assumed without the command. This has the odd result that one sees an initial core clock rate value reflecting the cooling of the GPU during rate-setting down time, which over the first few minutes descends to a reasonably stable value. This effect would be yet stronger had I left the fans at default. As I mentioned below I've imposed a significantly more aggressive fan speed curve (using MSIAfterburner) than the card default, with the practical result that at my room ambient and my test conditions the fan speed has been running about 70%, rather than about 53%.

As always with overclocking, there is no reason to suppose the overclocks tolerated by my particular sample of this card will be the same seen by other samples. Real errors may be generated, and given the spontaneous reboot seen, file corruption may be a possibility. In other words, proceed at your own risk and expect to get different results.

Manuel Palacios
Manuel Palacios
Joined: 18 Jan 05
Posts: 40
Credit: 224259334
RAC: 0

RE: My commissioning work

Quote:

My commissioning work with my MSI GTX 1070 FE card in the last few days has taken the form of overclocking exploration. While other methods might work, I stopped looking when I found one that seemed adequate to my short-term purpose. All my efforts have used Nvidia Inspector. Nearly all of my clock rate commands have used the command line interface, providing P0 state offsets for the GPU clock rate and the memory clock rate. These P0 state offsets propagate into the P2 state in which the card actually runs Einstein GRP6/CUDA55 work.

While I have seen by experiment that running multiplicity of 3X and 4X gives progressively modestly more output, I chose to standardize my overclocking testing at 2X, which allows me to cycle through test points more quickly. I think it likely that 2X overclock success rates will closely mimic those at 3X and above.

I inched up in memory clock rate in steps of 50 MHz, on the NVI scale for which full-speed nominal gaming P0 condition rate is reported as 4000. As the starting condition at which the card naturally runs Einstein was 3800 on this scale I was quite surprised that it ran correctly at a +950 offset, giving 2376.0 as reported by GPU-Z, or 4752 as reported by NVI, or 9504 in Nvidia marketing terms. The fastest observed success was 25% faster memory clock than the card runs Einstein in stock condition, and 18.8% faster than the marketing nominal memory clock rate for the card. As the Einstein GRP6 work is very strong in responding to memory clock rate, the card credit rate production improved by 19.6%, reaching a calculated rate for the GPU itself of 192,632, for a system total of 196,214.

While I expect my long-term running point will be a couple of steps down in memory clock rate from this observed peak, there seems likely to be more than enough combined opportunity from GPU core clock rate, GPU task multiplicity, and number of allowed CPU tasks to get the total system output over 200,000/day, and perhaps that from the GPU alone.

My "ceiling error" for maximum memory overclock took an undesired form. The PC spontaneously rebooted after about 30 minutes of running at that test condition. It is possible this had nothing to do with the overclock, and it may not be the consistent symptom. But +950 is a lot, so I don't plan to look harder for the memory clock peak real soon. I've backed down to +850 and have started inching my way up the core clock overclock. I also saw PC rebooting as the failure symptom in some of my GTX 750 overclocking work, and currently suspect this was a real ceiling symptom, and not a coincidence from an unrelated problem in my system.

It appears that P2 state core clock overclocking using the NVI P0 state offset input may genuinely set the clock to an offset of the prescribed amount to the time and temperature dependent value it would have assumed without the command. This has the odd result that one sees an initial core clock rate value reflecting the cooling of the GPU during rate-setting down time, which over the first few minutes descends to a reasonably stable value. This effect would be yet stronger had I left the fans at default. As I mentioned below I've imposed a significantly more aggressive fan speed curve (using MSIAfterburner) than the card default, with the practical result that at my room ambient and my test conditions the fan speed has been running about 70%, rather than about 53%.

As always with overclocking, there is no reason to suppose the overclocks tolerated by my particular sample of this card will be the same seen by other samples. Real errors may be generated, and given the spontaneous reboot seen, file corruption may be a possibility. In other words, proceed at your own risk and expect to get different results.

This is rather impressive considering just how far you were able to take those chips before a hard reboot. Personally, I would not take those GDDR5 chips 250-300mhz past there rated P0 state clock values. You mention you use a FE version, which although the engineering in place by NVIDIA is quite good on those coolers, I still believe it to be suboptimal to a properly designed heatsink/fan from an AIB. Of course, then you have to think about the price premiums concerning which route you want to go as far as cooling is concerned. It's unfortunate that we know GPU temperature, but i'm not aware of a tool that informs the user of the GDDR5 chip temperatures and the effect (other than obvious excess heating) that overclocking has on them. Does the FE design at least have heatsinks for the memory chips?

I can imagine a more modest overclock on the memory in conjunction with an updated BRP6 app could yield a 1070 setup that pushes or exceeds that 200,000 credit/gpu mark.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7225641597
RAC: 1050827

RE: It's unfortunate that

Quote:
It's unfortunate that we know GPU temperature, but i'm not aware of a tool that informs the user of the GDDR5 chip temperatures and the effect (other than obvious excess heating) that overclocking has on them. Does the FE design at least have heatsinks for the memory chips?


The power consumption increase at the wall for the total range of memory clock rate is rather modest. As the added power in the memory chips themselves can only be a fraction of that, I doubt they are getting seriously hotter. I think the card designs concern themselves with cooling all three of the GPU chip, the power conditioning circuitry, and the RAM chips.

On power efficiency grounds, a quick look at my in-process results suggests that for Einstein GRP6/CUDA55 work performance gain from memory clock increase is far cheaper in added power cost than the same gain obtained by core clock increase.

The term "silicon lottery" is a useful reminder that a given sample of a card may by luck have appreciably higher overclocking potential than another. If Micron in the future finds a big home for the next higher slice of GDDR5 RAM speed it may turn out that cards built a few months from now all have less overclocking headroom on the memory than mine, as a result of factory binning skimming the cream and sending it elsewhere. But I've seen a few reports of early 1070 users who tried finding they could use considerable memory overclocks on their cards. By contrast, across the whole Pascal universe under normal cooling conditions (that is not paying attention to the liquid nitrogen people) people are seeing a surprisingly tight range of maximum core clock rates, with not generally an especially large additional benefit obtained above the sort of "free overclock" the card provides at default operating conditions.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7225641597
RAC: 1050827

My commissioning work with my

My commissioning work with my MSI GTX 1070 FE card in the last few days has taken the form of overclocking exploration.

Having previous observed GRP6/CUDA55 success when running the memory clock offset at +950, I back it down to +850 and inched my way up core clock offsets, using an increment in my request of 20. The highest observed success was for a requested core clock offset of +240, while a trial at +260 generated two failures. Conveniently, these did not crash the computer, but just stopped running the task with the monitoring software showing status as computation error.

The reported results as shown on my task list:
[pre]Outcome: Computation error
Client state: Compute error
Exit status: 1015 (0x000003F7) Unknown error code[/pre]
I have now started a run with the memory clock set one of my increments lower than the fastest observed to work, and also the core clock. In P0 state offset terms this means I've requested +220 core clock and +900 memory clock. If this stays up and running and generating nothing but valid results for a day, I'll hazard a guess that yet one more tick down in core clock and memory clock might be a reasonable operating point, and I'll move over to trials of higher multiplicity, stopping briefly at 3X, longer at 4X, and if 4X gives more work than 3X I'll work up further.

On the GPU-Z reporting scale, this newly started run is running GPU core clock of 2075 MHz, and memory clock of 2352.4 (which is 4704.8 on the NVI scale, or 9.4 GHz on the Nvidia marketing scale. This core clock maximum is pretty much in the range of maximum core clock success speeds reported rather broadly by game-oriented reviewers and users.

System power consumption at these conditions is about 201 watts. I'll hazard a guess that the GPU card itself is using about 110 watts at this condition. Credit/day output is just under 200,000, and I think it likely to go slightly over at 4X multiplicity.

Gamboleer
Gamboleer
Joined: 5 Dec 10
Posts: 173
Credit: 168389195
RAC: 0

Still waiting on a viable

Still waiting on a viable pre-order for the RX-480. Only two pages have surfaced on Amazon, and have not had orders enabled. NewEgg is on auto-notify, and I won't purchase anything time-sensitive from B&H. I suspect the release has been held to the 29th to minimize scalping.

If I can get one overnighted from Amazon, I'll be able to post preliminary results before the weekend, otherwise I'll be gone until the 5th.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.