I read that Einstein@home is bottlenecked by GPU memory bandwidth. However I discovered that FGRPB1G downclocks the memory when it runs.
The default memory clock of GTX 1080 Ti is 5600 MHz. When FGRPB1G starts it downclocks to 5100 MHz. If I overclock the memory by 1000 MHz in MSI Afterburner, the memory clock becomes 6600 MHz. As soon as FGRPB1G starts the memory clock drops to 6100 MHz.
https://drive.google.com/file/d/1ZR00QFjqqFtzTb0_4Fjp_lj6Q2NNsD2u/view
Another example. The default memory clock of GTX 980 Ti is 3500 MHz. When FGRPB1G runs it downclocks to 3300 MHz. If I overclock the memory by 400 MHz, the memory clock becomes 3900 MHz in gaming benchmark. But when FGRPB1G runs it stays at 3300 MHz.
Is this a bug? Einstein@home is definitely memory bandwidth dependent. I don't know why it downclocks the memory. Below are the test results of my GTX 1080 Ti on LATeah2008L tasks. Numbers are averaged over a couple tasks. Overclocking the memory by 20% increases power consumption by 6%, but shortens the time by 4.5%.
Concurrency | Memory Clock/MHz | Power/W | Temperature/℃ | Total Time/s | Time per WU/s |
3 | 6100 | 212 | 47 | 966 | 322 |
3 | 5100 | 200 | 46 | 1010 | 337 |
2 | 6100 | 201 | 45 | 689 | 345 |
2 | 5100 | 191 | 44 | 713 | 357 |
BTW Vega 64 has a memory bandwidth of 484 GB/s and a RAC between 110k and 150k. Since Radeon VII has a memory bandwidth of 1024 GB/s, should we expect a RAC about 250k on a Radeon VII?
Added 15 Jan 2019 18:54:45 UTC
Apparently a stable overclock for gaming is not a stable overclock for crunching. Since Jan 14th I have gotten 169 invalid results.
Copyright © 2024 Einstein@Home. All rights reserved.
It's not the work units that
)
It's not the work units that downclock the memory, it's Nvidia. Nvidia has stated on their website that when GPU recognizes a scientific work unit, the GPU is moved to P2 states which, by default, is a lower GPU speed and memory so that the scientific work units don't get corrupted by the normal P0 state levels. AMD doesn't have this issue.
I'm not sure if this works
)
I'm not sure if this works with GTX 10xx series cards but with 9xx you can use a program called Nvidia Inspector to adjust the P2 mem clock to max. https://www.guru3d.com/files-details/nvidia-inspector-download.html
I'm still using version 1.9.7.8 as I recall there was something strange with installing or using the newer version, but might have been just a user error by me.
Show overclocking --- Overclocking : Performance Level (2)-(P2) - Memory Clock Offset ... move slider to max ... and Apply Clocks & Voltage ... and you can see mem bandwidth (Bus Width GB/s) on the left side will change --- Exit
Thank you for pointing it
)
Thank you for pointing it out. Could you elaborate on "get corrupted by the normal P0 state levels"?
Now that I got 169 invalid results since yesterday, I think the P2 state has a point. An overclocked memory at 6100 MHz should be stable for games. But obviously Einstein@home doesn't think so. Need to find a sweet spot between shorter finished time and failure rate.
Thanks. It seems the P2 state
)
Thanks. It seems the P2 state can be disabled.
https://www.reddit.com/r/RenderToken/comments/9w2rd9/how_to_use_maximum_p0_power_state_with_nvidia/
However I got lots of invalid results by going 500 MHz above the base memory clock (1000 MHz above P2). Do you have any recommendations on a safe overclock relative to the base clock?
shuhui1990 wrote:Thank you
)
Without going into to much detail, P0 states are fine for gaming. No one really cares if you are dropping small bits of data here and there. It will get overshadowed quickly as the screen changes. However, when doing scientific work, any error in calculations will result in the entire work unit being corrupted by an error in processing. Nvidia knows this so to prevent any errors from occurring they restrict scientific processes to P2 state with slower speeds so that no corruption gets incorporated in the analysis. Remember these are gaming cards not scientific cards as opposed to Tesla cards. If you want more info you can google P0 vs P2 states and come up with Miners talking about the difference, etc.. This is just a quick explanation in a nutshell.
Thanks. I do understand when
)
Thanks. I do understand when memory clock crosses a certain point the error rate increases exponentially with clock speed. So it seems to me the P2 memory clock was the absolutely safe clock with zero error while the P0 memory clock was already "officially overclocked" with few errors but fine for gaming.
The only way to find the
)
The only way to find the failure threshold for a particular sample of a particular card running a particular application is large-scale testing.
I've done this. The answer varies from card to card of the same make and model. So don't trust anyone who gives you a number.
The other side of the coin is "how much benefit?". Several generations ago when the fact that Maxwell2 generation cards downclocked memory a lot became known, there was an appreciable gain available for Einstein performance by tampering. I had sworn off CPU overclock several years before but got into GPU overclocking for the first time on that occasion.
I think you may find that the current Einstein application gives less performance improvement with memory overclocking than you might suppose, making it a bit questionable whether it is worth the time and effort to find a safe overclock.
Also, as not all the data sets are the same, there is no guarantee that a carefully found just barely safe operating point will stay safe into the future.
Warnings aside, I personally do overclock, but I do it by slowly creeping up in clock rate until I find error, then backing down until I find a rate that gives zero errors in 24 hours, then back down two more increments.
Your preferred method will vary, naturally.
I was gonna take your method.
)
I was gonna take your method. I do hope that there is an application that tests vram fidelity like MemTest for ram so I don't need to screw up E@H tasks.
Did you see performance improvement with core clock overclocking?
shuhui1990 wrote:Did you see
)
Yes, but again somewhat less than one might suppose. But more than the memory clock for recent cards and the current Einstein application, if memory serves.
Do you know if there is any
)
Do you know if there is any app_config options to pass a command line option to disable the throttling?
Do you think this might be related to the Nvidia Series 20 problems?
I took a GPUZ log at the point of the EAH screen blank and the it seems like the GPU and memory frequencies are dropped by more than a 1gz. I was not aware that EAH messed around with the frequency. If EAH lowers the frequency too much, it might be the source from the time out.
Thoughts?