Tom, you never answered the question of whether you looked at the ECC errors on the VRAM when you are so heavily overclocking the memory.
The gpu will happily keep correcting for errors but since that takes several retries on every memory transfer that causes a slowdown in performance. Exactly what you are experiencing.
Yes, it returns to processing full speed after a reboot.
It is a dedicated boinc machine running WCG on the CPU and e@h/grp#1 on the GPU.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Hi, I have a rtx 3080 ti FE that seems to slowdown its processing after being up for several days.
It goes from 2.5+ minutes down to 3.5-4 minutes per task. It is running 2x on the GPU with petri's optimized app.
This has happened across both the current 525 and 470 Ubuntu/Nvidia drivers. And includes MTM OC of +1700.
This GPU routinely used to do a little over 3M on the e@h grp#1 diet it normally gets.
It is clearly stalling at 2.5M.
I have rebooted and am now running an MTM OC of +900
What else should I be testing/looking at?
Thank you.
Tom M
Be honest with yourself. Is this a 'daily driver', or do you do anything else besides BOINC with it?
You may have some other program stealing memory and GPU usage. Have you run 'nvidia-smi'?
Also, did you buy the GPU in question used? Do you know how old it is? I had a couple of EVGA GPUs do that, and I ended up applying new thermal pads to them and they sped up again. Could be the same with yours.
Tom, you never answered the question of whether you looked at the ECC errors on the VRAM when you are so heavily overclocking the memory.
The gpu will happily keep correcting for errors but since that takes several retries on every memory transfer that causes a slowdown in performance. Exactly what you are experiencing.
I am pretty sure I looked the last time you asked. I didn't see anything. Let me research how and add that to my permanent notes.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Tom, you never answered the question of whether you looked at the ECC errors on the VRAM when you are so heavily overclocking the memory.
The gpu will happily keep correcting for errors but since that takes several retries on every memory transfer that causes a slowdown in performance. Exactly what you are experiencing.
I am pretty sure I looked the last time you asked. I didn't see anything. Let me research how and add that to my permanent notes.
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1143 G /usr/lib/xorg/Xorg 17MiB |
| 0 N/A N/A 1758 G /usr/lib/xorg/Xorg 70MiB |
| 0 N/A N/A 1886 G /usr/bin/gnome-shell 71MiB |
| 0 N/A N/A 2214 G /usr/bin/nvidia-settings 0MiB |
| 0 N/A N/A 2683 G /usr/lib/firefox/firefox 28MiB |
| 0 N/A N/A 4146 C ...-pc-linux-gnu-opencl_v1.0 1874MiB |
| 0 N/A N/A 4172 C ...-pc-linux-gnu-opencl_v1.0 1874MiB |
+-----------------------------------------------------------------------------+
tlgalenson@Ryzen-OneHorseShay:~$ nvidia-smi -g 0 --ecc-config=0
ECC features not supported for GPU 00000000:09:00.0.
Treating as warning and moving on.
All done.
tlgalenson@Ryzen-OneHorseShay:~$
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Is there any correlation of what WCG projects are running when the slowdown happens? What happens if you temporarily suspend WCG work when you notice the slowdown? If it speeds back up, then there could be a bottleneck because of the WCG work.
Also, the WCG Open Pandemics project has GPU tasks that get sent out sometimes (a lot of them today). Could those be running at some point?
Be honest with yourself. Is this a 'daily driver', or do you do anything else besides BOINC with it?
You may have some other program stealing memory and GPU usage. Have you run 'nvidia-smi'?
Also, did you buy the GPU in question used? Do you know how old it is? I had a couple of EVGA GPUs do that, and I ended up applying new thermal pads to them and they sped up again. Could be the same with yours.
The daily driver runs Windows. Not LInux/Ubuntu. I do use the Firefox web browser on my boinc boxes. But everything else is Linux utilities.
I got one rtx 3080 ti FE in a swap from Ian&SteveC after buying the original one used.
I will jack up the gpu fan speeds and keep an eye on the thermal readings.
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Is there any correlation of what WCG projects are running when the slowdown happens? What happens if you temporarily suspend WCG work when you notice the slowdown? If it speeds back up, then there could be a bottleneck because of the WCG work.
Also, the WCG Open Pandemics project has GPU tasks that get sent out sometimes (a lot of them today). Could those be running at some point?
Good question(s). I think I am running a cpu only profile on WCG. I have not gotten gpu tasks from WCG in a very long time even if I set a profile that only asks for them.
Presuming it slows down again in a couple of days I will try "suspending" the WCG project and observe what happens to the gpu processing.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Tom, you never answered the question of whether you looked at the ECC errors on the VRAM when you are so heavily overclocking the memory.
The gpu will happily keep correcting for errors but since that takes several retries on every memory transfer that causes a slowdown in performance. Exactly what you are experiencing.
I am pretty sure I looked the last time you asked. I didn't see anything. Let me research how and add that to my permanent notes.
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1143 G /usr/lib/xorg/Xorg 17MiB |
| 0 N/A N/A 1758 G /usr/lib/xorg/Xorg 70MiB |
| 0 N/A N/A 1886 G /usr/bin/gnome-shell 71MiB |
| 0 N/A N/A 2214 G /usr/bin/nvidia-settings 0MiB |
| 0 N/A N/A 2683 G /usr/lib/firefox/firefox 28MiB |
| 0 N/A N/A 4146 C ...-pc-linux-gnu-opencl_v1.0 1874MiB |
| 0 N/A N/A 4172 C ...-pc-linux-gnu-opencl_v1.0 1874MiB |
+-----------------------------------------------------------------------------+
tlgalenson@Ryzen-OneHorseShay:~$ nvidia-smi -g 0 --ecc-config=0
ECC features not supported for GPU 00000000:09:00.0.
Treating as warning and moving on.
All done.
tlgalenson@Ryzen-OneHorseShay:~$
that’s not what Keith is referring to. He’s talking about memory errors related to heavy memory overclock. not sure how you check those errors in Linux, but I know you can see it in Windows.
Are you still overclocking the memory 1500 or more? Maybe try reducing that to stock speeds to see if the slowdown behavior stops.
that’s not what Keith is referring to. He’s talking about memory errors related to heavy memory overclock. not sure how you check those errors in Linux, but I know you can see it in Windows.
Are you still overclocking the memory 1500 or more? Maybe try reducing that to stock speeds to see if the slowdown behavior stops.
Slap Forehead.
I was memory overclocking at +1700. I am now memory overclocking at +900. Which means another test, if/when the processing slowdown occurs again will to be dropping the OC entirely.
If I can figure out the right google search.... or if Keith reminds me what the command line is that he had me test....
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Tom, you never answered the
)
Tom, you never answered the question of whether you looked at the ECC errors on the VRAM when you are so heavily overclocking the memory.
The gpu will happily keep correcting for errors but since that takes several retries on every memory transfer that causes a slowdown in performance. Exactly what you are experiencing.
Yes, it returns to processing
)
Yes, it returns to processing full speed after a reboot.
It is a dedicated boinc machine running WCG on the CPU and e@h/grp#1 on the GPU.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Tom M wrote: Processing
)
Be honest with yourself. Is this a 'daily driver', or do you do anything else besides BOINC with it?
You may have some other program stealing memory and GPU usage. Have you run 'nvidia-smi'?
Also, did you buy the GPU in question used? Do you know how old it is? I had a couple of EVGA GPUs do that, and I ended up applying new thermal pads to them and they sped up again. Could be the same with yours.
Proud member of the Old Farts Association
Keith Myers wrote: Tom, you
)
I am pretty sure I looked the last time you asked. I didn't see anything. Let me research how and add that to my permanent notes.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Tom M wrote: Keith Myers
)
tlgalenson@Ryzen-OneHorseShay:~$ nvidia-smi
Fri Apr 28 20:48:23 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:09:00.0 On | N/A |
| 69% 70C P2 314W / 350W | 3953MiB / 12288MiB | 100% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1143 G /usr/lib/xorg/Xorg 17MiB |
| 0 N/A N/A 1758 G /usr/lib/xorg/Xorg 70MiB |
| 0 N/A N/A 1886 G /usr/bin/gnome-shell 71MiB |
| 0 N/A N/A 2214 G /usr/bin/nvidia-settings 0MiB |
| 0 N/A N/A 2683 G /usr/lib/firefox/firefox 28MiB |
| 0 N/A N/A 4146 C ...-pc-linux-gnu-opencl_v1.0 1874MiB |
| 0 N/A N/A 4172 C ...-pc-linux-gnu-opencl_v1.0 1874MiB |
+-----------------------------------------------------------------------------+
tlgalenson@Ryzen-OneHorseShay:~$ nvidia-smi -g 0 --ecc-config=0
ECC features not supported for GPU 00000000:09:00.0.
Treating as warning and moving on.
All done.
tlgalenson@Ryzen-OneHorseShay:~$
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Is there any correlation of
)
Is there any correlation of what WCG projects are running when the slowdown happens? What happens if you temporarily suspend WCG work when you notice the slowdown? If it speeds back up, then there could be a bottleneck because of the WCG work.
Also, the WCG Open Pandemics project has GPU tasks that get sent out sometimes (a lot of them today). Could those be running at some point?
GWGeorge007 wrote: Be honest
)
The daily driver runs Windows. Not LInux/Ubuntu. I do use the Firefox web browser on my boinc boxes. But everything else is Linux utilities.
I got one rtx 3080 ti FE in a swap from Ian&SteveC after buying the original one used.
I will jack up the gpu fan speeds and keep an eye on the thermal readings.
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Boca Raton Community HS
)
Good question(s). I think I am running a cpu only profile on WCG. I have not gotten gpu tasks from WCG in a very long time even if I set a profile that only asks for them.
Presuming it slows down again in a couple of days I will try "suspending" the WCG project and observe what happens to the gpu processing.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Tom M wrote: Tom M
)
that’s not what Keith is referring to. He’s talking about memory errors related to heavy memory overclock. not sure how you check those errors in Linux, but I know you can see it in Windows.
Are you still overclocking the memory 1500 or more? Maybe try reducing that to stock speeds to see if the slowdown behavior stops.
_________________________________________________________________________
Ian&Steve C. wrote: that’s
)
Slap Forehead.
I was memory overclocking at +1700. I am now memory overclocking at +900. Which means another test, if/when the processing slowdown occurs again will to be dropping the OC entirely.
If I can figure out the right google search.... or if Keith reminds me what the command line is that he had me test....
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!