I have a pair of rtx 3080 ti Founders Edition gpus.
The one that is not driving the monitor is utilizing 100 watts+ less power draw than the other.
OC doesn't seem to make a difference. And I am seeing a wider variability between processing times than I used to. Some tasks are 156s, some are 163s.
Any ideas?
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
are they both power limited to the same power? what value?
when you say one is "100W more" than the other. what are the specific power values observed being pulled? how is is being measured?
what are the clocks observed by each card? please specify the clocks with the power draw for each card.
are they both power limited to the same power? what value?
when you say one is "100W more" than the other. what are the specific power values observed being pulled? how is is being measured?
what are the clocks observed by each card? please specify the clocks with the power draw for each card.
Tried to post a reply before I did a cold boot. Apparently I didn't save the message.
After the cold boot both gpus are drawing at near 400 watts on a PL of 400.
I have just OCed the memory transfer back up to +900
Just ran nvidia-smi again and one gpu is drawing 100+ watts less than the other.
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1212 G /usr/lib/xorg/Xorg 5MiB |
| 0 N/A N/A 1752 G /usr/lib/xorg/Xorg 6MiB |
| 0 N/A N/A 3354 C ...-pc-linux-gnu-opencl_v1.0 1034MiB |
| 0 N/A N/A 3416 G /usr/bin/nvidia-settings 0MiB |
| 0 N/A N/A 3426 C ...-pc-linux-gnu-opencl_v1.0 1874MiB |
| 1 N/A N/A 1212 G /usr/lib/xorg/Xorg 17MiB |
| 1 N/A N/A 1752 G /usr/lib/xorg/Xorg 60MiB |
| 1 N/A N/A 1956 G /usr/bin/gnome-shell 79MiB |
| 1 N/A N/A 2942 G /usr/lib/firefox/firefox 119MiB |
| 1 N/A N/A 3370 C ...-pc-linux-gnu-opencl_v1.0 1874MiB |
| 1 N/A N/A 3396 C ...-pc-linux-gnu-opencl_v1.0 1874MiB |
| 1 N/A N/A 3416 G /usr/bin/nvidia-settings 0MiB |
+-----------------------------------------------------------------------------+
tommiller@Ryzen-Charon:~$
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
run this command instead, post the output. make sure you run this a few times to make sure you get an output that is representative of the cards running steady state (and not an outlier reading like if you were to run it at the exact moment that a task stopped). this is all one line command, not multiple lines/commands
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
If nothing else I am going to swap cards and slots. And see if the problem follows the card or stays with the slot.
===edit== Even though the temperature isn't displaying like it has hit the temperature limit, a way it slows down as the temperature goes up, makes me wonder.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Looks like GPU0 is locked at 1530MHz. That’s probably why it’s running with less power draw.
Two questions. Use some kind of reset? And it draws full power when it starts up. Then apparently heats up and slows down?
I will look for command line GPU reset stuff.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
I have a pair of rtx 3080 ti
)
I have a pair of rtx 3080 ti Founders Edition gpus.
The one that is not driving the monitor is utilizing 100 watts+ less power draw than the other.
OC doesn't seem to make a difference. And I am seeing a wider variability between processing times than I used to. Some tasks are 156s, some are 163s.
Any ideas?
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
need more information. are
)
need more information.
are they both power limited to the same power? what value?
when you say one is "100W more" than the other. what are the specific power values observed being pulled? how is is being measured?
what are the clocks observed by each card? please specify the clocks with the power draw for each card.
_________________________________________________________________________
Ian&Steve C. wrote: need
)
Tried to post a reply before I did a cold boot. Apparently I didn't save the message.
After the cold boot both gpus are drawing at near 400 watts on a PL of 400.
I have just OCed the memory transfer back up to +900
Just ran nvidia-smi again and one gpu is drawing 100+ watts less than the other.
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
That does not tell me what
)
That does not tell me what clock speed the cards are running.
but I do see that the lower power GPU is only running 89% GPU utilization where the full power card is at 99%.
_________________________________________________________________________
run this command instead,
)
run this command instead, post the output. make sure you run this a few times to make sure you get an output that is representative of the cards running steady state (and not an outlier reading like if you were to run it at the exact moment that a task stopped). this is all one line command, not multiple lines/commands
nvidia-smi --query-gpu=name,pci.bus_id,utilization.gpu,utilization.memory,clocks.current.sm,clocks.current.memory,power.draw,memory.used,pcie.link.gen.current,pcie.link.width.current --format=csv
_________________________________________________________________________
It looks like it was a
)
It looks like it was a connectivity issue. I got an error that said nvidia-smi could not talk to a card.
So I blew the cards out. Took off the problem child and did the Keith M cleaning routine. Re-seated it.
After I booted and started it up, the OTHER gpu quit completely (lights out).
Wiggled that cards power wiring and the card.
Booted again. And both are now drawing right next to 400 watts.
Nope. The card that also has a label on the hdmi port claiming it doesn't work has powered down to a lower draw again.
Going to do the requested command line diagnostic next.
Testing Results follows
tommiller@Ryzen-Charon:~$ nvidia-smi --query-gpu=name,pci.bus_id,utilization.gpu,utilization.memory,clocks.current.sm,clocks.current.memory,power.draw,memory.used,pcie.link.gen.current,pcie.link.width.current --format=csv
name, pci.bus_id, utilization.gpu [%], utilization.memory [%], clocks.current.sm [MHz], clocks.current.memory [MHz], power.draw [W], memory.used [MiB], pcie.link.gen.current, pcie.link.width.current
NVIDIA GeForce RTX 3080 Ti, 00000000:08:00.0, 100 %, 93 %, 1530 MHz, 9801 MHz, 265.14 W, 3767 MiB, 4, 8
NVIDIA GeForce RTX 3080 Ti, 00000000:09:00.0, 90 %, 97 %, 1875 MHz, 9801 MHz, 364.05 W, 2885 MiB, 4, 8
tommiller@Ryzen-Charon:~$ nvidia-smi --query-gpu=name,pci.bus_id,utilization.gpu,utilization.memory,clocks.current.sm,clocks.current.memory,power.draw,memory.used,pcie.link.gen.current,pcie.link.width.current --format=csv
name, pci.bus_id, utilization.gpu [%], utilization.memory [%], clocks.current.sm [MHz], clocks.current.memory [MHz], power.draw [W], memory.used [MiB], pcie.link.gen.current, pcie.link.width.current
NVIDIA GeForce RTX 3080 Ti, 00000000:08:00.0, 99 %, 93 %, 1530 MHz, 9801 MHz, 262.40 W, 3767 MiB, 4, 8
NVIDIA GeForce RTX 3080 Ti, 00000000:09:00.0, 100 %, 100 %, 1875 MHz, 9801 MHz, 372.43 W, 4053 MiB, 4, 8
tommiller@Ryzen-Charon:~$ nvidia-smi --query-gpu=name,pci.bus_id,utilization.gpu,utilization.memory,clocks.current.sm,clocks.current.memory,power.draw,memory.used,pcie.link.gen.current,pcie.link.width.current --format=csv
name, pci.bus_id, utilization.gpu [%], utilization.memory [%], clocks.current.sm [MHz], clocks.current.memory [MHz], power.draw [W], memory.used [MiB], pcie.link.gen.current, pcie.link.width.current
NVIDIA GeForce RTX 3080 Ti, 00000000:08:00.0, 100 %, 93 %, 1530 MHz, 9801 MHz, 264.97 W, 3767 MiB, 4, 8
NVIDIA GeForce RTX 3080 Ti, 00000000:09:00.0, 100 %, 100 %, 1875 MHz, 9801 MHz, 373.02 W, 4056 MiB, 4, 8
tommiller@Ryzen-Charon:~$
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
If nothing else I am going to
)
If nothing else I am going to swap cards and slots. And see if the problem follows the card or stays with the slot.
===edit== Even though the temperature isn't displaying like it has hit the temperature limit, a way it slows down as the temperature goes up, makes me wonder.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Looks like GPU0 is locked at
)
Looks like GPU0 is locked at 1530MHz. That’s probably why it’s running with less power draw.
_________________________________________________________________________
Ian&Steve C. wrote: Looks
)
Two questions. Use some kind of reset? And it draws full power when it starts up. Then apparently heats up and slows down?
I will look for command line GPU reset stuff.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
What is the performance
)
What is the performance setting for this GPU in Nvidia Settings? “Auto” “Adaptive” “Prefer Maximum Performance”
_________________________________________________________________________