The benefit of lower gpu utilization is lower power/heat.
That's a very inefficient way to throttle power usage, because you're still running the GPU in a "full throttle" mode (it doesn't know you want it throttled, but assumes you want the task finished as qickly as possible). And the calculation breaks when the single WU has to wait for something are too short for the GPU to power down.
If you load your higher with multiple WUs, but comppensate by reducing the power target (this should also be possible for AMDs) they will reduce clock speed and voltage. The latter gives you better energy efficiency.
This is independent of "single vs. triple GPU". And I wrote it because I think your config with just 1 WU per GPU exaggerates the PCIe differences (due to the worse load balancing / micro pauses already mentioned).
BTW: the top hosts with 2 Tahitis (smaller than your Hawaiis) achieve about 120k RAC per GPU, using i7 4770K as hosts, i.e. with 8x PCIe 3 or with a PLX.
MrS
How does one achieve those outputs for the AMD cards? I just bought a Gigabyte 7970 OC gpu clock is 1000MHz. The host is a 2500K running at 4.3GHz. Slot is 16x PCIe 2. I am running 2 Perseus Arm WU and each finishes in 90 minutes. Is there anything I can do to bring up the output from 87k (calculated based on elapsed time and credit). I ran Arecebo tasks before and I was getting even lower RAC (78k). Is PCIe 2 limiting me, or running windows? I rolled back the latest driver to an earlier version.
If the goal was efficiency, I don't think I would have got the most power hungry gpus around. Heat is somewhat important as too much causes failures. But let's look at some numbers:
with one wu my system is using 540w, the 295x2 runs at 52c and the 290x runs at 68c
with two wu the system uses 585w, the 295x2 runs at 53c and the 290x runs at 70c. the power increase isn't very large, and some of that would be due to the cpu working about 20% harder to feed double the gpu wus.
so i'm not one to tell anyone how to run their stuff, but this is my first time running a video card cooled by a fish pump and reading horror stories about pump failures, leaks, and automatic throttling down due to overheating has led me to be just a little bit conservative considering the coin i just dropped. none of the top computers run more than two gpus on a 16 lane platform so let's just say i was trying to see how much blood i could squeeze from a rock. my 295x2 only has 85% gpu utilization with two wus so it could be pushed harder as 53c is not a lot of heat.
my lazy math says you should be getting more than 100k on that. pcie2 is slower than pcie3 but you could try going to 3 or 4 wu
I will try that now. My version of the card is a terrible overclocker. After raising the mem clock to 1025 and the gpu clock to 1050, it spit out nothing but 0 second workunits that failed with error on compute. My Nvidia cards behave differently by crunching the whole wu and then come up with errors.
If it were me I wouldn't worry so much about trying to overclock. Some will even downclock to stock if reduces errors. i would just use gpu-z to check average temperature and gpu usage.
so on my 290x it was 81% gpu usage on one wu and 97% on two wu so going to three wu might not help much more
but on my 295x2 which is bottlenecked more on the pcie it's 75% on one wu and 85% on two wu so going to three or four wu might help, but at the same time i only have four cpu cores so i have to keep an eye on that too
I am going to go with 3 Perseus Arm wu at a time. Because of the performance penalty of the PCIe 2, the gpu was only loaded to 84% with 2 wu. This increased to 93% and gave me 9% more calculated RAC. This is not a lot, but it does put me right at the 100k RAC. I have two cpu processes running, reducing to 1 did not seem to have any effect on the time.
I'm going to run just one wu at a time because for some reason I have a lot of invalids and I never that problem before. Or maybe the driver is the problem.
I'm going to run just one wu at a time because for some reason I have a lot of invalids
Have you tried turning the clock rate down? (core or memory or both?)
I noticed that on both your hosts the majority of the Perseus jobs listed in the task list with "validate error" show on the task page outcome: Validate error (58:00111010)
That specific outcome first showed up on my GTX 970 during core clock overclocking experiments today, never having shown up at all in months of operation of five cards on three hosts. Of course the similarity could be a coincidence, but turning down the clock(s) would be quickly diagnostic.
Even if you think yourself not overclocked this might be worth a try. I currently have two of my five cards (both are GTX 660s, as it happens) slightly underclocked.
using Nvidia cards in the past I've always been able to use Precision X to change clocks but Catalyst isn't allowing my changes to stick so I will stay with one wu per gpu to see if the invalids go away.
RE: The benefit of lower
)
That's a very inefficient way to throttle power usage, because you're still running the GPU in a "full throttle" mode (it doesn't know you want it throttled, but assumes you want the task finished as qickly as possible). And the calculation breaks when the single WU has to wait for something are too short for the GPU to power down.
If you load your higher with multiple WUs, but comppensate by reducing the power target (this should also be possible for AMDs) they will reduce clock speed and voltage. The latter gives you better energy efficiency.
This is independent of "single vs. triple GPU". And I wrote it because I think your config with just 1 WU per GPU exaggerates the PCIe differences (due to the worse load balancing / micro pauses already mentioned).
MrS
Scanning for our furry friends since Jan 2002
RE: BTW: the top hosts
)
How does one achieve those outputs for the AMD cards? I just bought a Gigabyte 7970 OC gpu clock is 1000MHz. The host is a 2500K running at 4.3GHz. Slot is 16x PCIe 2. I am running 2 Perseus Arm WU and each finishes in 90 minutes. Is there anything I can do to bring up the output from 87k (calculated based on elapsed time and credit). I ran Arecebo tasks before and I was getting even lower RAC (78k). Is PCIe 2 limiting me, or running windows? I rolled back the latest driver to an earlier version.
http://einsteinathome.org/host/11685226
If the goal was efficiency, I
)
If the goal was efficiency, I don't think I would have got the most power hungry gpus around. Heat is somewhat important as too much causes failures. But let's look at some numbers:
with one wu my system is using 540w, the 295x2 runs at 52c and the 290x runs at 68c
with two wu the system uses 585w, the 295x2 runs at 53c and the 290x runs at 70c. the power increase isn't very large, and some of that would be due to the cpu working about 20% harder to feed double the gpu wus.
so i'm not one to tell anyone how to run their stuff, but this is my first time running a video card cooled by a fish pump and reading horror stories about pump failures, leaks, and automatic throttling down due to overheating has led me to be just a little bit conservative considering the coin i just dropped. none of the top computers run more than two gpus on a 16 lane platform so let's just say i was trying to see how much blood i could squeeze from a rock. my 295x2 only has 85% gpu utilization with two wus so it could be pushed harder as 53c is not a lot of heat.
my lazy math says you should
)
my lazy math says you should be getting more than 100k on that. pcie2 is slower than pcie3 but you could try going to 3 or 4 wu
RE: my lazy math says you
)
I will try that now. My version of the card is a terrible overclocker. After raising the mem clock to 1025 and the gpu clock to 1050, it spit out nothing but 0 second workunits that failed with error on compute. My Nvidia cards behave differently by crunching the whole wu and then come up with errors.
If it were me I wouldn't
)
If it were me I wouldn't worry so much about trying to overclock. Some will even downclock to stock if reduces errors. i would just use gpu-z to check average temperature and gpu usage.
so on my 290x it was 81% gpu usage on one wu and 97% on two wu so going to three wu might not help much more
but on my 295x2 which is bottlenecked more on the pcie it's 75% on one wu and 85% on two wu so going to three or four wu might help, but at the same time i only have four cpu cores so i have to keep an eye on that too
I am going to go with 3
)
I am going to go with 3 Perseus Arm wu at a time. Because of the performance penalty of the PCIe 2, the gpu was only loaded to 84% with 2 wu. This increased to 93% and gave me 9% more calculated RAC. This is not a lot, but it does put me right at the 100k RAC. I have two cpu processes running, reducing to 1 did not seem to have any effect on the time.
I'm going to run just one wu
)
I'm going to run just one wu at a time because for some reason I have a lot of invalids and I never that problem before. Or maybe the driver is the problem.
woohoo wrote:I'm going to run
)
Have you tried turning the clock rate down? (core or memory or both?)
I noticed that on both your hosts the majority of the Perseus jobs listed in the task list with "validate error" show on the task page outcome: Validate error (58:00111010)
That specific outcome first showed up on my GTX 970 during core clock overclocking experiments today, never having shown up at all in months of operation of five cards on three hosts. Of course the similarity could be a coincidence, but turning down the clock(s) would be quickly diagnostic.
Even if you think yourself not overclocked this might be worth a try. I currently have two of my five cards (both are GTX 660s, as it happens) slightly underclocked.
using Nvidia cards in the
)
using Nvidia cards in the past I've always been able to use Precision X to change clocks but Catalyst isn't allowing my changes to stick so I will stay with one wu per gpu to see if the invalids go away.