Hmmm, let me ask a question about 'How to interprete benchmarks':
currently my i3 runs with a borrowed GTX550Ti (192 shaders @ 900MHz).
The benchmark you're referring to shows the GTX650 (which is currently my favorite for a final solution) ~20% faster than the GTX550Ti, but the new card has twice as many shaders running at higher speed and it is a new technology. All in all the GTX650 should be nearly twice as fast ...
Actually the 550 is running substantially faster than the 650. Compute on an nVidia card occurs on the part of the chip called a shader; in the 5xx and earlier series of cards the shaders ran twice as fast as the rest of the chip. The stock speed for the 550Ti's shaders is 1800mhz (vs 900 for the 'core' which everything else on the chip ran at). For the new Kepler architecture (used in some 640 and all 650 or higher cards), nVidia lowered the shader clock to be the same as the core clock but added a lot more of them. As a result the performance gap between the two cards is much smaller; just looking at shader clocks and counts you'd expect a 17% speedup so the 20% you're seeing is about right.
WOW, what an explanation! THX for your efford!
So the new calculation is: ~40% less power consumption @ 20% higher speed for the same money.
Hmmm, let me ask a question about 'How to interprete benchmarks':
currently my i3 runs with a borrowed GTX550Ti (192 shaders @ 900MHz).
The benchmark you're referring to shows the GTX650 (which is currently my favorite for a final solution) ~20% faster than the GTX550Ti, but the new card has twice as many shaders running at higher speed and it is a new technology. All in all the GTX650 should be nearly twice as fast ...
Actually the 550 is running substantially faster than the 650. Compute on an nVidia card occurs on the part of the chip called a shader; in the 5xx and earlier series of cards the shaders ran twice as fast as the rest of the chip. The stock speed for the 550Ti's shaders is 1800mhz (vs 900 for the 'core' which everything else on the chip ran at). For the new Kepler architecture (used in some 640 and all 650 or higher cards), nVidia lowered the shader clock to be the same as the core clock but added a lot more of them. As a result the performance gap between the two cards is much smaller; just looking at shader clocks and counts you'd expect a 17% speedup so the 20% you're seeing is about right.
WOW, what an explanation! THX for your efford!
So the new calculation is: ~40% less power consumption @ 20% higher speed for the same money.
Thank you for an explanation.
One of my 560 ti has passed to the electronic haven so I'm trying to figure out - what will be better to buy - new 660 ti or same 560 ti. 560 ti is much cheaper here and if a performance of 660 ti is almost the same - it's not easy to decide.
Sat 15 Sep 2012 10:21:25 WST | | NVIDA GPU 0: GeForce 9800 GTX+ (driver version unknown, CUDA version 5.0, compute capability 1.1, 512MB, 134214097MB available, 710 GFLOPS peak)
Sat 15 Sep 2012 10:21:25 WST | | OpenCL: NVIDIA GPU 0: GeForce 9800 GTX+ (driver version 304.43, device version OpenCL 1.0 CUDA, 512MB, 134214097MB available)
1 x 4500 - sec (Linux 64)
Sat 15 Sep 2012 10:22:22 WST NVIDIA GPU 0: GeForce GTX 260 (driver version unknown, CUDA version 5000, compute capability 1.3, 896MB, 586 GFLOPS peak)
Nvidia driver 304.43
1 x 3540 - sec (Linux 64)
As one would expect, the GTX260 is faster as the 9800GTX+ although the # of
GFLOPS is higher on 9800GTX+ then the GTX260.
(I retired my 9800GTX+ when it started to sound like a vacuum cleaner, made errors and got way too hot).
(A HD5870 has 5440 GFLOPS, S.P. don't know the D.P. GFLOPS value).
(A GTX470, CUDA 4000,{448CUDA-cores/14 Compute Units), Compute Capabillity 2.0 driver 280.12 1025GFLOPS).
(Info provided by BOINC 7.0.28 x86 & x64).
On GTX480 running 2 WUs time ~ 2,117.00 Runtime, 437.39 CPU CR:500.00 Binary Radio Pulsar Search (Arecibo) v1.28 (BRP4cuda32)
HD5870 also 2 per GPU: 3,683.10 RT 273.05 CPU 500.00 Binary Radio Pulsar Search (Arecibo) v1.28 (opencl-ati) 20 Sep 2012 8:57:51 UTC 28 Sep 2012 7:31:02 UTC Completed and validated 2,986.69RT 266.17 CPU 500.00 Binary Radio Pulsar Search (Arecibo) v1.28 (opencl-ati).
(CPU=i7-2600 @3.53GHz. 8GByte 1652MHz DDR3).
I have some numbers but as usual making sense of them is non=trivial.
I'm starting to look at 3 systems and 4 nVidia cards. The idea is that with some mix and match I can hopefully separate the aggravating and mitigating factors. I'm just getting started with that so comments, criticisms, and suggestions are more than welcome.
[pre]
George | Jack |
|=========================|===================|
mobo | ASUS P8Z68-V PRO | Intel DZ77RD-75K |
cpu | I7 - 2600K 3.4 GHz | I7-3770K |
slot | PCIe3 8-bit | PCIe4 16 bit |
gpu | GTX 560 | GTX 550ti |
OS | Sci Linux 6.3 | Sci Linux 6.3 |
nVidia driver | 304.43 | 295.71 |
bandWidth test | 5518/5284 | 6761/5834 |
3 task time | 2307 s | 4841 s |
3 tasks/day | 112.3 | 52.5 |
2 task time | 2232 s | 2975 |
2 task/day | 77.4 | 58.1 |
-------------------------------------------------------------|
Holly | Jack |
|=========================|===================|
mobo | GByte GA-Z68MA-D2h-B3 | Intel DZ77RD-75K |
cpu | I7 - 2600K 3.4 GHz | I7-3770K |
slot | PCIe3 16-bit | PCIe4 16 bit |
gpu | GTX 560ti | GTX 670 |
OS | Ubuntu 12.04 lts | Sci Linux 6.3 |
nVidia driver | 296.40 | 295.71 |
bandWidth test | 6102/5240 | 6761/5834 |
3 task time | 2649 s | 2866 s |
3 tasks/day | 97.8 | 90.8 |
-------------------------------------------------------------|
[/pre]
As I said, I'm trying to isolate the different factors.
The bandwidth test is a sample program from the CUDA SDK that measures memory transfer times Host to/from GPU in MB/s.
The n task time is the time to run one task when running n at a time. This is the mean of 10 completed tasks.
The n tasks/day is how many tasks get completed in a 24 hr period running n at a time.
The ASUS motherboard (stupidly) put their 16 bit PCIe 3 slot at the edge of the board so you can't put a double width card in it because of the edge connectors.
I'm still trying to make sense of these numbers so I won't offer any conclusions. I will continue to mix and match when I get frustrated enough by real work to want to swap hardware around and deal with the resulting OS weirdness.
I have no theory on why the system with the slowest CPU to GPU bandwidth and mid-range GPU (560) is out performing the 560 ti and 670 in systems with higher bandwidth. Must be magic.
I can add results from crunching on a gt430:
4860 - 4960 sec, one wu, one (hyperthreading)core free on i3 @3.2GHz, pcie x16 slot. win7-64, driver 301.42
any update on this? i just went from 301.42 to 306.23 on my experiential box, but i'm only able to test DistrRTgen for changes in efficiency/run times right now. i won't have a chance to test Einstein@Home anytime soon...
any update on this? i just went from 301.42 to 306.23 on my experiential box, but i'm only able to test DistrRTgen for changes in efficiency/run times right now. i won't have a chance to test Einstein@Home anytime soon...
I did d/l the new drivers on both the 660Ti and the 550Ti and didn't notice a difference either way running cuda's here.
RE: RE: Hmmm, let me ask
)
WOW, what an explanation! THX for your efford!
So the new calculation is: ~40% less power consumption @ 20% higher speed for the same money.
RE: RE: RE: Hmmm, let
)
Thank you for an explanation.
One of my 560 ti has passed to the electronic haven so I'm trying to figure out - what will be better to buy - new 660 ti or same 560 ti. 560 ti is much cheaper here and if a performance of 660 ti is almost the same - it's not easy to decide.
Sat 15 Sep 2012 10:21:25 WST
)
Sat 15 Sep 2012 10:21:25 WST | | NVIDIA GPU 0: GeForce 9800 GTX+ (driver version unknown, CUDA version 5.0, compute capability 1.1, 512MB, 134214097MB available, 710 GFLOPS peak)
Sat 15 Sep 2012 10:21:25 WST | | OpenCL: NVIDIA GPU 0: GeForce 9800 GTX+ (driver version 304.43, device version OpenCL 1.0 CUDA, 512MB, 134214097MB available)
1 x 4500 - sec (Linux 64)
Sat 15 Sep 2012 10:22:22 WST NVIDIA GPU 0: GeForce GTX 260 (driver version unknown, CUDA version 5000, compute capability 1.3, 896MB, 586 GFLOPS peak)
Nvidia driver 304.43
1 x 3540 - sec (Linux 64)
RE: Sat 15 Sep 2012
)
As one would expect, the GTX260 is faster as the 9800GTX+ although the # of
GFLOPS is higher on 9800GTX+ then the GTX260.
(I retired my 9800GTX+ when it started to sound like a vacuum cleaner, made errors and got way too hot).
(A HD5870 has 5440 GFLOPS, S.P. don't know the D.P. GFLOPS value).
(A GTX470, CUDA 4000,{448CUDA-cores/14 Compute Units), Compute Capabillity 2.0 driver 280.12 1025GFLOPS).
(Info provided by BOINC 7.0.28 x86 & x64).
On GTX480 running 2 WUs time ~ 2,117.00 Runtime, 437.39 CPU CR:500.00 Binary Radio Pulsar Search (Arecibo) v1.28 (BRP4cuda32)
HD5870 also 2 per GPU: 3,683.10 RT 273.05 CPU 500.00 Binary Radio Pulsar Search (Arecibo) v1.28 (opencl-ati) 20 Sep 2012 8:57:51 UTC 28 Sep 2012 7:31:02 UTC Completed and validated 2,986.69RT 266.17 CPU 500.00 Binary Radio Pulsar Search (Arecibo) v1.28 (opencl-ati).
(CPU=i7-2600 @3.53GHz. 8GByte 1652MHz DDR3).
I have some numbers but as
)
I have some numbers but as usual making sense of them is non=trivial.
I'm starting to look at 3 systems and 4 nVidia cards. The idea is that with some mix and match I can hopefully separate the aggravating and mitigating factors. I'm just getting started with that so comments, criticisms, and suggestions are more than welcome.
[pre]
George | Jack |
|=========================|===================|
mobo | ASUS P8Z68-V PRO | Intel DZ77RD-75K |
cpu | I7 - 2600K 3.4 GHz | I7-3770K |
slot | PCIe3 8-bit | PCIe4 16 bit |
gpu | GTX 560 | GTX 550ti |
OS | Sci Linux 6.3 | Sci Linux 6.3 |
nVidia driver | 304.43 | 295.71 |
bandWidth test | 5518/5284 | 6761/5834 |
3 task time | 2307 s | 4841 s |
3 tasks/day | 112.3 | 52.5 |
2 task time | 2232 s | 2975 |
2 task/day | 77.4 | 58.1 |
-------------------------------------------------------------|
Holly | Jack |
|=========================|===================|
mobo | GByte GA-Z68MA-D2h-B3 | Intel DZ77RD-75K |
cpu | I7 - 2600K 3.4 GHz | I7-3770K |
slot | PCIe3 16-bit | PCIe4 16 bit |
gpu | GTX 560ti | GTX 670 |
OS | Ubuntu 12.04 lts | Sci Linux 6.3 |
nVidia driver | 296.40 | 295.71 |
bandWidth test | 6102/5240 | 6761/5834 |
3 task time | 2649 s | 2866 s |
3 tasks/day | 97.8 | 90.8 |
-------------------------------------------------------------|
[/pre]
As I said, I'm trying to isolate the different factors.
The bandwidth test is a sample program from the CUDA SDK that measures memory transfer times Host to/from GPU in MB/s.
The n task time is the time to run one task when running n at a time. This is the mean of 10 completed tasks.
The n tasks/day is how many tasks get completed in a 24 hr period running n at a time.
The ASUS motherboard (stupidly) put their 16 bit PCIe 3 slot at the edge of the board so you can't put a double width card in it because of the edge connectors.
I'm still trying to make sense of these numbers so I won't offer any conclusions. I will continue to mix and match when I get frustrated enough by real work to want to swap hardware around and deal with the resulting OS weirdness.
I have no theory on why the system with the slowest CPU to GPU bandwidth and mid-range GPU (560) is out performing the 560 ti and 670 in systems with higher bandwidth. Must be magic.
Joe
Oh one more thing (I've
)
Oh one more thing (I've exceeded the time I can edit the above post)
I collected a lot of that info as I wrote the post and just noticed Jack was running a different version of the nVidia drivers than George.
I use the OS distribution not the ones from nVidia site because of the nasty issues I had with Ubuntu mixing and matching them.
I updated Jack and we'll see if that makes a difference.
Joe
Well I figured out the reason
)
Well I figured out the reason the 560 seemed to do so well.
I screwed up. No surprise there.
It was in a different location and was only doing 2 GPU tasks not 3 like I thought.
I can't edit that table but will make a new one soon.
Sorry for posting bad info.
I can add results from
)
I can add results from crunching on a gt430:
4860 - 4960 sec, one wu, one (hyperthreading)core free on i3 @3.2GHz, pcie x16 slot. win7-64, driver 301.42
RE: Just thought I would
)
any update on this? i just went from 301.42 to 306.23 on my experiential box, but i'm only able to test DistrRTgen for changes in efficiency/run times right now. i won't have a chance to test Einstein@Home anytime soon...
RE: RE: Just thought I
)
I did d/l the new drivers on both the 660Ti and the 550Ti and didn't notice a difference either way running cuda's here.