CUDA and openCL Benchmarks

Alex
Alex
Joined: 1 Mar 05
Posts: 451
Credit: 507064930
RAC: 87789

RE: RE: Hmmm, let me ask

Quote:
Quote:
Hmmm, let me ask a question about 'How to interprete benchmarks':
currently my i3 runs with a borrowed GTX550Ti (192 shaders @ 900MHz).
The benchmark you're referring to shows the GTX650 (which is currently my favorite for a final solution) ~20% faster than the GTX550Ti, but the new card has twice as many shaders running at higher speed and it is a new technology. All in all the GTX650 should be nearly twice as fast ...

Actually the 550 is running substantially faster than the 650. Compute on an nVidia card occurs on the part of the chip called a shader; in the 5xx and earlier series of cards the shaders ran twice as fast as the rest of the chip. The stock speed for the 550Ti's shaders is 1800mhz (vs 900 for the 'core' which everything else on the chip ran at). For the new Kepler architecture (used in some 640 and all 650 or higher cards), nVidia lowered the shader clock to be the same as the core clock but added a lot more of them. As a result the performance gap between the two cards is much smaller; just looking at shader clocks and counts you'd expect a 17% speedup so the 20% you're seeing is about right.

WOW, what an explanation! THX for your efford!
So the new calculation is: ~40% less power consumption @ 20% higher speed for the same money.

Sid
Sid
Joined: 17 Oct 10
Posts: 164
Credit: 970867431
RAC: 430084

RE: RE: RE: Hmmm, let

Quote:
Quote:
Quote:
Hmmm, let me ask a question about 'How to interprete benchmarks':
currently my i3 runs with a borrowed GTX550Ti (192 shaders @ 900MHz).
The benchmark you're referring to shows the GTX650 (which is currently my favorite for a final solution) ~20% faster than the GTX550Ti, but the new card has twice as many shaders running at higher speed and it is a new technology. All in all the GTX650 should be nearly twice as fast ...

Actually the 550 is running substantially faster than the 650. Compute on an nVidia card occurs on the part of the chip called a shader; in the 5xx and earlier series of cards the shaders ran twice as fast as the rest of the chip. The stock speed for the 550Ti's shaders is 1800mhz (vs 900 for the 'core' which everything else on the chip ran at). For the new Kepler architecture (used in some 640 and all 650 or higher cards), nVidia lowered the shader clock to be the same as the core clock but added a lot more of them. As a result the performance gap between the two cards is much smaller; just looking at shader clocks and counts you'd expect a 17% speedup so the 20% you're seeing is about right.

WOW, what an explanation! THX for your efford!
So the new calculation is: ~40% less power consumption @ 20% higher speed for the same money.


Thank you for an explanation.
One of my 560 ti has passed to the electronic haven so I'm trying to figure out - what will be better to buy - new 660 ti or same 560 ti. 560 ti is much cheaper here and if a performance of 660 ti is almost the same - it's not easy to decide.

RAMen
RAMen
Joined: 18 Jan 09
Posts: 10
Credit: 13945382
RAC: 0

Sat 15 Sep 2012 10:21:25 WST

Sat 15 Sep 2012 10:21:25 WST | | NVIDIA GPU 0: GeForce 9800 GTX+ (driver version unknown, CUDA version 5.0, compute capability 1.1, 512MB, 134214097MB available, 710 GFLOPS peak)
Sat 15 Sep 2012 10:21:25 WST | | OpenCL: NVIDIA GPU 0: GeForce 9800 GTX+ (driver version 304.43, device version OpenCL 1.0 CUDA, 512MB, 134214097MB available)

1 x 4500 - sec (Linux 64)

Sat 15 Sep 2012 10:22:22 WST NVIDIA GPU 0: GeForce GTX 260 (driver version unknown, CUDA version 5000, compute capability 1.3, 896MB, 586 GFLOPS peak)
Nvidia driver 304.43

1 x 3540 - sec (Linux 64)

Fred J. Verster
Fred J. Verster
Joined: 27 Apr 08
Posts: 118
Credit: 22451438
RAC: 0

RE: Sat 15 Sep 2012

Quote:

Sat 15 Sep 2012 10:21:25 WST | | NVIDA GPU 0: GeForce 9800 GTX+ (driver version unknown, CUDA version 5.0, compute capability 1.1, 512MB, 134214097MB available, 710 GFLOPS peak)
Sat 15 Sep 2012 10:21:25 WST | | OpenCL: NVIDIA GPU 0: GeForce 9800 GTX+ (driver version 304.43, device version OpenCL 1.0 CUDA, 512MB, 134214097MB available)

1 x 4500 - sec (Linux 64)

Sat 15 Sep 2012 10:22:22 WST NVIDIA GPU 0: GeForce GTX 260 (driver version unknown, CUDA version 5000, compute capability 1.3, 896MB, 586 GFLOPS peak)
Nvidia driver 304.43

1 x 3540 - sec (Linux 64)

As one would expect, the GTX260 is faster as the 9800GTX+ although the # of
GFLOPS is higher on 9800GTX+ then the GTX260.
(I retired my 9800GTX+ when it started to sound like a vacuum cleaner, made errors and got way too hot).
(A HD5870 has 5440 GFLOPS, S.P. don't know the D.P. GFLOPS value).
(A GTX470, CUDA 4000,{448CUDA-cores/14 Compute Units), Compute Capabillity 2.0 driver 280.12 1025GFLOPS).
(Info provided by BOINC 7.0.28 x86 & x64).

On GTX480 running 2 WUs time ~ 2,117.00 Runtime, 437.39 CPU CR:500.00 Binary Radio Pulsar Search (Arecibo) v1.28 (BRP4cuda32)
HD5870 also 2 per GPU: 3,683.10 RT 273.05 CPU 500.00 Binary Radio Pulsar Search (Arecibo) v1.28 (opencl-ati) 20 Sep 2012 8:57:51 UTC 28 Sep 2012 7:31:02 UTC Completed and validated 2,986.69RT 266.17 CPU 500.00 Binary Radio Pulsar Search (Arecibo) v1.28 (opencl-ati).
(CPU=i7-2600 @3.53GHz. 8GByte 1652MHz DDR3).

joe areeda
joe areeda
Joined: 13 Dec 10
Posts: 285
Credit: 320378898
RAC: 0

I have some numbers but as

I have some numbers but as usual making sense of them is non=trivial.

I'm starting to look at 3 systems and 4 nVidia cards. The idea is that with some mix and match I can hopefully separate the aggravating and mitigating factors. I'm just getting started with that so comments, criticisms, and suggestions are more than welcome.

[pre]
George | Jack |
|=========================|===================|
mobo | ASUS P8Z68-V PRO | Intel DZ77RD-75K |
cpu | I7 - 2600K 3.4 GHz | I7-3770K |
slot | PCIe3 8-bit | PCIe4 16 bit |
gpu | GTX 560 | GTX 550ti |
OS | Sci Linux 6.3 | Sci Linux 6.3 |
nVidia driver | 304.43 | 295.71 |
bandWidth test | 5518/5284 | 6761/5834 |
3 task time | 2307 s | 4841 s |
3 tasks/day | 112.3 | 52.5 |
2 task time | 2232 s | 2975 |
2 task/day | 77.4 | 58.1 |
-------------------------------------------------------------|
Holly | Jack |
|=========================|===================|
mobo | GByte GA-Z68MA-D2h-B3 | Intel DZ77RD-75K |
cpu | I7 - 2600K 3.4 GHz | I7-3770K |
slot | PCIe3 16-bit | PCIe4 16 bit |
gpu | GTX 560ti | GTX 670 |
OS | Ubuntu 12.04 lts | Sci Linux 6.3 |
nVidia driver | 296.40 | 295.71 |
bandWidth test | 6102/5240 | 6761/5834 |
3 task time | 2649 s | 2866 s |
3 tasks/day | 97.8 | 90.8 |
-------------------------------------------------------------|
[/pre]
As I said, I'm trying to isolate the different factors.

The bandwidth test is a sample program from the CUDA SDK that measures memory transfer times Host to/from GPU in MB/s.

The n task time is the time to run one task when running n at a time. This is the mean of 10 completed tasks.

The n tasks/day is how many tasks get completed in a 24 hr period running n at a time.

The ASUS motherboard (stupidly) put their 16 bit PCIe 3 slot at the edge of the board so you can't put a double width card in it because of the edge connectors.

I'm still trying to make sense of these numbers so I won't offer any conclusions. I will continue to mix and match when I get frustrated enough by real work to want to swap hardware around and deal with the resulting OS weirdness.

I have no theory on why the system with the slowest CPU to GPU bandwidth and mid-range GPU (560) is out performing the 560 ti and 670 in systems with higher bandwidth. Must be magic.

Joe

joe areeda
joe areeda
Joined: 13 Dec 10
Posts: 285
Credit: 320378898
RAC: 0

Oh one more thing (I've

Oh one more thing (I've exceeded the time I can edit the above post)

I collected a lot of that info as I wrote the post and just noticed Jack was running a different version of the nVidia drivers than George.

I use the OS distribution not the ones from nVidia site because of the nasty issues I had with Ubuntu mixing and matching them.

I updated Jack and we'll see if that makes a difference.

Joe

joe areeda
joe areeda
Joined: 13 Dec 10
Posts: 285
Credit: 320378898
RAC: 0

Well I figured out the reason

Well I figured out the reason the 560 seemed to do so well.

I screwed up. No surprise there.

It was in a different location and was only doing 2 GPU tasks not 3 like I thought.

I can't edit that table but will make a new one soon.

Sorry for posting bad info.

Alex
Alex
Joined: 1 Mar 05
Posts: 451
Credit: 507064930
RAC: 87789

I can add results from

I can add results from crunching on a gt430:
4860 - 4960 sec, one wu, one (hyperthreading)core free on i3 @3.2GHz, pcie x16 slot. win7-64, driver 301.42

Sunny129
Sunny129
Joined: 5 Dec 05
Posts: 162
Credit: 160342159
RAC: 0

RE: Just thought I would

Quote:

Just thought I would toss this in before I go to sleep.

New GeForce drivers that I will do later today for the GeForce GTX 550 Ti OC and GeForce GTX 660 Ti SC

Version 306.23 - WHQL Release Date Thu Sep 13, 2012



any update on this? i just went from 301.42 to 306.23 on my experiential box, but i'm only able to test DistrRTgen for changes in efficiency/run times right now. i won't have a chance to test Einstein@Home anytime soon...

MAGIC Quantum Mechanic
MAGIC Quantum M...
Joined: 18 Jan 05
Posts: 1886
Credit: 1408644557
RAC: 1157450

RE: RE: Just thought I

Quote:
Quote:

Just thought I would toss this in before I go to sleep.

New GeForce drivers that I will do later today for the GeForce GTX 550 Ti OC and GeForce GTX 660 Ti SC

Version 306.23 - WHQL Release Date Thu Sep 13, 2012



any update on this? i just went from 301.42 to 306.23 on my experiential box, but i'm only able to test DistrRTgen for changes in efficiency/run times right now. i won't have a chance to test Einstein@Home anytime soon...

I did d/l the new drivers on both the 660Ti and the 550Ti and didn't notice a difference either way running cuda's here.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.