CUDA and openCL Benchmarks

Sunny129
Sunny129
Joined: 5 Dec 05
Posts: 162
Credit: 160342159
RAC: 0

RE: RE: Could you please

Quote:
Quote:
Could you please try 6 WUs at a time ?

Don't think this card will like 6 at a time, but I plan on increasing the number of parallel task over the next few days and will report back in due time.


sure you can. your GTX 660 Ti is a 2GB card, so you can run 6 WUs in parallel and not exceed the GPU's memory capacity. i can only run up to 3 tasks in parallel on my particular GTX 560 Ti, but Sid can run up to 6 in parallel b/c he has the 2GB version of my card.

Sunny129
Sunny129
Joined: 5 Dec 05
Posts: 162
Credit: 160342159
RAC: 0

RE: RE: that's

Quote:
Quote:

that's interesting to say the least ...

i wonder if that's fairly indicative of the performance increase expected when going from a GTX 560 Ti to a GTX 660 Ti...


Might be, in some of the reviews I read before purchasing this card they claimed that the 192 bit wide memory bus would slow this card down a bit. And if I were to guess the BRP4-app does a fair bit of memory transfers when running on the card and to and from the main system.


on second thought, we should probably take into consideration that the only way these documented BRP4 run time comparisons could be a true apples-to-apples comparison is if they are all tested on the same hardware bed and OS. not only do the hardware beds and OS's vary wildly across the above chart of documented run times, but so do the DC loads put on those CPUs and GPUs. for example, the first documented run times to show up in this thread for a GTX 560 Ti were very much in line with my own run times, particularly for 3 tasks at a time (which takes right around ~5,000s on my machine). but in Petrion's most recent update of the chart, that figure dropped down to ~4,000s. at first i couldn't figure out how that was possible. then i thought about the fact that when run Einstein@Home, i typically run 6 BRP4 CUDA tasks in parallel (3 per GPU) in conjunction w/ Test4Theory@Home (the multithreaded version, which consumes just under 2 CPU cores), and allocate the remaining CPU resources to either LHC@Home Classic (SixTrack) CPU tasks or Einstein@Home Gravitational Wave CPU tasks. perhaps the original documented run time for 3 consecutive BRP4 tasks of approx. 5,000s on a GTX 560 Ti is similar to mine b/c that host was also allocating the remainder of his CPU resources to other projects like me...and perhaps the other documented run time for 3 consecutive BRP4 tasks on a GTX 560 Ti of approx. 4,000s was done on a host that was loaded only with Einstein@Home BRP4 CUDA work, and not crunching work from any other projects at the time. perhaps the user who provided those GTX 560 Ti run times to Petrion could speak up so we could compare other hardware (CPU, memory quantity, etc.) and maybe make more sense of the difference between our run times...

...and with all that said, i'm thinking that we shouldn't deem the difference in run times between my GTX 560 Ti and your GTX 660 Ti definitive just yet...especially if you did your BRP4 CUDA testing with no other projects crunching in the background.

Sid
Sid
Joined: 17 Oct 10
Posts: 164
Credit: 970676992
RAC: 429442

RE: ...and with all that

Quote:
...and with all that said, i'm thinking that we shouldn't deem the difference in run times between my GTX 560 Ti and your GTX 660 Ti definitive just yet...especially if you did your BRP4 CUDA testing with no other projects crunching in the background.


If I'm running some CPU tasks in parallel with BRP4 tasks I can clearly see that GPU load is not more then 75% compare to 95% otherwise.
So time can be really different.

Maciek
Maciek
Joined: 21 Mar 05
Posts: 1
Credit: 319915
RAC: 0

Hello I have a problem. Which

Hello
I have a problem. Which of these cards gt430 or hd6670 (96 cuda cores vs 480 cores 268.8 vs 768 GFLOPS all data from wiki comparison table) should gave more computed WU/day? I don't know exactly what should I expect from radeon with latest opencl app.

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

RE: ...and with all that

Quote:
...and with all that said, i'm thinking that we shouldn't deem the difference in run times between my GTX 560 Ti and your GTX 660 Ti definitive just yet...especially if you did your BRP4 CUDA testing with no other projects crunching in the background.


I've done some more testing and here are my setup and the run times:

CPU: Core i7 GHz
GPU: GTX660Ti PCI-E 3.0 x16, MHz as reported by GPU-Z
RAM: 16 GB of Corsair PC3-12800 (800 MHz, dual channel)
OS: Win7 x64

[pre]All times in seconds and the CPU fully loaded with Einstein CPU-tasks.
# Mean Median Range # of tasks completed
x1 ~1700 1697 1685 - 1728 5
x2 ~2900 2824 2493 - 3494 35
x3 ~4360 4491 3393 - 4999 35
x4 ~6030 6105 4403 - 6802 20
x5 ~8660 8867 5920 - 9741 7
x6 ~12760 13198 11448 - 14066 5[/pre]
The times varies quite much, probably because this is my only computer and I do use it a fair bit. It seems that 2 at a time is most efficient without further tweaking and testing.

Edit: I used Process Lasso to raise the priority of the CPU-part of the BRP-app to "above normal" to improve the GPU-load.
Further more when running 3 or more parallel tasks I observed some minor lags while using the computer, especially while watching video.

Vladimir Lukovic
Vladimir Lukovic
Joined: 10 Sep 08
Posts: 10
Credit: 681501
RAC: 0

Could you run a single

Could you run a single workunit of Albert@Home with the new improved 1.28 cuda app.
My 560gtx does 1 wu in about 1700-1800s.
It would be awesome to see how much of improvement the 660ti makes.

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

Already done that, check this

Already done that, check this post.

Times were about 1180s.

joe areeda
joe areeda
Joined: 13 Dec 10
Posts: 285
Credit: 320378898
RAC: 0

This thread is very

This thread is very interesting, thank you all for contributing.

Here's some info on my GPUs

AMD Phenom II x6, nVidia 560ti: 6cpu + 2GPU
2654 s/wu, Ubuntu 11.04 x86_64, 26,209 avg credit

Intel I7-2600K, nVidia 560: 8cpu + 2GPU
2300 s/WU, Scientific Linux 6.3 x86_64, 34968 avg credit

Intel Iy-3770K nVidia 550ti: 8cpu + 2gpu
2961 s/wu, SL 6.3 x86_64, 33793 avg credit
I'm not sure average credit is stable yet (it's been a month)

I'm thinking of swapping these cards around so I can get a better feel for what exactly the card is doing and how the different CPU and motherboard affect it.

Now my dilemma is that I'm building another system and am ready to order another GPU but what the heck do I buy?

I'm going to stick with CUDA capable (nVidia) for now. Maybe I will have to double my options with OpenCL cards next year.

It seems the 600 series doesn't seem to outperforming the 500 series by very much. I'm pretty much down to a 560ti (2GB) for $270, 660ti (2GB) for $300, 670 (2GB) $400,

These machines are not dedicated to E@H but will be running jobs that are somewhat similar in nature and E@H will be the backfill for any idle time.

Anybody have any opinions or comments that can break the 3-way tie I seem to have? Do those of you who bought 600 series feel they are worth the extra money?

thanks
Joe

Horacio
Horacio
Joined: 3 Oct 11
Posts: 205
Credit: 80557243
RAC: 0

RE: Anybody have any

Quote:

Anybody have any opinions or comments that can break the 3-way tie I seem to have? Do those of you who bought 600 series feel they are worth the extra money?

thanks
Joe


Ill choose something in the Keppler series (warning not all 600 GPUs are Keppler, anyway, those you named are).
If not for performance, then it will be for the lower power requierements that will make you save more money on electricity than the extra paid on the cards... and also for the low noise levels...

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7221924931
RAC: 956394

It seems the question of

It seems the question of whether the GTX 660ti is a great buy or an honorable mention for Einstein use gets down to whether the application here (I hope plural applications in the future) is more nearly limited by shader performance or memory. Apparently it should have the shader count and performance of more expensive cousins, but a significant memory performance deficit.

Also a question is whether the rather oddly asymmetric 660 ti memory implementation might have some additional harmful effect.

Were I buying right now, I, personally, would be strongly tempted to get a Gigabyte 660 ti, and try to compare it to other Einstein hosts with Tesla and Fermi chips to help others decide whether it is a good choice.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.