CUDA and openCL Benchmarks

Sunny129

Joined: 5 Dec 05

Posts: 162

Credit: 160342159

RAC: 0

RE: Thanks Horacio I will

24 Nov 2012 1:13:03 UTC

Message 110219 in response to message 110218

(moderation:

)

Quote:

Thanks Horacio I will give that a try as soon as I get a few minutes.

Of course setting the preferences back to 0.5 was the first thing I did last night and it didn't fix the one host yet but the other 550Ti did switch back to 0.5 when I sent in it's finished tasks.

short of taking the client_state.xml short cut, we can no longer expect the GPU utilization factor to change instantaneously. that same "instantaneous" short cut used to exist in the app_info.xml file (as the n parameter), but ever since E@H got rid of the need for an app_info.xml file several months ago, it can only be changed through the client_state.xml file or through your web preferences. if you don't use the client_state.xml trick, then any amount of tasks that get downloaded after you changed your GPU utilization factor to 0.25 via your web preferences will run 4 at a time. so even though you changed the GPU utilization factor back to 0.5, any BRP tasks in the queue before you changed it back will continue to to run 4 at a time (unless you change the parameter manually in the client_state.xml as Horacio suggested).

Sunny129

Joined: 5 Dec 05

Posts: 162

Credit: 160342159

RAC: 0

RE: RE: that's why i

24 Nov 2012 1:15:07 UTC

Message 110220 in response to message 110217

(moderation:

)

Quote:

Quote:
that's why i don't use cpu crunching at all: gpu time "costs" much much more in "flops"

This post (and a few others getting good results from other Fermis) started me thinking - Would my puny i3/gtx460/gtx460 results improve if I removed the 100% CPU crunching load entirely?

So I tried....

before (PCIe x16, 768MB)
GTX 460 -> 1x3000, 2x4800
after
GTX 460 -> 1x1600, 2x2900

before (PCIe x4, 768MB)
GTX 460 -> 1x4700, 2x8400
after
GTX 460 -> 1x2870, 2x5750

... and the answer is - oh yes.

Now IÂ´m thinking what next...

since you already know what your GPU task run times are when 100% and 0% of the CPU is allocated to CPU crunching, you should run the same test at 75%, 50%, and 25% CPU just to make sure you aren't leaving any compute performance on the table. going from 100% CPU crunching to 0% CPU crunching doesn't give you the whole picture. your GPU task run times may be just as good or only marginally worse w/ less free CPU cores available, but you'll never know without testing. who knows, you might be able to run a CPU task or two without sacrificing GPU efficiency.

Holmis

Joined: 4 Jan 05

Posts: 1118

Credit: 1055935564

RAC: 0

RE: short of taking the

24 Nov 2012 15:10:09 UTC

Message 110221 in response to message 110219

(moderation:

)

Quote:

short of taking the client_state.xml short cut, we can no longer expect the GPU utilization factor to change instantaneously. that same "instantaneous" short cut used to exist in the app_info.xml file (as the n parameter), but ever since E@H got rid of the need for an app_info.xml file several months ago, it can only be changed through the client_state.xml file or through your web preferences. if you don't use the client_state.xml trick, then any amount of tasks that get downloaded after you changed your GPU utilization factor to 0.25 via your web preferences will run 4 at a time. so even though you changed the GPU utilization factor back to 0.5, any BRP tasks in the queue before you changed it back will continue to to run 4 at a time (unless you change the parameter manually in the client_state.xml as Horacio suggested).

Sorry Sunny but that's not correct.
See Richards post over here

Richard wrote:

Quote:

The number of WUs to run at once is specified via a setting in the segment of client_state.xml, exactly as it would be with an app_info.xml file - check client_state and sched_reply for confirmation.

Where Bikeman is right is in saying that the new data following a change is only transferred from the server to your host when new work is being allocated. Once received, however, it applies to all tasks - including tasks previously cached - assigned to the same plan_class.

At least this is how it's worked when I've changed that setting.

Sunny129

Joined: 5 Dec 05

Posts: 162

Credit: 160342159

RAC: 0

RE: Sorry Sunny but that's

24 Nov 2012 15:48:20 UTC

Message 110222 in response to message 110221

(moderation:

)

Quote:

Sorry Sunny but that's not correct.
See Richards post over here

Richard wrote:
Quote:
The number of WUs to run at once is specified via a setting in the segment of client_state.xml, exactly as it would be with an app_info.xml file - check client_state and sched_reply for confirmation.

Where Bikeman is right is in saying that the new data following a change is only transferred from the server to your host when new work is being allocated. Once received, however, it applies to all tasks - including tasks previously cached - assigned to the same plan_class.

At least this is how it's worked when I've changed that setting.

my apologies for posting inaccurate information.

things appeared to be working exactly as i described them on my hosts, but i think i now know why. you see, i i've been swapping GPUs between all my hosts lately in an effort to find the most efficient combination of hardware, and i've been fiddling w/ the GPU utilization factors as well. but instead of leaving my work buffer size alone, i would typically reduce it to 0.1 days before switching GPUs and downloading new work. b/c i probably had more than 0.1 days of work in the queue each time i reduced the work buffer size to 0.1 days, new work was not getting downloaded, let alone being scheduled and allocated to my hosts. so if things actually work the way Richard described them, that means that even if i manually updated the project to transfer the new settings to my host(s), any tasks that were in the queue before i updated the GPU utilization factor would not run at the new factor, and would continue to run at the old factor until the queue dwindled down to less than 0.1 days worth of work. in other words, had i not reduced my work buffer size prior to each time i swapped GPUs and changed the GPU utilization factor, i would have been allocated new work right away, and would have seen any existing tasks in the queue start crunching at the new factor as soon as that newly allocated work got downloaded to my host(s)...

dskagcommunity

Joined: 16 Mar 11

Posts: 89

Credit: 1217170751

RAC: 180354

AMD/ATI: (colored are

24 Nov 2012 16:43:07 UTC

Message 110223 in response to message 110204

(moderation:

)

AMD/ATI: (colored are optimized >=1.28 app values, defined by Petrion)
HD 7970 ----> 1x~650, 2x~950, 4x~1,800, 5x~2,200
HD 7950 ----> 3x~1860
HD 7950 ----> 1x 1,145
HD 7870
HD 7850
HD 7770 ----> 1x~1960, 2x~3600
HD 7750 ------> 2x~11,000
HD 5870 ------> 2x~3,105
HD 5850 ------> 1x 1,800, 2x 6,085
HD 5830 ------> 1x 2,916
HD 6970
HD 6950(1536)-> 2x 6700
HD 6950 ------> 2x 3,500
HD 6990
HD 6870
HD 5970
HD 6850 ------> 1x~2,300
HD 6850 ------> 1x~2,359
HD 6790
HD 5770 ------> 1x 7,750+
HD 6770
HD 5670 ------> 1x 11,100
HD 5570 ------> 1x~15,000
HD 5450 ------> 1x~36,500!

AMD A8 3870 -> 1x 6,489

NVIDIA: (colored are optimized >=1.28 app values, defined by Petrion)
GTX 690 -----> 3x2800
GTX 590
GTX 680 ------> 1x~750
GTX 680 ------> 3x 3,100(Win7)
GTX 680 -----> 2x 1,945(Linux)
GTX 580 ------> 1x 834, 3x~2,500
GTX 580 ------> 3x 3,350(Windows)
GTX 580 -----> 3x 3,050(Linux)
GTX 670 ------> 3x~4,300(vista)
GTX 660Ti ----> 1x~1,180, 2x~2,170
GTX 660Ti ----> 1x~1,700, 2x~2,900, 3x~4,500, 4x~6,030, 5x~8,660, 6x~12,760
gtx650 ----> 1x2630 sec, 2x4340 sec
GTX 650 Ti ----> 3x ~ 5900 (Linux ,PCIe 2)
GTX 570
GTX 670
GTX 480 ------> 2x~2,200
GTX 470 ------> 2x~3,000, 3x 3,800
GTX 560 [448] -> 1x 1,550, 2x 2,500
gtx 560 TI ----> 2x2030
GTX 560 Ti ----> 1x~1,100, 2x 2,654, 6x 6,400
GTX 560 Ti ----> 1x~1,100, 2x 2,000, 4x 4,100, 5x 5,200
GTX 560 ------> 2x 2,300
GTX 560 ------> 1x 3,300, 2x 4800
GTX 460 -> 1x1600, 2x2900
GTX 465
GTX 550 Ti ---> 1x 1,793, 2x 2,961
GT 640 -------> 1x~5,700
GT 440
GTS 450 ----> 1x~2,200, 2x 4,200
GF 610M ------> 1x~7,800
GT 430 -------> 2x 9,100
GT 430 -------> 1* 4860
GT 520 -------> 1x~9,600(Linux)

FirePro V4800-> 1x 10,620

Older cards (not openCL v1.1 capable) but still interesting comparison:
GT 295 -------> 1x 2,000(Linux)
GTX 285 ----> 2*3000
GTX 260 ----> 1*2200
8800GT G92 ---> 1x 2,940(Linux)
8800GT G92 ---> 1x 3,600(Linux)
8800GTS G80 --> 1x 4,020(Linux)
GTS 250 ------> 2x~5,484
GT 240 ------> 1x 4,035(OC'd)
GT 240 -------> 1x~4,500
GT 240 ----> 1x~5,400, 2x 10,500
GT 240 ----> 1x~3460 (Linux)
GT 220 -------> 2x 19,400[/b]

DSKAG Austria Research Team: [LINK]http://www.research.dskag.at[/LINK]

Alex

Joined: 1 Mar 05

Posts: 451

Credit: 507064930

RAC: 87789

HD7990 available 4096 stream

26 Nov 2012 23:29:50 UTC

Message 110224

(moderation:

)

HD7990 available
4096 stream processors, 3x8 pin power connector!
theoretical performance: 12 wu's / hr , 144000 credits/day with 6 free cores.

astrocrab

Joined: 28 Jan 08

Posts: 208

Credit: 429202534

RAC: 0

two 7970 are faster and

27 Nov 2012 6:48:52 UTC

Message 110225 in response to message 110224

(moderation:

)

two 7970 are faster and cheaper

dskagcommunity

Joined: 16 Mar 11

Posts: 89

Credit: 1217170751

RAC: 180354

AMD/ATI: (colored are

27 Nov 2012 17:57:06 UTC

Message 110226

(moderation:

)

AMD/ATI: (colored are optimized >=1.28 app values, defined by Petrion)
HD 7970 ----> 1x~650, 2x~950, 4x~1,800, 5x~2,200
HD 7950 ----> 3x~1860
HD 7950 ----> 1x 1,145
HD 7870
HD 7850
........

Soo thats over, i made now a Excel Sheet (visible as PDF here) based on all the Values i posted before.
I deleted all old entries, from before 1.28, so everybody who wants to buy a new (or ebay ;)) card can inform himself in this sheet with the nearly real Values he will get :)

http://www.dskag.at/images/Research/EinsteinGPUperformancelist.pdf

Have fun ^^

DSKAG Austria Research Team: [LINK]http://www.research.dskag.at[/LINK]

Claggy

Joined: 29 Dec 06

Posts: 560

Credit: 2699403

RAC: 0

Don't forget the Nvidia GPUs

27 Nov 2012 19:04:33 UTC

Message 110227 in response to message 110226

(moderation:

)

Don't forget the Nvidia GPUs are running a Cuda app, not an OpenCL app, so using the 'No Open CL 1.1' description for Legacy Nvidia GPUs is pointless,

Which also makes talk about Nvidia completion times Off Topic for this thread, perhaps the Mods can change the thread Title to 'Cuda and OpenCL Benchmarks' ;-)

But again, Cruncher's Corner's description is 'Credit, leaderboards, CPU performance' making GPU performance Off Topic, perhaps the admins can make the description 'Credit, leaderboards, CPU performance and GPU performance' ;-)

Claggy

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 727621588

RAC: 1229874

RE: Which also makes talk

28 Nov 2012 0:19:26 UTC

Message 110228 in response to message 110227

(moderation:

)

Quote:

Which also makes talk about Nvidia completion times Off Topic for this thread, perhaps the Mods can change the thread Title to 'Cuda and OpenCL Benchmarks' ;-)

Done :-). I hope the thread starter doesn't mind, but it's really a more appropriate title.

Cheers
HB

CUDA and openCL Benchmarks

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner