Times (Elapsed / CPU) for BRP5/6/6-Beta on various CPU/GPU combos - DISCUSSION Thread

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 579080201

RAC: 203723

Yep, same stepping says it

12 Apr 2015 19:38:43 UTC

Message 130818 in response to message 130817

(moderation:

)

Yep, same stepping says it all. The Refresh K models are the only ones with known differences (the soldered heat spreader).

MrS

Scanning for our furry friends since Jan 2002

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117759378734

RAC: 34779131

RE: I suspect there is some

13 Apr 2015 7:07:02 UTC

Message 130819 in response to message 130816

(moderation:

)

Quote:

I suspect there is some other reason lurking deep in the system.

Perhaps, and perhaps not very deep either :-).

The Haswell is an i3-4130 (2 cores / 4 threads) @ 3.4 GHz with 2 free threads.
The Haswell Refresh is a G3258 (2 cores / 2 threads) @ 3.9 GHz with 1 free thread.

I had expected that supporting 4 GPU tasks with 1 free core would be a penalty (even though the 3.9 GHz is an obvious bonus) such that there might be a detrimental effect on the CPU component of the crunch time.

When I saw the G3258 giving 618 secs average CPU time and the i3-4130 giving more than 50% higher at 960 secs, I wondered if there was something beneficial with Haswell Refresh. Perhaps the benefit is coming from 3.9 GHz compared to 3.4 GHz, although that seems like too big a difference for not that big a frequency increase. Perhaps part of the difference is not having the use of HT.

Cheers,
Gary.

Jim1348

Joined: 19 Jan 06

Posts: 463

Credit: 257957147

RAC: 0

RE: Perhaps part of the

13 Apr 2015 11:21:07 UTC

Message 130820 in response to message 130819

(moderation:

)

Quote:

Perhaps part of the difference is not having the use of HT.

It depends on the project of course, but I run the WCG/CEP2 work units on both an Ivy Bridge i5-3550 (4 cores, non-hyperthreaded) and on an i7-3770 (8 cores, hyperthreaded). I see only about a 12% to 15% improvement in overall throughput using hyperthreading, accounting for the difference in clock rates. With other projects, it is a bit more, but probably not over 25% in most cases.

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 732984242

RAC: 1263236

We now have gathered enough

13 Apr 2015 13:54:04 UTC

Message 130821

(moderation:

)

We now have gathered enough confidence in the Beta app to release it as the official one. All your work to summarize the performance characteristics of the new app helped a great lot, so thank you very much indeed. Special Thanks to Gary who suggested and initiated this very structured and focused discussion in the current form.

Next steps:

As promised earlier, we will try to put an additional CUDA app version up that wil use a newer CUDA version, at least 5.5.

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7228964896

RAC: 1134335

RE: a newer CUDA version,

13 Apr 2015 20:46:13 UTC

Message 130822 in response to message 130821

(moderation:

)

Quote:

a newer CUDA version, at least 5.5.

Great news. I hope you find your effort to try this rewarded by higher throughput with not too painful a set of difficulties. In my dreams I hope you will try Cuda7, which seems more likely to find better ways to use Maxwell GPUs than earlier ones, but I'll loyally test whatever you find good enough to have us try it. I've started shortening my queues in anticipation.

Jim1348

Joined: 19 Jan 06

Posts: 463

Credit: 257957147

RAC: 0

RE: In my dreams I hope you

13 Apr 2015 21:40:08 UTC

Message 130823 in response to message 130822

(moderation:

)

Quote:

In my dreams I hope you will try Cuda7, which seems more likely to find better ways to use Maxwell GPUs than earlier ones, but I'll loyally test whatever you find good enough to have us try it. I've started shortening my queues in anticipation.

Me too. It should be pointed out, though it is probably obvious, that Crunchers tend to go where their hardware can best be used, other things being equal. Therefore, in deciding on versions, it is not just the present user population that should be considered, but those that will be attracted by new applications, and alternatively might leave if the grass is greener elsewhere. Therefore, you need to lead your target a little bit.

AgentB

Joined: 17 Mar 12

Posts: 915

Credit: 513211304

RAC: 0

I generally donÂ´t pay much

19 Apr 2015 11:41:36 UTC

Message 130824

(moderation:

)

I generally donÂ´t pay much attention to CPU times, as it it fairly meaningless outside of the context of the system in question.

Elapsed times are more comparable between systems, and in some sense more easily verified (stopwatch for example!).

However this thread, has piqued my interest, and i noticed the following.

If i run only BRP6 1.52 (x2) for both GPUs the CPU times average around 980s.

If i then run additional CPU tasks (on the i3 CPU 530) namely, GWS S6 Bucket 1.06 (X64) followups, the CPU times for the BRP6 tasks DROP to around 720s! Over 20%.

Elapsed times do not appear to change either way.

The system feels better to use browsing when these extra tasks are running.

Not what i expected. Has anyone else noticed or can explain this?

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 579080201

RAC: 203723

Gary wrote:The Haswell is an

19 Apr 2015 21:06:09 UTC

Message 130825 in response to message 130819

(moderation:

)

Gary wrote:

The Haswell is an i3-4130 (2 cores / 4 threads) @ 3.4 GHz with 2 free threads.
The Haswell Refresh is a G3258 (2 cores / 2 threads) @ 3.9 GHz with 1 free thread.

Perhaps part of the difference is not having the use of HT.

Ahh, that's quite a difference! The 2 CPU Threads on the i3 each run on a separate physical core (that's how the OS schedulers handle HT CPUs). If the Einstein app joins those 2 threads every now and then it's guaranteed to share a core with either of them. That's OK, but makes it take longer. On the other hand on the Pentium there's always a physical core free, so the CPU portion of the Einstein tasks completes quicker.

@AgentB: I've also got an explanation for you. The Einstein tasks of the optimized 1.52 app use little CPU time. If you're not running any CPu tasks along with them, the CPU will be at idle / base frequency (1600 MHz I think) when the Einstein tasks start. Ramping it up to full speed takes some time. If Einstein is already finished, or at least most of it, the average CPU clock speed will be well below the maximum clock speed.

If you run CPU tasks along with the GPU tasks, the continous load will keep the CPU clock up and reduce execution times. This effect is probably amplified by your CPU being a bit older, so it doesn't switch power states as quickly as newer hardware. This doesn't matter, though, as either way is fast enough to support your GPU (same elapsed times) :)

MrS

Scanning for our furry friends since Jan 2002

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117759378734

RAC: 34779131

RE: ... If you're not

20 Apr 2015 2:42:13 UTC

Message 130826 in response to message 130825

(moderation:

)

Quote:

... If you're not running any CPu tasks along with them, the CPU will be at idle / base frequency (1600 MHz I think) when the Einstein tasks start. Ramping it up to full speed takes some time. If Einstein is already finished, or at least most of it, the average CPU clock speed will be well below the maximum clock speed.

Thanks very much for pointing this out! I've sometimes seen people say that they run multiple GPU tasks and leave ALL the cores free. I'm sure that helps with both power consumption and temperature but may hinder GPU performance if all the CPU cores are likely to be running at idle frequency most of the time. I remember looking at such a host some time ago and expecting to find the fastest crunch times but actually seeing what seemed to be slightly worse performance. That all makes sense now. Thanks for the explanation.

Cheers,
Gary.

AgentB

Joined: 17 Mar 12

Posts: 915

Credit: 513211304

RAC: 0

RE: I've sometimes seen

20 Apr 2015 19:03:37 UTC

Message 130827 in response to message 130826

(moderation:

)

Quote:

I've sometimes seen people say that they run multiple GPU tasks and leave ALL the cores free. I'm sure that helps with both power consumption and temperature but may hinder GPU performance if all the CPU cores are likely to be running at idle frequency most of the time.

Thanks also MrS every day at E@H a school day. So the CPU is not doing more itÂ´s just doing it slower.

With the old BRP4 tasks, the CPU load on my system was much higher, and running two GPUs would keep all four CPU threads busy running 6 tasks, around the 25% mark, so that explains why i did not see this before. With BRP4 i would notice any CPU load would have a negative effect on GPU elapsed time, but thinking about it now, a single card with a better processor probably benefit from some CPU load to keep it lit up and feeding the GPU.

BRP6 is a totally different GPU app, so much so, iÂ´m toying with the idea of running a third GPU in a PCIEx1 slot. Second hand GTX-460 are getting cheap on ebay, and i have power and cooling capacity.

Times (Elapsed / CPU) for BRP5/6/6-Beta on various CPU/GPU combos - DISCUSSION Thread

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner