P4: to hyperthread or not to hyperthread

d4rkm47r

Joined: 8 Oct 06

Posts: 2

Credit: 22758242

RAC: 0

17 Oct 2006 9:58:49 UTC

Topic 191952

(moderation:

)

what's the common wisdom on P4 hyperthreading? i seem
to be getting mixed results!

i enabled HT on a dual socket/single core P4 xeon 2.8GHz/1MB/800MHz FSB
and saw no throughput improvement running 4 WUs
with HT vs. 2 WU with HT off:

http://einsteinathome.org/host/760773

WUs ran just a hair over 2x as long, and 2x as many
were produced... basically, no performance impact.
i'm disabling HT on this system!

i ran the same experiment on a dual socket/single core
P4 xeon 2.4GHz/512KB/400MHz FSB and saw something like a 25 - 30% boost in throughput.

http://einsteinathome.org/host/761261

WUs take ~ 60% longer but 2x as many get done in that
time.

with HT off, relative throughput is pretty much
proportional to the clock speed ratio of the two
systems - that's the expected result. however, note that with HT enabled, the 2.4GHz system mops up the 2.8GHz system! HUH???

it seems i'm better off enabling HT on the 2.4GHz
system and disabling it on the 2.8GHz system... and
this way, the 2.4GHz system has better throughput!

can anyone make sense of this???

archae86

Joined: 6 Dec 05

Posts: 3165

Credit: 7387041687

RAC: 2018717

P4: to hyperthread or not to hyperthread

17 Oct 2006 13:26:01 UTC

Message 48444

(moderation:

)

Quote:

what's the common wisdom on P4 hyperthreading?

You may wish to review this thread, on which I posted my hyperthreading observations for reasonably current software. Other engaged in some comment. For other (mostly older) results you can search the forum archive.

previous hyperthreading thread

At a fairly late stage in akosf's S4 application development, one key change meant that subsequent versions actually ran slower when paired on HT machines that the same ap on the same machines run non-HT. However we are now on S5 and those versions are part of history, not present.

Other than that case, we've not had consistent reports of failure to improve in HT, so far as I recall. I think reported throughput improvement for Einstein has clustered pretty near 20%.

Certainly not all HT machines are created equal, but your 2.8 GHz Xeon would be expected to improve. Possibly you had a non-matched sample of WU's, or possibly there was non-comparable non-normal system activity from other sources in the two cases. HT reporting times are particularly easy to influence (i.e. falsify) by the behavior of other aps. I've seen both cases: the other ap can make reported time be either higher or lower than the norm.

If Einstein productivity is your key issue, I'd turn on HT and leave it on. I suspect a more careful or better controlled experiment will confirm this on your own rig if you feel the need.

Semmel

Joined: 16 Oct 06

Posts: 4

Credit: 124159

RAC: 0

RE: with HT off, relative

21 Oct 2006 22:52:39 UTC

Message 48445

(moderation:

)

Quote:

with HT off, relative throughput is pretty much
proportional to the clock speed ratio of the two
systems - that's the expected result. however, note that with HT enabled, the 2.4GHz system mops up the 2.8GHz system! HUH???

it seems i'm better off enabling HT on the 2.4GHz
system and disabling it on the 2.8GHz system... and
this way, the 2.4GHz system has better throughput!

can anyone make sense of this???

P4 is not P4

There are different Cores.
Your P4@2,4GHz is a Northwood and the P4@2,8 GHz is a Prescott core!

In spite of the fact that the Prescott is newer, the Northwood is more efficient. That's why the Northwood is faster.

keyboards

Joined: 2 Mar 06

Posts: 3

Credit: 80519

RAC: 0

RE: RE: with HT off,

23 Oct 2006 14:57:50 UTC

Message 48446 in response to message 48445

(moderation:

)

Quote:

Quote:

with HT off, relative throughput is pretty much
proportional to the clock speed ratio of the two
systems - that's the expected result. however, note that with HT enabled, the 2.4GHz system mops up the 2.8GHz system! HUH???

it seems i'm better off enabling HT on the 2.4GHz
system and disabling it on the 2.8GHz system... and
this way, the 2.4GHz system has better throughput!

can anyone make sense of this???

P4 is not P4

There are different Cores.
Your P4@2,4GHz is a Northwood and the P4@2,8 GHz is a Prescott core!

In spite of the fact that the Prescott is newer, the Northwood is more efficient. That's why the Northwood is faster.

Running a P4 Prescott 2.8GHz and found that the WUs take about 60% longer but produce twice as many results. Net result is an improvement with HT on.

!!Stupidity should be PAINFUL!!

mray

Joined: 23 Dec 05

Posts: 5

Credit: 327530410

RAC: 37502

There is one other issue with

24 Oct 2006 0:49:29 UTC

Message 48447

(moderation:

)

There is one other issue with hyperthreading and BOINC that can be quite annoying. The OS seems to schedule priorities by CPU. So even if BOINC has a low priority, another single-thread app with normal priority will not cause BOINC to release both CPUs. BOINC keeps one and lets the other app get the other one. This can significantly impact performance of your applications since the other processor is not completely independant of the first one.

Hope that makes sense, it's what I appear to be seeing on my HT machines.

archae86

Joined: 6 Dec 05

Posts: 3165

Credit: 7387041687

RAC: 2018717

RE: The OS seems to

24 Oct 2006 5:02:49 UTC

Message 48448 in response to message 48447

(moderation:

)

Quote:

The OS seems to schedule priorities by CPU. So even if BOINC has a low priority, another single-thread app with normal priority will not cause BOINC to release both CPUs.

Yes-that is an OS issue, not a BOINC issue. It is the obvious consequence of dealing with HT by having the OS treat the chip as two independent CPUs. It allowed reuse of lots of existing code, but slips where it fails to represent reality.

Your priority application gets _all_ of the one (of two virtual) CPUs it is running on. If that is not enough for you, don't run HT.

ML1

Joined: 20 Feb 05

Posts: 347

Credit: 86563414

RAC: 0

RE: RE: The OS seems to

14 Nov 2006 12:14:40 UTC

Message 48449 in response to message 48448

(moderation:

)

Quote:

Quote:
The OS seems to schedule priorities by CPU. So even if BOINC has a low priority, another single-thread app with normal priority will not cause BOINC to release both CPUs.

Yes-that is an OS issue, not a BOINC issue. It is the obvious consequence of dealing with HT by having the OS treat the chip as two independent CPUs. It allowed reuse of lots of existing code, but slips where it fails to represent reality.

Your priority application gets _all_ of the one (of two virtual) CPUs it is running on. If that is not enough for you, don't run HT.

The Linux OS scheduler has special programming to deal sensibly with the Intel HT shenanigans.

Give one of the recent Linux distros a try to see what performance you get?

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)

Metod, S56RKO

Joined: 11 Feb 05

Posts: 135

Credit: 833609621

RAC: 90780

RE: Give one of the recent

16 Nov 2006 10:50:42 UTC

Message 48450 in response to message 48449

(moderation:

)

Quote:

Give one of the recent Linux distros a try to see what performance you get?

I'm running BOINC on one dual-Xeon (Prestonia) on my Debian Sarge.
What I observe is when there's a normal-priority memory-intensive app running on one half of a processor, the other half of that physical processor is mostly idle (like 95% idle) even though there are a couple of BOINC science apps waiting for CPU attention.

Which, IMHO, shows that linux CPU scheduller works like a charm.

Metod ...

ML1

Joined: 20 Feb 05

Posts: 347

Credit: 86563414

RAC: 0

RE: RE: Give one of the

17 Nov 2006 0:12:11 UTC

Message 48451 in response to message 48450

(moderation:

)

Quote:

Quote:
Give one of the recent Linux distros a try to see what performance you get?

I'm running BOINC on one dual-Xeon (Prestonia) on my Debian Sarge.
What I observe is when there's a normal-priority memory-intensive app running on one half of a processor, the other half of that physical processor is mostly idle (like 95% idle) even though there are a couple of BOINC science apps waiting for CPU attention.

Which, IMHO, shows that linux CPU scheduller works like a charm.

And that also shows that your system is memory bandwidth limited! :-(

But yes, good to see that the priorities work as they should.

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)

Metod, S56RKO

Joined: 11 Feb 05

Posts: 135

Credit: 833609621

RAC: 90780

RE: RE: RE: Give one of

17 Nov 2006 10:49:31 UTC

Message 48452 in response to message 48451

(moderation:

)

Quote:

Quote:
Quote:
Give one of the recent Linux distros a try to see what performance you get?

I'm running BOINC on one dual-Xeon (Prestonia) on my Debian Sarge.
What I observe is when there's a normal-priority memory-intensive app running on one half of a processor, the other half of that physical processor is mostly idle (like 95% idle) even though there are a couple of BOINC science apps waiting for CPU attention.

Which, IMHO, shows that linux CPU scheduller works like a charm.

And that also shows that your system is memory bandwidth limited! :-(

But yes, good to see that the priorities work as they should.

I'm sorely aware of this fact and for the new systems I currently try to buy AMD systems. These don't show the same phenomenon. A quick test of an Intel Core2 Duo system showed much improvement over Xeons in this area also.

Metod ...

P4: to hyperthread or not to hyperthread

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner