what's the common wisdom on P4 hyperthreading? i seem
to be getting mixed results!
i enabled HT on a dual socket/single core P4 xeon 2.8GHz/1MB/800MHz FSB
and saw no throughput improvement running 4 WUs
with HT vs. 2 WU with HT off:
http://einsteinathome.org/host/760773
WUs ran just a hair over 2x as long, and 2x as many
were produced... basically, no performance impact.
i'm disabling HT on this system!
i ran the same experiment on a dual socket/single core
P4 xeon 2.4GHz/512KB/400MHz FSB and saw something like a 25 - 30% boost in throughput.
http://einsteinathome.org/host/761261
WUs take ~ 60% longer but 2x as many get done in that
time.
with HT off, relative throughput is pretty much
proportional to the clock speed ratio of the two
systems - that's the expected result. however, note that with HT enabled, the 2.4GHz system mops up the 2.8GHz system! HUH???
it seems i'm better off enabling HT on the 2.4GHz
system and disabling it on the 2.8GHz system... and
this way, the 2.4GHz system has better throughput!
can anyone make sense of this???
Copyright © 2024 Einstein@Home. All rights reserved.
P4: to hyperthread or not to hyperthread
)
You may wish to review this thread, on which I posted my hyperthreading observations for reasonably current software. Other engaged in some comment. For other (mostly older) results you can search the forum archive.
previous hyperthreading thread
At a fairly late stage in akosf's S4 application development, one key change meant that subsequent versions actually ran slower when paired on HT machines that the same ap on the same machines run non-HT. However we are now on S5 and those versions are part of history, not present.
Other than that case, we've not had consistent reports of failure to improve in HT, so far as I recall. I think reported throughput improvement for Einstein has clustered pretty near 20%.
Certainly not all HT machines are created equal, but your 2.8 GHz Xeon would be expected to improve. Possibly you had a non-matched sample of WU's, or possibly there was non-comparable non-normal system activity from other sources in the two cases. HT reporting times are particularly easy to influence (i.e. falsify) by the behavior of other aps. I've seen both cases: the other ap can make reported time be either higher or lower than the norm.
If Einstein productivity is your key issue, I'd turn on HT and leave it on. I suspect a more careful or better controlled experiment will confirm this on your own rig if you feel the need.
RE: with HT off, relative
)
P4 is not P4
There are different Cores.
Your P4@2,4GHz is a Northwood and the P4@2,8 GHz is a Prescott core!
In spite of the fact that the Prescott is newer, the Northwood is more efficient. That's why the Northwood is faster.
RE: RE: with HT off,
)
Running a P4 Prescott 2.8GHz and found that the WUs take about 60% longer but produce twice as many results. Net result is an improvement with HT on.
!!Stupidity should be PAINFUL!!
There is one other issue with
)
There is one other issue with hyperthreading and BOINC that can be quite annoying. The OS seems to schedule priorities by CPU. So even if BOINC has a low priority, another single-thread app with normal priority will not cause BOINC to release both CPUs. BOINC keeps one and lets the other app get the other one. This can significantly impact performance of your applications since the other processor is not completely independant of the first one.
Hope that makes sense, it's what I appear to be seeing on my HT machines.
RE: The OS seems to
)
Yes-that is an OS issue, not a BOINC issue. It is the obvious consequence of dealing with HT by having the OS treat the chip as two independent CPUs. It allowed reuse of lots of existing code, but slips where it fails to represent reality.
Your priority application gets _all_ of the one (of two virtual) CPUs it is running on. If that is not enough for you, don't run HT.
RE: RE: The OS seems to
)
The Linux OS scheduler has special programming to deal sensibly with the Intel HT shenanigans.
Give one of the recent Linux distros a try to see what performance you get?
Happy crunchin',
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
RE: Give one of the recent
)
I'm running BOINC on one dual-Xeon (Prestonia) on my Debian Sarge.
What I observe is when there's a normal-priority memory-intensive app running on one half of a processor, the other half of that physical processor is mostly idle (like 95% idle) even though there are a couple of BOINC science apps waiting for CPU attention.
Which, IMHO, shows that linux CPU scheduller works like a charm.
Metod ...
RE: RE: Give one of the
)
And that also shows that your system is memory bandwidth limited! :-(
But yes, good to see that the priorities work as they should.
Happy crunchin',
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
RE: RE: RE: Give one of
)
I'm sorely aware of this fact and for the new systems I currently try to buy AMD systems. These don't show the same phenomenon. A quick test of an Intel Core2 Duo system showed much improvement over Xeons in this area also.
Metod ...