S41.xx Observation Thread

B52

Joined: 19 Feb 05

Posts: 45

Credit: 273899

RAC: 0

RE: Continuing my practice

2 May 2006 9:29:02 UTC

Message 29948 in response to message 29924

(moderation:

)

Quote:

Continuing my practice of retesting the same archived WU on my Gallatin (Northwood-descended P4 EE running 3.2 GHz HT):

S40.12 39:05
S40.04 32:13
C41.00 41:26
C41.01 55:55
C41.00 41:11 (retry)
S41.06 34:27

So I see S41.06 as much faster than the unfortunate S40.12 on this machine, but still not so fast as the S40.04 it currently runs. I'll try S41.06 on my other machines, and if it looks more promising on them, retry on this machine with another work unit and more carefully controlled conditions.

Same observation here. S40.04 is still the fastest cruncher on my Prescott 3.0 GHZ running HT'ed (so far ;-) )

As said earlier in this thread, wonder where the penalty on these P4's with large caches running ht'd lies ??

Mr.Pernod

Joined: 9 Jul 05

Posts: 83

Credit: 3250626

RAC: 0

so far 1 case of 0

2 May 2006 9:56:52 UTC

Message 29949

(moderation:

)

so far 1 case of 0 credit
Intel Xeon/S41.06 versus Apple/4.37 and AthlonXP/4.37
the others (a lot) on the Xeons and AthlonMP/XP's are either pending or validated without a hitch.
I'll post some times later, when I get home.

B52,
Hyperthreading penalty on P4/Xeon seems related to the small L1 data- and instruction-caches on these chips, as DanNeely posted earlier.

B52

Joined: 19 Feb 05

Posts: 45

Credit: 273899

RAC: 0

RE: B52, Hyperthreading

2 May 2006 10:01:27 UTC

Message 29950 in response to message 29949

(moderation:

)

Quote:

B52,
Hyperthreading penalty on P4/Xeon seems related to the small L1 data- and instruction-caches on these chips, as DanNeely posted earlier.

Thx m8, that post must have slipped thru my reading

Aglarond

Joined: 3 Feb 06

Posts: 2

Credit: 216170

RAC: 0

RE: Doesn't setting the

2 May 2006 10:12:30 UTC

Message 29951 in response to message 29947

(moderation:

)

Quote:

Doesn't setting the project to only use 1 processor on a multiprocessor machine do the trick?

No, this will set boinc to run only one project at a time.

Honza

Joined: 10 Nov 04

Posts: 136

Credit: 3332354

RAC: 0

RE: B52, Hyperthreading

2 May 2006 10:25:31 UTC

Message 29952 in response to message 29949

(moderation:

)

Quote:

B52,
Hyperthreading penalty on P4/Xeon seems related to the small L1 data- and instruction-caches on these chips, as DanNeely posted earlier.

And high latency of L2 cache, slow FSB that has to feed cores and read/write memory content.

zagadka

Joined: 29 Apr 06

Posts: 12

Credit: 17088

RAC: 0

RE: RE: Doesn't setting

2 May 2006 13:18:11 UTC

Message 29953 in response to message 29951

(moderation:

)

Quote:

Quote:
Doesn't setting the project to only use 1 processor on a multiprocessor machine do the trick?

No, this will set boinc to run only one project at a time.

My mistake, it's under general preferences so it should be obvious that it has nothing to do with how projects are being run.

Edited for typos.

Zap

Joined: 12 Feb 06

Posts: 15

Credit: 3900434

RAC: 0

RE: Posted 42 days ago by

2 May 2006 14:34:20 UTC

Message 29954

(moderation:

)

Quote:

Posted 42 days ago by Zap
-------------------------------------------------------------------------------
AMD64 xp 3000 Newcastle core 10% overclock.

Went from 14k plus secs with the original app through an average of some less then 6000 with A36 to now my first result with S38 in 4235 secs.
Quite impressive Akosf.

Now with S41.06 about 2510 secs.( average of 3 z1 results) thats 5.6 times faster.!!!!
This so very impressive. No one here ever gonna forget Akosf I guess.

Validation wil take some time cos teamed up with 30k secs plus crunchers.

B52

Joined: 19 Feb 05

Posts: 45

Credit: 273899

RAC: 0

RE: RE: B52, Hyperthreadi

2 May 2006 14:42:10 UTC

Message 29955 in response to message 29952

(moderation:

)

Quote:

Quote:
B52,
Hyperthreading penalty on P4/Xeon seems related to the small L1 data- and instruction-caches on these chips, as DanNeely posted earlier.
And high latency of L2 cache, slow FSB that has to feed cores and read/write memory content.

Thx for the answers guys.

Plz correct me if I'm wrong on this one.

The only thing then, that will give a HT'ed P4 Prescott a boost is SSE2 or SSE3 optimized code ?? Or will even that not increase the speed ??

Cheers

Mr.Pernod

Joined: 9 Jul 05

Posts: 83

Credit: 3250626

RAC: 0

RE: Thx for the answers

2 May 2006 15:41:27 UTC

Message 29956 in response to message 29955

(moderation:

)

Quote:

Thx for the answers guys.

Plz correct me if I'm wrong on this one.

The only thing then, that will give a HT'ed P4 Prescott a boost is SSE2 or SSE3 optimized code ?? Or will even that not increase the speed ??

Cheers

only very limited in my opinion, the only thing that, to the best of my knowledge, would make a big impact would be a decrease in the "important" dataset, as has been done for S39L, but even that dataset (~11KB) doesn't fit in the 8KB L1 datacache of my Prestonia's (Northwood based Xeons), so when running with HT enabled, there are 2 threads, both "wanting" 11KB L1 at the same time, while the CPU only has 8KB to offer.
This means there are cache-misses, flushes, reloads and fetches from the L2 cache, or even from main system RAM (worst case) which all adds latency to just the memory-handling.
Another issue with HT is the fact that those two Einstein threads are basically doing the same type of work, both claiming resources of a similar nature from the CPU, which only has so many ALU- and FPU-execution units available.
Under ideal circumstances for HyperThreading, you would be running two different threads, one claiming ALU, the other claiming FPU execution units and their combined datasets fit together in L1 and/or L2 cache.
From my own "experiments" with HyperThreading I have found combinations like running SETI + SIMAP or SETI + Distributed.net RC5 at the same time to make the most optimal use of my Xeons, resulting in crunchtimes for both projects that were very, very close to the times I got with HT disabled on those systems.

(hope that made sense)

TauCeti

Joined: 1 Apr 05

Posts: 16

Credit: 1336558

RAC: 0

RE: Another issue with HT

2 May 2006 16:16:23 UTC

Message 29957 in response to message 29956

(moderation:

)

Quote:

Another issue with HT is the fact that those two Einstein threads are basically doing the same type of work, both claiming resources of a similar nature from the CPU, which only has so many ALU- and FPU-execution units available.
Under ideal circumstances for HyperThreading, you would be running two different threads, one claiming ALU, the other claiming FPU execution units and their combined datasets fit together in L1 and/or L2 cache.
From my own "experiments" with HyperThreading I have found combinations like running SETI + SIMAP or SETI + Distributed.net RC5 at the same time to make the most optimal use of my Xeons, resulting in crunchtimes for both projects that were very, very close to the times I got with HT disabled on those systems.
(hope that made sense)

That makes a lot of sense. Mixing the adequate choice of DC-applications on a HT-System can improve the overall performance a _lot_

example: Take my Northwood-P4 2.6 GHz

Calibrate S41.06 without HT with to "100%E@H" performance
Calibrate GIMPS (prime95) Trial-Factoring workload (up to 63 bit) without HT with "100%G" performance (enabling HT does not improve performance)

Running two HT-instances of S41.06 decreases performance to 75%E@H combined, so disabling HT for E@H is the usual choice for maximum throughput

Running one GIMPS-process and S41.06 hyperthreaded together yield 75%E@H _and_ 64%G throughput.

So if you have two identical (for the sake of this argument) machines and use both clients on both machines you get 150%E@H thoughput and 128%G thoughput. Thats a combined throughput of 278% for both projects compared to the 200%E@H or 200%G you get running only one project.

If you are only interested doing E@H you need to find another user who is only interested in GIMPS-TF work. Now you can team up (each user running both workloads) and both parties benefit ;)

Tau

S41.xx Observation Thread

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner