Wow, those results are really consistant in time. Even my dedicated crunching box, a 1.5gig athlon has noise in the +-5 minute range.
It is my impression from observation on my own boxes that some sections of some datafiles are more consistent in required CPU from result to result than others. Even my Win98SE machines, with their notoriously inaccurate CPU time reporting, sometimes get a string of remarkably close CPU times when I'm not using them myself at all, and they get a lucky string of results.
And, yes, Ziegenmelkers latest report is again quite persuasive for S40.12 time improvement on that specific machine. It starts to appear that S40.12 and the more modern big-cache hyper-threaded Intel P4s (both Gallatin and Prescott) are not a good match for some reason.
I did not do a controlled rerun, but the first five results returned running S40.12 on my Gallatin (Northwood-descended P4 EE 8k L1, 512k L2, 2M L3 cache) are definitely slower than most recent results from the same two major datafiles.
During the outage, I did do a controlled run of a short Einstein result (r1_0265.5_2133_S4R2a_0) on my Gallatin (Northwood-descended 2M L3 cache--the first P4 EE).
hyperthreaded, with the other BOINC job also an Einstein job:
S40.12 2345 seconds
S40.04 1933 seconds
so S40.12 was 1.21 times the execution time! Pretty high prices.
(though I did not rerun it, I had previous measured 1954 seconds for this result using S-40).
To recapitulate, my two Pentium III's showed slight improvement with S40.12 compared to S40.04. my Banias Pentium M a slight degradation, and my Gallatin Pentium 4 a large degradation. While it may be that the amount of degradation depends highly on the Work Unit, it seems more likely to me that it depends highly on the processor architecture, and perhaps memory speed. (my Gallatin is served by slower FSB memory than most--would be good to hear from other Gallatin owners). I've reverted the Banias and Gallatin machines to S40.04.
My Athlon 64 3200+ is faster with S40.12 than S40.03
in short :
4044 sec to 4047 sec with S-39L and wus r1_1220
3706 sec to 3877 sec with S40.03 and wus z1_1174
now 3503 sec to 3521 sec with S40.12 and wus z1_1174
Could it be that although the double-size lookup table comes with a speed penalty, the reduced cache load more than outweighs this on some CPUs?
I was thinking the same thing, but it seems to also be dependent on the chip design. My AMD 3500+ has sped up nicely, as have a number of related chips it seems.
RE: Wow, those results are
)
It is my impression from observation on my own boxes that some sections of some datafiles are more consistent in required CPU from result to result than others. Even my Win98SE machines, with their notoriously inaccurate CPU time reporting, sometimes get a string of remarkably close CPU times when I'm not using them myself at all, and they get a lucky string of results.
And, yes, Ziegenmelkers latest report is again quite persuasive for S40.12 time improvement on that specific machine. It starts to appear that S40.12 and the more modern big-cache hyper-threaded Intel P4s (both Gallatin and Prescott) are not a good match for some reason.
RE: I did not do a
)
During the outage, I did do a controlled run of a short Einstein result (r1_0265.5_2133_S4R2a_0) on my Gallatin (Northwood-descended 2M L3 cache--the first P4 EE).
hyperthreaded, with the other BOINC job also an Einstein job:
S40.12 2345 seconds
S40.04 1933 seconds
so S40.12 was 1.21 times the execution time! Pretty high prices.
(though I did not rerun it, I had previous measured 1954 seconds for this result using S-40).
To recapitulate, my two Pentium III's showed slight improvement with S40.12 compared to S40.04. my Banias Pentium M a slight degradation, and my Gallatin Pentium 4 a large degradation. While it may be that the amount of degradation depends highly on the Work Unit, it seems more likely to me that it depends highly on the processor architecture, and perhaps memory speed. (my Gallatin is served by slower FSB memory than most--would be good to hear from other Gallatin owners). I've reverted the Banias and Gallatin machines to S40.04.
RE: RE: agree, definitely
)
On my xp1800+, time cost from 92 mins(40.04) to 71 mins(40.12), very great improvment.
My a64x2's done enough work
)
My a64x2's done enough work that I'm seeing a ~10-20% speedup, 10 on the bigs, 20 on the short WUs.
My system also seems to
)
My system also seems to crunch S40.12 faster than S40.04
S40.04 ran about 4100 sec.
S40.12 is about 3900 sec.
The CPU is a Athlon XP 2200+
I've had 2 out of 4 WU's that errored out though.
The CPU is OC'ed a bit. Only the multiplier hence no memory OC.
It hasn't been a problem with any BOINC project before.
Is the S40.12 more stressfull than S40.04?
- Knorr
Could it be that although the
)
Could it be that although the double-size lookup table comes with a speed penalty, the reduced cache load more than outweighs this on some CPUs?
My Athlon 64 3200+ is faster
)
My Athlon 64 3200+ is faster with S40.12 than S40.03
in short :
4044 sec to 4047 sec with S-39L and wus r1_1220
3706 sec to 3877 sec with S40.03 and wus z1_1174
now 3503 sec to 3521 sec with S40.12 and wus z1_1174
[
I've now got a couple of
)
I've now got a couple of sequences, which both show good improvements for S40.12 over S40.04 (I'll leave you to work out the percentages...).
Both machines are stock Dell motherboards, PowerEdge / Dimension respectively.
W2K Server 475735 - P4 Northwood SSE2, 1.8 GHz, 512KB L2 cache
r1_1255.0__344_S4R2a_2 --- 7602 --- S40.04
r1_1255.0__343_S4R2a_2 --- 7592 --- S40.04
r1_1255.0__342_S4R2a_2 --- 7592 --- S40.04
r1_1255.0__341_S4R2a_2 --- 7596 --- S40.04
r1_1255.0__278_S4R2a_1 --- 7149 --- S40.04
r1_1255.0__266_S4R2a_2 --- 7136 --- S40.04
r1_1255.0__247_S4R2a_2 --- 7137 --- S40.04
r1_1255.0__228_S4R2a_2 --- 6796 --- S40.04/12 mixed
r1_1390.0__1417_S4R2a_3 --- 6563 --- S40.12
r1_1390.0__1401_S4R2a_1 --- 6570 --- S40.12
r1_1390.0__1397_S4R2a_0 --- 6568 --- S40.12
r1_1390.0__1393_S4R2a_1 --- 6569 --- S40.12
r1_1390.0__1389_S4R2a_1 --- 6570 --- S40.12
r1_1390.0__1386_S4R2a_3 --- 6567 --- S40.12
r1_1390.0__1381_S4R2a_1 --- 6570 --- S40.12
r1_1390.0__1378_S4R2a_2 --- 6568 --- S40.12
r1_1390.0__1377_S4R2a_2 --- 6568 --- S40.12
XP SP2 Workstation 475717 - P4 Northwood SSE2, 2.0 GHz, 512KB L2 cache
r1_1200.0__394_S4R2a_1 --- 6549 --- S40.04
r1_1200.0__393_S4R2a_1 --- 6738 --- S40.04
r1_1200.0__392_S4R2a_1 --- 6750 --- S40.04
r1_1200.0__391_S4R2a_0 --- 6614 --- S40.04
r1_1200.0__390_S4R2a_0 --- 6637 --- S40.04
r1_1200.0__389_S4R2a_0 --- 6649 --- S40.04
r1_1200.0__388_S4R2a_0 --- 6610 --- S40.04
r1_1200.0__387_S4R2a_0 --- 6671 --- S40.04/12 mixed
z1_0874.0__23_S4R2a_2 --- 5566 --- S40.12
z1_0874.0__20_S4R2a_1 --- 5466 --- S40.12
z1_0874.0__17_S4R2a_2 --- 5488 --- S40.12
r1_1200.0__324_S4R2a_1 --- 6129 --- S40.12
z1_0874.0__11_S4R2a_0 --- 5468 --- S40.12
z1_0874.0__9_S4R2a_0 --- 5530 --- S40.12
z1_0874.0__8_S4R2a_0 --- 5445 --- S40.12
r1_1200.0__284_S4R2a_0 --- 5762 --- S40.12
r1_1200.0__277_S4R2a_1 --- 5764 --- S40.12
So many thanks and congratulations to Akosf, yet again - keep up the good work!
(P.S. ignore the timings shown on results pages - I'm using Trux's tx36 calibrationg client, and it 'tweaks' the timings.)
RE: Could it be that
)
I was thinking the same thing, but it seems to also be dependent on the chip design. My AMD 3500+ has sped up nicely, as have a number of related chips it seems.
It doesn't seem to work
)
It doesn't seem to work properly on my system.
The CPU is overclocked a bit. Only by multiplier so no tweak of FSB/memory.
Out of a batch of 7 WU's.
These 3 errored:
26295992
26344584
26445945
1 turned out invalid:
26295998
And the last 3 is currently pending:
26416342
26416347
26423016
The speed is great though... ;)
But I'm gonna switch to D40 for a while.