process time, i hope this version will be faster on HT systems
Ok, here is some. These results are from r1_0253.5 (280 to 271 w/o 279). They are short wus, est. around 35 min. (I didn't take note). Anyway, they go from 35:51 to 36:59 min. with S40.04 (5 wus). I have found this to be the fastest version for my 3.0 Prescott w/ HT so far. 2 mixed results with the S40.04 and S41.07HT: 37:54 (about 80% S40.04) and 39:03 (about 15 to 30% S40.04 - I didn't take note...again). The last two results with 100% S41.07HT: 36:13 and 35:21.
Ok, here is some. These results are from r1_0253.5 (280 to 271 w/o 279). They are short wus, est. around 35 min. (I didn't take note). Anyway, they go from 35:51 to 36:59 min. with S40.04 (5 wus). I have found this to be the fastest version for my 3.0 Prescott w/ HT so far. 2 mixed results with the S40.04 and S41.07HT: 37:54 (about 80% S40.04) and 39:03 (about 15 to 30% S40.04 - I didn't take note...again). The last two results with 100% S41.07HT: 36:13 and 35:21.
Thanks. I think these times are really good, so I don't understand how was S39L faster on some HT systems.
I checked the top 200 computers to see if it shed any light on the question of the fastest optimized application. All are Pentium 4 and presumeably the 2cpu means hyperthreading is on.
My fastest Thoroughbred 1800+ has an average of 44min/result now, for a single CPU box it makes quite a good speed using S41.06 :-)
I see one MP2600+ (Thoroughbred core too) in the Top100 using D41.12 but the results are a little slower than my (not OCed) MP2600+ with S41.06
edit: I don't think that RAC is good for comparison, you never know how many projects and how many hours/day a box runs and what else the user does with it. It takes time for the RAC to adjust to the new result speed too.
Could somebody try out this version of S41.07 on a HT enabled system?
I happened to have spent my afternoon on controlled trials on the HT advantage/disadvantage question when I saw your request, so added your S41.07HT case to the trial.
Summary result
S41.07HT is a dramatic improvement over S41.07 for my Gallatin in HT mode, reducing the test unit CPU time from 2367 seconds to 1948 seconds. However, the non HT time was greatly increased, from 870 to 1145 seconds. So while this version restores the hypthreading benefit to its former value, unfortunately this is even more due to the slowing of the non HT case as compared to the speeding of the HT case. S41.07HT matches S40.04 on both HT and nHT CPU time to within the likely error of my measurements.
Details
As on my previous postings, all results were taken on my Gallatin (P4 EE 3.2 GHz Northwood-descended with 2 Mbyte L3 on-chip cache added, WinXP Pro SP2). The test work unit was the same, r1_0265.5_2113_S4R2a_0.[pre]
Version HT nHT HT/nHT productivity ratio
dist 7584 4630 1.22
S40 1954 1159 1.19
S40.04 1928 1141 1.18
S41.07 2367 870 0.73
S41.07HT 1948 1145 1.18[/pre]
Comments
A handful of cases where I've rerun these test cases suggest the timing repeatability is on the order of 1%.
My Gallatin has a slower FSB motherboard than is probably common (133 MHz FSB instead of 200).
Edit: after the original post I ran the new S41.07HT code on the non-hyperthreaded Gallatin, and revised my text according to the surprising result.
Yes, but why should the HT be used? At least for Prescott 3.4 GHz - if you turn the HT off, you will be significantly faster with S40.07 than with HT_enabled_S39L.
The figures posted by archae86 indicate that S41.07 can crank out a short WU in 870 seconds non-HT on the same machine on which S40.04 takes 1928 seconds in HT mode. Since 870 is less than half of 1928, theoretically it can crank out more WUs per day using S41.07 in non-HT mode. It isn't a huge difference; 1928 / 2 = 964 and 964 - 870 = 94 seconds saved. But with these crunch rates, we rapidly exceed Einstein's 32 results per day per box quota, so we have to run other projects to keep our boxes busy. Those projects enjoy a significant HT advantage that more than offsets its slight disadvantage on Einstein.
Archae - thanx for your work, it's very useful and is compatible with my recent experience with p4 3.4 GHz Prescott (although I haven't tried all the possibilities you have done).
The figures posted by archae86 indicate that S41.07 can crank out a short WU in 870 seconds non-HT on the same machine on which S40.04 takes 1928 seconds in HT mode. Since 870 is less than half of 1928, theoretically it can crank out more WUs per day using S41.07 in non-HT mode. It isn't a huge difference; 1928 / 2 = 964 and 964 - 870 = 94 seconds saved. But with these crunch rates, we rapidly exceed Einstein's 32 results per day per box quota, so we have to run other projects to keep our boxes busy. Those projects enjoy a significant HT advantage that more than offsets its slight disadvantage on Einstein.
Well, 94 seconds out of 964 represents 10% improvement - this means that with HT I would waste 8600 seconds of CPU time each day. That's not negligible for me.
What about detaching and re-attaching the project? Daily quota should be reset too. If you set the cache, e.g. to be 10 days, the need for re-attaching should be maybe once per week - it is not a convenient way, but better than drying out of WUs, isn't it?
Moreover, we spend a lot of time with dealing with optimizations - thus some additional seconds spent with reattaching are negligible...
akos, any specifics about
)
akos,
any specifics about what you are looking for?
run errors?
Einstein with Einstein run time?
I'm currently running one Einstien with one Seti (regular)
http://einsteinathome.org/host/479518/tasks
RE: any specifics about
)
process time, i hope this version will be faster on HT systems
RE: RE: any specifics
)
Ok, here is some. These results are from r1_0253.5 (280 to 271 w/o 279). They are short wus, est. around 35 min. (I didn't take note). Anyway, they go from 35:51 to 36:59 min. with S40.04 (5 wus). I have found this to be the fastest version for my 3.0 Prescott w/ HT so far. 2 mixed results with the S40.04 and S41.07HT: 37:54 (about 80% S40.04) and 39:03 (about 15 to 30% S40.04 - I didn't take note...again). The last two results with 100% S41.07HT: 36:13 and 35:21.
RE: Ok, here is some. These
)
Thanks. I think these times are really good, so I don't understand how was S39L faster on some HT systems.
I checked the top 200
)
I checked the top 200 computers to see if it shed any light on the question of the fastest optimized application. All are Pentium 4 and presumeably the 2cpu means hyperthreading is on.
#101 rac 975 3.6 GHz 2cpu S39L
#131 rac 898 3.2 GHz 1cpu S41.07
#149 rac 849 3.4 GHz 1cpu S41.07
#153 rac 845 3.0 GHz 1cpu S41.07
#157 rac 841 2.8 GHz 1cpu S41.07
#159 rac 837 3.0 GHz 2cpu S39L
#167 rac 825 3.0 GHz 2cpu S39L
#173 rac 816 2.8 GHz 2cpu S39L
#181 rac 808 3.0 GHz 2cpu S39L
#183 rac 808 3.06 GHz 2cpu S40.04
#189 rac 797 3.0 GHz 2cpu S39L
#194 rac 792 3.0 GHz 2cpu U41.04
#196 rac 788 3.2 GHz 2cpu S40.04
#200 rac 777 3.4 GHz 2cpu S40
Some PIII's are in the top 200 list.
My fastest Thoroughbred 1800+
)
My fastest Thoroughbred 1800+ has an average of 44min/result now, for a single CPU box it makes quite a good speed using S41.06 :-)
I see one MP2600+ (Thoroughbred core too) in the Top100 using D41.12 but the results are a little slower than my (not OCed) MP2600+ with S41.06
edit: I don't think that RAC is good for comparison, you never know how many projects and how many hours/day a box runs and what else the user does with it. It takes time for the RAC to adjust to the new result speed too.
RE: Could somebody try out
)
I happened to have spent my afternoon on controlled trials on the HT advantage/disadvantage question when I saw your request, so added your S41.07HT case to the trial.
Summary result
S41.07HT is a dramatic improvement over S41.07 for my Gallatin in HT mode, reducing the test unit CPU time from 2367 seconds to 1948 seconds. However, the non HT time was greatly increased, from 870 to 1145 seconds. So while this version restores the hypthreading benefit to its former value, unfortunately this is even more due to the slowing of the non HT case as compared to the speeding of the HT case. S41.07HT matches S40.04 on both HT and nHT CPU time to within the likely error of my measurements.
Details
As on my previous postings, all results were taken on my Gallatin (P4 EE 3.2 GHz Northwood-descended with 2 Mbyte L3 on-chip cache added, WinXP Pro SP2). The test work unit was the same, r1_0265.5_2113_S4R2a_0.[pre]
Version HT nHT HT/nHT productivity ratio
dist 7584 4630 1.22
S40 1954 1159 1.19
S40.04 1928 1141 1.18
S41.07 2367 870 0.73
S41.07HT 1948 1145 1.18[/pre]
Comments
A handful of cases where I've rerun these test cases suggest the timing repeatability is on the order of 1%.
My Gallatin has a slower FSB motherboard than is probably common (133 MHz FSB instead of 200).
Edit: after the original post I ran the new S41.07HT code on the non-hyperthreaded Gallatin, and revised my text according to the surprising result.
RE: Yes, but why should the
)
The figures posted by archae86 indicate that S41.07 can crank out a short WU in 870 seconds non-HT on the same machine on which S40.04 takes 1928 seconds in HT mode. Since 870 is less than half of 1928, theoretically it can crank out more WUs per day using S41.07 in non-HT mode. It isn't a huge difference; 1928 / 2 = 964 and 964 - 870 = 94 seconds saved. But with these crunch rates, we rapidly exceed Einstein's 32 results per day per box quota, so we have to run other projects to keep our boxes busy. Those projects enjoy a significant HT advantage that more than offsets its slight disadvantage on Einstein.
Archae - thanx for your work,
)
Archae - thanx for your work, it's very useful and is compatible with my recent experience with p4 3.4 GHz Prescott (although I haven't tried all the possibilities you have done).
RE: The figures posted by
)
Well, 94 seconds out of 964 represents 10% improvement - this means that with HT I would waste 8600 seconds of CPU time each day. That's not negligible for me.
What about detaching and re-attaching the project? Daily quota should be reset too. If you set the cache, e.g. to be 10 days, the need for re-attaching should be maybe once per week - it is not a convenient way, but better than drying out of WUs, isn't it?
Moreover, we spend a lot of time with dealing with optimizations - thus some additional seconds spent with reattaching are negligible...
Or am I missing something?