> 3.2 GHz P4 HT, 2GB DDR2 SDRAM @ 533MHz, Win XP SP2
>
> CPU's: 2
> Whet: 1377
> Dhry: 2613
>
> My times are skewed because I typically play Battle for Middle Earth while
> SETI and Einstein are running (database work keeping me from activating my
> transferred account for SETI). SETI has been finishing under 4 hours on this
> comp. With my interruptions, Einstein is finishing in about 10-12 hours.
>
> I'm no expert so I must ask: how does it stack up and could it do better?
>
I have a similar setup but only half the RAM. Times are fairly consistent though, so RAM seems to have only a minor effect on processing power. Or at least 1GB is no constraint to processing speed.
> The Athlon numeric co-processor is twice as fast as the P4 co-processor, clock
> cycle for clock cycle. Also, the higher speed P4s will clock throttle under
> 100% CPU load unles they have very good cooling.
>
I'm not so sure about that Evan...I've optimized seti@home client so I have some knowledge here.
It seems to me that the design of the Intel FPU makes it practically impossible to write totally optimal code in C, and only partly so in Assembly language.
However, as has been shown, Intel can get more performance if it does Hypterthreading (ie it can output more work units in a 24 hour period). It is still only one CPU core, but pretending to be two. The reason is all of the various rules required by the Intel FPU to keep it chugging along.
Two problems with Intel are latency and throughput.
Example:
Throughput - You can't execute a floating point addition more often than every other CPU cycle. Same for a multiply, or most other operations.
Latency - A Pentium IV floating point multiply will finish computing 5 cycles after it begins. If the next instruction in the program tries to make use of this multipy result, it has to twiddle its thumbs for 4 cycles to act on the result.
This causes problems because many code sequences involve maybe 5 multiplies followed by 4 additions on the same set of numbers. This isn't so easy to interleave (multiply | add | multiply | add | etc.).
In truth if everything could be sequenced correctly the FPU in an intel can begin one and complete another FPU instruction every CPU cycle. Even with the pipelining of AMD and multiple execution units I think one completion every cycle isn't possible there.
Hyperthreading comes closer to achieving this on Intel as the CPU scheduler has two different program threads to work with, and can find those alternating instructions and latency delayed instructions can be dealt with. So while thread #1 is waiting for the results of a multipy, thread #2 can begin an addition or two.
What you say seems to support my contention. Also, the real world benchmarks support it as well. I haven't programmed the P4 in assembler so I will take your word for it. The P4 does have a very long pipeline and if the execution prediction fails the flush takes a long time. Of course, that is where hand optimization comes in.
The clock throttling problem is very real. I sell computers for a living and I see machines from various name brand manufacturers that have suboptimal cooling. When pushed it is amazing how slow a P4 can be. It will cut the clock rate in half, at least, in order to maintain safe temperatures.
When I started in 1999 I got a Proteva with the PIII 500 when it 1st came out and then started running seti classic in 2000 and after a year of the slow times (14hrs to 18hrs) I bought a Systemax with a AMD 1700+ XP with 512.
Then for some reason when the wife made me go store shopping I picked up a HP with XP and a 2.5 P4 w/512ram.
It ran seti classics at ave. of 4:20 and Einsteins at ave. of 9.5hrs.
Then last year about this time I got a Systemax with XP Pro the AMD 64 3200+ and 1gig ram and a 200gig HD.
It has a seti classic ave. of 2:20 and an Eisnstein ave. of 5:45
such things just should not be writ so please destroy this if you wish to live 'tis better in ignorance to dwell than to go screaming into the abyss worse than hell
> > as I suspected... even the cheapest AMD, the sempron 2200+ I have is
> kicking
> > the butts off most Intel P4 systems.. now how funny is that ?
> >
> > Are the semprons so good or are the P4`s so bad ?
> >
> > now that I oc`d the 2200+ to 1800mhz, my time is 6:34 hrs per WU for
> einstein
> > and 3.75 for S@H
> >
>
> I was always under the assumption that sempron's were worse than an Athlon XP.
> But my Athlon XP 2400+ takes almost 12 hours for a WU.
>
Something else is holding your system back. My Athlon XP 1900+ takes about 8 hours for a E@H WU and about 3 hr 45 min to 3 hr 50 min typically for a SETI WU.
> I have no indept knowledge about windows/intel but could imagine a mechanisme
> that at installation time code and Libs are just better selected and matched
> to the specific arch
I don't believe this. The dlls are all the same for all x86 archs.
> then is done with "general" linux distro's as most of the
> people run.
Like on Windows the general Linux distros have general 586/686 optimized packages and libs. On Gentoo however the system is completely compiled on installation time and can and is in general be tuned at last for the processor and mostly with more optimization flags on.
> 3.2 GHz P4 HT, 2GB DDR2
)
> 3.2 GHz P4 HT, 2GB DDR2 SDRAM @ 533MHz, Win XP SP2
>
> CPU's: 2
> Whet: 1377
> Dhry: 2613
>
> My times are skewed because I typically play Battle for Middle Earth while
> SETI and Einstein are running (database work keeping me from activating my
> transferred account for SETI). SETI has been finishing under 4 hours on this
> comp. With my interruptions, Einstein is finishing in about 10-12 hours.
>
> I'm no expert so I must ask: how does it stack up and could it do better?
>
I have a similar setup but only half the RAM. Times are fairly consistent though, so RAM seems to have only a minor effect on processing power. Or at least 1GB is no constraint to processing speed.
team.
Catch your own wave...
> The Athlon numeric
)
> The Athlon numeric co-processor is twice as fast as the P4 co-processor, clock
> cycle for clock cycle. Also, the higher speed P4s will clock throttle under
> 100% CPU load unles they have very good cooling.
>
I'm not so sure about that Evan...I've optimized seti@home client so I have some knowledge here.
It seems to me that the design of the Intel FPU makes it practically impossible to write totally optimal code in C, and only partly so in Assembly language.
However, as has been shown, Intel can get more performance if it does Hypterthreading (ie it can output more work units in a 24 hour period). It is still only one CPU core, but pretending to be two. The reason is all of the various rules required by the Intel FPU to keep it chugging along.
Two problems with Intel are latency and throughput.
Example:
Throughput - You can't execute a floating point addition more often than every other CPU cycle. Same for a multiply, or most other operations.
Latency - A Pentium IV floating point multiply will finish computing 5 cycles after it begins. If the next instruction in the program tries to make use of this multipy result, it has to twiddle its thumbs for 4 cycles to act on the result.
This causes problems because many code sequences involve maybe 5 multiplies followed by 4 additions on the same set of numbers. This isn't so easy to interleave (multiply | add | multiply | add | etc.).
In truth if everything could be sequenced correctly the FPU in an intel can begin one and complete another FPU instruction every CPU cycle. Even with the pipelining of AMD and multiple execution units I think one completion every cycle isn't possible there.
Hyperthreading comes closer to achieving this on Intel as the CPU scheduler has two different program threads to work with, and can find those alternating instructions and latency delayed instructions can be dealt with. So while thread #1 is waiting for the results of a multipy, thread #2 can begin an addition or two.
What you say seems to support
)
What you say seems to support my contention. Also, the real world benchmarks support it as well. I haven't programmed the P4 in assembler so I will take your word for it. The P4 does have a very long pipeline and if the execution prediction fails the flush takes a long time. Of course, that is where hand optimization comes in.
The clock throttling problem is very real. I sell computers for a living and I see machines from various name brand manufacturers that have suboptimal cooling. When pushed it is amazing how slow a P4 can be. It will cut the clock rate in half, at least, in order to maintain safe temperatures.
When I started in 1999 I got
)
When I started in 1999 I got a Proteva with the PIII 500 when it 1st came out and then started running seti classic in 2000 and after a year of the slow times (14hrs to 18hrs) I bought a Systemax with a AMD 1700+ XP with 512.
Then for some reason when the wife made me go store shopping I picked up a HP with XP and a 2.5 P4 w/512ram.
It ran seti classics at ave. of 4:20 and Einsteins at ave. of 9.5hrs.
Then last year about this time I got a Systemax with XP Pro the AMD 64 3200+ and 1gig ram and a 200gig HD.
It has a seti classic ave. of 2:20 and an Eisnstein ave. of 5:45
AMD is the one for me.
And XP Pro (Bill Gates is my neighbor : )
-Samson Ben Yoseph-
JAHMAGIC, Just wondering,
)
JAHMAGIC,
Just wondering, do you work at Fermilab?
such things just should not be writ so please destroy this if you wish to live 'tis better in ignorance to dwell than to go screaming into the abyss worse than hell
> > as I suspected... even
)
> > as I suspected... even the cheapest AMD, the sempron 2200+ I have is
> kicking
> > the butts off most Intel P4 systems.. now how funny is that ?
> >
> > Are the semprons so good or are the P4`s so bad ?
> >
> > now that I oc`d the 2200+ to 1800mhz, my time is 6:34 hrs per WU for
> einstein
> > and 3.75 for S@H
> >
>
> I was always under the assumption that sempron's were worse than an Athlon XP.
> But my Athlon XP 2400+ takes almost 12 hours for a WU.
>
Something else is holding your system back. My Athlon XP 1900+ takes about 8 hours for a E@H WU and about 3 hr 45 min to 3 hr 50 min typically for a SETI WU.
Are you running Linux on it or something?
BTW, my CPU wasn't OCed, so is running at 1.6 GHz
Duron 1100/ASUS A7V333/128
)
Duron 1100/ASUS A7V333/128 RAM PC2700 SAMSUNG
E@H WU - little less than 12 hours.
SETI@home Classic - 5 hours 30 minutes per WU (average).
Athlon Mobile 2600+ @ 2400
)
Athlon Mobile 2600+ @ 2400 mhz / Abit NF7-S rev 2
E@H = 5hrs 20min
[oops] Ah, this might
)
[oops]
Ah, this might work:
[pre]
AppName Min Max Avg WU Count
mfoldB125 4059.515625 9951.40625 7238.0598221144 956
mfoldB120 4.2589850 15765.953125 7100.4262898567 382
setiathome 22.90625 30884.96875 13010.96382848 2641
einstein 28794.828125 51789.125 39240.910247093 43
sixtrack 2.8125 63085.390625 2671.8234310456 3115
hadsm3 1793074.375 2325737.25 2041320.1964286 14
[pre]
> I have no indept knowledge
)
> I have no indept knowledge about windows/intel but could imagine a mechanisme
> that at installation time code and Libs are just better selected and matched
> to the specific arch
I don't believe this. The dlls are all the same for all x86 archs.
> then is done with "general" linux distro's as most of the
> people run.
Like on Windows the general Linux distros have general 586/686 optimized packages and libs. On Gentoo however the system is completely compiled on installation time and can and is in general be tuned at last for the processor and mostly with more optimization flags on.
regards
martin