> This still seems odd though. Because if the code is well-written, it should
> perform roughly the same as it does on windows, since both programs are
> basically spending most of their time doing Floating Point calculations, which
> is a CPU problem not a compiler problem.
Using these new clients, almost doubled all my benchmarks(notice my P4 3.06GHz HT matches very closely with my AMD Athlon 1.2 GHz)
*Intel P4 3.06 GHz HT
567.51 million ops/sec
1331.32 million ops/sec
1512.76 million ops/sec
2344.25 million ops/sec
*Intel P4 2.0 GHz
543.37 million ops/sec
1565.7 million ops/sec
1096.71 million ops/sec
2570 million ops/sec
*Intel P4 1.5 GHz
407.42 million ops/sec
1158.59 million ops/sec
829.67 million ops/sec
1915.24 million ops/sec
*Intel Celeron 2.70 GHz
705.88 million ops/sec
2246.1 million ops/sec
1551.89 million ops/sec
3232.92 million ops/sec
*AMD XP 2400+
1069 million ops/sec
2462.66 million ops/sec
2108.88 million ops/sec
4195.66 million ops/sec
*AMD XP 2000+
891.09 million ops/sec
2050.3 million ops/sec
1754.39 million ops/sec
3491.33 million ops/sec
*AMD Athlon 1.3 GHz
694.1 million ops/sec
1541.09 million ops/sec
1354.62 million ops/sec
2706.22 million ops/sec
*AMD Athlon 1.2 GHz
608.25 million ops/sec
1209.16 million ops/sec
1242.12 million ops/sec
2480.67 million ops/sec
The only thing is, I doubt the optimized boinc client has much effect on the actually einstein stuff, since that is a separate program. But it just shows, an optimized einstein client for linux would greatly improve results more than likely.
such things just should not be writ so please destroy this if you wish to live 'tis better in ignorance to dwell than to go screaming into the abyss worse than hell
>
> This still seems odd though. Because if the code is well-written, it should
> perform roughly the same as it does on windows, since both programs are
> basically spending most of their time doing Floating Point calculations, which
> is a CPU problem not a compiler problem.
>
I checked the GCC compiler manual about this; There a several compiler flags available for FPC, with no flags set you will loose speed for sure.
> I checked the GCC compiler manual about this; There a several compiler flags
> available for FPC, with no flags set you will loose speed for sure.
Well when you compile with gcc you would think it would optimize as much as it can for the 686 arch. I realize none of us run that arch anymore, but at the same time, and therefore using the athlon-xp or pentium4 arch would give better FP results.
The only thing is though, for windows to work on different archs, there is no way it can be optimized for an athlon-xp or penitum4 either.
So I wonder what is really going on.
such things just should not be writ so please destroy this if you wish to live 'tis better in ignorance to dwell than to go screaming into the abyss worse than hell
> The only thing is though, for windows to work on different archs, there is no
> way it can be optimized for an athlon-xp or penitum4 either.
I have no indept knowledge about windows/intel but could imagine a mechanisme that at installation time code and Libs are just better selected and matched to the specific arch then is done with "general" linux distro's as most of the people run.
Further more the tight relation between Microsoft and Intel might have given them an advantage anyhow ??
Running at my default speed for P4 2.8 (Prescott) and HT running,
I complete:
seti unit in about 3 hr 10 min
Einstein in about 12-13 hr
ProteinProdictor in about 2 hr
small LHC unit in abour an hr 10 min (w/ full # of turns)
large LHC unit in about 10-11 hr (w/ full # of turns)
and Climate, well, 4-5 weeks :Þ
With HT off, Seti takes about 1 hr 50 min, or 3 hr 40 min for 2 units
I have not tried disabling HT and running the others.
> This still seems odd though. Because if the code is well-written, it should
> perform roughly the same as it does on windows, since both programs are
> basically spending most of their time doing Floating Point calculations, which
> is a CPU problem not a compiler problem.
Even the best-written code will turn into garbage if compiled with a poor compiler. Most of the calculations will be handled by code in the maths libraries so optimising those would provide the greatest improvement.
> Well when you compile with gcc you would think it would optimize as much as it
> can for the 686 arch. I realize none of us run that arch anymore, but at the
> same time, and therefore using the athlon-xp or pentium4 arch would give
> better FP results.
>
> The only thing is though, for windows to work on different archs, there is no
> way it can be optimized for an athlon-xp or penitum4 either.
What you are forgetting is that those are only sub-categories of the x86 architecture. GCC is designed to produce code for PowerPC, Sparc, etc., etc.....
There are also trade-offs to consider. Sometimes, optimising for speed will produce larger executables - not always desireable. The default optimisations in GCC are a reasonable compromise to ensure stable executeables of acceptable size and with acceptable speed in most situations. It is the programmers prerogative to adjust those compromises for a given situation. A rich set of compiler flags is available for that purpose.
However, as I touched upon previously, another factor is the optimisation of the library routines with which the executable is linked. Linking can be done either statically or dynamically.
With dynamic linking, library routines are provided by the end-users system and loaded into memory as and when required and so the programmer has no control over them. The advantages of dynamic linking are a smaller executable and bug-fixes/enhancements to the libraries will automatically be available to the executable. It does mean, though, that the end-user will often need to have a particular version of the libraries available.
With static linking, the library routines are built in to the executable at compile time. This does produce much larger executables and requires a re-compile if anything changes in the libraries. In return, it gives the programmer more control over the end result and removes any dependancy requirements on the end-users system.
To achieve optimum performance in the Einstein app, it would be necessary to re-compile the libraries used with optimisation for each of the platforms supported and to also compile the app with similar optimisations and then to link statically. Finding the best optimisations for each supported architecture could, in itself, take a great deal of time and experimentation.
Releasing the source code would allow users to do this experimentation and produce optimised apps for a number of architectures as has been the case with the BOINC client. In fact, I always compile my own BOINC client and will, when I get around to it, have another go at the SETI app. My earlier attempts failed to get that to compile at all due to errors in the source code tarball.
> Well, I`m running an OC`d sempron 2200+ @ 1710Mhz with 256Mb 2700DDR kingston
> on a Asrock k7s41gx...
>
> Just tried to crank it up to 210FSB, but it keeps freezing up just as windows
> is starting.. damn..
>
> I got it stable at 1800mhz, cpu running a bit hot.. 53-55 C ..Cpuidle keeps it
> in check, but this costs cycles..
>
> benchmark at 1800hz
>
> --- - 2005-02-27 13:50:47 - Benchmark results:
> --- - 2005-02-27 13:50:47 - Number of CPUs: 1
> --- - 2005-02-27 13:50:47 - 1679 double precision MIPS (Whetstone) per CPU
> --- - 2005-02-27 13:50:47 - 4044 integer MIPS (Dhrystone) per CPU
> --- - 2005-02-27 13:50:47 - Finished CPU benchmarks
>
> might even set it a bit lower, I`m running it at the limit it seems..
>
I don't think the CPU is the limiting factor. However your PC2700 RAM might be unless you are able to run it non-sync with the FSB. I can't look at it right now but I've got a Sempron 2200 box at home with an Asrock K7S41 but the non-GX version. The K7S41GX is rated to DDR333 whilst the K7S41 is rated to DDR400. The non-GX version is slightly dearer than the GX but still very much a budget board. I'm absolutely sure that I've got the home system running at 200mHz FSB but using PC3200 RAM.
The only difference between the K7S41GX and the K7S41 seems to be the onboard graphics. Are you using that or an external graphics card? In any case I think the problem you are having is more likely to be related to the lack of a PCI/AGP lock on these boards. It's not really documented in the motherboard manual (same for both MBs) but at certain FSBs the PCI and AGP busses will be in spec at 33 and 66 mHz and at other values of FSB they will be quite out of spec. When I was playing around with mine, I'm sure that it was locking up at 180-190mHz and OK at 200mHz or something like that. It was explained to me at the time that choosing FSBs of 166 or 200 would have the PCI/AGP busses close to spec whilst mid-range values like 180-190 would be quite out of spec and give problems. Have you tried 200mHz?
Also, using improved air cooling is a lot easier than water cooling. Just lapping the heatsink, using Arctic Silver and putting on a bigger faster fan can give you much better cooling. My cpu temp as measured by motherboard monitor is around 55C (summer - no aircon) and seems to be quite OK. It has been into the low 60s in a heatwave without locking up whilst running Seti. I try to keep below 55C and I'm sure it's OK at that level.
Intel Prescott 3.2GHz HT on. 1GB PC3200 RAM. Running XP SP2.
Benchmarks using BOINC Manager V4.24 (real benchmarking)
Measured floating point speed 1339.55 million ops/sec
Measured integer speed 1709.98 million ops/sec
Einstein WU's take between 38,000 and 42,000 seconds each.
LHC million rotation WU's 45,000 to 47,000 seconds
Protein Pred 5,700 to 8,000 seconds
Seti 8,000 to 16,000 seconds
Climate 2.5 to 2.8 sec /time step
> This still seems odd
)
> This still seems odd though. Because if the code is well-written, it should
> perform roughly the same as it does on windows, since both programs are
> basically spending most of their time doing Floating Point calculations, which
> is a CPU problem not a compiler problem.
Well I did find some optimized BOINC clients at the following, via another thread:
http://boinc.us.tt/
http://www.pperry.f2s.com/downloads.htm
Using these new clients, almost doubled all my benchmarks(notice my P4 3.06GHz HT matches very closely with my AMD Athlon 1.2 GHz)
*Intel P4 3.06 GHz HT
567.51 million ops/sec
1331.32 million ops/sec
1512.76 million ops/sec
2344.25 million ops/sec
*Intel P4 2.0 GHz
543.37 million ops/sec
1565.7 million ops/sec
1096.71 million ops/sec
2570 million ops/sec
*Intel P4 1.5 GHz
407.42 million ops/sec
1158.59 million ops/sec
829.67 million ops/sec
1915.24 million ops/sec
*Intel Celeron 2.70 GHz
705.88 million ops/sec
2246.1 million ops/sec
1551.89 million ops/sec
3232.92 million ops/sec
*AMD XP 2400+
1069 million ops/sec
2462.66 million ops/sec
2108.88 million ops/sec
4195.66 million ops/sec
*AMD XP 2000+
891.09 million ops/sec
2050.3 million ops/sec
1754.39 million ops/sec
3491.33 million ops/sec
*AMD Athlon 1.3 GHz
694.1 million ops/sec
1541.09 million ops/sec
1354.62 million ops/sec
2706.22 million ops/sec
*AMD Athlon 1.2 GHz
608.25 million ops/sec
1209.16 million ops/sec
1242.12 million ops/sec
2480.67 million ops/sec
The only thing is, I doubt the optimized boinc client has much effect on the actually einstein stuff, since that is a separate program. But it just shows, an optimized einstein client for linux would greatly improve results more than likely.
such things just should not be writ so please destroy this if you wish to live 'tis better in ignorance to dwell than to go screaming into the abyss worse than hell
> > This still seems odd
)
>
> This still seems odd though. Because if the code is well-written, it should
> perform roughly the same as it does on windows, since both programs are
> basically spending most of their time doing Floating Point calculations, which
> is a CPU problem not a compiler problem.
>
I checked the GCC compiler manual about this; There a several compiler flags available for FPC, with no flags set you will loose speed for sure.
John,
> I checked the GCC compiler
)
> I checked the GCC compiler manual about this; There a several compiler flags
> available for FPC, with no flags set you will loose speed for sure.
Well when you compile with gcc you would think it would optimize as much as it can for the 686 arch. I realize none of us run that arch anymore, but at the same time, and therefore using the athlon-xp or pentium4 arch would give better FP results.
The only thing is though, for windows to work on different archs, there is no way it can be optimized for an athlon-xp or penitum4 either.
So I wonder what is really going on.
such things just should not be writ so please destroy this if you wish to live 'tis better in ignorance to dwell than to go screaming into the abyss worse than hell
> The only thing is though,
)
> The only thing is though, for windows to work on different archs, there is no
> way it can be optimized for an athlon-xp or penitum4 either.
I have no indept knowledge about windows/intel but could imagine a mechanisme that at installation time code and Libs are just better selected and matched to the specific arch then is done with "general" linux distro's as most of the people run.
Further more the tight relation between Microsoft and Intel might have given them an advantage anyhow ??
Any specialist out here ??
John,
Running at my default speed
)
Running at my default speed for P4 2.8 (Prescott) and HT running,
I complete:
seti unit in about 3 hr 10 min
Einstein in about 12-13 hr
ProteinProdictor in about 2 hr
small LHC unit in abour an hr 10 min (w/ full # of turns)
large LHC unit in about 10-11 hr (w/ full # of turns)
and Climate, well, 4-5 weeks :Þ
With HT off, Seti takes about 1 hr 50 min, or 3 hr 40 min for 2 units
I have not tried disabling HT and running the others.
> This still seems odd
)
> This still seems odd though. Because if the code is well-written, it should
> perform roughly the same as it does on windows, since both programs are
> basically spending most of their time doing Floating Point calculations, which
> is a CPU problem not a compiler problem.
Even the best-written code will turn into garbage if compiled with a poor compiler. Most of the calculations will be handled by code in the maths libraries so optimising those would provide the greatest improvement.
Be lucky,
Neil
HT P4 3.2GHz, 1Gig RAM, WinXP
)
HT P4 3.2GHz, 1Gig RAM, WinXP - SP2, BOINC 4.24 :
-Whetstone 1359
-Drystone 1720
E@H: v4.79, 11 WU's, avg 12.3 hours
LHC: v4.64, 9 WU's, avg 1.5 hours
PP@H: v4.22, 74 WU's, avg 1.9 hours
S@H: v4.09, 31 WU's, avg 2.9 hours
Extra notes:
E@H avg time per WU with v4.72 was 11.8 hours
Also when comparing chips do not underestimate the advantage of HT.
team.
Catch your own wave...
> Well when you compile with
)
> Well when you compile with gcc you would think it would optimize as much as it
> can for the 686 arch. I realize none of us run that arch anymore, but at the
> same time, and therefore using the athlon-xp or pentium4 arch would give
> better FP results.
>
> The only thing is though, for windows to work on different archs, there is no
> way it can be optimized for an athlon-xp or penitum4 either.
What you are forgetting is that those are only sub-categories of the x86 architecture. GCC is designed to produce code for PowerPC, Sparc, etc., etc.....
There are also trade-offs to consider. Sometimes, optimising for speed will produce larger executables - not always desireable. The default optimisations in GCC are a reasonable compromise to ensure stable executeables of acceptable size and with acceptable speed in most situations. It is the programmers prerogative to adjust those compromises for a given situation. A rich set of compiler flags is available for that purpose.
However, as I touched upon previously, another factor is the optimisation of the library routines with which the executable is linked. Linking can be done either statically or dynamically.
With dynamic linking, library routines are provided by the end-users system and loaded into memory as and when required and so the programmer has no control over them. The advantages of dynamic linking are a smaller executable and bug-fixes/enhancements to the libraries will automatically be available to the executable. It does mean, though, that the end-user will often need to have a particular version of the libraries available.
With static linking, the library routines are built in to the executable at compile time. This does produce much larger executables and requires a re-compile if anything changes in the libraries. In return, it gives the programmer more control over the end result and removes any dependancy requirements on the end-users system.
To achieve optimum performance in the Einstein app, it would be necessary to re-compile the libraries used with optimisation for each of the platforms supported and to also compile the app with similar optimisations and then to link statically. Finding the best optimisations for each supported architecture could, in itself, take a great deal of time and experimentation.
Releasing the source code would allow users to do this experimentation and produce optimised apps for a number of architectures as has been the case with the BOINC client. In fact, I always compile my own BOINC client and will, when I get around to it, have another go at the SETI app. My earlier attempts failed to get that to compile at all due to errors in the source code tarball.
Be lucky,
Neil
> Well, I`m running an OC`d
)
> Well, I`m running an OC`d sempron 2200+ @ 1710Mhz with 256Mb 2700DDR kingston
> on a Asrock k7s41gx...
>
> Just tried to crank it up to 210FSB, but it keeps freezing up just as windows
> is starting.. damn..
>
> I got it stable at 1800mhz, cpu running a bit hot.. 53-55 C ..Cpuidle keeps it
> in check, but this costs cycles..
>
> benchmark at 1800hz
>
> --- - 2005-02-27 13:50:47 - Benchmark results:
> --- - 2005-02-27 13:50:47 - Number of CPUs: 1
> --- - 2005-02-27 13:50:47 - 1679 double precision MIPS (Whetstone) per CPU
> --- - 2005-02-27 13:50:47 - 4044 integer MIPS (Dhrystone) per CPU
> --- - 2005-02-27 13:50:47 - Finished CPU benchmarks
>
> might even set it a bit lower, I`m running it at the limit it seems..
>
I don't think the CPU is the limiting factor. However your PC2700 RAM might be unless you are able to run it non-sync with the FSB. I can't look at it right now but I've got a Sempron 2200 box at home with an Asrock K7S41 but the non-GX version. The K7S41GX is rated to DDR333 whilst the K7S41 is rated to DDR400. The non-GX version is slightly dearer than the GX but still very much a budget board. I'm absolutely sure that I've got the home system running at 200mHz FSB but using PC3200 RAM.
The only difference between the K7S41GX and the K7S41 seems to be the onboard graphics. Are you using that or an external graphics card? In any case I think the problem you are having is more likely to be related to the lack of a PCI/AGP lock on these boards. It's not really documented in the motherboard manual (same for both MBs) but at certain FSBs the PCI and AGP busses will be in spec at 33 and 66 mHz and at other values of FSB they will be quite out of spec. When I was playing around with mine, I'm sure that it was locking up at 180-190mHz and OK at 200mHz or something like that. It was explained to me at the time that choosing FSBs of 166 or 200 would have the PCI/AGP busses close to spec whilst mid-range values like 180-190 would be quite out of spec and give problems. Have you tried 200mHz?
Also, using improved air cooling is a lot easier than water cooling. Just lapping the heatsink, using Arctic Silver and putting on a bigger faster fan can give you much better cooling. My cpu temp as measured by motherboard monitor is around 55C (summer - no aircon) and seems to be quite OK. It has been into the low 60s in a heatwave without locking up whilst running Seti. I try to keep below 55C and I'm sure it's OK at that level.
Cheers,
Gary.
Cheers,
Gary.
Intel Prescott 3.2GHz HT on.
)
Intel Prescott 3.2GHz HT on. 1GB PC3200 RAM. Running XP SP2.
Benchmarks using BOINC Manager V4.24 (real benchmarking)
Measured floating point speed 1339.55 million ops/sec
Measured integer speed 1709.98 million ops/sec
Einstein WU's take between 38,000 and 42,000 seconds each.
LHC million rotation WU's 45,000 to 47,000 seconds
Protein Pred 5,700 to 8,000 seconds
Seti 8,000 to 16,000 seconds
Climate 2.5 to 2.8 sec /time step
Paul