Bruce,
Can you give us the latest on the possibilities of getting the Albert application in optimized forms? WIth the Altivec version I see super performance and know that this is also (based on SETI@Home experience) potentially possible with the PC type CPUs. I know that to have decent coverage there would have to be about 7 different "flavors"
1) Standard
2) AMD SSE2
3) AMD SSE3
4) Intel SSE2
5) Intel SSE3
6) I forget
7) I forget #2
Is it this complexity and the difficulty of ensuring the download brings the correct version down?
Or something else?
Or, the check is in the mail?
Enquiring minds want to know! :)
Copyright © 2024 Einstein@Home. All rights reserved.
Bruce, a question about An Optimized Application
)
I think I read somewhere that Albert was basically automatically optimized. When it detects that SSE3 or whatever is available it automatically runs code better suited for that instruction set.
BOINC WIKI
BOINCing since 2002/12/8
Hmmm, I don't think it is
)
Hmmm,
I don't think it is doing a very good job then. If it was, I would expect closer concurrence between the G5 and the Xeons and I am not seeing that at all ...
Even if it is too difficult
)
Even if it is too difficult to have boinc d/l the appropriate app it could be left as is. Then have a seperate d/l page where we can d/l the one we need and manually install the app. A lot of us are quite familiar with this proceedure because we have done so with our seti apps.
98SE XP2500+ @ 2.1 GHz Boinc v5.8.8
@Paul: Make that 1)
)
@Paul:
Make that
1) Standard
2) AMD SSE2
3) AMD SSE3
4) Intel SSE
5) Intel SSE2
6) Intel SSE3
7) I forget
Nothing travels faster than the speed of light with the possible exception of bad news, which obeys its own special laws.
Douglas Adams (1952 - 2001)
If you're at this, make it
)
If you're at this, make it that:
1) Standard
2) MMX
3) MMX + 3Dnow
4) MMX + SSE
5) MMX + 3Dnow2 + iSSE
6) MMX + SSE + SSE2
7) MMX + 3Dnow2 + SSE
8) MMX + 3Dnow2 + SSE + SSE2
9) MMX + SSE + SSE2 + SSE3
10) MMX + SSE + SSE2 + SSE3 + iA64
11) MMX + SSE + SSE2 + SSE3 + VT
...
... you see a complexity in this pattern? ;)
Aloha, Uli
During the last weeks and
)
During the last weeks and months we have been mainly busy with getting the Albert setup working, so I had not much time to spend on further optimization.
- The AltiVec-version of code is hancoded, explicitely using vector instructions where possible (at least in the very core of the program).
- On Linux, if SSE is detected the App switches to a part of the program that has been optimized for SSE by the compiler (gcc 3.4 or 4.0).
- On Windows we use the stock MSC compiler (7.1) on the generic version of the code.
I played with compiler options, compiler versions and modifications to the code for quite some time, but found the following measurements not to give any significant improvement in the calculation times compared to the Apps we currently deliver:
- prefer SSE2 over SSE when available (Linux)
- use hand-coded vector code (for SSE2) instad of leaving the optimization to the compiler (Linux)
- use SSE(2) optimization of the MSC compiler (Windows)
- use icc (the Intel compiler, version 8) instead of gcc or MSC
So my preliminary conclusions are that
- The MSC compiler does a suprisingly good job, at least on our code
- The SSE optimization of gcc seems to give results that are (nearly) as good as hand-written code
- The AltiVec Unit is simply better (and somewhat easier to program) than the SSE stuff; thats why I desperately regret the decision of Apple ragarding CPUs.
I began to play with the auto-vectorization of gcc-4 and icc-9, but without a usable result yet. It's something I'm still working on.
BM
BM
RE: - The AltiVec Unit is
)
Jobs did it to me with the Lisa, now I have a G5 he is at it again. Sorry, it is all my fault. I was thinking to go all PowerMac over windows.
I guess I will have to rethink that one. Though, I would like to get a Quad this year.
Hello Bernd, thx for
)
Hello Bernd,
thx for sharing that information! It's good to hear that devs are looking into this. When I compare this to the optimization process of the seti application, several things come to my mind:
- I think the largest single contribution in s@h was the caching of FFT results.. anything like that possible here?
- 2nd was the usage of a special FFT library, can't remember the name but it was hand coded for different CPUs and instruction sets
-> since e@h searches for periodic signals I suspect you're using FFT as the main algorithm as well?
- 3rd was the impact of using the icc 8 or 9, with different flags for p3, p4, p-m and with some tricks the p3 version worked for AXP as well and they made a A64 version
-> would it be useful to talk with the seti guys about their optimization experiences with the icc? (thinking of TMR, crunch3r, Harold Naparst)
MrS
Scanning for our furry friends since Jan 2002
RE: Hello Bernd, thx for
)
Here is what I use on my Pentium II, SuSE Linux 9.3:
Optimized SETI client V4.07.3a for i686 with FFTW3 by Ned Slider
Tollio
Hi! I did a hand-optimized
)
Hi!
I did a hand-optimized version of the albert code. (windows, no SSE)
It produces absolutely correct results, but at least two times faster.
Can I use it without any kickback?