If you look at Crunch3r's Seti optimised app's pages you will see the sse etc versions available for Seti. Others that you might think should be there like AMD SSE3 are not released because the AMD version of SSE3 is not the same as Intels and causes crashes or errors.
In truth as seen by Seti, SSE3 optimised apps don't work any faster than SSE2.
Seti itself doesn't offer different versions because of the extra workload involved and the problems of testing each optimised application on all platforms. And the problem will only get worse when Intel produce SSEn, because the there will be different generations of Mac's as well.
Another "full disclosure" statement is that the reason that there are AMD vs. Intel SSE and SSE2 applications is not because of the implementation of SSEx, but because of the Intel Compiler generating code targeted for Intel-specific architecture vs. "generic" for AMD. I had always translated that to be more buffering tricks and other architecture-specific optimizations, not differences in SSEx. If that is not the case, someone please speak up and clarify... :-)
From [url=In April 2005, AMD introduced a subset of SSE3 in revision E (Venice and San Diego) of their Athlon 64 CPUs.]Wikipedia SSE3 article[/url];
Quote:
In April 2005, AMD introduced a subset of SSE3 in revision E (Venice and San Diego) of their Athlon 64 CPUs.
What difference this makes I have no idea and if they have included full SSE3 in subsequent cpu's I have no idea.
Keep in mind that what you quoted and what I'm quoting now is Wiki, which can be edited by anyone, but according to this entry about x86, the only instructions left out were instructions specific to HyperThreading. I suppose one could argue that this means that they're not equivalent, and that would be true, but the difference is minor. Even if they had included the HT-related instructions, one could still claim that AMD didn't "follow the SSE3 standard" because the functions did not work (since no HT, they never would work)...
... the only instructions left out were instructions specific to [Intel] HyperThreading. I suppose one could argue that this means that they're not equivalent, and that would be true, but the difference is minor. Even if they had included the HT-related instructions, one could still claim that AMD didn't "follow the SSE3 standard" because the functions did not work (since no HT, they never would work)...
It's a very old anti-competitive game to 'extend' or deliberately break 'standards'.
I still consider the Intel use of the CPUID checks to be incredibly brazen and very costly for those stung by the consequences. I guess we'll find out in a few years time as the court cases rumble along after the event...
Out of interest, what would an AMD SSE2 CPU do with one of the HyperThreading instructions? NOP or cause a trap event?
3) Put different versions of the same science algorithm into a single program, which is distributed automatically to all users. The first thing the program will do is to detect the CPU's capabilities and then select the appropriate code-path.
There are some drawbacks as well, for example you have to generate a new app version which will be updated automatically on ALL clients even if only the code for one of the code-paths was changed.
Well... That option doesn't sound too bad for me, i really dont see the problem of an (automatic) download of a new version from time to time (the science app is like 3 mb....)
What happened was that I was reading up in Wikipedia what SSE2 did. There I then misunderstood what it does for the AMDs. On a 64bit AMD CPU, SSE2 adds 8 XMM registers that only work in 64bit mode. Hence my confusion and why I took Brian's advice and went to bed. The aspirins weren't working anyway. :-(
Oh well, you can't have a good night all nights and when you make one mistake, more are prone to crawl up on you in quick succession. ;-)
PAUSE is a NOP code on older and AMD processors.
But CLFLUSH produces the "illegal instruction" effect.
OK, so how often is the CLFLUSH instruction used, if ever?
Just as the "AuthenticAMD" string test generated by ICC can be overwritten so that the 'CPUID test' then always runs the (SSEn) optimised code, can the CLFLUSH instruction be overwritten by, for example, a NOP to avoid the "illegal instruction" trap?
OK, so how often is the CLFLUSH instruction used, if ever?
It depends on the application.
But I was wrong.
CLFLUSH is not part of SSE2 instructions, but it was introduced with them.
It has an own CPUID feature flag, so it can be appear in non-SSE2 CPUs too.
RE: RE: RE: If you look
)
Keep in mind that what you quoted and what I'm quoting now is Wiki, which can be edited by anyone, but according to this entry about x86, the only instructions left out were instructions specific to HyperThreading. I suppose one could argue that this means that they're not equivalent, and that would be true, but the difference is minor. Even if they had included the HT-related instructions, one could still claim that AMD didn't "follow the SSE3 standard" because the functions did not work (since no HT, they never would work)...
RE: ... the only
)
It's a very old anti-competitive game to 'extend' or deliberately break 'standards'.
I still consider the Intel use of the CPUID checks to be incredibly brazen and very costly for those stung by the consequences. I guess we'll find out in a few years time as the court cases rumble along after the event...
Out of interest, what would an AMD SSE2 CPU do with one of the HyperThreading instructions? NOP or cause a trap event?
Happy crunchin',
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
RE: Out of interest, what
)
Which instruction on which AMD CPU?
Any "unknown" instructions generate "illegal instruction" exception.
RE: RE: Out of interest,
)
For any code compiled for an Intel SSE2 CPU target that is instead run on an AMD SSE2 capable CPU (and with any "AuthenticAMD" Intel check removed).
Are the SSE2 HT-specific codes effectively ignored or does everything come to a halt with the "illegal instruction" exception?
Regards,
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
RE: 3) Put different
)
Well... That option doesn't sound too bad for me, i really dont see the problem of an (automatic) download of a new version from time to time (the science app is like 3 mb....)
What happened was that I was
)
What happened was that I was reading up in Wikipedia what SSE2 did. There I then misunderstood what it does for the AMDs. On a 64bit AMD CPU, SSE2 adds 8 XMM registers that only work in 64bit mode. Hence my confusion and why I took Brian's advice and went to bed. The aspirins weren't working anyway. :-(
Oh well, you can't have a good night all nights and when you make one mistake, more are prone to crawl up on you in quick succession. ;-)
RE: Are the SSE2
)
As far as I know SSE2 doesn't consist HT-specific codes.
So, the AMD processors know all of the SSE2 instructions except two.
CLFLUSH (flush cache line) and PAUSE (delay pre-decoding).
PAUSE is a NOP code on older and AMD processors.
But CLFLUSH produces the "illegal instruction" effect.
RE: PAUSE is a NOP code on
)
OK, so how often is the CLFLUSH instruction used, if ever?
Just as the "AuthenticAMD" string test generated by ICC can be overwritten so that the 'CPUID test' then always runs the (SSEn) optimised code, can the CLFLUSH instruction be overwritten by, for example, a NOP to avoid the "illegal instruction" trap?
Regards,
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
RE: OK, so how often is the
)
It depends on the application.
But I was wrong.
CLFLUSH is not part of SSE2 instructions, but it was introduced with them.
It has an own CPUID feature flag, so it can be appear in non-SSE2 CPUs too.
I think MONITOR and MWAIT are
)
I think MONITOR and MWAIT are part of SSE2, but not supported by AMD.
CU
Bikeman