Will something be done about the AMD penalty under Windows? What are the project devs planning; it can't be in their interest that a significant percentage of boxes is running at 70% of their potential or less, so, do you plan to change this part of the app in the next release of the Einstein science app?
Does anybody still have a copy of the S5R1 or S5RI science app for Windows??
Of course!
Quote:
Is the string "AuthenticAMD" also appearing in those apps?
Yes. I checked. ( S5RI 4.24 windows )
Hmmm, so maybe the modf function wasn't used as heavily in the old app. Bernd mentioned something that the old app used some alternative to modf which was later found to be numerically suboptimal in the context of the new run.
Akos, you will know what I mean, something along the lines
frac = x - (UINT4) x instead of frac = modf(x,&dummy)
This explains why the old app didn't suffer.
As to the compiler, I guess Microsoft may have licensed Intel's math library, or maybe they use the Intel compiler to build their math lib (just kidding, don't sue me, MS ...).
As to the compiler, I guess Microsoft may have licensed Intel's math library, or maybe they use the Intel compiler to build their math lib (just kidding, don't sue me, MS ...).
CU
BRM
Afaik you can download math libs from Intel and AMD for free. Don't know about the licence though.
Update from the Opteron... a full 50 percent increase! The WU isn't finished yet but it's more than half crunched so the estimate should be quite okay. Looks like that kind of box uses SSE2 a lot normally, therefore the huge 70% penalty and now the big performance increase. I think this kind of box will benefit most if a patch is applied on the large scale... okay, dunno how many people combine a server CPU and Windows, but still, it's a significant difference and 70% on a fast machine can have quite an effect even if there are not that many boxes of this kind around.
Update from the Opteron... a full 50 percent increase! The WU isn't finished yet but it's more than half crunched so the estimate should be quite okay. Looks like that kind of box uses SSE2 a lot normally, therefore the huge 70% penalty and now the big performance increase. I think this kind of box will benefit most if a patch is applied on the large scale... okay, dunno how many people combine a server CPU and Windows, but still, it's a significant difference and 70% on a fast machine can have quite an effect even if there are not that many boxes of this kind around.
Hi all!
So cool!
I looked at that suspect code in the math lib again and I think that it is not "evil", just not correct. It could be that whoever wrote this, didn't want to exclude all AMDs from SSE2 but only a certain processor family. After the comparison with the string "AuthenticAMD", the code does some more arithmetic with the CPU model and extended CPU model info, (my assembly language knowledge isn't that good anymore), it's possible that the intention was to exclude only the first generation of 130nm "Newcastle" K8s. I think what it actually does might be the opposite: enable SSE2 on the Newcastles and disabling it for all others.
Was there something wrong with the Newcastle SSE2 implementation? I didn't find anything by googling. Maybe it was just plain slow??
(...)
I looked at that suspect code in the math lib again and I think that it is not "evil", just not correct. It could be that whoever wrote this, didn't want to exclude all AMDs from SSE2 but only a certain processor family. After the comparison with the string "AuthenticAMD", the code does some more arithmetic with the CPU model and extended CPU model info, (my assembly language knowledge isn't that good anymore), it's possible that the intention was to exclude only the first generation of 130nm "Newcastle" K8s. I think what it actually does might be the opposite: enable SSE2 on the Newcastles and disabling it for all others.
(...)
I looked at that suspect code in the math lib again and I think that it is not "evil", just not correct. It could be that whoever wrote this, didn't want to exclude all AMDs from SSE2 but only a certain processor family. After the comparison with the string "AuthenticAMD", the code does some more arithmetic with the CPU model and extended CPU model info, (my assembly language knowledge isn't that good anymore), it's possible that the intention was to exclude only the first generation of 130nm "Newcastle" K8s. I think what it actually does might be the opposite: enable SSE2 on the Newcastles and disabling it for all others.
RE: Does anybody still have
)
Of course!
Yes. I checked. ( S5RI 4.24 windows )
Couldn't it be they now just
)
Couldn't it be they now just link against the mathlib from the ICC whereas before they linked against the Microsoft VCC standard-mathlib?
Will something be done about
)
Will something be done about the AMD penalty under Windows? What are the project devs planning; it can't be in their interest that a significant percentage of boxes is running at 70% of their potential or less, so, do you plan to change this part of the app in the next release of the Einstein science app?
RE: RE: Does anybody
)
Do you have a test S5RI datapak so you could run an offline comparison of the patched 4.24 app - on an AMD SSE2, of course?
RE: RE: Does anybody
)
Hmmm, so maybe the modf function wasn't used as heavily in the old app. Bernd mentioned something that the old app used some alternative to modf which was later found to be numerically suboptimal in the context of the new run.
Akos, you will know what I mean, something along the lines
frac = x - (UINT4) x instead of frac = modf(x,&dummy)
This explains why the old app didn't suffer.
As to the compiler, I guess Microsoft may have licensed Intel's math library, or maybe they use the Intel compiler to build their math lib (just kidding, don't sue me, MS ...).
CU
BRM
RE: As to the compiler, I
)
Afaik you can download math libs from Intel and AMD for free. Don't know about the licence though.
cu,
Michael
Update from the Opteron... a
)
Update from the Opteron... a full 50 percent increase! The WU isn't finished yet but it's more than half crunched so the estimate should be quite okay. Looks like that kind of box uses SSE2 a lot normally, therefore the huge 70% penalty and now the big performance increase. I think this kind of box will benefit most if a patch is applied on the large scale... okay, dunno how many people combine a server CPU and Windows, but still, it's a significant difference and 70% on a fast machine can have quite an effect even if there are not that many boxes of this kind around.
RE: Update from the
)
Hi all!
So cool!
I looked at that suspect code in the math lib again and I think that it is not "evil", just not correct. It could be that whoever wrote this, didn't want to exclude all AMDs from SSE2 but only a certain processor family. After the comparison with the string "AuthenticAMD", the code does some more arithmetic with the CPU model and extended CPU model info, (my assembly language knowledge isn't that good anymore), it's possible that the intention was to exclude only the first generation of 130nm "Newcastle" K8s. I think what it actually does might be the opposite: enable SSE2 on the Newcastles and disabling it for all others.
Was there something wrong with the Newcastle SSE2 implementation? I didn't find anything by googling. Maybe it was just plain slow??
CU
BRM
RE: (...) I looked at that
)
look here
Not as i know, but i go searching.
RE: RE: (...) I looked at
)
Clawhammer and Newcastle, that is. Everything that would report Family 15, extended family 0
CU
BRM