The SSE implementation is not different between them, but sometimes the OS system calls have different behaviour related to XMM registers.
For example I have a - fortunately graphic - algorythm, which uses XMM5, XMM6 and XMM7 registers to store predefined contstants for calculation. During this lenghty calculation it calls the Windows MsgWaitForMultipleObjects system call to get next piece of source data (produced by another thread). On AMD machines this system call clears the XMM0-XMM5 registers but leaves XMM6 and XMM7 registers untouched (on XP, XP x64, Windows7 x64), so the result image is corrupted: the solution is that after every call the program must reload the constant to XMM5 register. This behaviour has been seen on K7, K8 and K10 CPUs, but not on either Intel machines (P3, P4, Core2 and newer).
I don't know whether you use hand-written asm code, but maybe this helps.
This raises another question: which of these results is correct?
Is there a need to redo all wu's validated on 2 AMD processors?
My AMD hosts seem to be absolutely fine - all Opteron (servers) though. Some invalid tasks like this one which appears to be a tough nut to crack(!), one interesting one that failed (very rare for the host) was validated between an FX8120 and an E3200.
Invalid tasks have always been, and will be always, for example due to failed CPU overclock; I think they can recognize and filter them correctly (it's a joke to crunch a CPU task in 242 seconds :) ).
I think we should wait the result of investigation - the validator has been stopped again - and the solution, shouldn't stress the project developers with this.
My AMD hosts seem to be absolutely fine - all Opteron (servers) though. Some invalid tasks like this one which appears to be a tough nut to crack(!), one interesting one that failed (very rare for the host) was validated between an FX8120 and an E3200.
The first link is an example of bad data that has slipped through the screening process. Unless pulled manually by the Devs, that quorum will grow to 20 before it is terminated automatically.
The second link is probably an example of what Bernd is trying to solve at the moment. I may have missed it but I don't think Bernd has said that it's necessarily to do with AMD vs Intel. BTW, the E3200 (a Wolfdale Celeron dual core) is one of my hosts and it's been crunching 24/7 for the last 4+ years. Currently it has no errors or invalids in its tasks list. It is overclocked and it's currently running in an environment where the ambient is around 35 C. It runs reliably and the crunch times aren't too shabby for a relatively old architecture either :-).
Invalid tasks have always been, and will be always, for example due to failed CPU overclock; I think they can recognize and filter them correctly (it's a joke to crunch a CPU task in 242 seconds :) ).
You are right, 242 sec is abnormal, but this particular PC is a standalone (no KB, mouse or monitor is attached, only power and net) and the pc is absolutely not overclocked. It's a live backup for a critical system and does nothing else.
It just happened.
My AMD hosts seem to be absolutely fine - all Opteron (servers) though. Some invalid tasks like this one which appears to be a tough nut to crack(!), one interesting one that failed (very rare for the host) was validated between an FX8120 and an E3200.
The second link is probably an example of what Bernd is trying to solve at the moment. I may have missed it but I don't think Bernd has said that it's necessarily to do with AMD vs Intel. BTW, the E3200 (a Wolfdale Celeron dual core) is one of my hosts and it's been crunching 24/7 for the last 4+ years. Currently it has no errors or invalids in its tasks list.
Yes, I cited the 2nd one as a counter-example to the suggestion that it's a problem with AMD/Intel cross-validation; here your Intel host and another AMD host agreed, and cast out my AMD host - thanks for looking at it.
Having said that, I've looked at all my hosts now, and FWIW the only two displaying errors like the 2nd link are also the only AMD units I have online at present (other 6 are mixed intel). They are both quad opteron Supermicro servers, ECC RAM, well cooled, known reliable systems, no overclock, yadda yadda. These are the 4 validation errors I found (including the one cited above):-
The problem didn't have anything to do with platforms or vendors, it was essentially a bug in the validator (for the curious: it had already been there since S6LV1, but went unnoticed there).
It has been found and fixed, the new validator is crunching through the backlog. All should be back to normal in a few days.
Hallo BM!
Congrats!
Within this time I got very little tasks of S6LVE. I believe, you reduced the output of this tasks very much to avoid an overflow of the database and bring them back to normal now. What will be the mean ratio of tasks between FRGP2 and S6LVE and by what ist this determined?
Yeah I got a few of those on
)
Yeah I got a few of those on one of my AMD's and I guess I am glad I had switched back over to GRP's before I got more of them.
Don't see any of that on my Intels but haven't checked them all yet.
http://einsteinathome.org/host/4519028/tasks&offset=0&show_names=1&state=4&appid=0
RE: The SSE implementation
)
This raises another question: which of these results is correct?
Is there a need to redo all wu's validated on 2 AMD processors?
BTW, it's not only AMD against Intel,
http://einsteinathome.org/workunit/146229594
this one is Intel against Intel.
My AMD hosts seem to be
)
My AMD hosts seem to be absolutely fine - all Opteron (servers) though. Some invalid tasks like this one which appears to be a tough nut to crack(!), one interesting one that failed (very rare for the host) was validated between an FX8120 and an E3200.
Invalid tasks have always
)
Invalid tasks have always been, and will be always, for example due to failed CPU overclock; I think they can recognize and filter them correctly (it's a joke to crunch a CPU task in 242 seconds :) ).
I think we should wait the result of investigation - the validator has been stopped again - and the solution, shouldn't stress the project developers with this.
RE: My AMD hosts seem to be
)
The first link is an example of bad data that has slipped through the screening process. Unless pulled manually by the Devs, that quorum will grow to 20 before it is terminated automatically.
The second link is probably an example of what Bernd is trying to solve at the moment. I may have missed it but I don't think Bernd has said that it's necessarily to do with AMD vs Intel. BTW, the E3200 (a Wolfdale Celeron dual core) is one of my hosts and it's been crunching 24/7 for the last 4+ years. Currently it has no errors or invalids in its tasks list. It is overclocked and it's currently running in an environment where the ambient is around 35 C. It runs reliably and the crunch times aren't too shabby for a relatively old architecture either :-).
Cheers,
Gary.
RE: Invalid tasks have
)
You are right, 242 sec is abnormal, but this particular PC is a standalone (no KB, mouse or monitor is attached, only power and net) and the pc is absolutely not overclocked. It's a live backup for a critical system and does nothing else.
It just happened.
RE: RE: My AMD hosts seem
)
Yes, I cited the 2nd one as a counter-example to the suggestion that it's a problem with AMD/Intel cross-validation; here your Intel host and another AMD host agreed, and cast out my AMD host - thanks for looking at it.
Having said that, I've looked at all my hosts now, and FWIW the only two displaying errors like the 2nd link are also the only AMD units I have online at present (other 6 are mixed intel). They are both quad opteron Supermicro servers, ECC RAM, well cooled, known reliable systems, no overclock, yadda yadda. These are the 4 validation errors I found (including the one cited above):-
Host 6123309 Error 1
Host 6123309 Error 2
Host 6119246 Error 1
Host 6119246 Error 2
Happy to make these hosts available if the devs. want to see if the failure can be reproduced.
The problem didn't have
)
The problem didn't have anything to do with platforms or vendors, it was essentially a bug in the validator (for the curious: it had already been there since S6LV1, but went unnoticed there).
It has been found and fixed, the new validator is crunching through the backlog. All should be back to normal in a few days.
BM
BM
Hallo BM! Congrats! Within
)
Hallo BM!
Congrats!
Within this time I got very little tasks of S6LVE. I believe, you reduced the output of this tasks very much to avoid an overflow of the database and bring them back to normal now. What will be the mean ratio of tasks between FRGP2 and S6LVE and by what ist this determined?
Kind regards and happy crunching
Martin
Great to hear, thanks for
)
Great to hear, thanks for letting us know.
Onwards and upwards!