S6BucketLVE validation

MAGIC Quantum M...

Joined: 18 Jan 05

Posts: 1886

Credit: 1403654658

RAC: 1096625

Yeah I got a few of those on

1 Feb 2013 20:51:16 UTC

Message 114655

(moderation:

)

Yeah I got a few of those on one of my AMD's and I guess I am glad I had switched back over to GRP's before I got more of them.

Don't see any of that on my Intels but haven't checked them all yet.

http://einsteinathome.org/host/4519028/tasks&offset=0&show_names=1&state=4&appid=0

Alex

Joined: 1 Mar 05

Posts: 451

Credit: 507021598

RAC: 115600

RE: The SSE implementation

1 Feb 2013 21:33:08 UTC

Message 114656 in response to message 114652

(moderation:

)

Quote:

The SSE implementation is not different between them, but sometimes the OS system calls have different behaviour related to XMM registers.
For example I have a - fortunately graphic - algorythm, which uses XMM5, XMM6 and XMM7 registers to store predefined contstants for calculation. During this lenghty calculation it calls the Windows MsgWaitForMultipleObjects system call to get next piece of source data (produced by another thread). On AMD machines this system call clears the XMM0-XMM5 registers but leaves XMM6 and XMM7 registers untouched (on XP, XP x64, Windows7 x64), so the result image is corrupted: the solution is that after every call the program must reload the constant to XMM5 register. This behaviour has been seen on K7, K8 and K10 CPUs, but not on either Intel machines (P3, P4, Core2 and newer).

I don't know whether you use hand-written asm code, but maybe this helps.

This raises another question: which of these results is correct?
Is there a need to redo all wu's validated on 2 AMD processors?

BTW, it's not only AMD against Intel,
http://einsteinathome.org/workunit/146229594
this one is Intel against Intel.

Neil Newell

Joined: 20 Nov 12

Posts: 176

Credit: 169699457

RAC: 0

My AMD hosts seem to be

1 Feb 2013 21:42:04 UTC

Message 114657 in response to message 114655

(moderation:

)

My AMD hosts seem to be absolutely fine - all Opteron (servers) though. Some invalid tasks like this one which appears to be a tough nut to crack(!), one interesting one that failed (very rare for the host) was validated between an FX8120 and an E3200.

ph2000

Joined: 17 Mar 05

Posts: 7

Credit: 936499973

RAC: 0

Invalid tasks have always

1 Feb 2013 21:49:34 UTC

Message 114658 in response to message 114656

(moderation:

)

Invalid tasks have always been, and will be always, for example due to failed CPU overclock; I think they can recognize and filter them correctly (it's a joke to crunch a CPU task in 242 seconds :) ).

I think we should wait the result of investigation - the validator has been stopped again - and the solution, shouldn't stress the project developers with this.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117471490567

RAC: 35498709

RE: My AMD hosts seem to be

2 Feb 2013 4:42:36 UTC

Message 114659 in response to message 114657

(moderation:

)

Quote:

My AMD hosts seem to be absolutely fine - all Opteron (servers) though. Some invalid tasks like this one which appears to be a tough nut to crack(!), one interesting one that failed (very rare for the host) was validated between an FX8120 and an E3200.

The first link is an example of bad data that has slipped through the screening process. Unless pulled manually by the Devs, that quorum will grow to 20 before it is terminated automatically.

The second link is probably an example of what Bernd is trying to solve at the moment. I may have missed it but I don't think Bernd has said that it's necessarily to do with AMD vs Intel. BTW, the E3200 (a Wolfdale Celeron dual core) is one of my hosts and it's been crunching 24/7 for the last 4+ years. Currently it has no errors or invalids in its tasks list. It is overclocked and it's currently running in an environment where the ambient is around 35 C. It runs reliably and the crunch times aren't too shabby for a relatively old architecture either :-).

Cheers,
Gary.

Alex

Joined: 1 Mar 05

Posts: 451

Credit: 507021598

RAC: 115600

RE: Invalid tasks have

2 Feb 2013 8:01:31 UTC

Message 114660 in response to message 114658

(moderation:

)

Quote:

Invalid tasks have always been, and will be always, for example due to failed CPU overclock; I think they can recognize and filter them correctly (it's a joke to crunch a CPU task in 242 seconds :) ).

You are right, 242 sec is abnormal, but this particular PC is a standalone (no KB, mouse or monitor is attached, only power and net) and the pc is absolutely not overclocked. It's a live backup for a critical system and does nothing else.
It just happened.

Neil Newell

Joined: 20 Nov 12

Posts: 176

Credit: 169699457

RAC: 0

RE: RE: My AMD hosts seem

2 Feb 2013 9:02:32 UTC

Message 114661 in response to message 114659

(moderation:

)

Quote:

Quote:
My AMD hosts seem to be absolutely fine - all Opteron (servers) though. Some invalid tasks like this one which appears to be a tough nut to crack(!), one interesting one that failed (very rare for the host) was validated between an FX8120 and an E3200.

The second link is probably an example of what Bernd is trying to solve at the moment. I may have missed it but I don't think Bernd has said that it's necessarily to do with AMD vs Intel. BTW, the E3200 (a Wolfdale Celeron dual core) is one of my hosts and it's been crunching 24/7 for the last 4+ years. Currently it has no errors or invalids in its tasks list.

Yes, I cited the 2nd one as a counter-example to the suggestion that it's a problem with AMD/Intel cross-validation; here your Intel host and another AMD host agreed, and cast out my AMD host - thanks for looking at it.

Having said that, I've looked at all my hosts now, and FWIW the only two displaying errors like the 2nd link are also the only AMD units I have online at present (other 6 are mixed intel). They are both quad opteron Supermicro servers, ECC RAM, well cooled, known reliable systems, no overclock, yadda yadda. These are the 4 validation errors I found (including the one cited above):-

Host 6123309 Error 1
Host 6123309 Error 2
Host 6119246 Error 1
Host 6119246 Error 2

Happy to make these hosts available if the devs. want to see if the failure can be reproduced.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250367308

RAC: 35280

The problem didn't have

5 Feb 2013 13:07:03 UTC

Message 114662

(moderation:

)

The problem didn't have anything to do with platforms or vendors, it was essentially a bug in the validator (for the curious: it had already been there since S6LV1, but went unnoticed there).

It has been found and fixed, the new validator is crunching through the backlog. All should be back to normal in a few days.

astro-marwil

Joined: 28 May 05

Posts: 531

Credit: 642106543

RAC: 1106487

Hallo BM! Congrats! Within

5 Feb 2013 15:08:37 UTC

Message 114663 in response to message 114662

(moderation:

)

Hallo BM!
Congrats!
Within this time I got very little tasks of S6LVE. I believe, you reduced the output of this tasks very much to avoid an overflow of the database and bring them back to normal now. What will be the mean ratio of tasks between FRGP2 and S6LVE and by what ist this determined?

Kind regards and happy crunching
Martin

Neil Newell

Joined: 20 Nov 12

Posts: 176

Credit: 169699457

RAC: 0

Great to hear, thanks for

5 Feb 2013 16:06:37 UTC

Message 114664 in response to message 114662

(moderation:

)

Great to hear, thanks for letting us know.

Onwards and upwards!

S6BucketLVE validation

Forums › Technical News

Comment viewing options

Forums › Technical News