Information about the new S5 workunits

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 736687268

RAC: 1291049

RE: This sounds good. I

21 May 2007 23:40:09 UTC

Message 37867 in response to message 37864

(moderation:

)

Quote:

This sounds good. I mean, personally, I still won't be using Windows much for crunching, but it's great we figured this out :-) and if it could be made available to everyone, e.g., all the 80% who can't or don't want to use Linux, the project could benefit quite a lot...

Yes! Just think about it, if only (conservative estimate) 10 % of the computing power is currently contributed by modern AMD PCs under Windows for E@H, a 30 % performance increase just for those boxes would mean ca 2 additional Tera Flops for the project, just by changing a single byte.

Hey, this was fun, wasn't it, and well worth sacrifising some sleep !

BRM

Annika

Joined: 8 Aug 06

Posts: 720

Credit: 494410

RAC: 0

I can only say I 100% agree

21 May 2007 23:44:10 UTC

Message 37868

(moderation:

)

I can only say I 100% agree with your post :-D

RAMA

Joined: 5 May 05

Posts: 18

Credit: 657880205

RAC: 0

RE: In the meantime, I

22 May 2007 1:10:28 UTC

Message 37869 in response to message 37855

(moderation:

)

Quote:

In the meantime, I think it's a matter of courtesy to keep the number of modified clients to a minimum until Bernd OK's the change. It was essential to verify our hypothesis to try out the change, but let's wait until the official OK before everybody is patching the app. If 1000 people are patching and one of them makes a mistake, it can mess up quite a few results. As a software engineer I'd prefer that the new version is formally tested, approved, and only then released with a new version number before it's widely used so any negative effects are traceable.

CU

BRM

Hi, I'm one of the guys that patched his AMD farm with great results.
I like your statement as an software engineer, I just wish the guys responsible for this project would have the same standard !!!
What a waste of cpu power right now without using the optimized instruction sets of the newer CPU that we mainly use and the wasted time of crashed units in the first 2 week of this new run. So much about your nice standards in this project.
I hope they give us an Akosf optimized version sooooooon. Feel like watching $300 worth of my electricity bill a month being wasted right now or feels like driving on the Autobahn in 1st gear only.
On the other hand THANKS for Your great insight to help improve this run a little bid,
Pete

Muetze das Original

Joined: 13 Sep 06

Posts: 3

Credit: 11068477

RAC: 0

RE: Goodness, huge farms

22 May 2007 5:20:29 UTC

Message 37870 in response to message 37866

(moderation:

)

Quote:

Goodness, huge farms :-D makes my single AMD box (and even my friend's three, two of which are SSE2 capable) pale in comparison. Of course with so many boxes the effect will be much more noticeable; I'm sure it will pay of for both the team and the project. Still, what about the "not so many patched clients" policy?

Hey, that policy sounds Funny :-)
But pay's it the electricity Bill over here?

I'm the guy wit the 19 A64 + 6 A64 X2 as of Ziegenmelkers Post above.

have a look at one of my hosts HostID 752545

Have I really to say anymore then the last result?

Result 84292629 was the last without the patch:

Next two where partially affected by it.

And as of result 84292635 you can see the full impact of the patch:

So please keep on exploring and hope that we soon can expect an really optimized einstein binary.

Muetze

Winterknight

Joined: 4 Jun 05

Posts: 1456

Credit: 377434812

RAC: 141928

Glad that the problem has

22 May 2007 8:23:36 UTC

Message 37871

(moderation:

)

Glad that the problem has been found for the AMD cpu's.

But do think that the programmers of all projects need to get together to iron out these problems. I am pretty sure that the the Seti Optimisers have known about this intel compiler problem for a couple of years. Their main site is owned by, KWSN - Chicken of Angnor, AKA Simon, http://lunatics.at/index.php

Andy

Brian Silvers

Joined: 26 Aug 05

Posts: 772

Credit: 282700

RAC: 0

Just as a word of

22 May 2007 8:28:28 UTC

Message 37872

(moderation:

)

Just as a word of caution:

The last time someone was modding / patching apps on this project, the project team / scientists intervened and said "no".

While this could have a positive benefit for my system, I am not going to make a change to a closed-source application. If it was open-source, I'd be a little more willing...

FWIW, IMO, YMMV, etc, etc, etc...

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 736687268

RAC: 1291049

RE: Just as a word of

22 May 2007 8:50:45 UTC

Message 37873 in response to message 37872

(moderation:

)

Quote:

Just as a word of caution:

The last time someone was modding / patching apps on this project, the project team / scientists intervened and said "no".

While this could have a positive benefit for my system, I am not going to make a change to a closed-source application. If it was open-source, I'd be a little more willing...

FWIW, IMO, YMMV, etc, etc, etc...

Yup, I second that 100%. Let's be a bit patient.

I did inform Bernd about our findings so he's aware of it and investigating "clean" and legal ways to deal with it. Let's not forget that this is not the only issue, there are some client errors still happening which need to be addressed, too.

CU
BRM

Muetze das Original

Joined: 13 Sep 06

Posts: 3

Credit: 11068477

RAC: 0

RE: I did inform Bernd

22 May 2007 10:29:02 UTC

Message 37874 in response to message 37873

(moderation:

)

Quote:

I did inform Bernd about our findings so he's aware of it and investigating "clean" and legal ways to deal with it. Let's not forget that this is not the only issue, there are some client errors still happening which need to be addressed, too.

As I read of "some client errors still happening" I think it migt be good to Inform that all these "Client error" "Compute error" pairs in my Results that happened in the Last 10 Days are no faults of the Application. They all belong to my actions against wasting Energy in computing Work Units i'll never had a Chance to deliver in Time. Unfortunatly i had not looked at my Computers for a while, so some of them are coming to close to the deadlines of a bunch of WU's, I decidet then to abort them + a few extra. (not all of my Computers can crunch 24/7, it hardly depends on Holidays or illnes of collegues, if all of them are healthy and not on Holiday the most computers have to pause during 8 to 12 hours at 5 Days a Week.

All Computers at which i've decided to try the patch are running rock-solid for a while now.

Muetze

roadrunner_gs

Joined: 7 Mar 06

Posts: 94

Credit: 3369656

RAC: 0

http://einstein.phys.uwm.edu/

22 May 2007 11:10:09 UTC

Message 37875

(moderation:

)

http://einsteinathome.org/task/84250668
http://einsteinathome.org/task/84196371
Outcome marked as success, however validate state for both marked as invalid.
I won't get it, this whole run seems to me like an utter waste of energy.
Go get the apps fixed to run faster on AMD@Win-Boxes and also get rid of the debug-info, i am currently pulling all my remaining (aren't that much now either) boxes of off einstein that i could reach.

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 736687268

RAC: 1291049

RE: http://einstein.phys.uw

22 May 2007 11:20:12 UTC

Message 37876 in response to message 37875

(moderation:

)

Quote:

http://einsteinathome.org/task/84250668
http://einsteinathome.org/task/84196371
Outcome marked as success, however validate state for both marked as invalid.
I won't get it, this whole run seems to me like an utter waste of energy.
Go get the apps fixed to run faster on AMD@Win-Boxes and also get rid of the debug-info, i am currently pulling all my remaining (aren't that much now either) boxes of off einstein that i could reach.

Actually the debug info can be most helpful in resolving any problems that might (or actually do) occur, so it would be counterproductive to remove them. Only a few bytes / second of debug info is produced, so it doesn't matter performance-wise.

The validation errors are frequent when the initial replication is 3 (3 workunits deliverd) and 2 of the hosts have the same OS, but the third has a different (e.g. 2 x Darwin vs 1 x Linux or 2 x Windows vs 1 x Linux. ). I guess the validation problem will be reduced once the app will be using hand-crafted assembly code for most of the computation and the numerical differences introduced by different compilers get minimized.

BRM

Information about the new S5 workunits

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner