Information about the new S5 workunits

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 736679216

RAC: 1292344

RE: You're right, it

21 May 2007 8:11:53 UTC

Message 37847 in response to message 37846

(moderation:

)

Quote:

You're right, it doesn't affect these cpus, and I don't know at the moment, if Athlon XPs are also affected by the performance gap.

Athlon XPs fall into the same category wrt. this problem as Intel Pentium IIIs : Yes , the Win version will run significantly slower (probably roughly 30 %) as compared to the Linux app. The reason is that the non-SSE2 variant of the "modf" function used by the math lib in the win app is very slow indeed. And no, the experimental fix mentioned above won't help because Athlon XPs (AFAIK) don't support SSE2.

BRM

Annika

Joined: 8 Aug 06

Posts: 720

Credit: 494410

RAC: 0

Yeah I was talking about my

21 May 2007 9:05:34 UTC

Message 37848

(moderation:

)

Yeah I was talking about my Venice box. I haven't done anything about the app on my Core so far, mainly because I don't have a clue what's wrong there. But if ut's an individual issue it doesn't really matter- I don't mind running Linux (was more or less planning it anyway, just hadn't gotten around to make the effort).
I can't say anything about my AMD's performance after our "quick-tuning" yet, since the WU wasn't even sent to anyone else yet and I have no idea what it's worth (it's a 400 MHz, does that tell you sth?). The WU is at 36.6% in about 8.5 hours, which hints at a completion time of around 24 hours or sth.

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 736679216

RAC: 1292344

RE: Yeah I was talking

21 May 2007 11:04:23 UTC

Message 37849 in response to message 37848

(moderation:

)

Quote:

Yeah I was talking about my Venice box. I haven't done anything about the app on my Core so far, mainly because I don't have a clue what's wrong there. But if ut's an individual issue it doesn't really matter- I don't mind running Linux (was more or less planning it anyway, just hadn't gotten around to make the effort).
I can't say anything about my AMD's performance after our "quick-tuning" yet, since the WU wasn't even sent to anyone else yet and I have no idea what it's worth (it's a 400 MHz, does that tell you sth?). The WU is at 36.6% in about 8.5 hours, which hints at a completion time of around 24 hours or sth.

Good morning Annika!

The WU should be in the 300-350 credits range, I guess. The "fix" doesn't seem to level the playing field between the Windows and the Linux app completely, but it should narrow the gap.

It's 400 Hz btw, not MHz (it's somehow related to the spinning speed of the pulsars we are looking for, and a pulsar spinning a few hundred million times per sec would probably mean a Nobel Prize to it's discoverer.

BRM

Annika

Joined: 8 Aug 06

Posts: 720

Credit: 494410

RAC: 0

Okay, thanks for explaining.

21 May 2007 12:13:50 UTC

Message 37850

(moderation:

)

Okay, thanks for explaining. When I get home from Uni (around 7 pm) the WU should be more than half crunched, so I'll be able to get some fairly good estimates. Any idea how big the Win penalty for this kind of box usually is, so I have sth to compare to? I've seen everything from people on the board writing about a 20% difference all the way to a friend's Opteron which is a good 70% (!!!) faster under Linux.

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 736679216

RAC: 1292344

RE: Okay, thanks for

21 May 2007 12:46:04 UTC

Message 37851 in response to message 37850

(moderation:

)

Quote:

Okay, thanks for explaining. When I get home from Uni (around 7 pm) the WU should be more than half crunched, so I'll be able to get some fairly good estimates. Any idea how big the Win penalty for this kind of box usually is, so I have sth to compare to? I've seen everything from people on the board writing about a 20% difference all the way to a friend's Opteron which is a good 70% (!!!) faster under Linux.

No idea about the penalty for your Venice, I guess Michael and his database of statistics will be helpful there, but he's probably still taking the well deserved nap. Can hardly wait to see your results!!!

BRM

M. Schmitt

Joined: 27 Jun 05

Posts: 478

Credit: 15872262

RAC: 0

RE: RE: Okay, thanks for

21 May 2007 18:54:15 UTC

Message 37852 in response to message 37851

(moderation:

)

Quote:

Quote:
Okay, thanks for explaining. When I get home from Uni (around 7 pm) the WU should be more than half crunched, so I'll be able to get some fairly good estimates. Any idea how big the Win penalty for this kind of box usually is, so I have sth to compare to? I've seen everything from people on the board writing about a 20% difference all the way to a friend's Opteron which is a good 70% (!!!) faster under Linux.

No idea about the penalty for your Venice, I guess Michael and his database of statistics will be helpful there, but he's probably still taking the well deserved nap. Can hardly wait to see your results!!!

CU

BRM

Hi,

my first result is finished and uploaded also some other members of our team have patched there app and successfully finished WUs.

My c/h rose from ~14 to ~19!
This is eliminating the AMD/Win penalty. :-)

Some stats from my data:

[pre]
A64: 8.2 - 8.8 [c/(hÂ·GHz)] Linux
A64 X2: 8.2 - 8.8 [c/(hÂ·GHz)] Linux
A64: 4.6 - 5.2 [c/(hÂ·GHz)] Windows
A64 X2: 4.7 - 5.2 [c/(hÂ·GHz)] Windows
[/pre]
Because the Einstein app is scaling with cpu clock, there is no need to look at the different clocks, also cache size is pretty uninteresting. The former S5R1 and S5R2 app was running in L1 cache and even the smaller cache of Intel cpus was big enough. In my data there are for sure some hosts which are oc'd and therefore have influence to the results above, but I suppose one can find them in both os-groups.

My first result equals 7,25 [c/(hÂ·GHz)].
I should say, that running Boinc native with only one Einstein app without cpu affinity and one VMWare Linux cruncher dedicated to one core ended up with about 50% resource share for each task. But taskmanager showed more than 105,000,000 page faults for the Einstein Win app. VMWare in contrast only produced 430,000 page fauts after running for a couple of days. So maybe running Boinc without another full load process aside which is dedicated to one core will even improve the speed. Also the example imho shows, that page misses don't really bother the app and do not dramaticaly reduce speed.
When we get other results, we can draw conclusions about this.

The Intel Core cpus show really big differences in my data and therefor it's impossible to get good stats without knowing the exact clock rate.

When does one of the developers give a statement about this ugly lib issue? ;-)

cu,
Michael

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 736679216

RAC: 1292344

RE: RE: RE: Okay,

21 May 2007 20:18:36 UTC

Message 37853 in response to message 37852

(moderation:

)

Quote:

Quote:
Quote:
Okay, thanks for explaining. When I get home from Uni (around 7 pm) the WU should be more than half crunched, so I'll be able to get some fairly good estimates. Any idea how big the Win penalty for this kind of box usually is, so I have sth to compare to? I've seen everything from people on the board writing about a 20% difference all the way to a friend's Opteron which is a good 70% (!!!) faster under Linux.

No idea about the penalty for your Venice, I guess Michael and his database of statistics will be helpful there, but he's probably still taking the well deserved nap. Can hardly wait to see your results!!!

CU

BRM

Hi,

my first result is finished and uploaded also some other members of our team have patched there app and successfully finished WUs.

My c/h rose from ~14 to ~19!
This is eliminating the AMD/Win penalty. :-)

Some stats from my data:

[pre]
A64: 8.2 - 8.8 [c/(hÂ·GHz)] Linux
A64 X2: 8.2 - 8.8 [c/(hÂ·GHz)] Linux
A64: 4.6 - 5.2 [c/(hÂ·GHz)] Windows
A64 X2: 4.7 - 5.2 [c/(hÂ·GHz)] Windows
[/pre]
Because the Einstein app is scaling with cpu clock, there is no need to look at the different clocks, also cache size is pretty uninteresting. The former S5R1 and S5R2 app was running in L1 cache and even the smaller cache of Intel cpus was big enough. In my data there are for sure some hosts which are oc'd and therefore have influence to the results above, but I suppose one can find them in both os-groups.

My first result equals 7,25 [c/(hÂ·GHz)].
I should say, that running Boinc native with only one Einstein app without cpu affinity and one VMWare Linux cruncher dedicated to one core ended up with about 50% resource share for each task. But taskmanager showed more than 105,000,000 page faults for the Einstein Win app. VMWare in contrast only produced 430,000 page fauts after running for a couple of days. So maybe running Boinc without another full load process aside which is dedicated to one core will even improve the speed. Also the example imho shows, that page misses don't really bother the app and do not dramaticaly reduce speed.
When we get other results, we can draw conclusions about this.

The Intel Core cpus show really big differences in my data and therefor it's impossible to get good stats without knowing the exact clock rate.

When does one of the developers give a statement about this ugly lib issue? ;-)

cu,
Michael

Thanks for the stats, this looks really promising, doesn't it!!! I expected a 30 % rise in performance.

Unless you've already done so, I'll drop Bernd an email just in case he has missed the whole discussion.

BRM

M. Schmitt

Joined: 27 Jun 05

Posts: 478

Credit: 15872262

RAC: 0

RE: Thanks for the stats,

21 May 2007 20:33:52 UTC

Message 37854 in response to message 37853

(moderation:

)

Quote:

Thanks for the stats, this looks really promising, doesn't it!!! I expected a 30 % rise in performance.

Unless you've already done so, I'll drop Bernd an email just in case he has missed the whole discussion.

CU

BRM

Yes, looks very good. :-)
I haven't mailed to Bernd, so go ahaed.

Btw. I don't think this patch harms any cpus that are not SSE2 capable. There must be another switch in the code to filter out Intel SSE1 and non SSE cpus. This will probably work on AMD too. Should be something like described on that Web page about Intel compilers. But if there is some place where SSE1 Instructions are used, this might accelerate AMD Athlon XPs too.
But this is just a guess.

cu
Michael

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 736679216

RAC: 1292344

RE: Yes, looks very good.

21 May 2007 20:43:17 UTC

Message 37855 in response to message 37854

(moderation:

)

Quote:

Yes, looks very good. :-)
I haven't mailed to Bernd, so go ahaed.

Btw. I don't think this patch harms any cpus that are not SSE2 capable. There must be another switch in the code to filter out Intel SSE1 and non SSE cpus. This will probably work on AMD too. Should be something like described on that Web page about Intel compilers. But if there is some place where SSE1 Instructions are used, this might accelerate AMD Athlon XPs too.
But this is just a guess.

cu
Michael

Yes the detection mechanism to me seems to be as described in the article: First, detect feature bits to check for SSE2, then check for vendor and if it is "AuthenticAMD", reset the results just obtained from CPUID to a bare minimum. Not a nice thing to do, IMHO.

I didn't see any SSE instructions and I doubt very much that Athlon XPs or P IIIs will see any performance increase whatsoever by changing the COU detection code. For those platforms to reach the the same levels of performance as under Linux/gcc, a better implementation of the modf function is needed.

In the meantime, I think it's a matter of courtesy to keep the number of modified clients to a minimum until Bernd OK's the change. It was essential to verify our hypothesis to try out the change, but let's wait until the official OK before everybody is patching the app. If 1000 people are patching and one of them makes a mistake, it can mess up quite a few results. As a software engineer I'd prefer that the new version is formally tested, approved, and only then released with a new version number before it's widely used so any negative effects are traceable.

BRM

Annika

Joined: 8 Aug 06

Posts: 720

Credit: 494410

RAC: 0

Hey guys, just a quick

21 May 2007 20:46:03 UTC

Message 37856

(moderation:

)

Hey guys, just a quick update. My WU has about 2 hours left; total crunching time should amount to between 20.5 and 21 hours. I still don't know the exact credit value, though. Btw, I'm getting a friend from Uni to check this with one or two of his AMD boxes, so we'll get some more results. Mailing Bernd is a great idea imo.

Information about the new S5 workunits

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner