Intel Performance question. 3.0 GHZ P4 versus 3.2 GHZ P4

Blank Reg
Blank Reg
Joined: 18 Jan 05
Posts: 228
Credit: 40599
RAC: 0

p4 3.0 L2 512, 1gig

p4 3.0 L2 512, 1gig mem
3/26/2005 10:53:12 PM|| Number of CPUs: 2
3/26/2005 10:53:12 PM|| 1322 double precision MIPS (Whetstone) per CPU
3/26/2005 10:53:12 PM|| 1127 integer MIPS (Dhrystone) per CPU
3/26/2005 10:53:12 PM||Finished CPU benchmarks

p4 3.0, L2 2mb, 512mem
3/26/2005 10:57:29 PM|| Number of CPUs: 2
3/26/2005 10:57:29 PM|| 1325 double precision MIPS (Whetstone) per CPU
3/26/2005 10:57:29 PM|| 1621 integer MIPS (Dhrystone) per CPU
3/26/2005 10:57:29 PM||Finished CPU benchmarks

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5877
Credit: 118567787581
RAC: 19587863

> p4 3.0 L2 512, 1gig mem >

Message 9505 in response to message 9504

> p4 3.0 L2 512, 1gig mem
> 3/26/2005 10:53:12 PM|| 1127 integer MIPS (Dhrystone) per CPU
>
>
> p4 3.0, L2 2mb, 512mem
> 3/26/2005 10:57:29 PM|| 1621 integer MIPS (Dhrystone) per CPU

OK, so the L2 cache is very important for the benchmark routine performance as far as integer MIPS is concerned but how about publishing the actual WU crunch times? What's the difference there??

Cheers,
Gary.

PCZ
PCZ
Joined: 8 Nov 04
Posts: 11
Credit: 4203
RAC: 0

> I have Einstein@home

> I have Einstein@home running on two computers.
> Here are the benchmarks for these two computers taken from the Einstein@home
> webpage.
>
> 3.0 GHZ P4 running XP Pro SP1
> ----------------------------------------
> Number of CPUs 1
> Measured floating point speed 1546.38 million ops/sec
> Measured integer speed 3088.95 million ops/sec
>
> 3.2 GHZ P4 running XP Pro SP1
> -----------------------------------------
> Number of CPUs 2
> Measured floating point speed 876.72 million ops/sec
> Measured integer speed 791.93 million ops/sec
>
> An Einstein@home work unit typically completes on the 3.0 GHZ computer at
> around 28,000 CPU seconds while the 2nd presumably faster 3.2 GHZ CPU takes
> approximately 69,000 seconds.
>
> I decided to experiment and I changed the BOINC account options to process
> only one work unit at a time on the 3.2 GHZ hyperthreaded computer and the
> time went down to 50,000 seconds.
>
> Now I can see that the floating point speed on the 3.2 GHZ processor is a bit
> more than twice as fast as that of the 3.0 GHZ. Taking into account
> hyperthreading, I guess this would be expected.
>
> But why the huge discrepancy with integer speed? More to the point why is
> Einstein@home slower on the 3.2 GHZ machine? 28,000 * 2 = 56,000 is still less
> than 69,000.
>
> Maybe I am totally misinterpreting something here but I’m mystified. I would
> appreciate it if someone could explain this.
>
> --Gary
>
>

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5877
Credit: 118567787581
RAC: 19587863

PCZ posted:- ...snip (a

Message 9507 in response to message 9506

PCZ posted:-

...snip (a whole lot of irrelevant information)

which the thread starter had originally posted.

However the point is that I was asking a question directed at the poster immediately prior to me, (and not the thread starter), who had posted information showing a correlation between L2 cache size and the integer benchmark number. I presume he was reporting on his own machines and not those of the original poster.

I'm very sorry if the context I included wasn't sufficient to make that clear.

Cheers,
Gary.

Blank Reg
Blank Reg
Joined: 18 Jan 05
Posts: 228
Credit: 40599
RAC: 0

> PCZ posted:- > > ...snip

Message 9508 in response to message 9507

> PCZ posted:-
>
> ...snip (a whole lot of irrelevant information)
>
> which the thread starter had originally posted.
>
> However the point is that I was asking a question directed at the poster
> immediately prior to me, (and not the thread starter), who had posted
> information showing a correlation between L2 cache size and the integer
> benchmark number. I presume he was reporting on his own machines and not
> those of the original poster.
>
> I'm very sorry if the context I included wasn't sufficient to make that clear.
>

p4 3.0 L2 512, 1gig mem
3/26/2005 10:53:12 PM|| Number of CPUs: 2
3/26/2005 10:53:12 PM|| 1322 double precision MIPS (Whetstone) per CPU
3/26/2005 10:53:12 PM|| 1127 integer MIPS (Dhrystone) per CPU
3/26/2005 10:53:12 PM||Finished CPU benchmarks

Here is a couple of WUs times for the above CPU
38,157.50
38,814.52

p4 3.0, L2 2mb, 512mem
3/26/2005 10:57:29 PM|| Number of CPUs: 2
3/26/2005 10:57:29 PM|| 1325 double precision MIPS (Whetstone) per CPU
3/26/2005 10:57:29 PM|| 1621 integer MIPS (Dhrystone) per CPU
3/26/2005 10:57:29 PM||Finished CPU benchmarks

and here

43,336.02
44,494.02

Running Seti the tables are turned, the L2 cache runs a wu in around 11,000 and the 512 L2 runs around 13,000

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5877
Credit: 118567787581
RAC: 19587863

So, for E@H it would appear

Message 9509 in response to message 9508

So, for E@H it would appear that main memory is more important than L2 cache since 1gig main plus 512mb L2 gives times of ~38,500 as compared to 512mb main plus 2gig L2 which gives longer times of ~44,000.

For Seti however it is the opposite with L2 cache more important.

Are there any other differences that could account for this?? I might try doubling the memory in a couple of my E@H boxes and see what happens.

Thanks for the feedback.

Cheers,
Gary.

Paul D. Buck
Paul D. Buck
Joined: 17 Jan 05
Posts: 754
Credit: 5385205
RAC: 0

> Are there any other

Message 9510 in response to message 9509

> Are there any other differences that could account for this?? I might try
> doubling the memory in a couple of my E@H boxes and see what happens.

Yes.

1) Memory bandwidth - Motherboard has dual channel memory on one, and not the other

2) Memory speed - 2.5, 3, 3, 8 memory vs. 3, 3, 3, 8

3) "faster" work units

4) What else is running at the time.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5877
Credit: 118567787581
RAC: 19587863

> Yes. ..... But he said

Message 9511 in response to message 9510

> Yes.
.....

But he said that the situation was reversed when he ran Seti. Ponder that for a moment....

We can also presume that he wouldn't load up one machine with tasks just to create the sort of difference he is reporting. Anyway aren't the times actual cpu seconds rather than total elapsed seconds?? I don't really know...

In a whole month of crunching on a fair number of machines, I've noticed that the WU times have been fairly consistent and in line with expectations.

Cheers,
Gary.

Paul D. Buck
Paul D. Buck
Joined: 17 Jan 05
Posts: 754
Credit: 5385205
RAC: 0

> But he said that the

Message 9512 in response to message 9511

> But he said that the situation was reversed when he ran Seti. Ponder
> that for a moment....

Never said he did ...

But, in the case of EAH we have a potentially large data-set with the 10M File. In this case if the working set of the application is over 2MB then the cache experiences "thrashing" as you get little of the benefit of the cache regardless of the size of cache.

If, on the 1G system it is populated with 2 512MB modules and the motherboard has dual channel memory, then the memory bandwidth is twice that of the same motherboard with only one stick of 512MB ...

This is why it is so hard to know what is really important and what is not. To know absoulutely for sure what has an effect is to make a system and then change only ONE piece and note results. I had two processors of the same size, speed (3.2 GHz), etc. and memory size, with the only difference being the motherboard. In the cheap MB, my run times were worse than an older and slower 3.0GHz system.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5877
Credit: 118567787581
RAC: 19587863

> > But he said that the

Message 9513 in response to message 9512

> > But he said that the situation was reversed when he ran Seti.
> Ponder
> > that for a moment....
>
> Never said he did ...

Never said he did what??? Sorry, I don't understand??

> If, on the 1G system it is populated with 2 512MB modules and the motherboard
> has dual channel memory, then the memory bandwidth is twice that of the same
> motherboard with only one stick of 512MB ...

I do understand what dual channel memory is. Assuming that one machine has dual channel memory and this was speeding up E@H, why was it slowing down S@H?? That's what I was asking you to ponder....

> This is why it is so hard to know what is really important and what is not.
> To know absoulutely for sure what has an effect is to make a system and then
> change only ONE piece and note results. I had two processors of the same size,
> speed (3.2 GHz), etc. and memory size, with the only difference being the
> motherboard. In the cheap MB, my run times were worse than an older and
> slower 3.0GHz system.

Precisely!! No argument from me on all of this paragraph. But I'm afraid it's simply not relevant to the case we are discussing. Here, there are two boxes and zero hardware changes are being made. The one change that is being reported on, in each case, is the software application, E@H vs S@H. You would think that whichever box runs E@H faster would also run S@H faster, wouldn't you?? But there is an unusually large reversal of performance and that is what intrigues me. Only the person who posted the relative times can really throw more light on this apparent conundrum.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.