Quad core vs dual core

ohiomike
ohiomike
Joined: 4 Nov 06
Posts: 80
Credit: 6453639
RAC: 0

RE: I also programmed from

Message 71273 in response to message 71272

Quote:

I also programmed from the late '70's. Thing then was hardware was really expensive relative to staff, so yes, you squeezed everything to fit the machine and used an army of programmers to do it. Today, the situation is the reverse. If you need another GB of RAM, you put it in, if you need a new staff member, you have to think big money.

Today though, I program mostly deeply embedded systems often with very small microcontrollers, so again, it is a case of managing your resources efficiently.


1) The biggest nightmare I had in the 70's (outside of having to break things into overlays if the code got to big for memory) was comm between the CPU's in use. We should be very happy with the progress of ethernet comm. Back then I had one "master controller" running 3 "slave processors" all of which had to talk via serial link with home-made semaphores, etc.
2) In what I thought was an interesting thread "who?" over at SETI is trying to get the entire app to run in cache to get more performance.


DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1364
Credit: 3562358667
RAC: 0

RE: What about the bigger

Message 71274 in response to message 71271

Quote:
What about the bigger L2 caches that are accompanying the Quad Cores? Can that be a feature that can be utilized?

Probabaly not. Akos was able to tune the s4/sr5r1 apps to fit in the l1 caches of amd processors, and almost managed to do the same with the significantly smaller ones on intel chips. Unless the current app us much more data hungry the only benefit that a really large l2/l3 cahce is likely to provide is a reduced chance of data being swapped to main memory when a second app is using the core.

Quote:

Also could calculations be broken up, where by, each core takes a part of the calculation. On the machine language level, taking a 32 bit code, and breaking it into four, eight bit codes, for each core to work. I don't know if we are talking parallel core at this level.

What you're describing is impossible and wouldn't actually gain anything even if it was done. Each of the 32 bits of data in an operation is computed in parallel. In theory a work unit could be broken into several parellelizable parts and ran concurently on multiple cores, but even in the best case that wouldn't gain any throughput over running multiple WUs in parallel. In reality, the overhead needed to keep the threads syncronized would result in lower throughput and the einstien WUs aren't large enough for the benefit of a faster return to actually be a real benefit. CPDN on the other hand could benefit from this sort of parallelization, but thier science app is much more complex which would make the implementation far harder to do.

Quote:

Remember too, each core also has a Math-Coprocessor, which also could be doing calculations.

Einstien (and almost all other) science apps are primarily floating point math. All floating point math is done on the coprocessor. The main CPU is a purely integer unit and while capable of doing floating point in software doing so is extremely slow.

GoHack
GoHack
Joined: 2 Jun 05
Posts: 37
Credit: 20602963
RAC: 0

RE: RE: What about the

Message 71275 in response to message 71274

Quote:
Quote:
What about the bigger L2 caches that are accompanying the Quad Cores? Can that be a feature that can be utilized?

Probabaly not. Akos was able to tune the s4/sr5r1 apps to fit in the l1 caches of amd processors, and almost managed to do the same with the significantly smaller ones on intel chips. Unless the current app us much more data hungry the only benefit that a really large l2/l3 cahce is likely to provide is a reduced chance of data being swapped to main memory when a second app is using the core.

Quote:

Also could calculations be broken up, where by, each core takes a part of the calculation. On the machine language level, taking a 32 bit code, and breaking it into four, eight bit codes, for each core to work. I don't know if we are talking parallel core at this level.

What you're describing is impossible and wouldn't actually gain anything even if it was done. Each of the 32 bits of data in an operation is computed in parallel. In theory a work unit could be broken into several parellelizable parts and ran concurently on multiple cores, but even in the best case that wouldn't gain any throughput over running multiple WUs in parallel. In reality, the overhead needed to keep the threads syncronized would result in lower throughput and the einstien WUs aren't large enough for the benefit of a faster return to actually be a real benefit. CPDN on the other hand could benefit from this sort of parallelization, but thier science app is much more complex which would make the implementation far harder to do.

Quote:

Remember too, each core also has a Math-Coprocessor, which also could be doing calculations.

Einstien (and almost all other) science apps are primarily floating point math. All floating point math is done on the coprocessor. The main CPU is a purely integer unit and while capable of doing floating point in software doing so is extremely slow.

Thx.

They were just some crazy ideas. :)

peanut
peanut
Joined: 4 May 07
Posts: 162
Credit: 9644812
RAC: 0

I just ordered a Power Mac

I just ordered a Power Mac with 8 3Ghz Xeon cores. My walet is a whole lot lighter now. I figure you only live once, might as well go out crunching. Not that I plan on leaving this world anytime soon. Soon, I'll be able to compare dual vs quad cores. Although, my dual cores are relatively slow @1.83 and 1.66Ghz. I have been impressed with the crunch times of some of the fast Xeons I have run up against.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.