64-bit in Einstein crunching

Akos Fekete
Akos Fekete
Joined: 13 Nov 05
Posts: 561
Credit: 4527270
RAC: 0

RE: But are you sure the

Message 76625 in response to message 76624

Quote:
But are you sure the apps are using the same code basis and the only difference is the 64 bit mode?

Yes. I'm sure.

The key: ABC@Home uses lots of operations on 64 bit wide integers.
These operations are 2-3 times faster in 64 bit mode.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 771028403
RAC: 1163827

RE: RE: But are you sure

Message 76626 in response to message 76625

Quote:
Quote:
But are you sure the apps are using the same code basis and the only difference is the 64 bit mode?

Yes. I'm sure.

The key: ABC@Home uses lots of operations on 64 bit wide integers.
These operations are 2-3 times faster in 64 bit mode.

Ah, I see! Not something E@H would benefit from (mostly floating point ops), tho. Number theory projects are a different story, I admit!

CU
Bikeman

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 771028403
RAC: 1163827

I have to correct myself once

I have to correct myself once more, there is one area where even floating point calculations would benefit from the 64 bit instrcution set.

Some compilers tend to use a pair of 32bit integer move instructions to copy a 64 bit double precision floating point. This is very bad: It leads to mixing 32bit write- and 64 bit read-accesses on the same data, which is causing a so called "store forward stall", something which is quite expensive. I see this a lot in the code the MS compiler generated for the latest Windows beta app. With 64 bit instructions, even the dumbest compiler would copy the 64 bit floats in one instruction, preventing a store forward stall.

Even in 32 bit mode there would be ways around this effect, but in 64 bit mode it's kind of foolproof :-)

CU

Bikeman

Jesse Viviano
Jesse Viviano
Joined: 8 Jun 05
Posts: 33
Credit: 133045917
RAC: 0

The biggest advantages for

The biggest advantages for using 64-bit is that there are double the number of registers in the register files, that SSE2 support is guaranteed, and that function arguments can usually be passed in the registers.

Doubling the number of registers reduces the amount of wasted cycles caused by these reasons:

  • * The low number of registers in x86 32-bit mode forces the compiler or assembly programmer to use more move instructions, wasting cycles that would otherwise be used for computation. This also forces the CPU to use up more cache throughput, which could be used more efficiently if there are not so many instructions that must compete with one another to use the cache. Also, with more instructions competing to use the cache, the chance that a cache miss rises. Cache misses usually stall the processor until the data is fetched from a larger and slower cache or from main memory, unless the processor has some form of hardware multithreading.
    * CPU pipelines are getting really deep, forcing the use of many techniques to keep the pipelines full and the execution units busy. Unfortunately, these techniques only get you so far. The low amount of registers often creates bubbles, which are empty slots or groups of empty slots, in the pipeline which accomplish nothing but resolve dependencies. With more registers, the dependencies can often be spread out farther away from each other with other work whose data are in the extra registers, which can either reduce the number and size of the bubbles in the pipeline or completely eliminate them, resulting in greater efficiency.

The guarantee of SSE2 speeds things up because SSE2 can do anything that the x87 FPU can do but much more efficiently. With the old x87 FPU used in 16 and 32-bit modes, you must make sure that the data you want worked on are on the top two locations of the FPU stack, forcing the compiler or assembly programmer to create register exchange instructions that waste time. With SSE2, you can work on any two registers you desire, and can perform the same operation on small arrays of data if desired, raising efficiency. You also do not have to write code that checks to see if SSE2 is present or not, which invariably wastes time and program space.

Passing values to functions in the registers is much faster than passing them on the stack (a structure in memory), as is done in 32-bit mode and 16-bit mode. When a function is called in 32-bit mode or 16-bit mode, the arguments being passed are usually written to memory, and then are read from memory by the function that needs the data. In 64-bit mode, the arguments are kept in the registers unless there are too many arguments to keep in the registers under the calling convention being used, forcing the compiler or assembly programmer to push the remaining arguments onto the stack like in 32-bit or 16-bit mode.

Remember, registers are much faster than memory and caches. However, registers require huge amounts of chip real estate per bit compared to caches and RAM, so CPU designers cannot put many of them in a CPU and hope that the CPU remain inexpensive. Therefore, CPU designers must choose a good compromise between speed and cost when designing an architecture. The compiler's and the assembly programmer's responsibility is to use all of them effectively.

Stranger7777
Stranger7777
Joined: 17 Mar 05
Posts: 436
Credit: 432308872
RAC: 61931

And one more stone in this

And one more stone in this way: integer computations are significantly faster than floating ones. Whith large registers why not to change some fp-operations (where the precision is limited) with integer computation whith fixed point. Or may be it is better to use SSE2 instead (I don't yet know SSE2 enough)?

Akos Fekete
Akos Fekete
Joined: 13 Nov 05
Posts: 561
Credit: 4527270
RAC: 0

RE: And one more stone in

Message 76630 in response to message 76629

Quote:
And one more stone in this way: integer computations are significantly faster than floating ones.


It was true about 5-10 years ago.

Stranger7777
Stranger7777
Joined: 17 Mar 05
Posts: 436
Credit: 432308872
RAC: 61931

Let it be so. I'm familiar

Let it be so. I'm familiar whith new processors architecture, espesially with new command subsets like SSE,SSE2,SSE2,SSSE2 and so on. But the key of this thread is that we should ask Bernd to compile a 64-bit binary. Or may be, you can try it yourself, Akos?

Akos Fekete
Akos Fekete
Joined: 13 Nov 05
Posts: 561
Credit: 4527270
RAC: 0

RE: But the key of this

Message 76632 in response to message 76631

Quote:
But the key of this thread is that we should ask Bernd to compile a 64-bit binary. Or may be, you can try it yourself, Akos?


You should try to ask Bernd to compile a 64-bit binary.
I can't do it. I don't have access to the sources.
I would be glad to a x86-64 code...

Shawn
Shawn
Joined: 3 Mar 05
Posts: 1
Credit: 2616558
RAC: 0

I think that unless you know

I think that unless you know the code inside and out there wouldn't be much of a way to be sure if it would benefit from a 64bit recompile except trying it out. From what I understand porting to x64 is pretty easy. The main challenge is that, I do not believe there is a way to compile Win64 binaries with GCC (in that, there is no 64bit cygwin). Do you guys compile the Win32 binary with visual studio? Something else? How hard would it be to at least test it to see if it is worth bothering with?

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1364
Credit: 3562358667
RAC: 0

The windows boinc code is all

The windows boinc code is all written for the MS compiler. Bernd has tried (and failed) to get it to compile in gcc previously.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.