Compiling BRP for AARCH64-Linux

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 733975652
RAC: 1285711

RE: Because the

Quote:
Because the 1.47_neon_beta is compiled for ARMv6. And the neon-engine under ARMv8 is 128bit instead of 64bit-wide for ARMv6/v7. So the Compiler may can optimzie better.

The 1.47_neon_beta app is actually compiled for ARMv7.

N30dG
N30dG
Joined: 29 Feb 16
Posts: 89
Credit: 4805610
RAC: 0

RE: The 1.47_neon_beta app

Quote:
The 1.47_neon_beta app is actually compiled for ARMv7.


Ooops. -.- Yes you are right, as every time.
There is no Neon on ARMv6 so it must be compiled for ARMv7.

I only have take a look at this page:
https://einstein.phys.uwm.edu/apps.php
Linux running on ARMv6 (hard float), e.g. Raspberry Pi 1.47 (NEON_Beta)

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 733975652
RAC: 1285711

RE: I only have take a

Quote:

I only have take a look at this page:
https://einstein.phys.uwm.edu/apps.php
Linux running on ARMv6 (hard float), e.g. Raspberry Pi 1.47 (NEON_Beta)

Yup, the first part of that string is the configured "long name" for the ARM-Linux platform on E@H, which dates back from the time when we had only an ARMv6 app.

N30dG
N30dG
Joined: 29 Feb 16
Posts: 89
Credit: 4805610
RAC: 0

I get it compiled for AARCH64

I get it compiled for AARCH64 with -enable-neon Flag. I made it native on the device using gcc5.3.
Performance gets a little bit better but not that much. 40ks-45ks against 48-52ks without the enable-neon. That's not that much as I suspected.

But there is another huge Problem: Nearly the half of all tasks get invalid. This is regardless of the enable-neon, all other flags and gcc-version.
Maybe you can help me with that.
I don't understand why only the half of the tasks get invalid?

Here is a link to the Device:
https://einsteinathome.org/host/12251605

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 733975652
RAC: 1285711

RE: I get it compiled for

Quote:

I get it compiled for AARCH64 with -enable-neon Flag. I made it native on the device using gcc5.3.
Performance gets a little bit better but not that much. 40ks-45ks against 48-52ks without the enable-neon. That's not that much as I suspected.

But there is another huge Problem: Nearly the half of all tasks get invalid. This is regardless of the enable-neon, all other flags and gcc-version.
Maybe you can help me with that.
I don't understand why only the half of the tasks get invalid?

Here is a link to the Device:
https://einsteinathome.org/host/12251605

Looks like it did not produce a single valid result yet :-( .

Playing around with the gcc optimization options too agressively can cause results that differ so much from others that the validator will not validate them. E.g. --ffast-math is kind of problematic, it enables a whole set of optimizations which can result in a loss of accuracy.

It is also weird that the performance is not somewhat better. I wonder how well tested the AARCH64 support for fftw is.

N30dG
N30dG
Joined: 29 Feb 16
Posts: 89
Credit: 4805610
RAC: 0

RE: Playing around with the

Quote:
Playing around with the gcc optimization options too agressively can cause results that differ so much from others that the validator will not validate them. E.g. --ffast-math is kind of problematic, it enables a whole set of optimizations which can result in a loss of accuracy.


I don't have changed any of the gcc options. except the host-type and enabled two errata-Bugfixes. All other are standart from build.sh (linux-armv7neon-xcomp).
I only toyed around with FFTW-options. My first attempt has had the same Problem. Only some results get valid.

I think there is somethink odd with the float-handling. But thats curious the Arm-Developer-Guide says even NEON under AARCH64 implements full IEEE-floatpoint-standart for float & double.

But I have to change the GSL-lib-Version (and libXML but that shouldn't be the problem) in order to get it compiled, maybe there is the problem?

So I think I will try to remove the ffast-math-option run some WU's. If these get's valid, I'm switch back on one by one the options enabled by ffast-math, run some WU's an see if it's get valid. That's need much Time but that's the only way for me to find the issues. Better suggestions are welcome ;)

Quote:
It is also weird that the performance is not somewhat better. I wonder how well tested the AARCH64 support for fftw is.


I don't think that's a problem with FFTW's AARCH64-support. I've tryed to compile it for armhf and the time for running WU's are nearly the same.
I don't use wisdom's by now. But I don't think that there is a large speed-difference by using them?

N30dG
N30dG
Joined: 29 Feb 16
Posts: 89
Credit: 4805610
RAC: 0

removing the --ffast-math

removing the --ffast-math option doesn't solve the Problem. Still produce invalid Task's -.-

Christian Beer
Christian Beer
Joined: 9 Feb 05
Posts: 595
Credit: 188569457
RAC: 170522

I just took a closer look at

I just took a closer look at the validation log for two random invalids of your odroid64. They show indeed slight mathematical differences that exceed our thresholds for validation. Those are not big discrepancies but if I take two 32bit results, they are always closer together than your 64bit result.

N30dG
N30dG
Joined: 29 Feb 16
Posts: 89
Credit: 4805610
RAC: 0

Okay, but how to fix

Okay, but how to fix that?
There are no "unsafe" GCC Options enabled. I switched GSL-lib back to 1.16(first Version that knows AARCH64 Hosttype) from 2.1. But this doesn't solve the Problem still produce invalid Tasks.(Running on a different Host https://einsteinathome.org/host/12251605)

I will take a closer look at ARM-Documention how floatpoint is handled under AARCH64-Instructionset.

koschi
koschi
Joined: 17 Mar 05
Posts: 86
Credit: 1693700875
RAC: 824528

What flags etc. was the 32bit

What flags etc. was the 32bit 1.47 app compiled with?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.