Because the 1.47_neon_beta is compiled for ARMv6. And the neon-engine under ARMv8 is 128bit instead of 64bit-wide for ARMv6/v7. So the Compiler may can optimzie better.
The 1.47_neon_beta app is actually compiled for ARMv7.
Yup, the first part of that string is the configured "long name" for the ARM-Linux platform on E@H, which dates back from the time when we had only an ARMv6 app.
I get it compiled for AARCH64 with -enable-neon Flag. I made it native on the device using gcc5.3.
Performance gets a little bit better but not that much. 40ks-45ks against 48-52ks without the enable-neon. That's not that much as I suspected.
But there is another huge Problem: Nearly the half of all tasks get invalid. This is regardless of the enable-neon, all other flags and gcc-version.
Maybe you can help me with that.
I don't understand why only the half of the tasks get invalid?
I get it compiled for AARCH64 with -enable-neon Flag. I made it native on the device using gcc5.3.
Performance gets a little bit better but not that much. 40ks-45ks against 48-52ks without the enable-neon. That's not that much as I suspected.
But there is another huge Problem: Nearly the half of all tasks get invalid. This is regardless of the enable-neon, all other flags and gcc-version.
Maybe you can help me with that.
I don't understand why only the half of the tasks get invalid?
Looks like it did not produce a single valid result yet :-( .
Playing around with the gcc optimization options too agressively can cause results that differ so much from others that the validator will not validate them. E.g. --ffast-math is kind of problematic, it enables a whole set of optimizations which can result in a loss of accuracy.
It is also weird that the performance is not somewhat better. I wonder how well tested the AARCH64 support for fftw is.
Playing around with the gcc optimization options too agressively can cause results that differ so much from others that the validator will not validate them. E.g. --ffast-math is kind of problematic, it enables a whole set of optimizations which can result in a loss of accuracy.
I don't have changed any of the gcc options. except the host-type and enabled two errata-Bugfixes. All other are standart from build.sh (linux-armv7neon-xcomp).
I only toyed around with FFTW-options. My first attempt has had the same Problem. Only some results get valid.
I think there is somethink odd with the float-handling. But thats curious the Arm-Developer-Guide says even NEON under AARCH64 implements full IEEE-floatpoint-standart for float & double.
But I have to change the GSL-lib-Version (and libXML but that shouldn't be the problem) in order to get it compiled, maybe there is the problem?
So I think I will try to remove the ffast-math-option run some WU's. If these get's valid, I'm switch back on one by one the options enabled by ffast-math, run some WU's an see if it's get valid. That's need much Time but that's the only way for me to find the issues. Better suggestions are welcome ;)
Quote:
It is also weird that the performance is not somewhat better. I wonder how well tested the AARCH64 support for fftw is.
I don't think that's a problem with FFTW's AARCH64-support. I've tryed to compile it for armhf and the time for running WU's are nearly the same.
I don't use wisdom's by now. But I don't think that there is a large speed-difference by using them?
I just took a closer look at the validation log for two random invalids of your odroid64. They show indeed slight mathematical differences that exceed our thresholds for validation. Those are not big discrepancies but if I take two 32bit results, they are always closer together than your 64bit result.
Okay, but how to fix that?
There are no "unsafe" GCC Options enabled. I switched GSL-lib back to 1.16(first Version that knows AARCH64 Hosttype) from 2.1. But this doesn't solve the Problem still produce invalid Tasks.(Running on a different Host https://einsteinathome.org/host/12251605)
I will take a closer look at ARM-Documention how floatpoint is handled under AARCH64-Instructionset.
RE: Because the
)
The 1.47_neon_beta app is actually compiled for ARMv7.
RE: The 1.47_neon_beta app
)
Ooops. -.- Yes you are right, as every time.
There is no Neon on ARMv6 so it must be compiled for ARMv7.
I only have take a look at this page:
https://einstein.phys.uwm.edu/apps.php
Linux running on ARMv6 (hard float), e.g. Raspberry Pi 1.47 (NEON_Beta)
RE: I only have take a
)
Yup, the first part of that string is the configured "long name" for the ARM-Linux platform on E@H, which dates back from the time when we had only an ARMv6 app.
I get it compiled for AARCH64
)
I get it compiled for AARCH64 with -enable-neon Flag. I made it native on the device using gcc5.3.
Performance gets a little bit better but not that much. 40ks-45ks against 48-52ks without the enable-neon. That's not that much as I suspected.
But there is another huge Problem: Nearly the half of all tasks get invalid. This is regardless of the enable-neon, all other flags and gcc-version.
Maybe you can help me with that.
I don't understand why only the half of the tasks get invalid?
Here is a link to the Device:
https://einsteinathome.org/host/12251605
RE: I get it compiled for
)
Looks like it did not produce a single valid result yet :-( .
Playing around with the gcc optimization options too agressively can cause results that differ so much from others that the validator will not validate them. E.g. --ffast-math is kind of problematic, it enables a whole set of optimizations which can result in a loss of accuracy.
It is also weird that the performance is not somewhat better. I wonder how well tested the AARCH64 support for fftw is.
RE: Playing around with the
)
I don't have changed any of the gcc options. except the host-type and enabled two errata-Bugfixes. All other are standart from build.sh (linux-armv7neon-xcomp).
I only toyed around with FFTW-options. My first attempt has had the same Problem. Only some results get valid.
I think there is somethink odd with the float-handling. But thats curious the Arm-Developer-Guide says even NEON under AARCH64 implements full IEEE-floatpoint-standart for float & double.
But I have to change the GSL-lib-Version (and libXML but that shouldn't be the problem) in order to get it compiled, maybe there is the problem?
So I think I will try to remove the ffast-math-option run some WU's. If these get's valid, I'm switch back on one by one the options enabled by ffast-math, run some WU's an see if it's get valid. That's need much Time but that's the only way for me to find the issues. Better suggestions are welcome ;)
I don't think that's a problem with FFTW's AARCH64-support. I've tryed to compile it for armhf and the time for running WU's are nearly the same.
I don't use wisdom's by now. But I don't think that there is a large speed-difference by using them?
removing the --ffast-math
)
removing the --ffast-math option doesn't solve the Problem. Still produce invalid Task's -.-
I just took a closer look at
)
I just took a closer look at the validation log for two random invalids of your odroid64. They show indeed slight mathematical differences that exceed our thresholds for validation. Those are not big discrepancies but if I take two 32bit results, they are always closer together than your 64bit result.
Okay, but how to fix
)
Okay, but how to fix that?
There are no "unsafe" GCC Options enabled. I switched GSL-lib back to 1.16(first Version that knows AARCH64 Hosttype) from 2.1. But this doesn't solve the Problem still produce invalid Tasks.(Running on a different Host https://einsteinathome.org/host/12251605)
I will take a closer look at ARM-Documention how floatpoint is handled under AARCH64-Instructionset.
What flags etc. was the 32bit
)
What flags etc. was the 32bit 1.47 app compiled with?