What flags etc. was the 32bit 1.47 app compiled with?
That would be nice to know.
I've don't solved the "invalid"-Problem. Tryed so many options (Crosscompile, native-Compile, GCC-5.3, GCC-4.9, for armhf or AARCH64), nothing works -.-
Even if I only use the build.sh without modifications I get invalid Tasks. It's so frustrating -.-
If no one else has a good idea, I think I stop at this Point. There are no options left for me.
Okay. I have made some success :). I have located the "invalid"-Problem. It occurs if I switch the Arch-type to ARMv8.
I've made a 32bit Version that is faster than 1.47beta ;P. Tooks ~28000sec against ~30000sec from the 1.47neon. But this version is still compiled for ARMv7.
And I've made a 64bit Version that sadly produces still invalid Tasks but gives me a really good time ~24000s.
If I can get the invalid-Problem handled this would be brilliant.
Both Versions are still not high optimized, I think I can squeeze some second's out of it. There are still no wisdom's activated. And there are some "aggressive" flags I want to try.
But at first I try to solve the invalid-Problem.
I was able to fix the "invalid"-Problem in my AARCH64-App. I had to modify the Source. After day's of dumping out data and comparing with my working ARMv7-App(Now I know what the BRP-App really does ;)). I was able to locate the Problem. It was mainly caused by sine and squareroot. It seem's that sin(x) and sqrt(x) are less precion when compiled for ARMv8-A. Worked that around by switching sin to sinl and sqrt to sqrtl. That cost some second's but delivers valid result's.
It's still realy fast. ==> ~24ksec without a Wisdom.
Now I start the "fine-tunning". Trying some Compilerflags, creating a Wisdom and maybe rewrite some lines of code in order to archive the best possible Performance.
what's the status of your little project? I now also got my C2 and will build the BRP app for native AARCH64. In the end what version of FFTW did you use? Were there any other changes besides the build.sh and sin, sqrt changes?
I've rewriten some parts of the code(mainly the resampling) in order to get it vectorized and take advantage of the NEON. And I'm not finished, there is still a lot of potential left. BTW: This should be possible for SSE too.
I think I finish it in one or two weeks. Goal is to get it always below 4h Run-Time on ODROID-C2.
But my changes are too much to write it down here.
Maybe you can simply use my Version? My original plan was to upload it somewhere and make it puplic here in the forum (If it's finished). But if we can upload it to your boinc-app-server we will reach more Devices :) And my Version is really fast ;P
But beside of that, here are the minimum changes to get a working copy: build.sh:
But thas seem's to be a problem with my LibDC an it is maybe not needed for you.
I'm running Odrobian Vanilla an not these creepy Ubuntu from HardkernelFFTW-Version:
3.3.3 seem's to be the fastest but 3.3.2 works also.
Hopefully this list is complete, I have the feeling that I missed somethink ;)
We would be very interested in you changes. Right now I'm using the Ubuntu Xenial 16.04 image that was preinstalled and can't get FFTW3 compiled on the C2. I tried 3.3.2, 3.3.4 and the master branch from the github repository. I'll try some more or use the FFTW build by Ubuntu (if available).
Have you tried to add NEON_CFLAGS="-D__ARM_NEON__" to FFTW's configure? More changes are not required. And use gcc >= 4.9.3. The best result I've got with gcc 5.3.x.
I can provide you my source-files. But give me some time to add some commands ;)
I'm using special NEON-instructions form .
This really produces unreadable-code ;). It's close to ASM.
I do this in two steps:
1: Unroll the loop in a special way. And seperate code in two part's vectorizable & unvectorizable.
2: using NEON-instructions (I think the are available for SSE too).
I have saved both files so you can see how it's done.
I found the problem I was running into. I forgot to remove a configure option (--enable-maintainer-mode) needed when directly building the master branch. I now can build fftw 3.3.2 by patching the two files in this commit: Double precision Neon SIMD for aarch64 and patching the configure script accordingly.
I'm now also building the rest of the needed software directly on the C2 before I switch to the fast fftw version you suggested.
It seems I also have to remove some packages from the Ubuntu Mate that where preinstalled by the hardkernel guys because my 8GB eMMC is almost full.
Source code will do so I can compare it with the released code. If you feel more comfortable to add German comments, feel free to do so.
I found the problem I was running into. I forgot to remove a configure option (--enable-maintainer-mode) needed when directly building the master branch. I now can build fftw 3.3.2 by patching the two files in this commit: Double precision Neon SIMD for aarch64 and patching the configure script accordingly.
I don't suggest you to do that, simply add NEON_CFLAGS="-D__ARM_NEON__" as option to FFTW's configure and you ready. There is nothing more needed. I've tried to apply these patches too, it's not worth the trouble. Result is the same as adding this flag.
I had no problem with binutils but had to add the build type to the configure commands for gsl and libxml because config.guess is too old.
Quote:
einsteinbinary-Makefile:
- demod_binary_resamp_cpu.c: - ffp-contract=off
I don't understand this change. You mean I should add the compiler option to the demod_binary_resamp_cpu.o target?
Quote:
source:
- sqrt -> sqrtl
- sin -> sinl
But thas seem's to be a problem with my LibDC an it is maybe not needed for you.
I'm running Odrobian Vanilla an not these creepy Ubuntu from Hardkernel
I didn't change this as I want to see if I run into the same problems as you are.
Quote:
FFTW-Version:
3.3.3 seem's to be the fastest but 3.3.2 works also.
I'm currently running a version (via app_info) that was build using 3.3.2 to get a baseline but than will try one using 3.3.3 next.
RE: What flags etc. was the
)
That would be nice to know.
I've don't solved the "invalid"-Problem. Tryed so many options (Crosscompile, native-Compile, GCC-5.3, GCC-4.9, for armhf or AARCH64), nothing works -.-
Even if I only use the build.sh without modifications I get invalid Tasks. It's so frustrating -.-
If no one else has a good idea, I think I stop at this Point. There are no options left for me.
Okay. I have made some
)
Okay. I have made some success :). I have located the "invalid"-Problem. It occurs if I switch the Arch-type to ARMv8.
I've made a 32bit Version that is faster than 1.47beta ;P. Tooks ~28000sec against ~30000sec from the 1.47neon. But this version is still compiled for ARMv7.
And I've made a 64bit Version that sadly produces still invalid Tasks but gives me a really good time ~24000s.
If I can get the invalid-Problem handled this would be brilliant.
Both Versions are still not high optimized, I think I can squeeze some second's out of it. There are still no wisdom's activated. And there are some "aggressive" flags I want to try.
But at first I try to solve the invalid-Problem.
I was able to fix the
)
I was able to fix the "invalid"-Problem in my AARCH64-App. I had to modify the Source. After day's of dumping out data and comparing with my working ARMv7-App(Now I know what the BRP-App really does ;)). I was able to locate the Problem. It was mainly caused by sine and squareroot. It seem's that sin(x) and sqrt(x) are less precion when compiled for ARMv8-A. Worked that around by switching sin to sinl and sqrt to sqrtl. That cost some second's but delivers valid result's.
It's still realy fast. ==> ~24ksec without a Wisdom.
Now I start the "fine-tunning". Trying some Compilerflags, creating a Wisdom and maybe rewrite some lines of code in order to archive the best possible Performance.
Hi, what's the status of
)
Hi,
what's the status of your little project? I now also got my C2 and will build the BRP app for native AARCH64. In the end what version of FFTW did you use? Were there any other changes besides the build.sh and sin, sqrt changes?
Hi, I've made some
)
Hi,
I've made some Progress, you can see at my last Tasks (4 in parallel): https://einsteinathome.org/host/12260646/tasks
My Version runs below 15ksec :)
I've rewriten some parts of the code(mainly the resampling) in order to get it vectorized and take advantage of the NEON. And I'm not finished, there is still a lot of potential left. BTW: This should be possible for SSE too.
I think I finish it in one or two weeks. Goal is to get it always below 4h Run-Time on ODROID-C2.
But my changes are too much to write it down here.
Maybe you can simply use my Version? My original plan was to upload it somewhere and make it puplic here in the forum (If it's finished). But if we can upload it to your boinc-app-server we will reach more Devices :) And my Version is really fast ;P
But beside of that, here are the minimum changes to get a working copy:
build.sh:
- LIBXML-Version 2.9.3
einsteinbinary-Makefile:
source:
But thas seem's to be a problem with my LibDC an it is maybe not needed for you.
I'm running Odrobian Vanilla an not these creepy Ubuntu from HardkernelFFTW-Version:
Hopefully this list is complete, I have the feeling that I missed somethink ;)
We would be very interested
)
We would be very interested in you changes. Right now I'm using the Ubuntu Xenial 16.04 image that was preinstalled and can't get FFTW3 compiled on the C2. I tried 3.3.2, 3.3.4 and the master branch from the github repository. I'll try some more or use the FFTW build by Ubuntu (if available).
Have you tried to add
)
Have you tried to add NEON_CFLAGS="-D__ARM_NEON__" to FFTW's configure? More changes are not required. And use gcc >= 4.9.3. The best result I've got with gcc 5.3.x.
I can provide you my source-files. But give me some time to add some commands ;)
I'm using special NEON-instructions form .
This really produces unreadable-code ;). It's close to ASM.
I do this in two steps:
1: Unroll the loop in a special way. And seperate code in two part's vectorizable & unvectorizable.
2: using NEON-instructions (I think the are available for SSE too).
I have saved both files so you can see how it's done.
I found the problem I was
)
I found the problem I was running into. I forgot to remove a configure option (--enable-maintainer-mode) needed when directly building the master branch. I now can build fftw 3.3.2 by patching the two files in this commit: Double precision Neon SIMD for aarch64 and patching the configure script accordingly.
I'm now also building the rest of the needed software directly on the C2 before I switch to the fast fftw version you suggested.
It seems I also have to remove some packages from the Ubuntu Mate that where preinstalled by the hardkernel guys because my 8GB eMMC is almost full.
Source code will do so I can compare it with the released code. If you feel more comfortable to add German comments, feel free to do so.
RE: I found the problem I
)
I don't suggest you to do that, simply add NEON_CFLAGS="-D__ARM_NEON__" as option to FFTW's configure and you ready. There is nothing more needed. I've tried to apply these patches too, it's not worth the trouble. Result is the same as adding this flag.
RE: But beside of that,
)
I had no problem with binutils but had to add the build type to the configure commands for gsl and libxml because config.guess is too old.
I don't understand this change. You mean I should add the compiler option to the demod_binary_resamp_cpu.o target?
I didn't change this as I want to see if I run into the same problems as you are.
I'm currently running a version (via app_info) that was build using 3.3.2 to get a baseline but than will try one using 3.3.3 next.