Looks like both sides flunked the bet. No sign of any software on Raspberry Pi or Parallella and today is the 15th of September the deadline and a Birtshday
I would pm BikeMan he was supposed to be working on this. But I suspect he has got distracted. The Source should be available from http://einstein.phys.uwm.edu/license.php
with a sort template bank (due to RaspberryPi) [templates_400Hz_2_short2.bank].
And now the bad news.
The raspberryPi @ 1GHz is almost 16 times slower than an AMD FX-8350 @ 4GHz.
The really bad news is that the results [results_profile.cand] from raspberryPi differ in some numbers (at the 4th fractional digit) compared to the results of FX-8350. I build the x86_64 client on another two PCs (AMD and Intel) and the results are the same with the FX-8350.
Is this normal for the RaspberryPi client ?
Is it possible the compilation flags I add for the cross-compile to produce these errors?
After running the same test workunit with several EaH clients (officials) on different architectures (x86_64, ARM, CUDA) I get different results. I guest that this is normal for the EaH (???).
The main problem now is that the ARM client I build produce different results than the official ARM client. This is most probably due to different CFLAGS I used.
Hope you have some success. I have two Pi's running Einstien@home 24 x 7 and would love somebody to exploit the FFT library so that they run at a more reasonable rate.
Hello, I try to measure the processing time for the FFT on the RaspberryPi and I see that (file: demod_binary_fft_fftw.c) the FFT inputs are nsamples==6M and fft_size==3M. The FFTW plan is a R2C FFT of 6M-points (nsamples) which outputs 3M complex points (half of the FFT output).
My question is that the R2C FFT size is 3*2^22 or 3*2^21 ??
I am trying to implement this FFT by using the GPU on raspberry which supports up to 1M-point C2C FFTs. I have implement the Radix-2, Radix-3 and the C2C to R2C stages and I try to measure the potential speedup I can have with the GPU-FFT and the fft_size is crucial to the measurements...
Is the fft_size fixed or is variable based on WU ?
I take a WU from a running system (ARM) and the fft_size is 12M-point R2C.
Most probably there are units that needs 12M-points FFTs and units that needs 6M-points FFTs (or I had very old units for testing).
I take some time measurements and I see that the FFT processing time (FFTW) is about 58% of the total template loop processing time on a RapberryPi @1GHz and about 63% on a Parallella board. This different is most probably due to NEON engine in the Parallella's ARM processor.
Looks like both sides flunked
)
Looks like both sides flunked the bet. No sign of any software on Raspberry Pi or Parallella and today is the 15th of September the deadline and a Birtshday
Hello, I have a raspberryPi
)
Hello, I have a raspberryPi B+ and I want to ask if there is a way to build the EaH client for ARM.
I want to test if there is any possibility to use the GPU FFT API for the 3*2^22 FFT of the EaH... :)
Thank you,
I would pm BikeMan he was
)
I would pm BikeMan he was supposed to be working on this. But I suspect he has got distracted. The Source should be available from http://einstein.phys.uwm.edu/license.php
Hello, The good news is
)
Hello,
The good news is that I managed to build the EaH client for RaspberryPi.
I download a test workunit from here and I run the following command in both RaspberryPi and a x86_64 PC :
./einsteinbinary_XXXX-pc-linux-gnu -t ./test/templates_400Hz_2_short2.bank -l zaplist_232.txt -A 0.04 -P 3.0 -W -z -i ./test/J1907+0740_dm_482.binary -c status_profile.cpt -o results_profile.cand
with a sort template bank (due to RaspberryPi) [templates_400Hz_2_short2.bank].
And now the bad news.
The raspberryPi @ 1GHz is almost 16 times slower than an AMD FX-8350 @ 4GHz.
The really bad news is that the results [results_profile.cand] from raspberryPi differ in some numbers (at the 4th fractional digit) compared to the results of FX-8350. I build the x86_64 client on another two PCs (AMD and Intel) and the results are the same with the FX-8350.
Is this normal for the RaspberryPi client ?
Is it possible the compilation flags I add for the cross-compile to produce these errors?
I use these flags for the compilation:
CFLAGS="-march=armv6zk -mcpu=arm1176jzf-s -mtune=arm1176jzf-s -mfpu=vfp -mfloat-abi=hard"
After running the same test
)
After running the same test workunit with several EaH clients (officials) on different architectures (x86_64, ARM, CUDA) I get different results. I guest that this is normal for the EaH (???).
The main problem now is that the ARM client I build produce different results than the official ARM client. This is most probably due to different CFLAGS I used.
Hope you have some success. I
)
Hope you have some success. I have two Pi's running Einstien@home 24 x 7 and would love somebody to exploit the FFT library so that they run at a more reasonable rate.
Hello, I try to measure the
)
Hello, I try to measure the processing time for the FFT on the RaspberryPi and I see that (file: demod_binary_fft_fftw.c) the FFT inputs are nsamples==6M and fft_size==3M. The FFTW plan is a R2C FFT of 6M-points (nsamples) which outputs 3M complex points (half of the FFT output).
My question is that the R2C FFT size is 3*2^22 or 3*2^21 ??
I am trying to implement this FFT by using the GPU on raspberry which supports up to 1M-point C2C FFTs. I have implement the Radix-2, Radix-3 and the C2C to R2C stages and I try to measure the potential speedup I can have with the GPU-FFT and the fft_size is crucial to the measurements...
Is the fft_size fixed or is variable based on WU ?
Thank you,
Check the following message
)
Check the following message http://einsteinathome.org/node/196560&nowrap=true#128680
It suggest to me that it computes a 3*2^22 real to complex DFT.
I take a WU from a running
)
I take a WU from a running system (ARM) and the fft_size is 12M-point R2C.
Most probably there are units that needs 12M-points FFTs and units that needs 6M-points FFTs (or I had very old units for testing).
I take some time measurements and I see that the FFT processing time (FFTW) is about 58% of the total template loop processing time on a RapberryPi @1GHz and about 63% on a Parallella board. This different is most probably due to NEON engine in the Parallella's ARM processor.
Thank you,
I see the new version of the
)
I see the new version of the Pi's FFT library now supports 2^21. Does that mean its now closer but no cigar. See https://github.com/raspberrypi/firmware/tree/master/opt/vc/src/hello_pi/hello_fft