Hello fellow crunchers,
I created a faster version of BRP4G application. I know that CPU apps do not generate much credit, but I did it anyway for personal satisfaction.
You can expect a speedup of up to 50% if you don't hit the memory speed limit. For example, the time to complete when running 4 tasks can go down from 8600 seconds per each to 5600 seconds per each. I did not check the benefits in all-cores mode, but it should still be a bit faster than original.
Optimized version:
You need a SSE4-capable processor. Every mainstream AMD / Intel processor produced in last 10 years have this instruction set.
You need to set up an Anonymous platform.
- Set project to "no new tasks", wait for it to finish the cache of work. Optionally, you skip this step - E@H scheduler will probably resend you the tasks but they will start from scratch.
- Stop BOINC (close / kill program).
- Download https://drive.google.com/file/d/1dLO9oi7NRr5yl2xftYEmRjfbKewk1-1q/view?usp=sharing, choose windows or linux subdirectory and unpack the applications and app_info.xml to the einstein.phys.uwm.edu directory.
- Use provided app_info.xml as a starting point.
- If you want to use other applications, you need to add them yourself to this file. See file sched_reply_einstein.phys.uwm.edu.xml for hints about names. You also need the original binaries for the other applications, copy them temporarily somewhere else so that BOINC don't delete them (just in case).
- Run wisdom_generator application, it will take a long time (about 1 hour on Ryzen 9 7950X, single core performance matters). The application will generate BRP4.wisdom (FFTW wisdom) file in current directory. Ensure that the file is in the einstein.phys.uwm.edu, alongside the applications.
- FFTW wisdom is a kind of precalculated plan adjusted to your hardware for optimal performance. The default plan is not good enough, but it is generated in seconds.
- Start BOINC, check logs if anonymous platform was recognized without errors.
- In the logs of the task you should see a line "Using FFTW wisdom from file ../../projects/einstein.phys.uwm.edu/BRP4.wisdom." if the wisdom file was loaded correctly.
- ???
- Profit ;)
Some speedup with original application (Linux only):
If you don't want to set up an anonymous platform, you can still get a part of the speed up benefit by generating a wisdom file. You can expect up to 30% speedup if you don't hit the memory speed limit (I did not measure the speedup thoroughly, but it was noticeable).
Windows users, you are out of luck. The original application can load system-provided wisdom file, but only on Linux.
- Download the wisdom_generator app: https://drive.google.com/file/d/1gQPaJDCpUf2hOc7Qrrs6Id1oOk7z_0R9/view?usp=drive_link. This version of wisdom_generator uses the same compiler flags and FFTW version as the original application.
- Run wisdom_generator application, it will take a long time (about 1 hour on Ryzen 9 7950X, single core performance matters). The application will generate BRP4.wisdom (FFTW wisdom) file in current directory. Rename it as wisdomf and place in the default system location: /etc/fftw/wisdomf.
- FFTW wisdom is a kind of precalculated plan adjusted to your hardware for optimal performance. The default plan is not good enough, but it is generated in seconds.
- You don't have to restart anything, the wisdom file will be picked by currently running tasks when they start processing the next input file (the app processes 8 input files sequentially). However, to get the benefits instantly you can restart BOINC.
- The original application does not log any information about loading wisdom file, but if you watch completion percentage in the BOINC Manager, you will see a difference
On a side note, I think that some of the improvements could be incorporated into official app to give everyone a few percent speedup.
I don't run CPU work here,
)
I don't run CPU work here, but nice to see someone else out here optimizing apps :)
_________________________________________________________________________
Well done PERL, I
)
Well done PERL,
I have question for user of PERL -> Where are source code for Einstein applications?
I am interested in too for optimalization of codes, but i do not found git repository for Einstein application.
Thanks
Karel Simecek
Okay, I think I got it
)
Okay, I think I got it running on my AMD 1600AF. Initial estimated time to completion was over 21 hours.
EDIT: I'm running all 12 threads. A (very) quick look seems to suggest they'll finish after 10 or 11 hours. Once they're done, I'll go back to the project app and compare.
The first 12 tasks running
)
The first 12 tasks running the optimized app have finished. I noticed that CPU time is higher than elapsed time. Having checked the properties tab for the tasks when they were around 90% completion, hourly progress was between 10.6-10.8%.
DRAM at stock frequency (2133) as is the CPU(3.2 GHz, Turbo Core is disabled)
I have 12 tasks now running the standard app. After 30 minutes, the hourly progress has dropped to around 7.2-7.56%.
The results are in. I only
)
The results are in. I only ran 12 tasks with each app but I don't believe there is much variability between BRP4G tasks. Seems like a very decent speed increase (around 30% judging by elapsed time, I can't compare CPU time since the reported CPU time with the optimized app was higher than the elapsed time. No increase in power consumption from what I could tell via HWinfo (I did not plug the wattmeter). I'll use this app whenever this computer is running BRP4G.
Make this app official!
Thank you for looking into
)
Thank you for looking into that!
If it's just a matter of compiling the app (actually FFTW, I think) with SSE4 support and allowing to load a wisdom file, we could certainly build an official app that does this, also for Windows and Mac. We did provide for loading wisdom in the FGRP App, just need to dig that out.
Whenever I find the time, though.
BM
Sounds great!
)
Sounds great!
>>PERL: The wisdom*
)
>>PERL:
The wisdom* executable can't find "__get_cpu_features". ldd -r wisdom_generator.original.133 complains that it is an undefined symbol. It doesn't seem to be in the libc.so.6 or the libm.so.6 libraries.
Is it in another system library that I, perhaps, have not installed? Or is it a version conflict between my system and the wisdom* compiled version?
My system is: Debian 12.4.0, kernel 6.1.52, libc6 version 2.36-9+deb12u3.
***I have an off-line "old" Linux system and this symbol IS DEFINED in libc.so.6 but the libc6 package version is 2.19 from a Debian 8.8.0 release from long ago!** Looks like __get_cpu_features got renamed or moved to some other library?
(The CPU is Ryzen 7 5700X, chugging away on BRP4G tasks at ~24,000 seconds cpu time.)
Gene...
Eugene Stemple
)
Have you tried: < sudo apt get ... > ( without the < > ) for your cpu features? SUDO is a root request. I don't really know much about Debian, I run Ubuntu 22.04.3 LTS. But it is based off of Debian.
Proud member of the Old Farts Association
I also started running this
)
I also started running this app on my AMD 2500U laptop. Even running at 1.6 GHz, instead of the stock 2.0 GHz, on all 8 threads, the current tasks seem to be on track to finish after around 11-12 hours of CPU time, down from 20-22 hours with the standard app. Very large increase in computation speed.
The first task using PERL's optimized app vs the shortest task still on record using the stock app on my AMD 2500u laptop.
Binary Radio Pulsar Search (Arecibo,GBT,long) v1.33 () windows_x86_64