BRP4G enhanced (faster) app

Perl
Perl
Joined: 3 Oct 11
Posts: 4
Credit: 89445320
RAC: 0
Topic 230607

Hello fellow crunchers,

I created a faster version of BRP4G application. I know that CPU apps do not generate much credit, but I did it anyway for personal satisfaction.
You can expect a speedup of up to 50% if you don't hit the memory speed limit. For example, the time to complete when running 4 tasks can go down from 8600 seconds per each to 5600 seconds per each. I did not check the benefits in all-cores mode, but it should still be a bit faster than original.

 

Optimized version:

You need a SSE4-capable processor. Every mainstream AMD / Intel processor produced in last 10 years have this instruction set.
You need to set up an Anonymous platform.

  1. Set project to "no new tasks", wait for it to finish the cache of work. Optionally, you skip this step - E@H scheduler will probably resend you the tasks but they will start from scratch.
  2. Stop BOINC (close / kill program).
  3. Download https://drive.google.com/file/d/1dLO9oi7NRr5yl2xftYEmRjfbKewk1-1q/view?usp=sharing, choose windows or linux subdirectory and unpack the applications and app_info.xml to the einstein.phys.uwm.edu directory.
  4. Use provided app_info.xml as a starting point.
    1. If you want to use other applications, you need to add them yourself to this file. See file sched_reply_einstein.phys.uwm.edu.xml for hints about names. You also need the original binaries for the other applications, copy them temporarily somewhere else so that BOINC don't delete them (just in case).
  5. Run wisdom_generator application, it will take a long time (about 1 hour on Ryzen 9 7950X, single core performance matters). The application will generate BRP4.wisdom (FFTW wisdom) file in current directory. Ensure that the file is in the einstein.phys.uwm.edu, alongside the applications.
    1. FFTW wisdom is a kind of precalculated plan adjusted to your hardware for optimal performance. The default plan is not good enough, but it is generated in seconds.
  6. Start BOINC, check logs if anonymous platform was recognized without errors.
    1. In the logs of the task you should see a line "Using FFTW wisdom from file ../../projects/einstein.phys.uwm.edu/BRP4.wisdom." if the wisdom file was loaded correctly.
  7. ???
  8. Profit ;)


Some speedup with original application (Linux only):

If you don't want to set up an anonymous platform, you can still get a part of the speed up benefit by generating a wisdom file. You can expect up to 30% speedup if you don't hit the memory speed limit (I did not measure the speedup thoroughly, but it was noticeable).

Windows users, you are out of luck. The original application can load system-provided wisdom file, but only on Linux.

  1. Download the wisdom_generator app: https://drive.google.com/file/d/1gQPaJDCpUf2hOc7Qrrs6Id1oOk7z_0R9/view?usp=drive_link. This version of wisdom_generator uses the same compiler flags and FFTW version as the original application.
  2. Run wisdom_generator application, it will take a long time (about 1 hour on Ryzen 9 7950X, single core performance matters). The application will generate BRP4.wisdom (FFTW wisdom) file in current directory. Rename it as wisdomf and place in the default system location: /etc/fftw/wisdomf.
    1. FFTW wisdom is a kind of precalculated plan adjusted to your hardware for optimal performance. The default plan is not good enough, but it is generated in seconds.
  3. You don't have to restart anything, the wisdom file will be picked by currently running tasks when they start processing the next input file (the app processes 8 input files sequentially). However, to get the benefits instantly you can restart BOINC.
    1. The original application does not log any information about loading wisdom file, but if you watch completion percentage in the BOINC Manager, you will see a difference

 

On a side note, I think that some of the improvements could be incorporated into official app to give everyone a few percent speedup.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3945
Credit: 46588002642
RAC: 64200310

I don't run CPU work here,

I don't run CPU work here, but nice to see someone else out here optimizing apps :)

 

_________________________________________________________________________

KAREL SIMECEK
KAREL SIMECEK
Joined: 24 Mar 08
Posts: 1
Credit: 829432532
RAC: 764244

Well done PERL,    I

Well done PERL

 

I have question for user of PERL -> Where are source code for Einstein applications?

I am interested in too for optimalization of codes, but i do not found git repository for Einstein application.

Thanks

Karel Simecek

Falconet
Falconet
Joined: 9 Mar 09
Posts: 49
Credit: 15421049
RAC: 644

Okay, I think I got it

Okay, I think I got it running on my AMD 1600AF. Initial estimated time to completion was over 21 hours.

EDIT: I'm running all 12 threads. A (very) quick look seems to suggest they'll finish after 10 or 11 hours. Once they're done, I'll go back to the project app and compare.

 

Falconet
Falconet
Joined: 9 Mar 09
Posts: 49
Credit: 15421049
RAC: 644

The first 12 tasks running

The first 12 tasks running the optimized app have finished. I noticed that CPU time is higher than elapsed time. Having checked the properties tab for the tasks when they were around 90% completion, hourly progress was between 10.6-10.8%.

DRAM at stock frequency (2133) as is the CPU(3.2 GHz, Turbo Core is disabled)

I have 12 tasks now running the standard app. After 30 minutes, the hourly progress has dropped to around 7.2-7.56%.

 

p2030.20181214.G191.58-01.34.N.b0s0g0.00000_152_0 780070403 16 Jan 2024 17:05:02 UTC 17 Jan 2024 15:20:15 UTC Completed, waiting for validation 33,768 44,105 0 Binary Radio Pulsar Search (Arecibo,GBT,long) Anonymous platform
p2030.20181214.G191.58-01.34.N.b0s0g0.00000_936_0 780071621 16 Jan 2024 17:07:07 UTC 17 Jan 2024 15:21:37 UTC Completed, waiting for validation 33,901 44,283 0 Binary Radio Pulsar Search (Arecibo,GBT,long) Anonymous platform
p2030.20181214.G191.58-01.34.N.b0s0g0.00000_1248_0 780071661 16 Jan 2024 17:09:11 UTC 17 Jan 2024 15:17:08 UTC Completed, waiting for validation 33,635 44,005 0 Binary Radio Pulsar Search (Arecibo,GBT,long) Anonymous platform
p2030.20181214.G191.58-01.34.N.b0s0g0.00000_1336_0 780071673 16 Jan 2024 17:08:09 UTC 17 Jan 2024 15:19:13 UTC Completed and validated 33,728 44,091 500 Binary Radio Pulsar Search (Arecibo,GBT,long) Anonymous platform
p2030.20181214.G191.58-01.34.N.b0s0g0.00000_2488_0 780071824 16 Jan 2024 17:14:24 UTC 17 Jan 2024 15:25:27 UTC Completed, waiting for validation 33,552 43,974 0 Binary Radio Pulsar Search (Arecibo,GBT,long) Anonymous platform
p2030.20181214.G191.58-01.34.N.b0s0g0.00000_2496_0 780071825 16 Jan 2024 17:12:19 UTC 17 Jan 2024 15:18:11 UTC Completed, waiting for validation 33,094 43,533 0 Binary Radio Pulsar Search (Arecibo,GBT,long) Anonymous platform
p2030.20181214.G191.58-01.34.S.b0s0g0.00000_1144_0 780091323 16 Jan 2024 17:10:14 UTC 17 Jan 2024 15:15:05 UTC Completed, waiting for validation 33,511 43,921 0 Binary Radio Pulsar Search (Arecibo,GBT,long) Anonymous platform
p2030.20181214.G191.70-01.11.C.b0s0g0.00000_1904_0 780116845 16 Jan 2024 17:13:22 UTC 17 Jan 2024 15:19:13 UTC Completed, waiting for validation 33,144 43,611 0 Binary Radio Pulsar Search (Arecibo,GBT,long) Anonymous platform
p2030.20181214.G191.70-01.11.C.b0s0g0.00000_2224_0 780116887 16 Jan 2024 17:06:04 UTC 17 Jan 2024 15:22:39 UTC Completed, waiting for validation 33,936 44,254 0 Binary Radio Pulsar Search (Arecibo,GBT,long) Anonymous platform
p2030.20181214.G191.70-01.11.C.b1s0g0.00000_976_0 780117262 16 Jan 2024 17:03:59 UTC 17 Jan 2024 15:20:15 UTC Completed and validated 33,756 44,163 500 Binary Radio Pulsar Search (Arecibo,GBT,long) Anonymous platform
p2030.20181214.G191.70-01.11.C.b1s0g0.00000_1000_0 780117266 16 Jan 2024 17:02:57 UTC 17 Jan 2024 15:20:15 UTC Completed, waiting for validation 33,758 44,135 0 Binary Radio Pulsar Search (Arecibo,GBT,long) Anonymous platform
p2030.20181214.G191.70-01.11.C.b2s0g0.00000_3208_0 780118124 16 Jan 2024 17:11:17 UTC 17 Jan 2024 15:20:15 UTC Completed and validated 33,234 43,701 500 Binary Radio Pulsar Search (Arecibo,GBT,long) Anonymous platform
Falconet
Falconet
Joined: 9 Mar 09
Posts: 49
Credit: 15421049
RAC: 644

The results are in. I only

The results are in. I only ran 12 tasks with each app but I don't believe there is much variability between BRP4G tasks. Seems like a very decent speed increase (around 30% judging by elapsed time, I can't compare CPU time since the reported CPU time with the optimized app was higher than the elapsed time. No increase in power consumption from what I could tell via HWinfo (I did not plug the wattmeter).  I'll use this app whenever this computer is running BRP4G.

 

Make this app official!

 

 

 

p2030.20181214.G192.46+00.27.N.b0s0g0.00000_3776_0 780373657 17 Jan 2024 15:49:20 UTC 18 Jan 2024 16:46:26 UTC Completed and validated 49,192 48,997 500 Binary Radio Pulsar Search (Arecibo,GBT,long) v1.33 () windows_x86_64
p2030.20181214.G192.46+00.27.N.b1s0g0.00000_2640_0 780374167 17 Jan 2024 15:51:26 UTC 18 Jan 2024 16:40:36 UTC Completed and validated 48,846 48,625 500 Binary Radio Pulsar Search (Arecibo,GBT,long) v1.33 () windows_x86_64
p2030.20181214.G192.46+00.27.N.b2s0g0.00000_2392_0 780374994 17 Jan 2024 15:51:26 UTC 18 Jan 2024 16:45:18 UTC Completed and validated 49,122 48,928 500 Binary Radio Pulsar Search (Arecibo,GBT,long) v1.33 () windows_x86_64
p2030.20181214.G192.46+00.27.N.b2s0g0.00000_2400_0 780375041 17 Jan 2024 15:51:26 UTC 18 Jan 2024 16:46:26 UTC Completed and validated 49,167 48,970 500 Binary Radio Pulsar Search (Arecibo,GBT,long) v1.33 () windows_x86_64
                 
                 
p2030.20181214.G192.46+00.27.N.b5s0g0.00000_2576_0 780381357 17 Jan 2024 15:49:20 UTC 18 Jan 2024 16:44:15 UTC Completed and validated 49,061 48,840 500 Binary Radio Pulsar Search (Arecibo,GBT,long) v1.33 () windows_x86_64
p2030.20181022.G189.25+00.54.C.b1s0g0.00000_1808_1 778550807 17 Jan 2024 15:49:20 UTC 18 Jan 2024 16:49:11 UTC Completed and validated 49,357 49,168 500 Binary Radio Pulsar Search (Arecibo,GBT,long) v1.33 () windows_x86_64
p2030.20181023.G188.75-00.81.N.b5s0g0.00000_3120_1 778614795 17 Jan 2024 15:48:17 UTC 18 Jan 2024 16:48:04 UTC Completed and validated 49,289 49,089 500 Binary Radio Pulsar Search (Arecibo,GBT,long) v1.33 () windows_x86_64
p2030.20181023.G188.75-00.81.N.b5s0g0.00000_3144_1 778614798 17 Jan 2024 15:48:17 UTC 18 Jan 2024 16:53:42 UTC Completed and validated 49,626 49,438 500 Binary Radio Pulsar Search (Arecibo,GBT,long) v1.33 () windows_x86_64
p2030.20181023.G188.75-00.81.N.b5s0g0.00000_3168_1 778614801 17 Jan 2024 15:48:17 UTC 18 Jan 2024 16:52:25 UTC Completed and validated 49,533 49,351 500 Binary Radio Pulsar Search (Arecibo,GBT,long) v1.33 () windows_x86_64
p2030.20181022.G189.25+00.54.C.b2s0g0.00000_1192_1 778551241 17 Jan 2024 15:50:23 UTC 18 Jan 2024 16:51:23 UTC Completed and validated 49,485 49,268 500 Binary Radio Pulsar Search (Arecibo,GBT,long) v1.33 () windows_x86_64
p2030.20181022.G189.25+00.54.C.b2s0g0.00000_1296_1 778551256 17 Jan 2024 15:50:23 UTC 18 Jan 2024 16:51:23 UTC Completed and validated 49,476 49,294 500 Binary Radio Pulsar Search (Arecibo,GBT,long) v1.33 () windows_x86_64
p2030.20181022.G189.25+00.54.C.b2s0g0.00000_1312_1 778551258 17 Jan 2024 15:50:23 UTC 18 Jan 2024 16:50:19 UTC Completed and validated 49,424 49,194 500 Binary Radio Pulsar Search (Arecibo,GBT,long) v1.33 () windows_x86_64
Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250360259
RAC: 35898

Thank you for looking into

Thank you for looking into that!

If it's just a matter of compiling the app (actually FFTW, I think) with SSE4 support and allowing to load a wisdom file, we could certainly build an official app that does this, also for Windows and Mac. We did provide for loading wisdom in the FGRP App, just need to dig that out.

Whenever I find the time, though.

 

BM

Falconet
Falconet
Joined: 9 Mar 09
Posts: 49
Credit: 15421049
RAC: 644

Sounds great!

Sounds great!

Eugene Stemple
Eugene Stemple
Joined: 9 Feb 11
Posts: 67
Credit: 372836908
RAC: 542555

>>PERL: The wisdom*

>>PERL:

The wisdom* executable can't find "__get_cpu_features".  ldd -r wisdom_generator.original.133 complains that it is an undefined symbol.  It doesn't seem to be in the libc.so.6 or the libm.so.6 libraries.

Is it in another system library that I, perhaps, have not installed?  Or is it a version conflict between my system and the wisdom* compiled version?

My system is:  Debian 12.4.0, kernel 6.1.52, libc6 version 2.36-9+deb12u3.

***I have an off-line "old" Linux system and this symbol IS DEFINED in libc.so.6 but the libc6 package version is 2.19 from a Debian 8.8.0 release from long ago!**   Looks like __get_cpu_features got renamed or moved to some other library?

(The CPU is Ryzen 7 5700X, chugging away on BRP4G tasks at ~24,000 seconds cpu time.)

Gene...

 

GWGeorge007
GWGeorge007
Joined: 8 Jan 18
Posts: 3060
Credit: 4961364353
RAC: 1396091

Eugene Stemple

Eugene Stemple wrote:

>>PERL:

The wisdom* executable can't find "__get_cpu_features".  ldd -r wisdom_generator.original.133 complains that it is an undefined symbol.  It doesn't seem to be in the libc.so.6 or the libm.so.6 libraries.

Is it in another system library that I, perhaps, have not installed?  Or is it a version conflict between my system and the wisdom* compiled version?

My system is:  Debian 12.4.0, kernel 6.1.52, libc6 version 2.36-9+deb12u3.

***I have an off-line "old" Linux system and this symbol IS DEFINED in libc.so.6 but the libc6 package version is 2.19 from a Debian 8.8.0 release from long ago!**   Looks like __get_cpu_features got renamed or moved to some other library?

(The CPU is Ryzen 7 5700X, chugging away on BRP4G tasks at ~24,000 seconds cpu time.)

Gene...

Have you tried:  < sudo apt get ... >  ( without the < > ) for your cpu features?   SUDO is a root request.  I don't really know much about  Debian, I run Ubuntu 22.04.3 LTS.  But it is based off of Debian.

George

Proud member of the Old Farts Association

Falconet
Falconet
Joined: 9 Mar 09
Posts: 49
Credit: 15421049
RAC: 644

I also started running this

I also started running this app on my AMD 2500U laptop. Even running at 1.6 GHz, instead of the stock 2.0 GHz, on all 8 threads, the current tasks seem to be on track to finish after around 11-12 hours of CPU time, down from 20-22 hours with the standard app. Very large increase in computation speed.

 

The first task using PERL's optimized app vs the shortest task still on record using the stock app on my AMD 2500u laptop.

p2030.20181216.G191.70-01.11.S.b3s0g0.00000_3016_0 780905442 19 Jan 2024 19:40:07 UTC 21 Jan 2024 2:11:21 UTC Completed and validated 44,658 37,154 500 Binary Radio Pulsar Search (Arecibo,GBT,long) Anonymous platform

p2030.20181213.G204.36-00.51.S.b1s0g0.00000_80_0 779900890 15 Jan 2024 21:25:24 UTC 19 Jan 2024 14:20:33 UTC Completed and validated 76,400 66,014 500

Binary Radio Pulsar Search (Arecibo,GBT,long) v1.33 () windows_x86_64

 

 


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.