Gravitational Wave search O1 all-sky tuning (O1AS20-100T)

Mike Hewson

Moderator

Joined: 1 Dec 05

Posts: 6591

Credit: 319504309

RAC: 424447

Both my Linux and win64 hosts

15 Mar 2016 1:02:03 UTC

Message 136967

(moderation:

)

Both my Linux and win64 hosts are doing well, no invalids, after dozens of units. The windows machine has about twice the runtime ( but now more consistently so ) than the Fedora. Specifically there are no units over 25 hours.

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250665929

RAC: 34674

In cleaning up the "tuning"

15 Mar 2016 9:46:59 UTC

Message 136968

(moderation:

)

In cleaning up the "tuning" run (that this thread is originally about) we just granted credit for the results of early app versions (before 1.04) that "lost" validation to the results of a 1.04 version.

Being able to do this was the only reason we kept these workunits in the DB, we'll now purge that run from the system.

Maximilian Mieth

Joined: 4 Oct 12

Posts: 130

Credit: 10298249

RAC: 3019

RE: In cleaning up the

15 Mar 2016 10:09:23 UTC

Message 136969 in response to message 136968

(moderation:

)

Quote:

In cleaning up the "tuning" run (that this thread is originally about) we just granted credit for the results of early app versions (before 1.04) that "lost" validation to the results of a 1.04 version.

Being able to do this was the only reason we kept these workunits in the DB, we'll now purge that run from the system.

BM

Thank you! Worked for three of the four tasks I crunched, but
one did not get credits. Any reason for that?

Sebastian M. Bo...

Joined: 20 Feb 05

Posts: 63

Credit: 1529602972

RAC: 105

So if the app uses the FFTW

16 Mar 2016 12:27:17 UTC

Message 136970

(moderation:

)

So if the app uses the FFTW maybe it is possible to easy use of cuFFTW (cuda FFTW compatibility mode) to do some offload.

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 733587153

RAC: 1273144

RE: So if the app uses the

16 Mar 2016 16:50:21 UTC

Message 136971 in response to message 136970

(moderation:

)

Quote:

So if the app uses the FFTW maybe it is possible to easy use of cuFFTW (cuda FFTW compatibility mode) to do some offload.

Actually all our searches on Einstein@Home (Binary Pulsar Search, Fermi Gamma-Ray Pulsar Search and the GW search) use the FFT, and all of them would benefit from offloading the FFT to the GPU.

However, the Binary Radio Pulsar search code is by far the most optimized for GPU, we get a speed-up (with GPUs compared to CPU only) well greater than 10 (depending on the individual GPU and CPU of course). For the GW search, the FFT part of the computation takes only roughly half the computing time for CPUs, so offloading this to the GPU can at most speed up the computation by a factor of 2.

So currently, the best use for the GPUs on E@H is to do the Binary Radio Pular search, and that search only.

We may change this decision later depending on science priorities, tho.

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 733587153

RAC: 1273144

For those who experience

16 Mar 2016 17:03:03 UTC

Message 136972

(moderation:

)

For those who experience surprisingly poor performance of the GW search on their hardware (say more than 14 hrs with a recent CPU), and who like to experiment a bit, there is a "hidden" way to force the app to try a bit harder to fine-tune the FFT computation to their particular hardware.

You can set two environment variables so that the E@H science app sees them (e.g. you could define them systemwide for Windows or in the startup options for BOINC on Linux):

env. variable                 value
=====================================
LAL_FSTAT_FFT_PLAN_MODE         PATIENT
LAL_FSTAT_FFT_PLAN_TIMEOUT       120

This will tell FFTW to spend (roughly) up to two minutes (120s) just on optimizing the FFT computation for your particular hardware. You can play around with even longer durations.

We do not expect this to have a dramatic effect on most hosts, and it can even lead to slightly worse runtime in some cases, so we did not enable this by default. It might help on some hosts tho where the default settings lead to very suboptimal runtime.

Mumak

Joined: 26 Feb 13

Posts: 325

Credit: 3528097561

RAC: 1459599

Does it mean you're currently

16 Mar 2016 17:06:15 UTC

Message 136973 in response to message 136971

(moderation:

)

Does it mean you're currently not thinking about releasing a GW GPU application, because the current priority for GPUs is BRP ?
Do you know how much of the GW code could be ported to GPUs, or a very approximate possible speed-up factor on GPUs ?

-----

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 733587153

RAC: 1273144

RE: Does it mean you're

16 Mar 2016 17:11:31 UTC

Message 136974 in response to message 136973

(moderation:

)

Quote:

Does it mean you're currently not thinking about releasing a GW GPU application, because the current priority for GPUs is BRP ?
Do you know how much of the GW code could be ported to GPUs, or a very approximate possible speed-up factor on GPUs ?

Correct, there are no GPU plans for the O1 GW search. As I wrote, the speedup would be limited by ca a factor of 2 for a rather straight forward offloading of the FFT ( compared to a factor of >>10 for the BRP app, which is by now almost completly running on the GPU).

I'm quite sure the other parts of the computation (besides FFT) can also be ported to GPUs, but we have no plans to do that in the near future.

Mumak

Joined: 26 Feb 13

Posts: 325

Credit: 3528097561

RAC: 1459599

Thanks for the

16 Mar 2016 17:20:10 UTC

Message 136975 in response to message 136974

(moderation:

)

Thanks for the clarification.
Hoping you'll change your decision later ;-) BRP6 should be finished this year, depending on how much BRP4G work is there (which is currently quite a lot), so I suppose a new GPU app will be required...

-----

rbpeake

Joined: 18 Jan 05

Posts: 266

Credit: 1135407797

RAC: 753610

RE: ....This will tell FFTW

16 Mar 2016 18:41:18 UTC

Message 136976 in response to message 136972

(moderation:

)

Quote:

....This will tell FFTW to spend (roughly) up to two minutes (120s) just on optimizing the FFT computation for your particular hardware. You can play around with even longer durations.

HB

Does FFTW perform this check for each work unit as it starts, or just once for the hardware system, which it then remembers for that hardware system into the future?

Thanks!

Gravitational Wave search O1 all-sky tuning (O1AS20-100T)

Forums › Technical News

Comment viewing options

Forums › Technical News