Weave Analysis Software for Follow-Up Searches

rbpeake
rbpeake
Joined: 18 Jan 05
Posts: 266
Credit: 1134027797
RAC: 765866
Topic 218512

An interesting article on the analysis methods used to detect continuous gravitational waves.

https://arxiv.org/pdf/1901.08998.pdf

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6589
Credit: 318367243
RAC: 392498

Interesting. Thanks for

Interesting. Thanks for pointing this out. Of note for E@H is :

Quote:

B. Memory usage

When a search is run on the Einstein@Home computing project, the parameter space to search is split into cells, and each volunteer computer searches a cell corresponding to a specific work-unit (WU) of the global task. The cells are chosen such that a single WU will run for approximately 8 hours on a volunteer computer. The loudest candidates recovered from each cell are then returned to the Einstein@Home servers. While many modern computers have more than 10 GB of memory, Einstein@Home includes older computers with less memory. The searches run single-threaded, so typically one task is executed per available core, meaning the memory we can use is limited to what is available per CPU core. Therefore, to be able to run on Einstein@Home volunteer computers, we limit the memory a WU can use to 1 GB. Normally, the parameter space of an all-sky search is split into 0.05 Hz bands, and then the number of sky grid points is scaled to result in a search which runs for 8 hours. The GCT searches consume a manageable amount of memory with such WUs. The Weave search given in Table I, when split up to run for 8 hours over 0.05 Hz, reaches a memory consumption of 2-7 GB, depending on how you split the parameter space.

Many of us could easily accommodate that but very many couldn't. Memory is so cheap these days. Maybe users could opt-in their host(s) to such a search if they were confident of their memory status.

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Kavanagh
Kavanagh
Joined: 29 Oct 06
Posts: 1860
Credit: 103041821
RAC: 12222

Does not Boinc ascertain how

Does not Boinc ascertain how much memory is available?

Richard

DF1DX
DF1DX
Joined: 14 Aug 10
Posts: 105
Credit: 3875066854
RAC: 4986110

Or use command line switches

Or use command line switches for each parameter of the CPU or GPU applications:

    - maximum memory
    - Number of CPU Threads
    - Workgroup/kernel size

and so on.

This works well for the optimized Seti applications. Optimum adaptations to your own hardware are therefore possible:

In my case for AMD/OpenCL WUs file:  mb_cmdline-8.22_windows_intel__opencl_ati5_SoG.txt

-high_perf -hp high_perf -period_iterations_num 1 -sbs 2048 -spike_fft_thresh 4096 -tune 1 60 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 60 -oclfft_tune_cw 60 -tt 1500

 

 

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 463
Credit: 257957147
RAC: 0

Mike Hewson wrote:Many of us

Mike Hewson wrote:
Many of us could easily accommodate that but very many couldn't. Memory is so cheap these days. Maybe users could opt-in their host(s) to such a search if they were confident of their memory status.

Yes!  I don't care how you do it, but use more rather than less.

And then, double that.

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1364
Credit: 3562358667
RAC: 0

The high end of that memory

The high end of that memory usage would clobber all of my hosts.  My 3 desktops are all quad core i7's with 32, 32, and 18GB of ram and running 5x CPU and 3x GPU tasks when running an all E@H load. 

The first two both have 4x8gb of DDR3, so I couldn't put more ram in without having to buy expensive and slower 16gb dimms to swap out the 8s.  One is a pure crunch box and probably could take ~5.5gb/task without choking.  The other one is my daily driver though and a much larger share of its ram is in use day to day (my browsers are currently pigging down on 7gb, which isn't out of normal for my usage).  I'd probably top out at being able to support ~4gb/task here today (and presumably a fair amount less over the next 2-3 years before I replace it).

My third box is a decadish old relic, with 3x4 and 3x2gb dimms in it.  Apparently it can support 8gb dimms but due to its age buying more ram is out of the question.   In any event I'm considering standing it's CPU down, putting both 1070's in it and putting my AMD 560 in box 2.  At that point I'd be running 7 CPU tasks and down to ~4gb/cpu task again.

When I built the newer two boxes I really thought 32gb of ram would be enough to keep them running for 8-10 years with only GPU swaps needed.

With mainstream CPU core counts rising steadily the ram/core situation isn't likely to get any better, and potentially will get a lot worse.  Current mainstream CPUs are up to 8/16 cores with the rumor mill suggesting that Intel's next generation will inch up to 10/20 cores, while AMD leapfrogs to 16/32.  With 32 gb ddr4 dimms stupidly expensive and mainstream sockets still limited to 2 channels/4 dimms, those systems are effectively topped out at 64gb of ram.

 

 

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 463
Credit: 257957147
RAC: 0

DanNeely wrote:With

DanNeely wrote:
With mainstream CPU core counts rising steadily the ram/core situation isn't likely to get any better, and potentially will get a lot worse.  Current mainstream CPUs are up to 8/16 cores with the rumor mill suggesting that Intel's next generation will inch up to 10/20 cores, while AMD leapfrogs to 16/32.  With 32 gb ddr4 dimms stupidly expensive and mainstream sockets still limited to 2 channels/4 dimms, those systems are effectively topped out at 64gb of ram.

You don't have to use all your cores on Einstein.  If you ran 4 to 6 cores on Einstein, and the rest on something else, you could probably get away with 32 GB of memory, which is cheap enough with 2 x 16 GB modules.  My usual problem is finding enough good use for that memory.  But if Einstein won't use it, I can probably find something else that will.

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1364
Credit: 3562358667
RAC: 0

Mike Hewson

Mike Hewson wrote:

Interesting. Thanks for pointing this out. Of note for E@H is :

Quote:

B. Memory usage

When a search is run on the Einstein@Home computing project, the parameter space to search is split into cells, and each volunteer computer searches a cell corresponding to a specific work-unit (WU) of the global task. The cells are chosen such that a single WU will run for approximately 8 hours on a volunteer computer. The loudest candidates recovered from each cell are then returned to the Einstein@Home servers. While many modern computers have more than 10 GB of memory, Einstein@Home includes older computers with less memory. The searches run single-threaded, so typically one task is executed per available core, meaning the memory we can use is limited to what is available per CPU core. Therefore, to be able to run on Einstein@Home volunteer computers, we limit the memory a WU can use to 1 GB. Normally, the parameter space of an all-sky search is split into 0.05 Hz bands, and then the number of sky grid points is scaled to result in a search which runs for 8 hours. The GCT searches consume a manageable amount of memory with such WUs. The Weave search given in Table I, when split up to run for 8 hours over 0.05 Hz, reaches a memory consumption of 2-7 GB, depending on how you split the parameter space.

Many of us could easily accommodate that but very many couldn't. Memory is so cheap these days. Maybe users could opt-in their host(s) to such a search if they were confident of their memory status.

Cheers, Mike.

 

The amount of memory used is mostly dependent on how the data is chunked up in creating the WUs; which I think would be a deal breaker since if I'm understanding it correctly the one parameter that could be adjusted without changing how the WUs are generated would require 2x as much time to halve ram usage.

OTOH the Weave search appears like it would be doable for a multi-threaded app and being able to have all your cores pounding away at the same WU would allow for much higher per WU ram usage.  The gotcha there is that it'd greatly expand the runtime disparity between computers because high end desktops have both significantly more and significantly faster cores than mainstream laptops.

 

Zalster
Zalster
Joined: 26 Nov 13
Posts: 3117
Credit: 4050672230
RAC: 0

Might want to look at GPUGrid

Might want to look at GPUGrid to see what happens there.  If I could point out an example of multi thread application.  Currently their QC task require 4 threads per work unit, large amounts of RAM and Huge amount of scratchpad on the HD.  This lead to the discovery that the CPU run HOT..I mean REALLY hot. With only 2 running, my 360mm CPU rad hovers at 60-65C.  3 work units pushing the temps to 85-90C.  Their work units ended up exceeding the disk space available.  A work around (for me) was to install 4 TB HDD. Since a high read write speed was not needed, a HDD was cheaper than an SSD for 4 TB.  The amount of run time vs CPU time was a 1:4  ie, while run time said 3,400.69 sec, CPU time was 13, 034.05 (approx 4 x run)  

Since the temp issue was a concern, people decided to run the work units on fewer than the required 4 threads. This had the desired effect of lowering CPU temps but proportionally raised the CPU time to complete.

The last part about RAM, I ended up increasing my RAM from 32GB to 64GB before I found out about the temps.  Since then, I've found that 32GB is sufficient since I rarely go above 3 work units.

So I guess what I'm saying is things don't always go as we think they will.

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6589
Credit: 318367243
RAC: 392498

Also of note is the science

Also of note is the science payback : as 57.9/50.8 ~ 1.14, with (1.14)~ 1.48 then that's how you get an extra ~ 50% of sky volume out there being covered c/w current methods ( all else being equal etc ). Isn't that neat ? Plus the follow up of candidates is two orders faster. This is a great piece of work. I wish I understood it in detail ..... :-)))

Cheers, Mike.

( edit ) Could be a good time to go over what 'coherent' vs 'incoherent' means in our setting :

An assumed waveform is a template, that representing the strength of a gravitational wave ( spacetime strain ) varying with time. But a template is only one cycle of some wave and we are looking for, or assuming, that a waveform repeats in time. Hence the term continuous waves. Imagine being at a beach with some continuous waves coming onto shore. Suppose you were a wizard & could dictate their shape, snap your fingers and tada ! All the waves coming in all look the same, break the same etc.

Now keep a certain template ( wave shape ) but speed up or slow down how frequently the waves arrive. Same template, different frequencies. From one wave to the next you still see the same waveform go by.

Now just to be slightly more complicated : allow for some change in this arrival frequency while you are watching. Maybe back at the beach you'd find the interval b/w waves ( their period in time ) reaching shore is smaller than you had noted beforehand, hence the wave frequency has increased. Conversely if the interval were increasing as you watch then we'd say their frequency was decreasing. How quickly the frequency changes over time is the frequency derivative or 'f-dot'. If f-dot is positive then the frequency is increasing, if negative then the frequency is decreasing.

Next pretend that we are on a circular island, neither too 'big' nor too 'small', such that we might go to different parts of the shoreline. We could see different wave behaviours ( waveform, frequency, f-dot ) depending upon where we went. We might set up a system whereby we designate which way we are facing ie. what compass direction is the distant horizon as we look offshore at some spot. Northwards could be zero in this system, Eastwards could then be 90, Southwards is 180 etc in the usual fashion. So to fully describe an observation we have an angle as well as the aforementioned wave behaviour stuff.

But we are not on a circular island. We are on an ( oh so very close to ) spherical ball in space. We just need another angle to describe which way we are facing as the wave data comes in. That extra angle would be analogous to looking above or below the horizon if we were on that island.

In summary then we have : a waveform, it's frequency, the frequency derivative, with lastly those two angles to designate our observation direction. If I'd spent a day watching the beach at some spot, with a single wave type coming in, I could broadly summarise my wave-watching experience to you in a letter with just these pieces of information. When you read the letter you'd have a very good idea what sort of a day I'd had, wave-wise.

So much for definitions. Now what do we actually do at E@H ??? :-) LOL

In effect, please think of it in the following way : you have a record of what actually happened at the sea shore for some length of time. It is not some well defined wizard-induced train of identical waves. No way. In fact it's likely to :

(a) Have not much going on. Gravity is 'quiet'. It takes a zillion dollars to just make machines that can accurately listen to these very soft signals.

(b) At a first glance look chaotic. No discernible pattern at all. But there is one saving feature which turns out to be the lifeline for the whole wave-watching enterprise.

In the circumstances under which we listen the whole business is linear. Basically that means when adding two waves together, the height of the resultant wave is the sum of the heights of the two waves separately. Extend that to very many waves superposed upon one another. Maybe, just maybe, the wave record only looks chaotic because there are just so very many waves types coming in simultaneously: each with its own individual waveform, frequency, f-dot plus angles.

Thus in addition to a zillion dollar budget, we also want to make sense of what was recorded. We need a way to pick out individual waves in the presence of very many others. Then we might be able to say for a day of beach watching : out of all that happened that day I found a purely repetitive feature ( template ) of such-and-such frequency, f-dot and there are also two angles which indicate which way I was watching. Send that in a letter.

We do have such a 'pick-out machine'. It's called Einstein At Home, and it was actually invented by a wizard too. Bruce The Wizard to be exact.

Alas the $Z machine has one fussy feature we can't escape. If there is a gap in the record ( say I tripped over the power cord going to the $Z machine and didn't notice for sometime before plugging it back in ) then we can't exactly say what happened during the time when it wasn't recording. You are not allowed to 'make it up' ie. we are doing science. So what we had hoped was going to be a single uninterrupted days worth of data is now two distinct data sets, and crucially we can't say how to match up the wave height at the last instant of the first set with the wave height at the first instant of the second set. Anything could have happened in between.

You can guess the punchlines now. Coherent is when there is no gap in the wave record. Incoherent is when there either is a gap in the record OR the analysis is done on one segment of the record without taking into account any other part(s).

Now Professor Mathematics sez that while a coherent analysis really is the best to pick up those quiet waves, you pay dearly in the difficulty of computation ( however measured ) the longer that coherence time is. Prof Math also says that if one is prepared to take a chance on maybe missing some signals ( making the search less sensitive ), then you can deliberately break up a coherent set into many also coherent but shorter ones. Effectively pretend that you do not know how to join one segment to the next. Analyse each segment separately. Sprinkle in some reasonable probability type thinking to incoherently combine results/findings of the separate segments. A-bing-a-bada-a-boom & a number pops out ( of some lucky E@H contributor machines ) which gauges how likely we are not being fooled when we reckon we have picked out a real wave [ you may need to read that again ].

For E@H one wants to serve the common denominator first, meaning that the commonest machines available are also the weakest computation wise. That's the rationale that limits WU characteristics, and in turn that dictates how you cut up the parameter space ( frequency, f-dot, two angles ) for a given waveform guess ( template ).

Time for a lie down ..... :-))

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 463
Credit: 257957147
RAC: 0

Zalster wrote:Might want to

Zalster wrote:
Might want to look at GPUGrid to see what happens there.

I am glad you asked.  I have been running 10 (out of 12) cores of my i7-8700 on GPUGrid/Quantum Chemistry for several months (Ubuntu 18.04).  I run one work unit per core for maximum output.  Each work unit averages a little over 1 GB, which helps to use my 32 GB of memory.

That Coffee Lake chip is by far the hardest to cool that I have ever encountered.  The package temp is currently 55 C, and the individual cores are a few degrees less.  It is my Plan B for my Ryzen machines if Einstein is not interested.

EDIT: I should mention that to get the heat out, I had to use a video card (GTX 1070) with an external water cooler, a top-mounted 120 mm fan, and a good HSF for the CPU.  It took more work than I have ever had to do before, but it works now.  Also, it is cooler in the winter in that location, but even in the summer it should be in the 60's C range.  My point is that the QC are no worse than anything else.  I originally built it for Rosetta, which was about the same for heat.  I am sure it could handle Einstein OK.

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.