Gravitational Wave search O1AS20-100 F and I FAQ

Christian Beer
Christian Beer
Joined: 9 Feb 05
Posts: 595
Credit: 197136478
RAC: 105341
Topic 198498

This post tries to answer some questions regarding the O1AS20-100 F and I searches. Feel free to post more questions.

What's the criteria to decide which search a host gets?

One result from the tuning run was that we saw a clustering of average runtime per host. 16% of hosts finished a tuning task in 8 to 10 hours on average. Another 11% took 24 to 28 hours and at the end 6% took more than 38 hours per task. This produced multiple problems with runtime estimation and fetching work for Einstein@home in general.
I analyzed the hosts participating in the tuning run, to find out what may cause this. It became clear that the cache size was a main factor but also if the CPU used Hyper Threading or not (which limits the available cache per thread). The latter is not detected by the BOINC Client. Since we can only make a decision to send work to a host based on what the host tells us we had to find a common denominator for each class of hosts. With some statistics knowledge applied to the data I came up with two classes of hosts. Fast hosts that in the tuning run showed an average runtime less than 14 hours and "not so fast" hosts that took more than that. It turned out that this was also the median of the host population at that time. With this 50:50 separation I started to look at the CPU models in each category and selected the ones that where only listed in one of the categories and then came up with a formula to decide if a CPU model that had hosts in both categories should be assigned to the fast category or not. I also tried to build categories based on cache size alone but this was not suitable since a lot of hosts that should have been fast were in fact very slow (based on the tuning data).

This resulted in the currently used criteria for the cpu model reported by the BOINC Client. The O1AS20-100F search contains all the data from the O1 run and gets assigned to the fast hosts. The O1AS20-100I search contains only a subset of the O1 data and gets assigned to all other hosts.

Does the O1AS20-100F version use AVX ?

Yes, the applications are the same for both searches. In fact they are the same as in the tuning run.

Why are there two separate user preference items for O1AS20-100F and O1AS20-100I?

This is because each search is a separate BOINC application and thus automatically gets a preference item. Those are still useful if you don't want to have Gravitational Wave tasks (I can't imagine why) but they don't change which app is assigned to a specific host. This is happening via the cpu model reported.

Will this criteria change over time?

Certainly, we're going to monitor the average runtimes per host for each search. If there is a specific cpu model that should be moved to the other category we will do that. But we will need enough tasks returned for each search to make an informed decision.

Edit 2016/06/13: We removed the CPU model criteria today, see: O1AS20-100 search now open for all CPU models.

Is there a GPU version in the works to speed up things?

The Binary Radio Pulsar search code is by far the most optimized for GPU, we get a speed-up (with GPUs compared to CPU only) well greater than 10 (depending on the individual GPU and CPU of course). For the GW search, the FFT part of the computation takes only roughly half the computing time for CPUs, so offloading this to the GPU can at most speed up the computation by a factor of 2. We are quite sure that the other parts of the computation (besides FFT) can also be ported to GPUs, but we have no plans to do that in the near future. We may change this decision later depending on science priorities, tho.

Bill592
Bill592
Joined: 25 Feb 05
Posts: 786
Credit: 70825065
RAC: 0

Gravitational Wave search O1AS20-100 F and I FAQ

Thanks Christian !

Bill

Benva
Benva
Joined: 19 Jul 08
Posts: 4
Credit: 12838249
RAC: 0

Thanks!

Thanks!

Filipe
Filipe
Joined: 10 Mar 05
Posts: 186
Credit: 416186370
RAC: 170675

Does a GPU version is in the

Does a GPU version is in the works to speed up things?

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513211304
RAC: 0

RE: It became clear that

Quote:
It became clear that the cache size was a main factor but also if the CPU used Hyper Threading or not (which limits the available cache per thread).

Thanks Christian for posting this. Is HT halving the cache per thread (2 HT/core), or does it put some other limit?

I guess to fix this requires a BOINC enhancement to report the cache size ?

Michael H.W. Weber
Michael H.W. Weber
Joined: 22 Jan 05
Posts: 10
Credit: 399175195
RAC: 0

Since the host specs are

Since the host specs are dispalyed for each client on the Einstein@home webseite, it should be possible to read out the CPU model from that data and derive that CPU's cache size based on a simple specs table, shouldn't it?

Michael.

RNA World - A Distributed Supercomputer to Advance RNA Research

Mumak
Joined: 26 Feb 13
Posts: 335
Credit: 3580355668
RAC: 1308515

RE: Since the host specs

Quote:

Since the host specs are dispalyed for each client on the Einstein@home webseite, it should be possible to read out the CPU model from that data and derive that CPU's cache size based on a simple specs table, shouldn't it?

Michael.

That would be a really HUGE table and needed to be constantly updated. Moreover as already mentioned not all CPUs contain the final CPU model string, some models can also have bugs in the string.

Christian Beer
Christian Beer
Joined: 9 Feb 05
Posts: 595
Credit: 197136478
RAC: 105341

RE: RE: It became clear

Quote:
Quote:
It became clear that the cache size was a main factor but also if the CPU used Hyper Threading or not (which limits the available cache per thread).

Thanks Christian for posting this. Is HT halving the cache per thread (2 HT/core), or does it put some other limit?

I guess to fix this requires a BOINC enhancement to report the cache size ?

I guess it depends on the chip architecture on how the L2/L3 cache is implemented (Intel sometimes calls it SmartCache). I'm currently gathering more data to see how the number of cores affects the runtime. We already get the cache size reported. Or at least we get a number from the BOINC client that it gets from the operating system that it thinks it is the cache size. So far the values seem reasonable but they also vary within a specific cpu model. So I'm not sure we can trust this value.

I also updated the first post with an answer to the GPU question.

archae86
archae86
Joined: 6 Dec 05
Posts: 3162
Credit: 7319838354
RAC: 2310865

Christian Beer wrote:Why are

Christian Beer wrote:

Why are there two separate user preference items for O1AS20-100F and O1AS20-100I?

This is because each search is a separate BOINC application and thus automatically gets a preference item. Those are still useful if you don't want to have Gravitational Wave tasks (I can't imagine why) but they don't change which app is assigned to a specific host. This is happening via the cpu model reported.


I have an i5-2500K host which received F work until I turned off the preferences to receive CPU, and also the specific work preference for F and I.

About five days ago I turned the CPU work option back on, but (partly out of curiosity) enabled I without enabling F.

Ever since the host has been steadily requesting CPU work, and consistently has not gotten any. Usually both the message log and the most recent work request log have contained a message something like "No work is available for Gravitational Wave search O1 all-sky I". This has persisted long after the server problems, though I thought perhaps it might be reluctant to send me work because of the locality system.

So perhaps the behavior is that one's selection of "Gravitational Wave search O1 all-sky F" vs "Gravitational Wave search O1 all-sky I" on the Einstein preferences change will not influence which application type work a given host receives, but can preclude a host getting O1AS20-100 work at all if the user enables the "wrong" type for the host capabilities? If that is the case, this specific "no work is available" message is unhelpful, as apparently the real meaning is "no work of this type is permitted to be sent to this host".

In the broader scope of things, this is a minor matter, and might be more trouble to tidy up than it is worth.

Christian Beer
Christian Beer
Joined: 9 Feb 05
Posts: 595
Credit: 197136478
RAC: 105341

RE: In the broader scope of

Quote:
In the broader scope of things, this is a minor matter, and might be more trouble to tidy up than it is worth.


Exactly. We need to build in more custom code that is probably only useful for this special case to "combine" the preference selection.

Steve Hawker*
Steve Hawker*
Joined: 15 May 12
Posts: 6
Credit: 8775281
RAC: 0

Reading the explanations I'm

Reading the explanations I'm guessing that my Phenom(tm) II X4 965 is too puny to be granted "F" work because I get the "no work is available for that application" message instead of an actually useful message like "your puny CPU hah hah hah" {yes, I get that BOINC controls the messages}

So my question is really about why the project feels the need for such task segregation. I personally don't have any issues with runtime estimation or work allocation so long as I can actually get work.

If I select "F" and deselect "I" in preferences, why can't you send me "F" tasks? By setting the preferences, I agree and accept any and all risks with regard to runtime estimation and task allocation.

I sometimes run certain PrimeGrid tasks that take weeks to complete. I'm OK with a task taking 3 days or more. Why should my puny CPU be punished like this?

:)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.