The O2-All Sky Gravitational Wave Search on GPUs - discussion thread.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117789018587
RAC: 34697147

cecht wrote:... results I

cecht wrote:
... results I reported are for O2AS20-500 tasks (Continuous Gravitational Wave search O2 All-Sky v1.06 () x86_64-pc-linux-gnu; GW-opencl-ati)

So were Richie's :-).

cecht wrote:
... the Server Status page shows that workunits for that application are still being sent out

The O2AS search has both a CPU app and the current test version of the GPU app.  So, on the server status page, I presume the activity now applies to the CPU app.  Bernd's cryptic comment about validation hints at immediate action to stop the further flow of GPU tasks until he works out what is going on. People still may have GPU tasks on board but I'd be quite surprised if you could get more of them after that comment.  I suspect (when the problem is identified) that there could be a further app version needed so it doesn't make much sense to keep crunching any remaining tasks for the current app - at least until Bernd makes a further comment.

In the past during 'live' tests like this, people were granted credit (if possible) for the work done, even if an app failure caused the results to be 'junk'.  Hence I'm not at all surprised at Archae86's comment that he had credit for an invalid result.  At the moment, validation will have been suspended whilst the results are being looked at in an effort to characterise the true nature of the problem.  Unless the problem is sorted quickly, results may remain in limbo for quite a while.  Test results are never 'junk' since they always help to improve the app or other back end processes in the whole validation chain.

cecht wrote:
... the three work generators for the O1OD1 series of programs are disabled.

We are doing the Observation Run 2 All-Sky search.  The Observation Run 1 Open Data 1 search is a different beast so it's not surprising those work generators are listed as disabled.

Cheers,
Gary.

cecht
cecht
Joined: 7 Mar 18
Posts: 1537
Credit: 2915468630
RAC: 2115698

Gary Roberts wrote:cecht

Gary Roberts wrote:
cecht wrote:
... results I reported are for O2AS20-500 tasks (Continuous Gravitational Wave search O2 All-Sky v1.06 () x86_64-pc-linux-gnu; GW-opencl-ati)

So were Richie's :-).

cecht wrote:
... the Server Status page shows that workunits for that application are still being sent out

The O2AS search has both a CPU app and the current test version of the GPU app.  So, on the server status page, I presume the activity now applies to the CPU app.  Bernd's cryptic comment about validation hints at immediate action to stop the further flow of GPU tasks until he works out what is going on. People still may have GPU tasks on board but I'd be quite surprised if you could get more of them after that comment.  I suspect (when the problem is identified) that there could be a further app version needed so it doesn't make much sense to keep crunching any remaining tasks for the current app - at least until Bernd makes a further comment.

In the past during 'live' tests like this, people were granted credit (if possible) for the work done, even if an app failure caused the results to be 'junk'.  Hence I'm not at all surprised at Archae86's comment that he had credit for an invalid result.  At the moment, validation will have been suspended whilst the results are being looked at in an effort to characterise the true nature of the problem.  Unless the problem is sorted quickly, results may remain in limbo for quite a while.  Test results are never 'junk' since they always help to improve the app or other back end processes in the whole validation chain.

cecht wrote:
... the three work generators for the O1OD1 series of programs are disabled.

We are doing the Observation Run 2 All-Sky search.  The Observation Run 1 Open Data 1 search is a different beast so it's not surprising those work generators are listed as disabled.

Thanks for straightening me out! Or as Gilda Radner's character, Emily Litella, used to say on Rowan&Martin's Laugh-In, "Nevermind".

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117789018587
RAC: 34697147

cecht wrote:Thanks for

cecht wrote:
Thanks for straightening me out!

Sorry for the unintentional collateral damage!  I didn't realise you were bent! :-).

Cheers,
Gary.

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

I couldn't resist trying out

I couldn't resist trying out if GPU tasks are available at the moment. Yes, they are... but same as earlier, v1.06.

Stef
Stef
Joined: 8 Mar 05
Posts: 206
Credit: 110568193
RAC: 0

I do also keep getting them,

I do also keep getting them, but not a single result was rewarded with credit so far.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7230211504
RAC: 1158276

cecht wrote:This Linux host

cecht wrote:

This Linux host has a modest 2-core(4-thread) Pentium G5600 CPU @ 3.90GHz.

app_config GPU\CPU   task time, s   CPU time, s
     1 \ 0.9            3750           2370
     0.5 \ 0.8          2435           1463
     0.333 \ 0.5        2153           1081

Cecht.  We share the same RX 570 GPU, but have very different CPUs, and I run Windows 10 vs. your Linux.

Somewhere in those or other differences there hides a wildly different CPU time relationship.  At all levels from 1X up through 4X, my system reports slightly more CPU time than elapsed time, implying at least a little bit of simultaneous activity on the 5 or 6 reported threads.  Mine thus reports much more CPU per task as one raises multiplicity.  You, on the other hand, report less CPU time per task as the multiplicity level rises.

I speculate two things:

1. The difference in our behaviors seems most likely a difference between the Linux application and the Windows application, or possibly a difference on OS services as requested by the applications.

2. Much, or perhaps most of the CPU work on this application is not in fact directly required computation on the target problem, but some kind of data-shuffling overhead.  Either that, or for some reason when running at higher multiplicities the Windows version re-runs much of the work.

As others have reported, delivery of 1.06 GPU tasks did not in fact stop when Bernd said "I'm disabling the GPU versions for now" eighteen hours ago.  I downloaded just a little more solely in order to allow a 4X trial matched to my 1X, 2X, 3X trials.  The result was (seemingly) successful completions at somewhat improved productivity.

I'll summarize the apparent productivities using the metric of implied task completions per day:

1X 19.9
2X 32.5
3X 39.6
4X 42.9

cecht's system is far more productive than mine at 1X, probably because his 3.9 GHz Coffee Lake processor shrugs off the marketing slur of "Pentium" and delivers single-core performance mostly attributable to clock rate and processor generation (his chip is listed as a 2-core Coffee Lake with a Q2 2018 launch date) coupled with probably considerably more efficient computation by the Linux version than the Windows version of this application.  My i5-9400F is a bit more recent with a Q1 2019 launch date but just a 2.9 GHz clock rate and is also Coffee Lake, so likely very, very similar computational performance at a given clock rate.  Possibly the fact mine has six physical cores compared to his two helps the higher multiplicity matter.

Of course, none of this matters unless an application is delivered that actually works.  I've aborted my remaining GW tasks, and disabled acceptance of Beta Test work until I see some favorable indication.

(edited to add OS difference to CPU clock rate comment)

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117789018587
RAC: 34697147

Gary Roberts wrote:....

Gary Roberts wrote:
.... Bernd's cryptic comment about validation hints at immediate action to stop the further flow of GPU tasks until he works out what is going on. People still may have GPU tasks on board but I'd be quite surprised if you could get more of them after that comment.

All I can do is apologise for the incorrect assessment and reiterate that I'm genuinely 'quite surprised' :-).

However, it seems to be just a waste of your resources to continue downloading and running them.  Bernd should already have a big enough sample size to inspect whilst he tries to isolate the problem.  I guess we'll be doing them all again at some point once the problem is rectified.

People often comment about lack of evidence of their completed tasks being subjected to the validation process.  It seems to be that test tasks like these are deliberately held back from validation until some sort of 'inspection' is done as to the efficacy of the results.  If all looks good, they get passed to the validator.  If there are problems, some sort of manual intervention quarantines them and a credit is manually applied to 'compensate' the volunteer for their contribution.

Cheers,
Gary.

Rolf
Rolf
Joined: 7 Aug 17
Posts: 27
Credit: 135377187
RAC: 0

About the validation, the

About the validation, the pattern I have seen is that two CPUs mostly agree on the result (even if they happen to be AMD and Intel), and two GPUs also agree. The dispute starts when a CPU and GPU compare their results, it always end up with a referendum. Then the minority loses, so two GPUs will downvote a CPU and declare its result invalid, and vice versa.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250688475
RAC: 35189

There is now version 1.07

There is now version 1.07 which should validate much better with the CPU versions.

BM

Arif Mert Kapicioglu
Arif Mert Kapicioglu
Joined: 16 Jul 09
Posts: 7
Credit: 823300983
RAC: 0

Bernd Machenschalk

Bernd Machenschalk wrote:
There is now version 1.07 which should validate much better with the CPU versions.

 Currently running one tough the GPU load is fluctuating between %21-27. Win 10 X64, Vega 64, GPU temp 47 Celcius.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.