Immediate Computation error with Gravitational Wave search O1 all-sky tuning v1.00

archae86

Joined: 6 Dec 05

Posts: 3163

Credit: 7338181687

RAC: 2298699

My hosts picked up one v1.02

14 Feb 2016 15:04:44 UTC

Message 137017

(moderation:

)

My hosts picked up one v1.02 task each about ten hours ago.

The first two of these have now completed and uploaded without obvious trouble.

Stoll8 first 1.02 task
Stoll7 first 1.02 task

The others should follow, in this order:
Stoll6 first 1.02 task
Acer2 first 1.02 task

The only oddity I see may relate to the work content estimate for these jobs, relative to Gamma-ray pulsar binary search #1 v1.00 jobs. The four Parkes PMPS GPU jobs on Stoll8 promptly went to "panic mode" high priority processing when the GW job finished because this host had been running CPU jobs of the relative new FGRPB1 flavor and the CPU job completion time for this "tuning" work moved up the task duration correction factor to 2.17. Possibly this might suggest an underestimate of the work content. Similarly, on completion of the first Stoll7 tuning rev 1.02 task the task duration correction factor on that host more than doubled to 2.14, kicking all jobs into panic mode.

AgentB reported in the technical news thread that he had a host with multiple results on Linux v1.02 pending validation. My hosts are Windows 7, so this is two variants of 1.02 that get past the returning results goal line successfully. Now if some validations turn up things will seem much better than before. I assume initial validation attempts will be run under supervision, possibly on Monday.

AgentB Linux host v1.02 pending

[edited to reflect second completion and to adjust my guesses on the meaning of the DCF jumps]

AgentB

Joined: 17 Mar 12

Posts: 915

Credit: 513211304

RAC: 0

RE: My hosts are Windows 7,

14 Feb 2016 16:08:44 UTC

Message 137018 in response to message 137017

(moderation:

)

Quote:

My hosts are Windows 7, so this is two variants of 1.02 that get past the returning results goal line successfully. Now if some validations turn up things will seem much better than before. I assume initial validation attempts will be run under supervision, possibly on Monday.

AgentB Linux host v1.02 pending

Some of those wingmen are completing on various platforms, so looking promising for Monday, the status page confirms your suspicions. I did notice that host has picked up a "v1.02 (SSE2) i686-pc-linux-gnu" task, so that is interesting.

I haven't seen any 1.02 AVX tasks, although i'm not looking too hard.

Zalster

Joined: 26 Nov 13

Posts: 3117

Credit: 4050672230

RAC: 0

Running Windows 7 I have 3

14 Feb 2016 18:04:17 UTC

Message 137019 in response to message 137018

(moderation:

)

Running Windows 7

I have 3 v1.02 that have completed and awaiting validation.

I have 8 more that should finish in about 6 hours.

Of the 3 that completed, 2 are awaiting a wingman, the 3rd has not sent out the work so I have to wonder if it will get validated.

Holmis

Joined: 4 Jan 05

Posts: 1118

Credit: 1055935564

RAC: 0

I'm seeing the same as

14 Feb 2016 20:05:13 UTC

Message 137020 in response to message 137017

(moderation:

)

I'm seeing the same as archae86 in that the v1.02 tasks estimated time is about half of what it should be.
The estimate for 2 tasks with a DCF of 1.28 was 4h59m40s and they took about 11h24m30s. That's almost 2.3 times longer leading to a DCF of 2.9217 and blown out estimates for the whole cache of work. This needs to be tuned before the beta test is over.

On the bright side the tasks completed and uploaded without errors so we're one step further along in the beta testing! =)

archae86

Joined: 6 Dec 05

Posts: 3163

Credit: 7338181687

RAC: 2298699

While not an error, problem

15 Feb 2016 4:35:13 UTC

Message 137021

(moderation:

)

While not an error, problem or bug, possibly it may be interesting to give relative performance on SSE2 and AVX versions of the same application on the same host. While most of the first day of Windows v1.02 runs were sent out in SSE2 versions, two of my three "serious" hosts got one of each.

Comparing reported CPU time, in each case the AVX version was materially faster, requiring between 82 and 84% as much as the SSE2 version. While I don't know whether O1AS20-100T has the nearly perfect similarity of required work from WU to WU as do several of the other Einstein applications, if it does (except perhaps for edge case units), then this is a nice improvement, but small compared to the greater than factor of two underestimate of required computing reflected in the DCF jumps observed.

The two hosts are respectively a dual-core Haswell running hyperthreaded, but with only one CPU job (though it does support four GPU GRP6 tasks), and a four-core non-hyperthreaded Sandy Bridge, carrying a similar load.

The AVX advantage fell into the same range when considering elapsed time.

AgentB

Joined: 17 Mar 12

Posts: 915

Credit: 513211304

RAC: 0

RE: RE: AgentB Linux host

15 Feb 2016 8:34:17 UTC

Message 137022 in response to message 137018

(moderation:

)

Quote:

Quote:

AgentB Linux host v1.02 pending

Some of those wingmen are completing on various platforms, so looking promising for Monday, the status page confirms your suspicions. I did notice that host has picked up a "v1.02 (SSE2) i686-pc-linux-gnu" task, so that is interesting.

I haven't seen any 1.02 AVX tasks, although i'm not looking too hard.

First result in the SSE2 version takes about 40% longer. Not quite clear why the difference.

Some more in the pipeline...

Alex

Joined: 1 Mar 05

Posts: 451

Credit: 520316822

RAC: 327948

OK, not immediate error,

15 Feb 2016 9:43:34 UTC

Message 137023

(moderation:

)

OK, not immediate error, but:
version 1.02 finishes on darwin and on win10, but do not validate against each other.
https://einsteinathome.org/workunit/239442534

Logforme

Joined: 13 Aug 10

Posts: 332

Credit: 1714373961

RAC: 0

RE: OK, not immediate

15 Feb 2016 11:04:02 UTC

Message 137024 in response to message 137023

(moderation:

)

Quote:

OK, not immediate error, but:
version 1.02 finishes on darwin and on win10, but do not validate against each other.

I think it's because the validator is not running. I have a task which is also waiting for validation with 2 completed.

Alex

Joined: 1 Mar 05

Posts: 451

Credit: 520316822

RAC: 327948

Validated now.

15 Feb 2016 12:52:55 UTC

Message 137025

(moderation:

)

Validated now.

archae86

Joined: 6 Dec 05

Posts: 3163

Credit: 7338181687

RAC: 2298699

Four of my six successfully

15 Feb 2016 13:39:34 UTC

Message 137026

(moderation:

)

Four of my six successfully returned tasks validated, including some encouraging mixed quorums:
Windows, AVX vs. SSE2
Windows SSE2 vs. Linux SSE2
Windows SSE2 vs Linux without specified vector extension flavor

The other two are just waiting for quorum partners to return their tasks.

The Einstein server status page as I type shows 96 valid tasks, up from zero 12 hours ago, but the validator disabled, so I'd guess the project did a test validation run, and has not yet set it to automatic. It also shows 800 total workunits, up from 400 last night, so I guess they made a new batch, created about 8:30 UTC on February 15.

Things seem much improved.

Immediate Computation error with Gravitational Wave search O1 all-sky tuning v1.00

Forums › Problems and Bug Reports

Comment viewing options

Forums › Problems and Bug Reports