FGRP4 App version 1.14 vs 1.15, was: There's no CPU work available

Oliver Behnke

Moderator

Administrator

Joined: 4 Sep 07

Posts: 987

Credit: 25171438

RAC: 0

RE: A pity resources are

2 Oct 2015 12:41:55 UTC

Message 134254 in response to message 134253

(moderation:

)

Quote:

A pity resources are (need to be?) wasted like that though.

Well, the wasted resources would have been much more had we not deployed the patch which would have meant a full reanalysis of a lot more tasks. FYI, the results returned so far aren't wrong but not as good as they could have been. Fixing the bug now while trying to limit the negative effects of comparatively few technically "invalid" results seemed the best compromise considering all facts and alternatives - for all stakeholders, that includes our volunteers.

Thanks,
Oliver

Einstein@Home Project

Jasper

Joined: 14 Feb 12

Posts: 63

Credit: 4032891

RAC: 0

ThatÂ´s what I was thinking:

3 Oct 2015 5:46:34 UTC

Message 134255 in response to message 134254

(moderation:

)

ThatÂ´s what I was thinking: either it wasnÂ´t all that important and it would have been better to let the run finish, or it really needed to be done (I assumed this to hold true). However, in the latter case, I would suppose there is concern with the validity of 80%+ of this run already done. Did you have the opportunity to check for how much impact it has? I mean, seeing impact on only a couple of weeks done does not seem a good measure to me, but is rather worrying. I donÂ´t know about all results, but if every older WU is going to produce invalids when crunched with the newer 1.15 application, how reliable then are those 80%+ already done? Are there any older WUs left in pending state with the older application version, that managed to get validated with 1.15? What about much older, already validated WUs? WouldnÂ´t these, if checked, turn out to be invalids too? Your initial reaction Thursday looked rather one of surprise to me: you sounded like 100% sure that this could not happen: http://einsteinathome.org/node/198054&nowrap=true#144600

Quote:

Quote:

Another thing is that FGRP4 executables changed to version 1.15 after the short outage. Again, maybe related, maybe not.

Nope, that was just a scientific bug fix which also required to let the task pool to run dry.

Oliver

I had two invalids meanwhile (I expect more to come):
- one all 1.14: I donÂ´t remember ever seeing such, last WU completed was Thursday, October 1st. by someone else, on 1.14 as well;
- another one waiting forever for a wingmanÂ´s result and crunched again, twice, with 1.15.
I donÂ´t like to see that, I am just not used to it! I know, IÂ´m only running everything on a single, older iMac now, but still, each one means trashing half a day of work which put in perspective, for me really means quite a waste. Others will likely care a lot less about that. However, I was really happily crunching away on Einstein, but that feeling has got a little dent at this point.

Oliver Behnke

Moderator

Administrator

Joined: 4 Sep 07

Posts: 987

Credit: 25171438

RAC: 0

Let's see... RE: I would

5 Oct 2015 13:30:26 UTC

Message 134256 in response to message 134255

(moderation:

)

Let's see...

Quote:

I would suppose there is concern with the validity of 80%+ of this run already done.

The WUs crunched so far aren't invalid but incomplete. The current plan is to analyze the remaining data locally. This is also why we needed to make the cut on a dataset boundary.

Quote:

but if every older WU is going to produce invalids when crunched with the newer 1.15 application

The WUs themselves are fine and can be analyzed with both app versions. The "just" won't validate across the two app versions.

Quote:

What about much older, already validated WUs? WouldnÂ´t these, if checked, turn out to be invalids too?

See above. Validated WUs are still valid.

Quote:

However, I was really happily crunching away on Einstein, but that feeling has got a little dent at this point.

As I tried to explain, we know that some of your tasks won't validate and thus there is a certain waste - and we're sorry about that. Please keep in mind that this affects only a very limited number of volunteers: only those WUs that started with 1.14 and where one of its tasks errored out (or was found invalid) after 1.15 got deployed are affected. The current weighted total error rate for FGRP is ~ 3.5% (which includes the validation error discussed here) so that should give you an idea about the impact.

Regarding the 80%: we still have more data to crunch for FGRP but that hasn't yet been enqueued into the pipeline, so the 80% figure is, despite technically correct, not the whole picture.

If you don't want to risk any potentially (!) wasted cycles you may of course opt-out of FGRP for ten more days. By then all 1.14 tasks are finished or timed-out such that only 1.15 tasks will be in flight.

Anyhow, we should have announced the expected validation issue alongside the app deployment such that everyone caring about that could have reacted accordingly. We failed to do that and thus are ready to take the heat. Again, sorry, even at an otherwise rather rock-solid project mishaps/miscommunication can happen.

HTH,
Oliver

Einstein@Home Project

Der Mann mit de...

Joined: 12 Dec 05

Posts: 151

Credit: 302594178

RAC: 0

...hm, so far 15 WU's for

8 Oct 2015 8:47:14 UTC

Message 134257

(moderation:

)

...hm,

so far 15 WU's for the garbage can counting up! That sucks me a lot whatever that will be only concerning a couple of 3,5%. :-(

Greetings from the North

Oliver Behnke

Moderator

Administrator

Joined: 4 Sep 07

Posts: 987

Credit: 25171438

RAC: 0

RE: That sucks me a

8 Oct 2015 9:04:21 UTC

Message 134258 in response to message 134257

(moderation:

)

Quote:

That sucks me a lot

Please feel free to pause crunching FGRP until Oct. 15th. By then all 1.14 wingmen will have finished one way or another.

Best,
Oliver

Einstein@Home Project

Der Mann mit de...

Joined: 12 Dec 05

Posts: 151

Credit: 302594178

RAC: 0

...joking! That will not

8 Oct 2015 11:26:01 UTC

Message 134259

(moderation:

)

...joking!
That will not resolve the Problem for the 58 WU's finished and waiting for Validation! In Addition to the 15 WU's so far make 73 WU's for dev null!

Greetings from the North

Oliver Behnke

Moderator

Administrator

Joined: 4 Sep 07

Posts: 987

Credit: 25171438

RAC: 0

RE: That will not resolve

8 Oct 2015 15:10:21 UTC

Message 134260 in response to message 134259

(moderation:

)

Quote:

That will not resolve the Problem for the 58 WU's finished and waiting for Validation!

All of them have 1.14 wingmen?

Anyhow, we're looking into re-validating such 1.15 tasks which would get you the deserved credit once two 1.15 validated later-on and produced the canonical result.

Stay tuned,
Oliver

Einstein@Home Project

Der Mann mit de...

Joined: 12 Dec 05

Posts: 151

Credit: 302594178

RAC: 0

...it is mixed; I have

8 Oct 2015 15:24:04 UTC

Message 134261

(moderation:

)

...it is mixed; I have pending 1.14 WU's with 1.15 wingmen and pending 1.15 with 1.14 wingmen.

BTW The Credit is not the Problem, I don't like to waste time and power. :-)

Greetings from the North

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119232066874

RAC: 25224786

RE: RE: That will not

8 Oct 2015 22:04:02 UTC

Message 134262 in response to message 134260

(moderation:

)

Quote:

Quote:

That will not resolve the Problem for the 58 WU's finished and waiting for Validation!

All of them have 1.14 wingmen?

You guys shouldn't be wasting time on this.

The problem was for those 1.14 quorums where one task failed and there was a 1.15 resend. Two possibilities might arise. If the failed 1.14 'rises from the ashes' then the 1.15 resend will miss out. Otherwise, the 1.14/1.15 combo will fail validation and a further 1.15 resend will seal the fate of the poor old long suffering original 1.14 left standing.

There was never a problem for quorums with 1.15 original tasks (_0 or _1 extensions on the task name) since they cannot subsequently be paired with a 1.14 resend.

If DMMDL takes a look at the 1.15 tasks in the 58 'pendings' he mentions, he can assume that all of those with _0 or _1 (surely the majority) will validate at some point in the future. The only possible problem is for any that have a _2 or higher extension on the name. If there are any of these, they can ONLY fail validation if they actually are invalid, or if a previously missing 1.14 suddenly gets sent in now.

So of the 58, how many are 1.14s? Those are the ones likely to fail. Of the 1.15s, I would be very surprised if more than a couple miss out. Maybe DMMDL would like to count how many of the 1.15 pendings are _2 or above tasks with a 1.14 partner AND a further 1.14 task that hasn't actually failed and is just late. That is his maximum 'exposure' :-).

Of course, my thinking may be totally muddled so please correct me if I'm wrong.

Cheers,
Gary.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119232066874

RAC: 25224786

RE: Anyhow, we're looking

8 Oct 2015 22:47:49 UTC

Message 134263 in response to message 134260

(moderation:

)

Quote:

Anyhow, we're looking into re-validating such 1.15 tasks which would get you the deserved credit once two 1.15 validated later-on and produced the canonical result.

I not sure I fully understand this bit. A 1.15 task can only fail if it was paired with a 1.14 because of a timeout of the other 1.14. The 1.15 fails only if the timed out 1.14 unexpectedly revives. Are you saying that any completed quorums that are 1.14,1.14,1.15, where the 1.15 has been excluded in this way, will be repeated and the failed 1.15 will then be matched against the new canonical result? Wont the failed 1.15 be gone before the new canonical result arrives? I guess you must be looking at retaining them all for however long it takes?

Seems like a lot of extra effort for you guys. Not wasting your time is more important than worrying about a few lost credits.

Cheers,
Gary.

FGRP4 App version 1.14 vs 1.15, was: There's no CPU work available

Forums › Problems and Bug Reports

Comment viewing options

Forums › Problems and Bug Reports