Timed out - no response

alintope
alintope
Joined: 27 Jan 12
Posts: 52
Credit: 295551180
RAC: 0
Topic 196192

On page http://einsteinathome.org/account/tasks which is showing all my Einstein@Home tasks there is one recent entry I have a question to:

Quote:
271704040 4164013 7 Feb 2012 7:26:30 UTC 21 Feb 2012 12:05:58 UTC Completed and validated
271704039 4242667 7 Feb 2012 7:29:36 UTC 9 Feb 2012 3:35:15 UTC Completed and validated
274388917 4583083 21 Feb 2012 8:34:37 UTC 21 Feb 2012 12:46:52 UTC Timed out - no response

On 2/21/2012 at 8:34:37 computer 4164013 was working on WU 116033127 already 14 days 1 hour and 8 minutes. Since its result was overdue at that moment my computer 4583083 was sent the same WU as the third computer to examine it. But 4 hours later computer 4164013 came with a result. My computer was timed out. Why not the other one the result of which came too late?

Gruß
Heinrich

Claggy
Claggy
Joined: 29 Dec 06
Posts: 560
Credit: 2699403
RAC: 0

Timed out - no response

Quote:

On page http://einsteinathome.org/account/tasks which is showing all my Einstein@Home tasks there is one recent entry I have a question to:

Quote:
271704040 4164013 7 Feb 2012 7:26:30 UTC 21 Feb 2012 12:05:58 UTC Completed and validated
271704039 4242667 7 Feb 2012 7:29:36 UTC 9 Feb 2012 3:35:15 UTC Completed and validated
274388917 4583083 21 Feb 2012 8:34:37 UTC 21 Feb 2012 12:46:52 UTC Timed out - no response

On 2/21/2012 at 8:34:37 computer 4164013 was working on WU 116033127 already 14 days 1 hour and 8 minutes. Since its result was overdue at that moment my computer 4583083 was sent the same WU as the third computer to examine it. But 4 hours later computer 4164013 came with a result. My computer was timed out. Why not the other one the result of which came too late?

Gruß
Heinrich


It looks as if you eithier reset the project, or you lost all your Wu's, then your Wu's slowly got resent, your timed out Wu didn't get resent because it it had already validated between two other hosts, it would be a waste to send it to your host too.

Claggy

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117877264825
RAC: 34730148

RE: RE: On page

Quote:
Quote:

On page http://einsteinathome.org/account/tasks which is showing all my Einstein@Home tasks there is one recent entry I have a question to:

Quote:
271704040 4164013 7 Feb 2012 7:26:30 UTC 21 Feb 2012 12:05:58 UTC Completed and validated
271704039 4242667 7 Feb 2012 7:29:36 UTC 9 Feb 2012 3:35:15 UTC Completed and validated
274388917 4583083 21 Feb 2012 8:34:37 UTC 21 Feb 2012 12:46:52 UTC Timed out - no response

On 2/21/2012 at 8:34:37 computer 4164013 was working on WU 116033127 already 14 days 1 hour and 8 minutes. Since its result was overdue at that moment my computer 4583083 was sent the same WU as the third computer to examine it. But 4 hours later computer 4164013 came with a result. My computer was timed out. Why not the other one the result of which came too late?

Gruß
Heinrich


It looks as if you eithier reset the project, or you lost all your Wu's, then your Wu's slowly got resent, your timed out Wu didn't get resent because it it had already validated between two other hosts, it would be a waste to send it to your host too.

Claggy


Thanks Claggy for adding the links.

I think you are probably essentially right in your analysis but there still seem to be some oddities and perhaps other possibilities. For the benefit of anyone interested, I thought I'd add some detail about how it looks to me.

Firstly, concerning the deadline miss task that belonged to host 4164013. Now that it has validated, it's not obvious that it was a deadline miss (for around 4 hours or so). You can see it was from the dates and times - sent and reported. This deadline miss would have triggered the creation and sending of a replacement task (I call them 'resends' - I don't know if there is an official name). You can easily tell it's a 'resend' by checking its name - it has a _2 extension. _0 and _1 are the primary tasks and if resends are necessary, they will have higher numbered extensions.

A resend may not actually be sent out if the deadline miss task gets reported before the scheduler can find a suitable host to give the resend task to. If it's not sent, the status would show the new task as 'not needed'. This does happen quite a bit with GW tasks (which tend to have longer delays in finding a host) and is a good thing, because it eliminates some redundant work. In this case, however, the scheduler found a suitable host and the resend was issued about an hour after the deadline miss occurred and about 3+ hours before the deadline miss task was finally reported.

Even when a resend has been issued, the scheduler doesn't ignore the deadline miss task if it gets returned and reported before the resend does. This is reasonable because there is no guarantee that the resend itself will ever get returned so why would you ignore a useful result at this point? If the resend does get completed within deadline, it will also be credited so the host crunching the resend wont be disadvantaged by this behaviour.

So why was the resend task in this case given a status of 'Timed out - no response'? This happened just over 4 hours after the task was recorded as being sent. It was also exactly at a time when a request for work was being processed by the scheduler. It could have been the scheduler finding a 'lost task' at the client end and then noticing that the quorum it belonged to was already complete so no point in re-sending it. If that was the case, the OP would have been able to see it recorded in the message log at that time. Lost tasks being resent are quite obvious in the log and even at this stage after the event, it should be possible to go back and check the log to confirm this.

Another possibility could be that it was a server initiated abort. It happened during a scheduler exchange and it looks like the client request for work caused the scheduler to notice that the task was now redundant. The server code is customised on E@H and it's not often updated so it's hard to know the full story. The status message recorded in the database is not helpful and this would seem to be some sort of a bug. There's no way the task actually 'timed out' (ie missed its deadline). Whatever the full story, it was still a reasonable outcome since no time had been expended on the task and it truly was redundant. The only thing wasted was the bandwidth in downloading it.

Cheers,
Gary.

alintope
alintope
Joined: 27 Jan 12
Posts: 52
Credit: 295551180
RAC: 0

Thank you, Gary, for your

Thank you, Gary, for your datailed and knowledgeable analysis. I, as a BOINC beginner, didn't expect it to be so complicated and I have to admit, I didn't completely understand every detail. But it shows me how much "brain power" has to be spent on a DC project to get it running properly.

Gruß von
Heinrich

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117877264825
RAC: 34730148

RE: ... I, as a BOINC

Quote:
... I, as a BOINC beginner, didn't expect it to be so complicated ...


Behind the scenes, the BOINC framework is very complicated. It needs to be when you really think about the disparate needs and behaviours of all the diverse projects that depend on it now. The Devs often cop flak concerning BOINC's supposed 'failings' (both real and imaginary ;-) ). Much of this is unjustified. If you don't try to force it to do things it's not designed to do, and if you are prepared to be patient and ask questions/seek advice, it is really possible to have a pleasant, relatively trouble free and essentially 'hands-off' experience, without having to have a deep understanding or to 'micromanage' things at all.

Quote:
... I didn't completely understand every detail ...


That's OK - there's plenty of stuff I don't understand.

Quote:
But it shows me how much "brain power" has to be spent on a DC project to get it running properly.


Well ... actually, not really. One of the design goals of BOINC is for projects to run properly (set and forget) without any special knowledge on the part of the user. Even the BOINC Devs would admit there's still a way to go but what we have now is not too bad, all things considered. To achieve a satisfying experience, make sure you browse through and think carefully about all the computing and project specific preferences. If it's not absolutely clear what a particular pref does (or how it interacts with other prefs), ask, ask, ask, until it is clear. A lot of grief can be caused by inappropriate pref settings. So, sure, you need a bit of understanding when setting up the prefs for your project mix, but that tends to be common sense rather than "brain power". The defaults are usually pretty good anyway.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.