Immediate timeout? Missing deadline?

archae86
archae86
Joined: 6 Dec 05
Posts: 3161
Credit: 7305378356
RAC: 2289544

Ray Stone wrote:Also, I've

Ray Stone wrote:
Also, I've noticed that sometimes tasks are marked as "missed deadline" but not always. Is the presence of this message some indication that the WU has been sent to a 3rd computer? Is there any such indication for these situations?


If you find the task of interest in a task list on your web pages (either that one specific to one of your hosts or the general task list for your account) clicking on the link for that task in the "Work Unit ID" column will show you a list of all tasks (so far) created from that WU, with creation date, deadlines, and status. That will answer your question as to whether one has already been sent, but if a quorum-fulfilling return is currently overdue, it won't tell you how long a time will elapse before next issue.

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513211304
RAC: 0

RE: RE: I am about to

Quote:
Quote:
I am about to miss my deadline on 2 tasks that have been running for 10 and 7 hrs. They have 2 and 2.75 hrs left. Due in 20min. What happens now? Do I miss the credits and end up wasting 18+hrs of CPU time?

My question is "what is good form in this case?"

There are different views, each valid.

If a task has not started and likely to fail a deadline, then i always abort them.

I seem to recall E@H will still credit you if you complete a task late, even if others report it before you complete (within some reasonable limit, not sure what that is, but several days at least) - other projects may not.

If i'm feeling "credit hungry", i let running tasks continue, the late task may not even be allocated to another host by the time you finish. (Easy option)

If i'm late, and the task has been re-assigned (and you notice on the web page!) then you can take a "pulsar hungry" position, abort it and sacrifice your credits and start a fresh task.

However, if this is a regular problem - missing deadlines - then this is usually a good sign something is wrong. The host is not configured correctly, wrong app, too many tasks, or giving insufficient time resources to the project etc. Leaving it running badly is not imho "good form".

To reinforce archae86, there is no feedback to the boinc manager on the host about the task status, you to need to manually query the web page.

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

RE: I seem to recall E@H

Quote:
I seem to recall E@H will still credit you if you complete a task late, even if others report it before you complete (within some reasonable limit, not sure what that is, but several days at least) - other projects may not.


My understanding is that if you return your late task before the quorum has been met you will get credit, I belive this is true for all projects.

Here at Einstein all workunits has a quorum of 2. So if you return a task late and only your initial wingman has return their task you will get credit, and so will the host working on the replacement task if it's returned before it's deadline. On the other hand if the replacement task is returned before your late task then your work will be wasted and you will not get credit.

The replacement task for your missed deadline will be placed last in the schedulers queue of tasks to send, so there will probably be some time between the task being generated and it being sent. Take a look at the details of any given task to get a feel for how long that time might be. At a quick glance i see times ranging from about 2 to 6 hours between creation and sending.

I belive that the feauter to abort not needed tasks is enabled here and that would meen that if the task is not started and it's deemed not needed Boinc will abort it. For this to happen the late taks will have to be reported and then the host with the replacement task needs to contact the scheduler before it starts the replacement task.

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513211304
RAC: 0

RE: Here at Einstein all

Quote:

Here at Einstein all workunits has a quorum of 2. So if you return a task late and only your initial wingman has return their task you will get credit, and so will the host working on the replacement task if it's returned before it's deadline. On the other hand if the replacement task is returned before your late task then your work will be wasted and you will not get credit.

Thanks Holmis, i had not realized if your replacement comes in before you (and gets validated), then you always lose the credit.

This host has a number of "too late to validate" tasks and this task an example of extra credit despite that host being late.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 777909582
RAC: 1203889

I think it makes a lot of

I think it makes a lot of sense the way it's implemented, fairness-wise:

- You will always get credit it your host is doing what it promised: return a valid result within the deadline

- You will also get credit if the host sends back a result that is 'useful' (=it could be used to validate another result) , even if the deadline was missed.

- You will not get credit if you missed the deadline AND the result came in so late that is was no longer useful to the project

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5877
Credit: 118612320771
RAC: 18255313

RE: RE: I am about to

Quote:
Quote:
I am about to miss my deadline on 2 tasks that have been running for 10 and 7 hrs. They have 2 and 2.75 hrs left. Due in 20min. What happens now? Do I miss the credits and end up wasting 18+hrs of CPU time?

My question is "what is good form in this case?"


For the specific case you quoted - 20 mins to deadline, ~2 hrs to run - the tasks should probably be allowed to run for the following reasons:-

  • * A lot of time has already been invested and the remaining time is short, so you save little by aborting now.
    * It's possible that the task could be returned before the extra task is issued in which case there will be a saving when the server cancels the 'not needed' replacement task.
    * There is a chance that the other task in the original quorum may not have been returned so your task is still 'needed'. You can check this when making your decision. If neither task is back yet, why would you abort yours?
    * Even if the extra task is sent, there is a risk (perhaps as high as 20%) that it wont be successfully returned (for various reasons) so the late task would be really useful after all.
    * There is also a smaller risk that the initial comparison of the first two results could be 'inconclusive' so that the third result was required.
    * It's impossible for the 'nearly completed' task not to be a part of the validation process. Even if the replacement task is sent out immediately, it is extremely unlikely to reach the top of the recipient's work cache, be crunched, uploaded and reported, all within 2-3 hours.

"Good form" is to not be making the above decision with just 2hrs to go :-). If a machine gets restarted after a break and there are tasks on board at deadline risk, just abort enough 'closest to deadline' unstarted ones to allow those further away to be done in time. Have a think about any 'in progress' ones. If the % done is small (eg. 10-30%) just abort them and suck up the loss. Checking quorum partners can often help in the decision about tasks that have a higher % done. In the end, it's your decision.

Quote:
Is there some kind of grace period before the ending of which this wu will not be sent to another computer?


No. If you're lucky, you may get a little (usually minutes rather than hours) until the scheduler gets around to sending out the replacement. It depends quite a lot on which science run is involved.

Quote:
Also, I've noticed that sometimes tasks are marked as "missed deadline" but not always. Is the presence of this message some indication that the WU has been sent to a 3rd computer? Is there any such indication for these situations?


No and no. The only way to know if a replacement task has actually been issued is to check the WUID link on the website. The BOINC client (on startup) will complain about tasks that have already exceeded deadline. The website will show any 'deadline misses' in red for tasks assigned to you but not reported. In neither case will you know (without further investigation) whether or not a replacement has been issued.

In his reply, Holmis mentioned

Quote:
I belive that the feauter to abort not needed tasks is enabled here and that would meen that if the task is not started and it's deemed not needed Boinc will abort it. For this to happen the late taks will have to be reported and then the host with the replacement task needs to contact the scheduler before it starts the replacement task.


BOINC certainly has this feature but I don't think it is enabled here, certainly not for the science runs I use (BRP6 and FGRP4). Sometimes if I get a lot of 'resend' tasks (_2 or higher suffix on the task name) on a particular host, I will check the WUIDs to see what the problem seems to be. I often notice in some of these that a 'late' task has subsequently been returned so that my task (not yet started) is really not needed. I have several times 'updated' to see what the scheduler would do and my unneeded copy has NOT been deleted by the scheduler. I always abort manually in these cases.

Cheers,
Gary.

Grubix
Grubix
Joined: 1 Jul 08
Posts: 19
Credit: 159690452
RAC: 0

My problem has nothing to do

My problem has nothing to do with the recent postings, the miss of a normal deadline. I refer to the original topic of this thread.
In the last few days I got some tasks without a deadline. Many of my examples above are already out of the database.

Here are two new examples:

515419446 (223919462)
514575327

As you can see at the first example, I have returned the task in less than 12 hours. But someone else got also the task and was faster: I got no credits.
The credits are not that important to me, I compute for science. I am more unhappy of wasted computing time. And in almost every one of my examples, the task has been charged multiple times due to the lack deadline.

Bye, Grubix.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4332
Credit: 252100586
RAC: 33945

We can confirm that there is

We can confirm that there is a problem of "zero day" deadlines that occurs occasionally with GW / locality / S6BucketFU* tasks.

I browsed the DB and scheduler code a number of times now, and I must admit that I still don't have a clue how this can happen. The respective workunits are ok in the DB, and the scheduler simply adds the WU's "delay_bound" to the current date to set the task's deadline when sending it to the client. There isn't much that can go wrong there.

BM

BM

Grubix
Grubix
Joined: 1 Jul 08
Posts: 19
Credit: 159690452
RAC: 0

Hello Bernd, thanks for your

Hello Bernd, thanks for your reply.

Too bad that you can't find a bug despite your intensive search. I wish you have still success.

If I can help, please write me, Grubix.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.