Petition - Deadline Relief for Longest Results

Svenie25
Svenie25
Joined: 21 Mar 05
Posts: 139
Credit: 2436862
RAC: 0

RE: The deadline (actually

Message 70024 in response to message 70023

Quote:

The deadline (actually its "length") is a property of the workunit and thus inserted by the workunit generator at time of creating the workunit, nothing is known (and necessary to know) about the host they will later be assigned to. The workunit generator, however, knows about the "size" of a workunit that is reflected by the number of credits that will finally be granted for it. A variable deadline would be derived from this "size" of the workunit, not from any info about any host.

Would this concept of a variable deadline be desirable?

BM

I think so. This should be the same system at SETI, I think. Ther is also a variable deadline from the size of the WU.

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0

RE: The deadline (actually

Message 70025 in response to message 70023

Quote:

The deadline (actually its "length") is a property of the workunit and thus inserted by the workunit generator at time of creating the workunit, nothing is known (and necessary to know) about the host they will later be assigned to. The workunit generator, however, knows about the "size" of a workunit that is reflected by the number of credits that will finally be granted for it. A variable deadline would be derived from this "size" of the workunit, not from any info about any host.

Would this concept of a variable deadline be desirable?

BM

Agreed, and when the scheduler decides whether or not to send a given WU to host it has the reported performance and other metrics for the host to use for that purpose.

However, keep in mind you would only 'need' to use variable deadlines if you intended to send the whole spectrum of template frequencies to all hosts so that the tightness factor was constant over that range.

If you were to stick with the 'slowhost/fasthost' method used in S5R1/I, then the main factor would be the trigger points for the ranges. Of course the downside to that with a fixed two week deadline is you will progressively raise the bar for who can participate as the work gets 'tougher'. IOW's slowhosts would only be able to run a smaller fraction of the work, all other things being equal.

So it seems to me that the choice of stategies boils down to how much the extra load from either method would impact the DB backend. Although I can't say for sure I would think that just bumping the deadline a week would have less effect than variable deadlines at fixed tightness factor would, since my data indicates it could easily take a host at the low end of the speed spectrum a month to run the 'toughies'.

Thinking about it, since we're Beta right now anyway, wouldn't it be easier to test the 3 week deadline theory at this point than variable deadlines with regard to DB load?

Alinator

Stick
Stick
Joined: 24 Feb 05
Posts: 790
Credit: 33451874
RAC: 13230

RE: RE: I guess if the

Message 70026 in response to message 70023

Quote:
Quote:
I guess if the project should go for for variable deadlines, one would make the decision based on RAC instead of benchmark results? That should do the trick

The deadline (actually its "length") is a property of the workunit and thus inserted by the workunit generator at time of creating the workunit, nothing is known (and necessary to know) about the host they will later be assigned to. The workunit generator, however, knows about the "size" of a workunit that is reflected by the number of credits that will finally be granted for it. A variable deadline would be derived from this "size" of the workunit, not from any info about any host.

Would this concept of a variable deadline be desirable?

BM

That seems to be the way that SETI does it.

BTW: I just got a SETI unit with a "To completion" time of a little more than 5 hours. It's deadline is 3 weeks away. The most current Einstein unit (with a 2 week deadline) on the same host will take about 50 hours. While I continue to think that longer deadlines are in order, I realize that BOINC "takes care" of the issue, by making sure my deadlines are met and then, in the case of Einstein, putting off requests for new work until it's debt is paid off.

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0

RE: BTW: I just got a SETI

Message 70027 in response to message 70026

Quote:

BTW: I just got a SETI unit with a "To completion" time of a little more than 5 hours. It's deadline is 3 weeks away. The most current Einstein unit (with a 2 week deadline) on the same host will take about 50 hours. While I continue to think that longer deadlines are in order, I realize that BOINC "takes care" of the issue, by making sure my deadlines are met and then, in the case of Einstein, putting off requests for new work until it's debt is paid off.

You've just observed what I'm talking about when I speak of 'tightness factor'.

FWIW, EAH has normally been a tighter project than SAH historically speaking.

Alinator

ohiomike
ohiomike
Joined: 4 Nov 06
Posts: 80
Credit: 6453639
RAC: 0

The other thing that SETI

The other thing that SETI does that would help here is the initial replication of 3 with a quorum of 2. With the new BOINC software(5.8.x and up), the software will send 3, wait for the first 2 results and then cancel the 3rd WU if the host has not started it yet. We could eliminate the 45-60 day waits some people have gotten for credit that way.
Off topic: We could also send smaller, more reasonable WU's that don't scare people off.


Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0

RE: The other thing that

Message 70029 in response to message 70028

Quote:
The other thing that SETI does that would help here is the initial replication of 3 with a quorum of 2. With the new BOINC software(5.8.x and up), the software will send 3, wait for the first 2 results and then cancel the 3rd WU if the host has not started it yet. We could eliminate the 45-60 day waits some people have gotten for credit that way.
Off topic: We could also send smaller, more reasonable WU's that don't scare people off.

No offense meant, but issuing trailers by default is the last thing which should be considered, especially while we're in this beta phase and other possible scheduling issues have been observed.

My reasons:

1.) It means that any host not running a 5.10 client will end up wasting at least some of, up to most of it's time running scientifically useless results and therefore wasting the participants money spent on power.

2.) With a tight deadline project your host might find itself running a little late, but end up getting unconditionally aborted after having crunched most of the result, due to the third result coming in and validating. There are other twists to this scenario, and applies to 5.5 CC's and up (IIRC).

3.) The amount of time a result stays pending has zero long term impact on any of your performance metrics, regardless of the reason for it.

4.) The large host cache scenario, where 221 functionality works to mitigate the wasted time issue for always connected fast hosts, is really intended for people who are not always connected (ie notebooks and DU participants). Issuing trailers by default just to placate instant gratification unduly penalizes them due to Items 1 and 2. One only needs to look at Dr. Anderson's comments regarding this to see how the 'head man' feels about it, the cache decoupling feature was only recently released, and the jury is still out regarding whether it's a good or bad thing in the context of being available to the whole spectrum of participants. My guess is 221 functionality was added as well in order to prevent wholesale deadline blowing in extreme cache, short CI scenarios when running multiple projects.

Alinator

KSMarksPsych
KSMarksPsych
Moderator
Joined: 15 Oct 05
Posts: 2702
Credit: 4090227
RAC: 0

RE: The other thing that

Message 70030 in response to message 70028

Quote:
The other thing that SETI does that would help here is the initial replication of 3 with a quorum of 2. With the new BOINC software(5.8.x and up), the software will send 3, wait for the first 2 results and then cancel the 3rd WU if the host has not started it yet. We could eliminate the 45-60 day waits some people have gotten for credit that way.
Off topic: We could also send smaller, more reasonable WU's that don't scare people off.

But doesn't that take a backend update as well (which is needed here)? I don't run Seti and I haven't been paying very close attention to all of the server side stuff that's come up in the last few weeks. I've had enough trouble keeping up with client side stuff.

On the topic of the thread, I personally am not running into deadline issues. But variable deadlines does seem to fit the bill here.

Kathryn :o)

Einstein@Home Moderator

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5877
Credit: 118617384062
RAC: 18073593

RE: Doubling the deadline

Message 70031 in response to message 70018

Quote:
Doubling the deadline means basically doubling the size of our database, and it means that people have to wait for their results to be validated and thus credit granted potentially twice as long.

If I understand things correctly (and that's a big if) I don't think this statement is necessarily true.

I would think that many of the result pairs - perhaps even the majority - get fully completed in less than 10 days with a 14 day deadline. I base this on an observation of many of my own results over time. I ask this question. If the deadline had been 28 days instead of 14, would all those people who are taking 10 days or less suddenly start taking 20 days? I wouldn't have thought so. In fact isn't it true to say that a simple increase in deadline would have no effect on those who are currently meeting the deadline unless they suddenly started running their machines less hours per day or suddenly reduced the resource share that they were prepared to allocate to EAH or suddenly did something silly like drastically increasing their cache size? My gut feeling is that whilst some may take some of these three actions, most wouldn't.

As far as waiting for validation and credits, I don't think there would be much change. The anecdotal evidence suggests that there is a significant drift of machines away from the project because the perception of the owners is that they can't abide the long crunch times and the strict deadlines. Many simply leave without completing what they have which means that work has to be reissued. In other words, the results so effected are going to take a long time to validate anyway. A longer deadline would encourage many of those people to "stick it out" which may actually reduce the total time for validation on quite a few results. Someone sticking to the job for 20 days is going to be faster than two people successively failing a 14 day deadline.

Quote:
A deadline that depends on the "size" (i.e. expected run-time, credit etc.) of the workunit would be an interesting idea. I'll discuss that with the team.

I think absolutely that this is the way to go. As the workunit generator knows the "size" and could calculate and insert an appropriate deadline, this course of action obeys the KISS principle. On your PDF graph you showed "size" in terms of crunch hours - 10, 20, 30, etc. It almost seems appropriate to change those into deadline days - 10 days, 20 days, 30 days. It wouldn't need to be a continuous function - you could put certain frequencies into "speed bins" and have a single deadline for each bin - whatever is easiest for the WU generator to do.

Cheers,
Gary.

archae86
archae86
Joined: 6 Dec 05
Posts: 3161
Credit: 7306358356
RAC: 2303919

RE: A deadline that depends

Message 70032 in response to message 70018

Quote:

A deadline that depends on the "size" (i.e. expected run-time, credit etc.) of the workunit would be an interesting idea. I'll discuss that with the team.

BM


One comment: I've observed an undesirable side-effect on the short end of the current variable SETI deadlines.

For users running more than one project, one with low resource share can very easily trip into EDF processing when a new result is downloaded with a low predicted runtime and thus a very early deadline. On my machines, this effect is annoying. I run some at 2% SETI share, but when SETI has a server hiccup, during recovery my machine overfetches (a long-known bug), and if among the overfetch are some short ones, I get into Earliest Deadline First. If I run a queue of more than trivial length, soon some Einstein units are in EDF. I won't go down the path of arguing whether anyone should care about this--the simple fact is that a fair number of people do.

I'd think most of us would like the behavior of the variable deadline with size approach so long as the low end did not dip below something like ten days.

On the long end the biggest project risk I can see is that an unlucky WU which gets downloaded to sequence of machines which quit or invalidate could take even longer to finally get resolved than now. So the tail at the end of current campaign could take even longer and lead to even more massive multiple issuing in the end game.

On balance I think it a good idea. As a first guess I'd suggest the smallest units currently issued get ten days, and the largest currently issue get double the current deadline, with a linear scale between. No magic bullet this, but possibly a decent compromise among the considerations.

Winterknight
Winterknight
Joined: 4 Jun 05
Posts: 1491
Credit: 394302350
RAC: 544104

When you look at the basic

When you look at the basic assumptions for BOINC then the deadlines MUST be extended.
You cannot expect hosts to be on 24/7, most are probably only on during office hours, or at home, but not sleeping hours, so don't expect more than 8 hrs/day as an absolute max.
You must assume the host is attached to more than one project, so divide time by 2 or 3.
The host probably uses windows and uses standard app, and on average is two years old. Therefore crunch time per Einstein, mid range, unit is 20+ hours.
BOINC is only expected to use spare cpu cycles.
And in reality a 14 day deadline is probably closer to 12 days crunching. As each unit is probably downloaded a day before it starts and the scheduler tries to return 24 hrs before actual deadline.

From this I would guess the average attached computer could do one unit/cpu at most and probably at some point is in EDF for the Einstein units.

Andy

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.