Task time estimates and sent tasks

grn
grn
Joined: 6 Nov 14
Posts: 17
Credit: 30339583
RAC: 0

Gary, First I could not

Gary,

First I could not find any local settings to restrict work but eventually found it under network usage which I guess is logical in a way and I had misread the site preference setting and thought I'd restricted workload to 2 days but had entered that in the addtional days field by mistake. The actual field for the number of days was blank when I finally corrected my mistake so assume it was defaulting to 10 days. Settings have now been changed in both locations.

However, the case still exists that regardless of this setting I am getting sets of tasks that do not fit into the fixed deadline period. I have no idea how it determines the amount of tasks other than basing it on the estimated run time. However, I have found why I have presented figures that you have disputed. I've noticed that the time estimates for waiting tasks change whenever a running task completes. For example, the GPU tasks started with an estimated run time of 15:49:31 but after the first task completed the estimated run time had reduced for waiting tasks to over 10 hours and it now stands at 9:25:34 hours. I had not noticed this behaviour before.

Is this the convergance you mentioned? I note it is also happening with the gamma-ray pulsar search tasks. This may explain why I thought they were significantly overruning - I was comparing the estimates for waiting tasks with the current elapsed time for running tasks nearing completion finding them already well over the estimated times for the waiting tasks.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5877
Credit: 118617837392
RAC: 18064895

RE: ... regardless of this

Quote:
... regardless of this setting I am getting sets of tasks that do not fit into the fixed deadline period.


I've no idea why you think this because it's really not true. Remember, the deadline is 14 days. I presume we are talking about your machine with the GTX670 - it's the one we've been discussing all along. If you click this link you can see all the 'in progress' GPU tasks you currently have (19). At 2.7hrs per task, this represents a little over 2 days work. You are not getting new GPU tasks - there haven't been any new ones since Dec 1st.

Next, if you click this link, you will see the 'in progress' CPU tasks. There are currently 21 of them, 7 of which were downloaded on Dec 1st and 14 on Dec 5th. You were almost out of CPU work so it's no surprise that you got a new batch. If you check your validated CPU tasks, (currently 56 of them), you can see that they take about 15 hours each. You have a total of 8 threads available for crunching, 7 if you have freed up one of them for GPU support. Even if only 2 of those 7 were used to crunch Einstein, you would be able to clear the 21 tasks in around 7 days - so once again, you don't have a deadline problem. If the downloading of the new batch of 14 tasks on Dec 5th caused you concern, why don't you just reduce your work cache even further, just as archae86 suggested in his initial response to you. If you use those values, you'll hardly have any work at all.

Quote:
I have no idea how it determines the amount of tasks other than basing it on the estimated run time.


As you would expect, that's the only yardstick available to BOINC so that's exactly how it calculates what to download. When you first join a project and for a variety of complicated reasons, the estimates can be quite ridiculous, and this can persist for quite a while. So, you just have to set a small cache size and relax whilst BOINC sorts it out - which it will eventually do.

Quote:
However, I have found why I have presented figures that you have disputed. I've noticed that the time estimates for waiting tasks change whenever a running task completes. For example, the GPU tasks started with an estimated run time of 15:49:31 but after the first task completed the estimated run time had reduced for waiting tasks to over 10 hours and it now stands at 9:25:34 hours. I had not noticed this behaviour before.


Yes, I could see that this was troubling you which was why I tried to get you to forget about estimates and just think about the actual crunch times that were being recorded on the website. In desperation, I tried to explain how DCF worked to correct estimates and how this could be sabotaged by all the different science runs having to use the one, per project, DCF.

Lets take a very simple example. Assume BRP5 tasks are estimated at 20 hours but they really take 3 hours. A BRP5 task completes and BOINC sees the 17 hour discrepancy. BOINC uses 10% of that discrepancy (1.7 hours) to correct the estimate. It does this by adjusting the value of the DCF (Duration Correction Factor). So the new estimate (applied to all further BRP5 tasks) would be 18.3 hours. If you were only doing BRP5 and they all took 3 hours, you can see that the continuing '10% of the difference' method would eventually have the estimate converging on 3 hours. Lets say that after this a 'rogue' task comes along which takes 25 hours to crunch, possibly as a result of some hardware issue - perhaps some sort of GPU starvation. Whilst BOINC reduces estimates using the 10% formula, it panics when the crunch time increases a lot like this and immediately sets the estimate to the full inflated amount of 25 hours and we would be back to worse than square one. This is why you really need to sort out why very occasionally your gpu throws in a 15.5 hour result when the correct time is 2.7 hours.

The above is a very simple scenario, easy to understand. The complication is that there is only one DCF per project and tasks for each different science run will cause BOINC to fiddle with the DCF. Lets imagine that after the first BRP5 task above, when the new estimate was 18.3 hours, a FGRP4 task was completed. Imagine it had an estimate of 5 hours but it actually took 15 hours. BOINC would immediately bump up the value of the DCF so that the new estimate for all further FGRP4 tasks would become 15 hours. Because there is only one DCF, the collateral damage would be that the BRP5 tasks estimated to take 18.3 hours would also be dramatically affected (through no fault of their own) by a factor of three. The new estimates for BRP5 would become a ridiculous 54.9 hours.

Stupid things like this can happen when you first add Einstein as a new project to your existing mix, particularly if you enable all the searches. If you keep your hardware clean, cool and healthy, and if you keep your work cache relatively small, you are unlikely to be adversely affected and BOINC will eventually get to a satisfactory working state where estimates (although still oscillating up and down a bit) will bear a reasonable resemblance to reality.

Quote:
Is this the convergance you mentioned? I note it is also happening with the gamma-ray pulsar search tasks. This may explain why I thought they were significantly overruning - I was comparing the estimates for waiting tasks with the current elapsed time for running tasks nearing completion finding them already well over the estimated times for the waiting tasks.


Yep! :-). Sounds like you are beginning to get a handle on things :-).

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.