Einstein Tasks' Time "To Completion" Incorrectly Going Up Instead of Down

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5877
Credit: 118653263858
RAC: 18921953

RE: I have noticed the same

Message 97835 in response to message 97833

Quote:
I have noticed the same "increasing time remaining" on my PIII 1.4 Netservers (W2Ksp4) OS. So what is the solution? Just leave all alone and monitor if it completes by the deadline? Abort and hope for a WU that counts down the correct direction? I thought maybe I had caused the problem by setting my server clocks after I had started E@H.


Thanks for taking the trouble to do a search on your problem and for finding this old thread which probably covers exactly the same behaviour you are observing. I quickly skimmed the old posts and I don't think there was an entirely satisfactory answer then and I don't think Richard's answer covers exactly what you are reporting, either. Richard is quite correct in what he says (as always) but I think you are referring to something different.

I used to run a *lot* of tualatin PIIIs - they were quite a favourite of mine, and at one point I had about a dozen dual core 1400s running. It must be costing you (or someone) a fortune for the electricity now though :-). I think I know exactly what you are seeing so I'll try to explain it by reference to the list of tasks for one host.

In that list there are examples of the three currently available CPU app tasks. The GW (Gravitational wave) tasks are taking around 80Ksecs, the FGRP (Fermi Gamma Ray Pulsar) tasks around 120Ksecs and the BRP (Binary Radio Pulsar) tasks around 260Ksecs.

Each task that is sent out has a time estimate attached to it. The estimates are hopefully reasonable but they have to be a compromise because of the high variability of all the different platforms that are out there. For more modern platforms, the estimates are not too bad - usually. For older platforms like yours - SSE only - there are usually problems with inaccurate estimates.

The real cause of the problem is that the mechanism for 'correcting' estimates over time (built into the BOINC client) is on a 'per project' basis and not a 'per app' basis. If the estimate for a particular app is too low (ie the task actually takes longer), BOINC will compensate for this by raising the DCF (duration correction factor) to improve the estimate. If a sizeable correction is needed, BOINC is likely to do this in one big hit.

If the estimate for a different app is too high (ie the task actually takes less time), BOINC will (over time) lower the DCF, but never in one big hit. So imagine what would happen if you had been doing a series of tasks (GW and FGRP) where BOINC had settled on a suitable DCF. Then along comes a BRP task which actually takes way longer than the estimate (probably mostly because you only have SSE). That task, while running, would have too short an estimate. The remaining time would continue to blow out until somewhere towards the end when it would turn around and start going down to zero. When the task finally finished, the DCF would be corrected upwards in one big hit so that all future tasks for all apps would have much higher estimates until a series of 'good' tasks progressively lowered the DCF once again. If you repeat this scenario, you can have the DCF see-sawing, quickly up and slowly down, with not much you can do about it without manual intervention.

There's no real problem with this - it's not a 'bug' in the normal sense, but rather expected behaviour due to the current limitations in BOINC. The only way to 'solve' this would be to extend the functionality of BOINC to be able to make these estimate corrections on a 'per app' basis. I don't have time to keep up with all that is going on in BOINC development but I reckon this functionality would be on somebody's 'to do' list.

If you keep your cache size reasonable (as you seem to have) there will be no issue for you. The estimates will fluctuate, but who cares! The one thing I would recommend that you do is to change your preferences so as NOT to receive BRP CPU tasks since these are not being crunched efficiently on your platform. GW and FGRP tasks seem to be about right so why not stick with those two only?

Of course you could consider shutting all those old clunkers down and using the electricity savings to build a single 'Sandy Bridge' host (2500K CPU) with something like a GTX560 GPU. That way you could support all three apps. It would only take a relatively small number of months (probably less than 12) to pay for the upgrade :-). And your 'rate of production' would skyrocket :-).

Cheers,
Gary.

JDBurch
JDBurch
Joined: 2 Sep 05
Posts: 190
Credit: 1584266
RAC: 0

Thanks to both of you for the

Message 97836 in response to message 97835

Thanks to both of you for the information. It seems to be a combination of things (besides running dinosaurs). I reduced my additional days work download from 3 to .1, aborted many tasks that obviously were not going to make it. I understand there is no empathy in programming, one second late and no credit. I added more RAM to all the machines. And after reading your post I believe changing the E@H preferences to not accept work written for GPU processing might help also. After just observing for several days I understand what Richard was talking about the 2% increments.
BTW..you are right about the power bill. I have had these 13 lp1000r Netservers, cord, rack, KVM, and swithces laying around for years and always wanted to build my own server rack. About $150 month to run them. I will post the pic of my green eyed creature and then start planning my multi-core behemoth. You can see it's black 4u case in the rack above the 1u Netservers.
Thank You again for the input.

cliff
cliff
Joined: 15 Feb 12
Posts: 176
Credit: 283452444
RAC: 0

I notice that I have WU

I notice that I have WU showing 100% completed in tasks, but there it stays while the time remaining is blank and the time elapsed contnues to increase.

This for S6GC wu only.

After between 6 and 10 mins the wu is uploaded ok.

I'm curious as to why thse wu overun this way, even the graphics show as 100% completed..

Regards,

Cliff,

Been there, Done that, Still no damm T Shirt.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5877
Credit: 118653263858
RAC: 18921953

RE: I'm curious as to why

Message 97838 in response to message 97837

Quote:
I'm curious as to why thse wu overun this way, even the graphics show as 100% completed..


This is normal since the % done refers to crunching of the data and, after that is finished, there is some post-processing of results being performed before the fully finished product is uploaded. My experience has been that this only take 2 or 3 minutes and is therefore not really all that obvious to the casual observer.

If you want the full technical details, please read the technical news thread where the S6Bucket run was announced. In particular, read this message in that thread and look for Bernd's description of the "LineVeto" calculations.

Cheers,
Gary.

cliff
cliff
Joined: 15 Feb 12
Posts: 176
Credit: 283452444
RAC: 0

Hi, Thanks for the

Message 97839 in response to message 97838

Hi,
Thanks for the info.

However on my computer the overrun is on the order of 6 to 10 mins..

So it is noticible particularly when one has just started a project for the 1st time and is therefore keeping an 'eye' on progress.

Regards.

Cliff,

Been there, Done that, Still no damm T Shirt.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5877
Credit: 118653263858
RAC: 18921953

RE: So it is noticible

Message 97840 in response to message 97839

Quote:
So it is noticible particularly when one has just started a project for the 1st time and is therefore keeping an 'eye' on progress.


Yes, I must apologise for being too dumb to see that it was your first post and that you had just recently joined. I do welcome you here and I hope you find it an interesting project to participate in. My memory about the 2-3 mins might be faulty - it's a while since I first saw this and I don't tend to closely watch tasks crunching these days. The comment about "not all that obvious" was a comment about myself and how I see things and was not intended to give any offense, particularly to a newcomer.

This particular science run (S6Bucket) has been running since around May last year and is due to run out of primary work very shortly. You are soon to experience the transition to a new run for which the app is still under test. When the new app is released, there will be a period of weeks to months where the remnants of the old run will be cleaned up. Most hosts will transition automatically to the new app quite quickly. As cleanup tasks become needed, they will be sent to hosts having the appropriate data files so you will probably see a mixture of new and old for quite some time after the server status page declares the old run to be 100% complete. That figure applies to the creation of primary tasks and says nothing about when all tasks have eventually been returned and all WU quorums are fully completed.

Cheers,
Gary.

cliff
cliff
Joined: 15 Feb 12
Posts: 176
Credit: 283452444
RAC: 0

Hi Gary, No offense

Message 97841 in response to message 97840

Hi Gary,
No offense taken:-) I simply wanted to highlight that my experience was different to yours.

Well I hope that any transition is smooth, and that the app doesnt fall over at the first sniff of a WU on my rig..

Thats happened in the recent past with some boinc orientated software. I find that Murphy's 3rd law applies in spades in my neck of the woods:-)

So I have a tendancy to listen for the drop of the other 'shoe' when something changes or appears out of the ordinary.

Thanks for your assistance.

Regards,
Cliff

Cliff,

Been there, Done that, Still no damm T Shirt.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5877
Credit: 118653263858
RAC: 18921953

RE: Well I hope that any

Message 97842 in response to message 97841

Quote:
Well I hope that any transition is smooth, and that the app doesnt fall over at the first sniff of a WU on my rig..


Since the project opened its doors just over 7 years ago, there have been many of these 'transitions' and, from a volunteer perspective, they have been well managed by the Devs and largely painless. I guess the Devs themselves would probably have a different opinion about the level of pain - if not the degree of good management :-). A few different approaches have been tried in an attempt to ensure a smooth transition. The latest is the Albert@Home test project. The next two apps to be released are being tested there. At least we know the apps will actually run a task to completion under some circumstances :-).

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.