The case:
The WU was ~23% completed after 2 hours calculation. The PC was then unexpected rebooted by reason of buggy program (not BOINC).
After the restart, the WU was restarted from 0% completed, but with 2 hours calculations behind them. The "To Completion" column displays wrong time (begining from ~380 hrs, counting down).
I think the more important problem is why the WU calculations was restarted from zero.
(Computer: Pentium 4, WinXP SP2)
Regards,
Griffin
Copyright © 2024 Einstein@Home. All rights reserved.
"To Completion" time wrong calculated
)
If the checkpoint-files(intermediantfiles) get corrupted in the crash I bet Boinc detects it and start all over again
but the time stucks(bug?) where it left off.
I've seen it in my own rig(here),and I got credit for 1 WU I saved to see what happend.
(Of course,I didn't get the 214 claimed credit for it... ;)
I would suspect that the
)
I would suspect that the elapsed time for completion of a WU is based on the start time on the specific computer (snapshot of the CPU clock) and then uses that for the calculation both for completion percentage and total time. If a WU is interrupted (as described in the other threads here) and restarted at 0.00%, unless a new snapshot is taken, the estimated time to complete and the total time will be totally incorrect. From what you describe, no new snapshot is taken.
If I've lived this long - I gotta be that old!
RE: I would suspect that
)
I was thinking the same. So I have launched a new CPU benchmark, but no change. Now I'm thinking, that the calculation goes as something:
TimeToCompletion = (PercentCompleted/TimeComputing)*PercentLeft
This formula explains also why the wrong time is continously decrementing to the right time.
RE: RE: I would suspect
)
The initial estimate, before the workunit starts running, is based on the benchmarks and estimated flops provided by the project. After the workunit starts then the formula above is used.
This will change slightly in the 5.xx versions:
Initial est = flops * benchmark * project correction factor
TimeToCompletion = weighted average((initial est / percent complete), ((PercentCompleted/TimeComputing)*PercentLeft))
Hopefully the version 5.xx code will improve the initial estimates after a few workunits. The correction factor starts out at 1.0 and increases quickly if it takes longer to crunch, it decreases more slowly so that a bad workunit doesn't reduce it prematurely.
Using the weighted average for the time to completion should reduce the impact of work that processes quickly at the begining then more slowly at the end.
BOINC WIKI
BOINCing since 2002/12/8
RE: The initial estimate,
)
John -- Thanks for the information, but this doesn't help the original problem - the restart/pickup of a workunit after an unplanned computer restart. I gues what I'm trying to say, is there should be some provision in the programming to take a new snapshot of the CPU clock if the processing of the workunit is halted/aborted and restarted at time zero ...
If I've lived this long - I gotta be that old!
RE: John -- Thanks for the
)
It is not based on a snapshot of the CPU clock. The project code reports the CPU time spent and the % complete. So when a restart occurs, the project code must run to the next place where it would report CPU time and % complete for this information to be updated. Also if the project has not checkpointed, then the code will really start at 0 again.
BOINC WIKI