"To Completion" time wrong calculated

Duncan Griffin
Duncan Griffin
Joined: 7 Jun 05
Posts: 12
Credit: 62360
RAC: 0
Topic 189483

The case:
The WU was ~23% completed after 2 hours calculation. The PC was then unexpected rebooted by reason of buggy program (not BOINC).

After the restart, the WU was restarted from 0% completed, but with 2 hours calculations behind them. The "To Completion" column displays wrong time (begining from ~380 hrs, counting down).

I think the more important problem is why the WU calculations was restarted from zero.

(Computer: Pentium 4, WinXP SP2)

Regards,
Griffin

Sharky T
Sharky T
Joined: 19 Feb 05
Posts: 159
Credit: 1187722
RAC: 0

"To Completion" time wrong calculated

If the checkpoint-files(intermediantfiles) get corrupted in the crash I bet Boinc detects it and start all over again
but the time stucks(bug?) where it left off.
I've seen it in my own rig(here),and I got credit for 1 WU I saved to see what happend.
(Of course,I didn't get the 214 claimed credit for it... ;)


Ocean Archer
Ocean Archer
Joined: 18 Jan 05
Posts: 92
Credit: 368644
RAC: 0

I would suspect that the

I would suspect that the elapsed time for completion of a WU is based on the start time on the specific computer (snapshot of the CPU clock) and then uses that for the calculation both for completion percentage and total time. If a WU is interrupted (as described in the other threads here) and restarted at 0.00%, unless a new snapshot is taken, the estimated time to complete and the total time will be totally incorrect. From what you describe, no new snapshot is taken.


If I've lived this long - I gotta be that old!

Duncan Griffin
Duncan Griffin
Joined: 7 Jun 05
Posts: 12
Credit: 62360
RAC: 0

RE: I would suspect that

Message 13794 in response to message 13793

Quote:
I would suspect that the elapsed time for completion of a WU is based on the start time on the specific computer (snapshot of the CPU clock) and then uses that for the calculation both for completion percentage and total time. If a WU is interrupted (as described in the other threads here) and restarted at 0.00%, unless a new snapshot is taken, the estimated time to complete and the total time will be totally incorrect. From what you describe, no new snapshot is taken.

I was thinking the same. So I have launched a new CPU benchmark, but no change. Now I'm thinking, that the calculation goes as something:

TimeToCompletion = (PercentCompleted/TimeComputing)*PercentLeft

This formula explains also why the wrong time is continously decrementing to the right time.

Keck_Komputers
Keck_Komputers
Joined: 18 Jan 05
Posts: 376
Credit: 5744955
RAC: 0

RE: RE: I would suspect

Message 13795 in response to message 13794

Quote:
Quote:
I would suspect that the elapsed time for completion of a WU is based on the start time on the specific computer (snapshot of the CPU clock) and then uses that for the calculation both for completion percentage and total time. If a WU is interrupted (as described in the other threads here) and restarted at 0.00%, unless a new snapshot is taken, the estimated time to complete and the total time will be totally incorrect. From what you describe, no new snapshot is taken.

I was thinking the same. So I have launched a new CPU benchmark, but no change. Now I'm thinking, that the calculation goes as something:

TimeToCompletion = (PercentCompleted/TimeComputing)*PercentLeft

This formula explains also why the wrong time is continously decrementing to the right time.


The initial estimate, before the workunit starts running, is based on the benchmarks and estimated flops provided by the project. After the workunit starts then the formula above is used.

This will change slightly in the 5.xx versions:
Initial est = flops * benchmark * project correction factor
TimeToCompletion = weighted average((initial est / percent complete), ((PercentCompleted/TimeComputing)*PercentLeft))

Hopefully the version 5.xx code will improve the initial estimates after a few workunits. The correction factor starts out at 1.0 and increases quickly if it takes longer to crunch, it decreases more slowly so that a bad workunit doesn't reduce it prematurely.

Using the weighted average for the time to completion should reduce the impact of work that processes quickly at the begining then more slowly at the end.

BOINC WIKI

BOINCing since 2002/12/8

Ocean Archer
Ocean Archer
Joined: 18 Jan 05
Posts: 92
Credit: 368644
RAC: 0

RE: The initial estimate,

Message 13796 in response to message 13795


Quote:

The initial estimate, before the workunit starts running, is based on the benchmarks and estimated flops provided by the project. After the workunit starts then the formula above is used.

This will change slightly in the 5.xx versions:
Initial est = flops * benchmark * project correction factor
TimeToCompletion = weighted average((initial est / percent complete), ((PercentCompleted/TimeComputing)*PercentLeft))

Hopefully the version 5.xx code will improve the initial estimates after a few workunits. The correction factor starts out at 1.0 and increases quickly if it takes longer to crunch, it decreases more slowly so that a bad workunit doesn't reduce it prematurely.

Using the weighted average for the time to completion should reduce the impact of work that processes quickly at the begining then more slowly at the end.

John -- Thanks for the information, but this doesn't help the original problem - the restart/pickup of a workunit after an unplanned computer restart. I gues what I'm trying to say, is there should be some provision in the programming to take a new snapshot of the CPU clock if the processing of the workunit is halted/aborted and restarted at time zero ...


If I've lived this long - I gotta be that old!

John McLeod VII
John McLeod VII
Moderator
Joined: 10 Nov 04
Posts: 547
Credit: 632255
RAC: 0

RE: John -- Thanks for the

Message 13797 in response to message 13796

Quote:
John -- Thanks for the information, but this doesn't help the original problem - the restart/pickup of a workunit after an unplanned computer restart. I gues what I'm trying to say, is there should be some provision in the programming to take a new snapshot of the CPU clock if the processing of the workunit is halted/aborted and restarted at time zero ...


It is not based on a snapshot of the CPU clock. The project code reports the CPU time spent and the % complete. So when a restart occurs, the project code must run to the next place where it would report CPU time and % complete for this information to be updated. Also if the project has not checkpointed, then the code will really start at 0 again.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.