After error, provide option to try once from checkpoint.

AgentB

Joined: 17 Mar 12

Posts: 915

Credit: 513211304

RAC: 0

20 Feb 2014 20:03:13 UTC

Topic 197397

(moderation:

)

IÂ´m not sure if this is a E@H or boinc wish...

Some tasks checkpoint many times then error out close to the end.

Restarting a task once from a checkpoint might be an option which could reduce errors.

Jord

Joined: 26 Jan 05

Posts: 2952

Credit: 5893653

RAC: 112

After error, provide option to try once from checkpoint.

20 Feb 2014 22:33:06 UTC

Message 120259

(moderation:

)

BOINC is set up for redundancy. If you run in too many errors for the task to finish in a normal fashion, you'll report it as an error and it'll be sent out to another computer. In the case of a bad batch of tasks, all computers that it gets sent to will give errors, and the administrators will be warned about this in the back-end.

There's really no need for your client to always finish all work correctly.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117519903578

RAC: 35381505

RE: IÂ´m not sure if this

22 Feb 2014 13:48:26 UTC

Message 120260

(moderation:

)

Quote:

IÂ´m not sure if this is a E@H or boinc wish...

Definitely a BOINC thing.

Quote:

Some tasks checkpoint many times then error out close to the end.

Restarting a task once from a checkpoint might be an option which could reduce errors.

It would need to be a certain type of error - in particular errors that do not cause the machine to lock up or crash. Two classic cases I can think of are file(s) that are suddenly missing or file(s) that suddenly fail a checksum. Both of these trash the currently running tasks and also cause the entire remaining cache of work that depend on these file(s) to be trashed as well. It would seem to be a much better option for BOINC simply to stop all crunching temporarily and try to replace the missing file(s) or the corrupt file(s) by downloading fresh copies and then trying again from the last checkpoint.

I have had this situation quite a few times over the years. I've seen a number of cases where supposedly corrupt files are not actually corrupt at all. My impression is that quite a few of these are caused by heat and/or faulty power, again probably related to heat. At the onset of such a problem, it would be helpful if BOINC just stopped crunching rather than trashing the entire cache of work. Surely BOINC could try to replace a file and then stop if there were further problems.

Cheers,
Gary.

After error, provide option to try once from checkpoint.

Forums › Wish List

After error, provide option to try once from checkpoint.

RE: IÂ´m not sure if this

Comment viewing options

Forums › Wish List