Do not checkpoint every minute!

Risky64
Risky64
Joined: 1 Feb 25
Posts: 30
Credit: 55670
RAC: 2633
Topic 232107

Hi, it is not neccesary to checkpoint this often. It seems to be hard coded. Every 200 seconds will be enough.

AndreyOR
AndreyOR
Joined: 28 Jul 19
Posts: 88
Credit: 774765753
RAC: 1063986

Risky64 wrote: Hi, it is not

Risky64 wrote:

Hi, it is not neccesary to checkpoint this often. It seems to be hard coded. Every 200 seconds will be enough.

I tried it and it turns out that BRP4 app does listen to BOINC checkpoint requests. Looks like at one point you had your BOINC set at 10 min. since that's about how often one of your tasks check-pointed according to its log.  It'd be interesting to see if significantly reducing (or even eliminating) checkpoint frequency would reduce runtime in a meaningful way.

Risky64
Risky64
Joined: 1 Feb 25
Posts: 30
Credit: 55670
RAC: 2633

It is worse! The cpu seems to

It is worse! The cpu seems to wait until the checkpoint is written. I want less checkpoints!!!

San-Fernando-Valley
San-Fernando-Valley
Joined: 16 Mar 16
Posts: 565
Credit: 10957572806
RAC: 13392171

Have a look at post 231731   

Have a look at post 231731    -- sorry I don't know how to link to this referenced post --
Might be of interest.

Here is a copy:

 

From MAD_MAX

 

5 Jan 2025 20:27:44 UTC

 

 

But it is not BOINC report progress and write this log, its APP itself.
 

 

it's checkpointing after some regular time interval defined in BOINC, not because it reached some milestone in the app to trigger a checkpoint.

 

You're getting it wrong. That's not how it works at all. Applications make checkpoints only when they reach certain points in calculation set by the APPLICATION programmers. Where it is possible/convenient to record it (and then later restore from it). BOINC client simply can not influence this. All that the corresponding setting in the BOINC client does is say to app "please do not write checkpoints more than once xx minutes." But when, in which places, and how often to write them is up to the scientific application alone. The corresponding option is even worded accordingly:

 


"Request task to checkpoint at most every xxx seconds"

 

APP can follow this recommendation by skip writing the next checkpoint if less than the specified interval has passed since the previous one was recorded. Or ignore this recommendation. But in any case, checkpoints are written only at points predefined by app programmer when the calculation process reaches it. This is both theoretically and has been tested repeatedly in practice by me and many other users. For example, a fresh example with an FGRP5 application in another topic: https://einsteinathome.org/content/strange-wus-names-and-checkpoint-issues-latest-fgrp5-batch

 


go ahead and let the task "finish". it wont. it's hung or stuck in some kind of infinite loop. it might get to 99.999 but will never complete

 

.... etc.

sfv
Risky64
Risky64
Joined: 1 Feb 25
Posts: 30
Credit: 55670
RAC: 2633

Fast machines, with gpu never

Fast machines, with gpu never write a checkpoint, because the calculation stops after 30 seconds. Then why do we need 40 checkpoints on slow computers? This is just a waste of time.

AndreyOR
AndreyOR
Joined: 28 Jul 19
Posts: 88
Credit: 774765753
RAC: 1063986

Risky64 wrote: Fast

Risky64 wrote:

Fast machines, with gpu never write a checkpoint, because the calculation stops after 30 seconds. Then why do we need 40 checkpoints on slow computers? This is just a waste of time.

In the case of BRP4, if you want less checkpoints, increase the time interval for checkpoints in BOINC manager settings.  I did a test and BRP4 does adhere relatively closely to BOINC checkpoint requests.  You can see the evidence for this in the task logs.

However and unfortunately, reducing the amount of checkpoints does not improve run times.  I went from 22 checkpoints per task to just 1 and noticed no difference in run times.  It appears that in the case of BRP4, the checkpoint process is very quick and uses very little resources.  I was hopeful that this was going to be one way to increase run times but it does not.

GPU tasks do checkpoint, I've seen it with both BRP7 and O3AS, which are strictly GPU tasks.  In case of BRP7, you can even see it in the logs.  It depends on the app coding, your BOINC settings, and the run times on a given machine.

ahorek's team
ahorek's team
Joined: 16 Dec 05
Posts: 39
Credit: 249688102
RAC: 2800

Someone tested the checkpoint

Someone tested the checkpoint effect on an RPi 4 with an SD card that had extremely poor I/O performance, 

It resulted in a -10% performance drop. However, under normal conditions, checkpoints should have a negligible impact on performance, 1% at most.

AndreyOR
AndreyOR
Joined: 28 Jul 19
Posts: 88
Credit: 774765753
RAC: 1063986

ahorek's team wrote:Someone

ahorek's team wrote:

Someone tested the checkpoint effect on an RPi 4 with an SD card that had extremely poor I/O performance, 

It resulted in a -10% performance drop. However, under normal conditions, checkpoints should have a negligible impact on performance, 1% at most.

Interesting.  I tested it on an old laptop with a Celeron N3050 CPU & a 32GB eMMC drive and saw no difference.  It seems like those with slow storage like an SD card could increase their BRP4 run times in a meaningful way by going with 1 or 0 checkpoints.

Risky64
Risky64
Joined: 1 Feb 25
Posts: 30
Credit: 55670
RAC: 2633

ahorek's team

ahorek's team wrote:

Someone tested the checkpoint effect on an RPi 4 with an SD card that had extremely poor I/O performance, 

It resulted in a -10% performance drop. However, under normal conditions, checkpoints should have a negligible impact on performance, 1% at most.

It seems i have overreacted. I am sorry! This is a non issue. 

I have tried 1500 seconds for checkpoints. This is okay for me.

Scrooge McDuck
Scrooge McDuck
Joined: 2 May 07
Posts: 1142
Credit: 18951129
RAC: 12500

Risky64 schrieb:It seems i

Risky64 wrote:

It seems i have overreacted. I am sorry! This is a non issue.

Not at all. The checkpointing mechanism is not obvious. It's important to discuss its details every now and then... how this works between the BOINC client and the science app, as did the useful repost by 'San-Fernando-Valley'.

Risky64
Risky64
Joined: 1 Feb 25
Posts: 30
Credit: 55670
RAC: 2633

Someone could write a guide,

Someone could write a guide, with hints about possible defaults for newbies and noobs.

 

Settings made easy, or Noob guide to Einsteinathome.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.