"Normal" FGRP5 CPU tasks do some ~60 checkpoints between 0 % and ~90% progress. The final 10% progress is spend for final evaluation of 10 toplist candidate signals where there are further 10 checkpoints.
In the beginning of a new raw data file (e.g. LATeah2108.dat) there are lots of tasks with only a few skypoints (as few as 6, ... 4, or as can be currently seen only TWO skypoints!!!).
These workunits can be identified by a small number (< 100) in the workunit name next to the leading raw data file name:
% C 1 0 <-- 1st cp: ~45% progress
% C 2 0 <-- 2nd cp: ~90% progress
% C 3 2 <-- 1st toplist candidate checkpoint
% C 4 3 <-- 2nd toplist...
% C 5 4 <-- 3rd
% C 6 5
% C 7 6
% C 8 7
% C 9 8
% C 10 9
% C 11 10
% C 12 11 <-- 10th toplist candidate --> 100% progress
FPU status flags: PRECISION
12:58:08 (1960): [normal]: done. calling boinc_finish(0).
12:58:08 (1960): called boinc_finish
It's almost impossible to run such tasks on a computer which is powered down regularly. You will loose all most of the computation effort due to frequent resets to the last checkpoint because checkpoints are extremely rare (HOURS between!). So either you run it 24/7 or you have to use hibernate mode (suspend to disk) instead of shutting the BOINC client down. Or avoid FGRP5 CPU with these "unnormal" tasks.
On the other hand: these occasionally occuring WUs complete in less than half the time of "normal" FGRP5 tasks, giving the same credit; that is: boosting your RAC 100...120% if you don't use a real GPU like me.
I removed the project and added it back several days later. Since last night, it has been checkpointing more frequently. Around every 20 minutes from my occasional review.
We are back at 22 skypoints, that is 22 checkpoints within first 90% (resp. 89.989%) of progress for currently send out FGRP5 CPU tasks.
90% / 22 = 4.09% --> checkpointing each ~4% of progress.
As SFV already mentioned: memory usage can be limited to a fixed share:
Use max xx% of memory*:
...or by limiting the number of parallel running tasks (FGRP5 currently: ~700 MiB per task):
Use max xx% of CPUs:
CPU usage can be throttled as well: (see effect in Windows task manager)
Use max xx% of CPU time
There are also 3rd party throttling tools (e.g. TThrottle) for BOINC which allow you to set CPU temperature tresholds to control CPU throttling based on actual CPU thermal output. (e.g. different processing steps of a FGRP5 task causes different heat output).
[*] not shure about exact words bc I have a German localization in BOINC.
In general we design our Apps to (potentially) checkpoint as often as possible / feasible, i.e. after each reasonably independent computation.
Feasibility limits here include the programming effort (parameters in data structures modified in nested loops saved and restored) and the data volume (storage space and time to write) of the necessary checkpoints. It doesn't make much sense to checkpoint every minute when writing the checkpoint takes several seconds (multiplied by the number of instances that may be running and checkpointing at once) and thus noticeably slows down computation, or if initializing the application picking up from a checkpoint takes several minutes alone.
"Normal" FGRP5 CPU tasks do
)
"Normal" FGRP5 CPU tasks do some ~60 checkpoints between 0 % and ~90% progress. The final 10% progress is spend for final evaluation of 10 toplist candidate signals where there are further 10 checkpoints.
In the beginning of a new raw data file (e.g. LATeah2108.dat) there are lots of tasks with only a few skypoints (as few as 6, ... 4, or as can be currently seen only TWO skypoints!!!).
These workunits can be identified by a small number (< 100) in the workunit name next to the leading raw data file name:
e.g.: LATeah2108F_72.0_3124_-8.599999999999999e-11 --> only 2 skypoints: 2 checkpoints until 90% progress
see command line:
This workunit checkpoints at ~45%, ~90%, then ten further checkpoints until 100% progress.
See: stderr.txt logfile of running tasks:
It's almost impossible to run such tasks on a computer which is powered down regularly. You will loose
allmost of the computation effort due to frequent resets to the last checkpoint because checkpoints are extremely rare (HOURS between!). So either you run it 24/7 or you have to use hibernate mode (suspend to disk) instead of shutting the BOINC client down. Or avoid FGRP5 CPU with these "unnormal" tasks.On the other hand: these occasionally occuring WUs complete in less than half the time of "normal" FGRP5 tasks, giving the same credit; that is: boosting your RAC 100...120% if you don't use a real GPU like me.
an old thread from "wish
)
an old thread from "wish list" forum on the same issue* with FGRP5 CPU where I explained it in detail before:
https://einsteinathome.org/de/content/cpu-time-checkpoint-4h
* not an issue but a feature. ;-)
Thank you.
)
Thank you.
This should be the final
)
This should be the final update.
I removed the project and added it back several days later. Since last night, it has been checkpointing more frequently. Around every 20 minutes from my occasional review.
Thanks to all for teaching me more about this.
Emails/personal inputs,
)
Emails/personal inputs, during program, result in program restarts.
I enjoy running Einstein@home, but don't feel it should dominate the computer's use.
Checkpoints seem fewer, lately. I'll not stop running E@home, have just let my "aborts" be a friendly
heads-up to you.
Resend, and will give it another try!
jm
Jim Martin
)
You should be able to control the behavior to your needs with the "Options" tab.
sfv
Jim Martin
)
We are back at 22 skypoints, that is 22 checkpoints within first 90% (resp. 89.989%) of progress for currently send out FGRP5 CPU tasks.
90% / 22 = 4.09% --> checkpointing each ~4% of progress.
As SFV already mentioned: memory usage can be limited to a fixed share:
...or by limiting the number of parallel running tasks (FGRP5 currently: ~700 MiB per task):
CPU usage can be throttled as well: (see effect in Windows task manager)
There are also 3rd party throttling tools (e.g. TThrottle) for BOINC which allow you to set CPU temperature tresholds to control CPU throttling based on actual CPU thermal output. (e.g. different processing steps of a FGRP5 task causes different heat output).
[*] not shure about exact words bc I have a German localization in BOINC.
A ten year old post from
)
A ten year old post from project admin on checkpoint frequency:
https://einsteinathome.org/de/content/more-checkpoints#comment-119513