FGRP5 (FGRPSSE 1.08) tasks reset progress

Scrooge McDuck
Scrooge McDuck
Joined: 2 May 07
Posts: 1052
Credit: 17869501
RAC: 12422

"Normal" FGRP5 CPU tasks do

"Normal" FGRP5 CPU tasks do some ~60 checkpoints between 0 % and ~90% progress. The final 10% progress is spend for final evaluation of 10 toplist candidate signals where there are further 10 checkpoints.

In the beginning of a new raw data file (e.g. LATeah2108.dat) there are lots of tasks with only a few skypoints (as few as 6, ... 4, or as can be currently seen only TWO skypoints!!!).

These workunits can be identified by a small number (< 100) in the workunit name next to the leading raw data file name:

e.g.: LATeah2108F_72.0_3124_-8.599999999999999e-11  --> only 2 skypoints: 2 checkpoints until 90% progress

see command line:

command line: projects/einstein.phys.uwm.edu/hsgamma_FGRP5_1.08_windows_intelx86__FGRPSSE.exe --inputfile [...] --numSkyPoints 2 --f1dot -8.699999999999998e-11 [...]

This workunit checkpoints at ~45%, ~90%, then ten further checkpoints until 100% progress.

See: stderr.txt logfile of running tasks:

% C 1 0   <-- 1st cp: ~45% progress
% C 2 0   <-- 2nd cp: ~90% progress
% C 3 2   <-- 1st toplist candidate checkpoint
% C 4 3   <-- 2nd toplist...
% C 5 4   <-- 3rd
% C 6 5
% C 7 6
% C 8 7
% C 9 8
% C 10 9
% C 11 10
% C 12 11 <-- 10th toplist candidate --> 100% progress
FPU status flags:  PRECISION
12:58:08 (1960): [normal]: done. calling boinc_finish(0).
12:58:08 (1960): called boinc_finish

It's almost impossible to run such tasks on a computer which is powered down regularly. You will loose all most of the computation effort due to frequent resets to the last checkpoint because checkpoints are extremely rare (HOURS between!). So either you run it 24/7 or you have to use hibernate mode (suspend to disk) instead of shutting the BOINC client down. Or avoid FGRP5 CPU with these "unnormal" tasks.

On the other hand: these occasionally occuring WUs complete in less than half the time of "normal" FGRP5 tasks, giving the same credit; that is: boosting your RAC 100...120% if you don't use a real GPU like me.

Scrooge McDuck
Scrooge McDuck
Joined: 2 May 07
Posts: 1052
Credit: 17869501
RAC: 12422

an old thread from "wish

an old thread from "wish list" forum on the same issue* with FGRP5 CPU where I explained it in detail before:

https://einsteinathome.org/de/content/cpu-time-checkpoint-4h

* not an issue but a feature.   ;-)

carmar
carmar
Joined: 27 May 21
Posts: 32
Credit: 535865
RAC: 506

Thank you.

Thank you.

carmar
carmar
Joined: 27 May 21
Posts: 32
Credit: 535865
RAC: 506

This should be the final

This should be the final update.

I removed the project and added it back several days later. Since last night, it has been checkpointing more frequently. Around every 20 minutes from my occasional review. 

Thanks to all for teaching me more about this. 

Jim Martin
Jim Martin
Joined: 24 Jun 05
Posts: 9
Credit: 9596878
RAC: 11005

Emails/personal inputs,

Emails/personal inputs, during program, result in program restarts.

I enjoy running Einstein@home, but don't feel it should dominate the computer's use.

Checkpoints seem fewer, lately.  I'll not stop running E@home, have just let my "aborts" be a friendly

heads-up to you.

Resend, and will give it another try!

jm

San-Fernando-Valley
San-Fernando-Valley
Joined: 16 Mar 16
Posts: 397
Credit: 10113953455
RAC: 28518589

Jim Martin

Jim Martin wrote:

Emails/personal inputs, during program, result in program restarts.

I enjoy running Einstein@home, but don't feel it should dominate the computer's use.

...

You should be able to control the behavior to your needs with the "Options" tab.

sfv

Scrooge McDuck
Scrooge McDuck
Joined: 2 May 07
Posts: 1052
Credit: 17869501
RAC: 12422

Jim Martin

Jim Martin wrote:

Checkpoints seem fewer, lately.

We are back at 22 skypoints, that is 22 checkpoints within first 90% (resp. 89.989%) of progress for currently send out FGRP5 CPU tasks.

90% / 22 = 4.09% --> checkpointing each ~4% of progress.

As SFV already mentioned: memory usage can be limited to a fixed share:

Use max xx% of memory*:

...or by limiting the number of parallel running tasks (FGRP5 currently: ~700 MiB per task):

Use max xx% of CPUs:

CPU usage can be throttled as well: (see effect in Windows task manager)

Use max xx% of CPU time

There are also 3rd party throttling tools (e.g. TThrottle) for BOINC which allow you to set CPU temperature tresholds to control CPU throttling based on actual CPU thermal output. (e.g. different processing steps of a FGRP5 task causes different heat output).

[*] not shure about exact words bc I have a German localization in BOINC.

Scrooge McDuck
Scrooge McDuck
Joined: 2 May 07
Posts: 1052
Credit: 17869501
RAC: 12422

A ten year old post from

A ten year old post from project admin on checkpoint frequency:

https://einsteinathome.org/de/content/more-checkpoints#comment-119513

Bernd Machenschalk wrote:

In general we design our Apps to (potentially) checkpoint as often as possible / feasible, i.e. after each reasonably independent computation.

Feasibility limits here include the programming effort (parameters in data structures modified in nested loops saved and restored) and the data volume (storage space and time to write) of the necessary checkpoints. It doesn't make much sense to checkpoint every minute when writing the checkpoint takes several seconds (multiplied by the number of instances that may be running and checkpointing at once) and thus noticeably slows down computation, or if initializing the application picking up from a checkpoint takes several minutes alone.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.