Progress not saved

Redvibe
Redvibe
Joined: 5 Apr 18
Posts: 11
Credit: 2189846
RAC: 0
Topic 226853

I have been running Einstein@home for some time now. However, recently, the progress made has not been saved when I shut down the computer. Does anyone else have this problem?

San-Fernando-Valley
San-Fernando-Valley
Joined: 16 Mar 16
Posts: 469
Credit: 10398177370
RAC: 4041183

@redvibe:   Either you

@redvibe:

 

Either you forgot to "suspend" and "update" the project before exiting

or, more likely,

a checkpoint has not yet been written -- which you can check before closing/shutting down.

 

Just a thought of mine.

Redvibe
Redvibe
Joined: 5 Apr 18
Posts: 11
Credit: 2189846
RAC: 0

This problem only recently

This problem only recently started, and I have not changed my habits. I always suspend before shutting down. There is no manual "update" option as far as I am aware, and I have never used one. I do not know how to check for a checkpoint - I have never had to do that before.

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 5021
Credit: 18924700522
RAC: 6534122

Click on any running task in

Click on any running task in your list and select Properties from the left hand menu. Last checkpoint is listed for the task there.

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 118343552357
RAC: 25446910

Redvibe wrote:... I always

Redvibe wrote:
... I always suspend before shutting down.

You were given incorrect information.  When you wish to shut down your computer, there is no need to either 'suspend' or 'update' anything.  You just shut down the BOINC client and shut down the computer.  Even if you don't manually stop the client, shutting down the computer will stop the client anyway.

Redvibe wrote:
There is no manual "update" option as far as I am aware, and I have never used one.

It's in BOINC Manager, advanced view, on the projects tab.  If you select a particular project, an "update" button will be highlighted.  Clicking that button forces any outstanding transaction between client and server (eg. uploaded tasks, if any, that haven't yet been reported) to be performed immediately.  You don't have to do this since it will happen automatically the next time you restart.  However, if you were not restarting for quite a while (eg holidays) it would be advisable to do that update, since there is a time limit (7 or 14 days for E@H) before tasks expire.

Another good use of the 'update' function is to allow your client to become aware immediately of a preference change made on the project website.  Otherwise it could take some time for the client to find out through some other routine communication - eg. a work request.

Redvibe wrote:
I do not know how to check for a checkpoint - I have never had to do that before.

Checkpoints are intermediate stages of the calculations that are saved automatically from time to time.  Usually, they're not something you need to worry about at Einstein since they're created/updated quite regularly.

When you shutdown and, later on, restart your computer, the calculations are continued from the last saved checkpoint - as long as one exists.  Checkpoint timing intervals depend on the type of data being analysed and can vary as data files change.

If (by chance) you happen to shut down before the very first checkpoint has been saved, the calculations will be restarted from the very beginning for that particular task.

Your computers are 'hidden' so nobody else can see information about the frequency of checkpointing for the tasks you have been completing recently.  If you change your preferences to allow others to see non-sensitive information about your completed tasks, it would be easy to see if there is anything unusual about checkpoints being created regularly.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 118343552357
RAC: 25446910

After I posted my previous

After I posted my previous reply, I had a funny feeling that your user name was familiar.  I checked your previous messages and found this thread you started almost 3 years ago. Maybe it was much the same situation then, as you have now.  There was a similar report about "losing progress" every time you shut down.

In the information you posted at that time there was a reference to your computer ID - 12639019 - so I used that to see if it was still your current system. It must be since there are some current tasks of which there was one that you aborted with 6,140 secs of run time (in other words there should be a checkpoint).  From the stderr output that you can see by clicking on the task ID link for that task, you can work out what was happening prior to you aborting the task.

It looks quite daunting and you can see many instances of the task stopping and being restarted before the very first checkpoint was ever written.  Maybe you have your preferences set to NOT keep tasks in memory when suspended and maybe you also have the preference to suspend crunching when you are physically using your keyboard/mouse.  The task does run at lower priority so it isn't usually necessary to suspend tasks while you use the machine yourself.

The other thing I noticed in the stderr output was this particular parameter:-

--numSkyPoints 6

This is one of the many parameters that are in the command string that launches the app.  You can find a long list of parameters fairly close to the start of the output.  I'm not running CPU tasks at all but I seem to remember that the number of skypoints controls the number of checkpoints.  A task may take from 12 to 24 hours for an average machine so at a rough guess it might mean around 2 to 4 hours between checkpoints.  In that case, the above two preference settings would be vital to have right if you ever want to make progress.  Normally there are a lot more than 6 skypoints so that is probably why you are seeing the problem right now.

As a matter of interest, you did create one checkpoint in the above linked stderr output.  Here is the actual line where the checkpoint was created:-

% C 1 0

It's a short line all on it's own quite a long way down in the output.  'C 1' stands for checkpoint #1 and you would expect to see "% C 2  0", etc., immediately on following lines if crunching wasn't being interrupted and restarted all the time.

After the above first checkpoint, crunching was stopped and you can see in the output the following message:-

% checkpoint read: skypoint 1 binarypoint 0

which is exactly what you should see for a partially crunched task being restarted and using the saved checkpoint.  However, you didn't allow it to continue since we can see:-

-- signal handler called: signal 1   App got 4th kill-signal, guess you mean it. Exiting.

If you need assistance setting the preferences to keep the task in memory when suspended and to not suspend crunching when the user is active, please ask.  Those settings should allow tasks to complete over time if your computer is running for enough hours each day.

Cheers,
Gary.

San-Fernando-Valley
San-Fernando-Valley
Joined: 16 Mar 16
Posts: 469
Credit: 10398177370
RAC: 4041183

Gary Roberts wrote: ... You

Gary Roberts wrote:

... You were given incorrect information.  When you wish to shut down your computer, there is no need to either 'suspend' or 'update' anything ...

@Gary:

... that was not an incorrect information - it was only a correct tip ...

 

In my experience, I feel that suspending and updating (before shutting down) is an excellent practice.

It CAN spare the user diverse problems.

Like not beeing able to restart or start (after 4 weeks in the hospital) the machine and therefore "loosing out".

AND if you had looked at the three provided links, you would have seen the settings Redvibe has made.

... just had to write this ...

Have a nice day !

Redvibe
Redvibe
Joined: 5 Apr 18
Posts: 11
Credit: 2189846
RAC: 0

Thanks for the replies. I

Thanks for the replies.

I found the update option. I suspended and updated before shutting down today. When I started up again a few minutes later, I could see that the progress had been saved.

Nevertheless, any specifc advice on how to adjust the settings so that it keeps crunching when I am active would be appreciated.

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 5021
Credit: 18924700522
RAC: 6534122

BOINC Manager >> Options >>

BOINC Manager >> Options >> Computing Preferences >> Computing >> When to Suspend >> "In use" >> Suspend when non-BOINC CPU usage is above xx%

Try 80%

 

Redvibe
Redvibe
Joined: 5 Apr 18
Posts: 11
Credit: 2189846
RAC: 0

That seemed to help at first,

That seemed to help at first, but now all my tasks say "suspended by user" even though I didn't suspend them. They do not start up again when I keep my hands off the keyboard and do nothing. I have tried upping the suspend when usage setting to 82%, but still no use.

Redvibe
Redvibe
Joined: 5 Apr 18
Posts: 11
Credit: 2189846
RAC: 0

Okay, I just found the

Okay, I just found the "resume" command, but I have never had to use that before, even though I have long been in the habit of suspending before shutting down. Every day, there seems to be a extra step that I have to do manually. When I first started running Einstein@home, I didn't even have to suspend before shutting down. Then I had to suspend. Then, more recently, suspend and update. Now I have to suspend, update, shut down, then manually resume upon start-up. What is going on here?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.