Losing progress on shutdown

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5882
Credit: 118949898697
RAC: 24021253

MarkJ wrote:I believe the

MarkJ wrote:
I believe the “disconnect when done” is intended for dial-up connections ......

Yes!!  You'd be absolutely correct!  A much better example of the use of that setting.  I'd forgotten all about the 'dialup days' since I never had the pleasure of having to use that system :-).

 

Cheers,
Gary.

daghtus
daghtus
Joined: 1 Mar 19
Posts: 11
Credit: 31994422
RAC: 0

I noticed the same thing. It

I noticed the same thing about suddenly losing progress with some project. It does not seem to plague all projects, though. This one is especially affected:

Gravitational Wave All-sky search on LIGO O1 Open Data 0.03

I lost over 8 hrs of calculations multiple times. I do not usually leave my PC on for over 12 hours so I doubt it ever gets done.

anniet
anniet
Joined: 6 Feb 14
Posts: 1348
Credit: 5079314
RAC: 0

Hi Annie,What a pleasant

Hi Annie, What a pleasant surprise to see you here <==    rather than over there ==>   where you usually hang out :-)

It's lovely to see you too, Gary wherever you are :) I do like to pop by the technical sections. I don't always understand all of it but sometimes stuff sticks.

However there will be a mandatory test assignment with an enormous number of arcane questions to prove you've done your homework and taken in all the relevant details .... :-)

Of all the nervous breakdowns to have, those are always the most fun, I think :)

In case you're a bit miffed on being denied a diatribe, here's a mini-one just for you.  With 79 sky points and therefore 79 checkpoints to be written for the main calculations, you can know exactly when each checkpoint has been written without examining the task properties.  Since 90%/79=1.139%, whenever you see the %done tick over to 1.139%, 2.278%, 3.417%, 4.556%, .... then you know a new checkpoint has just been written.  If your current in-progress task has 8 checkpoints, the sequence should be 11.25%, 22.50%, 33.75%, 45.00%, 56.25%, 67.50%, 78.75% and 90.00%.

I was right... It was. Thank you! :)

In view of the other similar reports, I would think there was a staff mini-oops that probably got quietly and quickly rectified before a huge cacophony broke out.

*ohhh-say-no-more-blink* I have that happen all the time... not with staff. With me. I haven't got staff...

... I am staff...

The file is named 'stdoutdae.txt'

That's the one! I was rather pleased to find it. I'd found the job log one first which made my eyes hurt.
 

Don't worry about the old tasks but it would be helpful to know if there is any further sign in the latest tasks.

I've left checkpoint debug selected in my event log options. I rarely run more than one task at a time, so also changed the interval between checkpoints to 600 seconds. That way there aren't quite so many mentions of them to clutter things up. There have been no more strangenesses....

I meant to respond sooner than this. I'm sorry, but thank you so much for the reply. It is much appreciated.

Hope everyone has a nice weekend :) 

 

Please wait here. Further instructions could pile up at any time. Thank you.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.