Hi,
These 2 WUs, 676685 and 673808, both crashed when they were 99.5% - 99.7% complete. I had taken a backup 3 hours before the first one crashed, so I reverted to it thinking that it was a one-off problem, but it crashed again when it was almost finished. Unfortunately I didn't have a backup handy when the second one crashed, so I didn't have a chance to rerun it.
The error message for the WUs is as follows:
02/04/2005 15:27:59 - Einstein@Home - Unrecoverable error for result H1_0260.9__0261.4_0.1_T25_Test02_3 ( - exit code -1073741819 (0xc0000005))
03/04/2005 15:23:26 - Einstein@Home - Unrecoverable error for result H1_0260.9__0261.3_0.1_T25_Test02_6 ( - exit code -1073741819 (0xc0000005))
If you look at these two WUs, you will see that there are others experiencing client errors, too (3 client errors in the first one, and 4 in the second.) Is there anyone having similar problems out there? Any ideas as to what might be happening?
A WU takes ~11 hours to crunch on my laptop, so ~22 + 3 hours (for the rerun of the first WU from backup) = ~25 hours of crunching has gone to trash! I have never experienced such crashes with Einstein before, it was quite consistent and stable until yesterday. I have another H1_0260.9 WU, if this one also crashes, I'm afraid I will have to say goodbye until a new kind of H1_XXXX WU is submitted.
Any kind of help, advice, clue, tip, etc. will be greatly appreciated, especially from project owners and admins.
Best regards,
Ertugrul.
Copyright © 2024 Einstein@Home. All rights reserved.
Problem with H1_0260.9 WUs?
)
Which version of E@H do you run? For all I know a 0xc0000005 refers to an Access Violation. Yet, that's as far as I can be of help here.
:
your thoughts - the ways :: the knowledge - your space
:
> > Which version of E@H do
)
>
> Which version of E@H do you run? For all I know a 0xc0000005 refers to an
> Access Violation. Yet, that's as far as I can be of help here.
>
I'm running einstein 4.79 with CC 4.25. I have been running this configuration for quite a while...
I have something rather strange going on here!
> I have another H1_0260.9 WU, if this one also crashes, I'm afraid I will
> have to say goodbye until a new kind of H1_XXXX WU is submitted.
Well, I took a backup just before this one was 99% through, and when I restarted, it completed fine without any errors. But still I had the impression that those H1_0260.9 WUs were problematic, so I detached and reattached, and got a new set of WU. And guess what, that one crashed before my eyes at 99% after 11 hours of crunching, too!!! I HATE to waste WUs, so I reverted to my backup which I had taken when the WU was 20% through, and reran that. But this time I stopped and took a backup at 99%, restarted, and voila, it completed without any problem!
So I have strange pattern like,
1) Let the WU finish on its own, and get a crash,
2) Babysit the WU just before it's over, and you are OK.
I don't have any problems with babysitting, I regulary backup my BOINC folder anyway, but this pattern seeems totally illogical and annoying to me.
Any suggestions?
Ertugrul.
> Well, I took a backup just
)
> Well, I took a backup just before this one was 99% through, and when I
> restarted, it completed fine without any errors. But still I had the
> impression that those H1_0260.9 WUs were problematic, so I detached and
> reattached, and got a new set of WU. And guess what, that one crashed before
> my eyes at 99% after 11 hours of crunching, too!!! I HATE to waste WUs,
> so I reverted to my backup which I had taken when the WU was 20% through, and
> reran that. But this time I stopped and took a backup at 99%, restarted, and
> voila, it completed without any problem!
This is weird indeed but at least that takes out of the equasion the WU file itself. Rather it seems to be some dodgy code being a bug.
My question would be: Have the WU been completed with einstein 4.79 and CC 4.25 without any problems so far and the WU crash only started very recently? And if so, what changes have been made to the system?
For example, my Windows kept crashing when having upgraded it to 1GB though it wasn't necessarily entirely Windows' fault but more like cheap hardware without BIOS updates (yet, Linux just does fine ;-) ) Maybe a new virus scanner, defragmentation tool or any other piece of system utility has been installed?
I keep my fingers crossed!
:
your thoughts - the ways :: the knowledge - your space
:
> > My question would be:
)
>
> My question would be: Have the WU been completed with einstein 4.79 and CC
> 4.25 without any problems so far and the WU crash only started very recently?
>
I had no problems whatsoever with einstein 4.79 and CC 4.25 until the first crash I mentioned, many WUs have successfully been completed after I upgraded from 4.19 to 4.25.
> And if so, what changes have been made to the system?
Nothing, no new hardware of software... The only thing that has changed is the way I run Einstein; it used to get 20% of CPU time before, now I'm running it exclusively at 100%. So maybe it cannot handle a WU from start to finish, and gets "exhausted" towards the end? :-)
The current WUs crunch in
)
The current WUs crunch in three stages: two almost identical ones writing their results in temporary files on the disk, then a third step (which shows up as the last % in the progerss) that reads these files back in and writes the result file. The third step should take only little time, but requires much more memory than the first two steps. Apparently your crash happens during the last step. My first guess would be a bad memory segment.
BM
BM