Einstein@Home hangs at 100% and still Running

Filip
Filip
Joined: 24 Jun 05
Posts: 5
Credit: 1214247
RAC: 0
Topic 194247

Hi,
in last couple of weeks this kind of probles happens very often:
E@H task shows 100% progress, status Running and it is stuck there doing no work and blocking all other tasks.

I can manually Suspend the task and then next task starts processing OK until the same happens again - usually once in an hour or so.

The task showing 100% is clearly not finished - a) the CPU time is too low, b) when the processing gets back to this task, that 100% turns back to the correct value.

I am running BOINC 5.10.45 on Kubuntu Linux 8.04

So far, my other projects were not affected.


Thanks
Filip

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1079
Credit: 341280
RAC: 0

Einstein@Home hangs at 100% and still Running

Does anything suspicious appear in the stderr.txt file in the slot directory?

Does a reboot change anything?

I run 5.10.45 too, though on winXP, without problems.

Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

Filip
Filip
Joined: 24 Jun 05
Posts: 5
Credit: 1214247
RAC: 0

RE: Does anything

Message 90945 in response to message 90944

Quote:

Does anything suspicious appear in the stderr.txt file in the slot directory?

Does a reboot change anything?

There is no stderr.txt file in the whole system. There is stderrdae.txt in /var/lib/boinc-client, but it is empty.

After reboot it is similar to Suspend/Resume - one of the tasks gets started and works for a while until it hangs again.

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1079
Credit: 341280
RAC: 0

RE: There is no stderr.txt

Message 90946 in response to message 90945

Quote:
There is no stderr.txt file in the whole system. There is stderrdae.txt in /var/lib/boinc-client, but it is empty...


No, not stderrdae; it should be relative to your BOINC directory, perhaps without the .txt suffix for unix? Here's my (relative) path:
BOINC\slots\0\stderr.txt

It contains the text that goes into the "stderr out" part of the task details.

This one (121491487) of yours looks very suspicious. It got successively a signal 11, signal 6, signal 15 and signal 11 again, before it finished with exit status 41.

Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

Filip
Filip
Joined: 24 Jun 05
Posts: 5
Credit: 1214247
RAC: 0

RE: No, not stderrdae; it

Message 90947 in response to message 90946

Quote:

No, not stderrdae; it should be relative to your BOINC directory, perhaps without the .txt suffix for unix? Here's my (relative) path:
BOINC\slots\0\stderr.txt


Thanks, I finally found them, the slots folder needs admin rights in Linux.

Yes, there are stderr.txt files and there is plenty of stuff inside, lots of signals and memory maps. What exactly shall I look for? I don't want to just copy and paste the whole thing here, it's big.

And after I examine the stderr, what can I do with it? All my current tasks do the same thing.

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1079
Credit: 341280
RAC: 0

RE: ...What exactly shall I

Message 90948 in response to message 90947

Quote:
...What exactly shall I look for? I don't want to just copy and paste the whole thing here, it's big.


I don't think you need to paste it here, since your tasks are visible onsite.

Quote:
And after I examine the stderr, what can I do with it? All my current tasks do the same thing.


Sorry, from here on, I must pass on to more knowledgeable people. I do know where to look for error messages, but I can't interpret them (in this case :-).

Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

mikey
mikey
Joined: 22 Jan 05
Posts: 12863
Credit: 1884357890
RAC: 238400

RE: Hi, in last couple of

Quote:

Hi,
in last couple of weeks this kind of probles happens very often:
E@H task shows 100% progress, status Running and it is stuck there doing no work and blocking all other tasks.

I can manually Suspend the task and then next task starts processing OK until the same happens again - usually once in an hour or so.

The task showing 100% is clearly not finished - a) the CPU time is too low, b) when the processing gets back to this task, that 100% turns back to the correct value.

I am running BOINC 5.10.45 on Kubuntu Linux 8.04

So far, my other projects were not affected.

Thanks Filip

You could just be seeing computer rounding of the numbers. You said "when the computer comes back to this task, that 100% turns back to the correct value", what is that value? Does it then finish the crunching of the unit?

Michael Karlinsky
Michael Karlinsky
Joined: 22 Jan 05
Posts: 888
Credit: 23502182
RAC: 0

Hi, two tasks you returned

Hi,

two tasks you returned lately crashed, receiving signal 11 - segmentation violation - which normally should not occur. This points to faulty RAM and heat.

APP DEBUG: Application caught signal 11.

Michael

Filip
Filip
Joined: 24 Jun 05
Posts: 5
Credit: 1214247
RAC: 0

RE: You could just be

Message 90951 in response to message 90949

Quote:
You could just be seeing computer rounding of the numbers. You said "when the computer comes back to this task, that 100% turns back to the correct value", what is that value? Does it then finish the crunching of the unit?

First, everything looks OK - CPU time ticks up, Progress shows some percentage (slowly growing), Status is Running. Then suddenly Progress jumps tp 100%, CPU time stops moving, status says Running and this way it can stay forever; basically BOINC does nothing at all. I can manually suspend the task which starts another one.
After a reboot, BOINC takes one of the previously "frozen" tasks, CPU time starts growing again and after a few seconds the Progress indicator changes from 100% to whatever it was before it froze. It could be 5% as well as 50%. Everything works fine for a while (from 10 minutes to 2 hours) and then the same happens - no CPU time, 100%, Running.

stderr.txt shows Signal 11, Signal 6, Signal 15, I think all three for each time it "freezes".

Filip
Filip
Joined: 24 Jun 05
Posts: 5
Credit: 1214247
RAC: 0

RE: This points to faulty

Message 90952 in response to message 90950

Quote:
This points to faulty RAM and heat.

this is quite possible, it is an older PC (made 2004 and even then it was a cheap one). After reading some other threads I cleaned the PC up to avoid overheating, shortly after that all seven downloaded E@H tasks crashed with Computation Error. At least this is a clear result instead of "mysterious freezing". Let's see how the next batch will work.

Thanks for all advice

mikey
mikey
Joined: 22 Jan 05
Posts: 12863
Credit: 1884357890
RAC: 238400

RE: RE: This points to

Message 90953 in response to message 90952

Quote:
Quote:
This points to faulty RAM and heat.

this is quite possible, it is an older PC (made 2004 and even then it was a cheap one). After reading some other threads I cleaned the PC up to avoid overheating, shortly after that all seven downloaded E@H tasks crashed with Computation Error. At least this is a clear result instead of "mysterious freezing". Let's see how the next batch will work.

Thanks for all advice

You are also running VERY old versions of the Boinc software. Maybe if you upgraded some of the problems would go away. Personally I like 6.2.19 for Windows but don't have a preference for Linux. I have not tried the brand new 6.4.7 version for Windows yet though. You are running 5.?.? versions on both your Windows and Linux machines.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.