Hi,
in last couple of weeks this kind of probles happens very often:
E@H task shows 100% progress, status Running and it is stuck there doing no work and blocking all other tasks.
I can manually Suspend the task and then next task starts processing OK until the same happens again - usually once in an hour or so.
The task showing 100% is clearly not finished - a) the CPU time is too low, b) when the processing gets back to this task, that 100% turns back to the correct value.
I am running BOINC 5.10.45 on Kubuntu Linux 8.04
So far, my other projects were not affected.
Thanks
Filip
Copyright © 2024 Einstein@Home. All rights reserved.
Einstein@Home hangs at 100% and still Running
)
Does anything suspicious appear in the stderr.txt file in the slot directory?
Does a reboot change anything?
I run 5.10.45 too, though on winXP, without problems.
Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)
RE: Does anything
)
There is no stderr.txt file in the whole system. There is stderrdae.txt in /var/lib/boinc-client, but it is empty.
After reboot it is similar to Suspend/Resume - one of the tasks gets started and works for a while until it hangs again.
RE: There is no stderr.txt
)
No, not stderrdae; it should be relative to your BOINC directory, perhaps without the .txt suffix for unix? Here's my (relative) path:
BOINC\slots\0\stderr.txt
It contains the text that goes into the "stderr out" part of the task details.
This one (121491487) of yours looks very suspicious. It got successively a signal 11, signal 6, signal 15 and signal 11 again, before it finished with exit status 41.
Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)
RE: No, not stderrdae; it
)
Thanks, I finally found them, the slots folder needs admin rights in Linux.
Yes, there are stderr.txt files and there is plenty of stuff inside, lots of signals and memory maps. What exactly shall I look for? I don't want to just copy and paste the whole thing here, it's big.
And after I examine the stderr, what can I do with it? All my current tasks do the same thing.
RE: ...What exactly shall I
)
I don't think you need to paste it here, since your tasks are visible onsite.
Sorry, from here on, I must pass on to more knowledgeable people. I do know where to look for error messages, but I can't interpret them (in this case :-).
Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)
RE: Hi, in last couple of
)
You could just be seeing computer rounding of the numbers. You said "when the computer comes back to this task, that 100% turns back to the correct value", what is that value? Does it then finish the crunching of the unit?
Hi, two tasks you returned
)
Hi,
two tasks you returned lately crashed, receiving signal 11 - segmentation violation - which normally should not occur. This points to faulty RAM and heat.
Michael
Team Linux Users Everywhere
RE: You could just be
)
First, everything looks OK - CPU time ticks up, Progress shows some percentage (slowly growing), Status is Running. Then suddenly Progress jumps tp 100%, CPU time stops moving, status says Running and this way it can stay forever; basically BOINC does nothing at all. I can manually suspend the task which starts another one.
After a reboot, BOINC takes one of the previously "frozen" tasks, CPU time starts growing again and after a few seconds the Progress indicator changes from 100% to whatever it was before it froze. It could be 5% as well as 50%. Everything works fine for a while (from 10 minutes to 2 hours) and then the same happens - no CPU time, 100%, Running.
stderr.txt shows Signal 11, Signal 6, Signal 15, I think all three for each time it "freezes".
RE: This points to faulty
)
this is quite possible, it is an older PC (made 2004 and even then it was a cheap one). After reading some other threads I cleaned the PC up to avoid overheating, shortly after that all seven downloaded E@H tasks crashed with Computation Error. At least this is a clear result instead of "mysterious freezing". Let's see how the next batch will work.
Thanks for all advice
RE: RE: This points to
)
You are also running VERY old versions of the Boinc software. Maybe if you upgraded some of the problems would go away. Personally I like 6.2.19 for Windows but don't have a preference for Linux. I have not tried the brand new 6.4.7 version for Windows yet though. You are running 5.?.? versions on both your Windows and Linux machines.