Computation Error - Child Processes

tom
tom
Joined: 13 Dec 05
Posts: 7
Credit: 137905
RAC: 0
Topic 190372

(Copy of item posted originally to Cafe Einstein)

I just joined E@H yesterday but having been running SETI and Rosetta under BOINC. E@H has downloaded about 8 work units but all have ended up running from 2 secs to 1 hour and then crashing with the error

"There are no Child Processes to Wait for (0x80)
exit code 128(0x80)"

The error does not seem to occur during screensaver mode.

Any suggestions?

Tom

Walt Gribben
Walt Gribben
Joined: 20 Feb 05
Posts: 219
Credit: 1645393
RAC: 0

Computation Error - Child Processes

Quote:

I just joined E@h yesterday but having been running SETI and Rosetta under BOINC. E@h has downloaded about 8 work units but all have ended up running from 2 secs to 1 hour and then crashing with the error

"There are no Child Processes to Wait for (0x80)
exit code 128(0x80)"

The error does not seem to occur during screensaver mode.

Any suggestions?

Tom

(I moved this from the cafe einstein thread where I replied to the earlier post)

What were you doing when the error occurred? BOINC gets that message when it starts the Einstein application, and for some reason it doesn't start. Could be security settings (install under one account, run under another without the proper authorizations), or other software interfering. Like firewall, anti-virus, file backup utilities.

Check the messages in Boinc Manager to see what was happening up to the error. If you see something like:

Suspending computation - user is active
Pausing result
Resuming computation activity
Restarting result xxxx with einstein version 479
(and you get the exit code 128 error messaage)

It means that you did something on your computer, so it suspended the workunit until you finished, and the error occurred when BOINC restarted it. What did you do?

If you see somethin similar, but instead of "user is active" its switching to another application (Rosetta), its the same problem, the "restart" failed.

Either way, the actual message show what BOINC was doing. Seeing those would help, at a minimum the ones for the previous 3 hours.

One thing to try, run FileMon from System Internals to see whats going on with the BOINC files. In the filter dialog, set the "include" filter to "einstein*;slots*" leave the other two filters blank and select all the boxes on the bottom. (Funnel shaped icon brings up the Filemon Filter dialog). Set these options:
-Advanced Output
-Clock Time
-Show Milliseconds
-set history depth to 30000
In the Volumes menu, select only the drive BOINC is running on.

Bring up Boinc Manager and switch to the Messages tab so you can see what its doing. If it suspended the result, let the system sit until it restarts the result again. If you get the "exit code 128" error, stop the FileMon trace and check it.

You have to watch it and stop the trace when you get the error, otherwise the trace buffer will wrap and you'll lose the error information.

Look thru the trace, you might see something else accessing the files BOINC or Einstein use, or an error when the application starts. (the "magnifying glass" icon starts and stops the filemon trace).

Or save the trace to a file ("File", "Save", give it a name like "exit 128 trace"), and send it to me at wgdebug(at)yahoo.com. Also include the messages - files stdoutdae.txt and stderrdae.txt. Zip them, otherwise the mail programs munge the text and its unusable. If you do email them, put a note here so I know to check the account.

After you do that, you could try changing the preferences for "Leave applications in memory while preempted?" to "yes". That way it won't keep keep reloading the application every time you switch applications, or use the computer.

Walt

tom
tom
Joined: 13 Dec 05
Posts: 7
Credit: 137905
RAC: 0

Thanks Walt for the

Thanks Walt for the info.

I will work through some of those issues tomorrow. The error appears to occur during the actual running of the program, not during a restart. I also have my settings to leave the program in memory while preempted.

The program is installed and running on my desktop computer with only one user.

I upgraded BOINC to the .13 version and have lost my old message file. I am running E@H at the moment to see if I can recreate the problem

I will keep you informed and will send you any system files which are created by following your suggestions.

Thanks

Tom

tom
tom
Joined: 13 Dec 05
Posts: 7
Credit: 137905
RAC: 0

Hi Walt, I have sent off

Hi Walt,

I have sent off some files to you at Yahoo documenting the most recent computation errors.
I hope this helps.

Tom

Walt Gribben
Walt Gribben
Joined: 20 Feb 05
Posts: 219
Credit: 1645393
RAC: 0

RE: Hi Walt, I have sent

Message 21432 in response to message 21431

Quote:

Hi Walt,

I have sent off some files to you at Yahoo documenting the most recent computation errors.
I hope this helps.

Tom

Hi Tom,

I got them. Yes it helps, I see the error.

Are you sure about not running the screensaver? The error occured right after it loaded the Intel OpenGL graphics driver. Thats loaded the first time graphics is displayed - either the screensaver or the "show graphics" button.

From the trace:
9:45:04.194 AM einstein_4.79_w:1696 IRP_MJ_CLOSE C:\\WINNT\\system32\\ialmgicd.dll SUCCESS

9:45:11.132 AM einstein_4.79_w:1696 IRP_MJ_WRITE C:\\Program Files\\BOINC\\slots\\1\\stderr.txt SUCCESS Offset: 0 Length: 128

First line shows when the DLL finished loading (thats part of opening the graphics device), the second line shows Einstein writing out an error message.

From stdoutdae.txt:
2005-12-15 09:45:11 [Einstein@Home] Unrecoverable error for result w1_1339.0__1339.3_0.1_T04_S4hD_0 ( - exit code -164 (0xffffff5c))

The "exit code -164" means it got an error while it was already handling an error, the first one was 0xC0000005. Thats from the result messages. Most likely this is one of the "graphics bug". In this case, one that occurs when initializing graphics.

This looks very much like a graphics problem.

You should set your screensaver to "none" or "blank" and not use the "show graphics" button. Run that way for a couple of days and see if the errors go away.

If so, and you want the graphics, try the beta test application. Its described here. And its probably a good idea to update your graphics drivers, Intel is another vendor that had buggy OpenGL drivers.

Walt

Michael Roycraft
Michael Roycraft
Joined: 10 Mar 05
Posts: 846
Credit: 157718
RAC: 0

RE: ... And its probably a

Message 21433 in response to message 21432

Quote:

... And its probably a good idea to update your graphics drivers, Intel is another vendor that had buggy OpenGL drivers.

Walt

tom,
This thread has links to download sites for graphics drivers. :-)

microcraft
"The arc of history is long, but it bends toward justice" - MLK

tom
tom
Joined: 13 Dec 05
Posts: 7
Credit: 137905
RAC: 0

Thanks for your comments. I

Thanks for your comments. I got my first two units processed bu turning off the screensaver. This has prevented both the exit code 128 and 164 errors from occuring. I have downloaded the lasted Intel graphics driver and will now experiment with using the screensaver.

Tom

KSMarksPsych
KSMarksPsych
Moderator
Joined: 15 Oct 05
Posts: 2702
Credit: 4090227
RAC: 0

RE: Thanks for your

Message 21435 in response to message 21434

Quote:

Thanks for your comments. I got my first two units processed bu turning off the screensaver. This has prevented both the exit code 128 and 164 errors from occuring. I have downloaded the lasted Intel graphics driver and will now experiment with using the screensaver.

Tom

Most likely if you try to use the graphics again, you'll get bitten by the bug. If you want to use the graphics, you'll have to use the Beta App.

There are many threads on this topic. Just search for "Graphics Bug".

Kathryn

Kathryn :o)

Einstein@Home Moderator

tom
tom
Joined: 13 Dec 05
Posts: 7
Credit: 137905
RAC: 0

Right you are. Starting using

Right you are.
Starting using the screensaver and both running work units ended up with the exit code -164 (0xffffff5c)

Tom

tom
tom
Joined: 13 Dec 05
Posts: 7
Credit: 137905
RAC: 0

Moved to Beta .18 and

Moved to Beta .18 and everything seems to be running well.

Thanks for all the assistance

Tom

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.