Albert error 10 [ Resolved ]

CJOrtega
CJOrtega
Joined: 19 Feb 05
Posts: 39
Credit: 1742781
RAC: 0
Topic 190701

Starting a new thread as the other one got hyjacked. :-)

I managed to catch an error, using sysinternal's filemon. The zipped file is abt 90kb, which expands to 1.6mb.

I don't want to include a block of text of that size in this forum. : )

So if someone wants to review the log, tell me where to send it.

There are some 'name collision' errors.

Walt Gribben
Walt Gribben
Joined: 20 Feb 05
Posts: 219
Credit: 1645393
RAC: 0

Albert error 10 [ Resolved ]

Quote:

Starting a new thread as the other one got hyjacked. :-)

I managed to catch an error, using sysinternal's filemon. The zipped file is abt 90kb, which expands to 1.6mb.

I don't want to include a block of text of that size in this forum. : )

So if someone wants to review the log, tell me where to send it.

There are some 'name collision' errors.


Hi Claude,

Send to to me at wgdebug(at)yahoo(dot)com.

If you continue running FileMon, would you make sure these options are set:

Advanced output
Clock time
Show milliseconds

Would you also get pslist and psservice from System Internals, run them and send the output with the filemon trace? You can run them from a command prompt, redirecting the output to a file like this:

pslist >pslist.txt
psservice >psservice.txt

Thanks,

Walt

CJOrtega
CJOrtega
Joined: 19 Feb 05
Posts: 39
Credit: 1742781
RAC: 0

Thanks Walt. Message sent.

Thanks Walt.

Message sent.

CJOrtega
CJOrtega
Joined: 19 Feb 05
Posts: 39
Credit: 1742781
RAC: 0

Well, I'm still getting error

Well, I'm still getting error 10's. About 30 to 50% of the WU's error out, at random, on two of my three WinXpHome systems. The three Linux systems haven't had any of these errors.

I don't understand how my two systems are the only ones, out of the thousands of computers doing Albert WU's, to get these errors.

I see no problems from the Seti and Climate Prediction computations, or any of the other processes that run on these systems.

It seems to be something like the program is trying to write to a file before the file is created, or unlocked, or something.

Walt Gribben
Walt Gribben
Joined: 20 Feb 05
Posts: 219
Credit: 1645393
RAC: 0

RE: Well, I'm still getting

Message 24730 in response to message 24729

Quote:

Well, I'm still getting error 10's. About 30 to 50% of the WU's error out, at random, on two of my three WinXpHome systems. The three Linux systems haven't had any of these errors.

I don't understand how my two systems are the only ones, out of the thousands of computers doing Albert WU's, to get these errors.

I see no problems from the Seti and Climate Prediction computations, or any of the other processes that run on these systems.

It seems to be something like the program is trying to write to a file before the file is created, or unlocked, or something.

Hi Claude,

The trace you sent shows something is reading the same file Albert is using. Each time Albert writes a block of data, the other application reads the file. When Albert tries to delete the old file, the delete fails with "sharing violation" and the subsequent rename fails with "Name Collision".

I'll send a couple of things to try in email.

Walt

CJOrtega
CJOrtega
Joined: 19 Feb 05
Posts: 39
Credit: 1742781
RAC: 0

Walt, Et Al, Received your

Walt, Et Al,

Received your email, thought about what was similar and different between the three computers, and realized that the two systems that were getting the errors had their BOINC folders 'shared', the other system didn't.

I unshared both folders and ran filemon overnight. I don't see that interleaved write/read in the log this morning, so sharing the folders might be the problem.

I've emailed the log to you, and will continue to monitor for a while, to see if the problem occurs again.

Claude

Walt Gribben
Walt Gribben
Joined: 20 Feb 05
Posts: 219
Credit: 1645393
RAC: 0

RE: Walt, Et Al, Received

Message 24732 in response to message 24731

Quote:

Walt, Et Al,

Received your email, thought about what was similar and different between the three computers, and realized that the two systems that were getting the errors had their BOINC folders 'shared', the other system didn't.

I unshared both folders and ran filemon overnight. I don't see that interleaved write/read in the log this morning, so sharing the folders might be the problem.

I've emailed the log to you, and will continue to monitor for a while, to see if the problem occurs again.

Claude

Hi Claude,

Didn't think about file sharing. Explains a lot actually.

The share by itself won't cause any problem. But it means that a program on another computer could be accessing the BOINC files thru the network. Like a network based file backup utility. Or the BOINC folder was mapped on the other PC, and the utilities were reading the files over the network as though the folder were local.

I looked thru one of the other traces you sent, the one with 10 minutes of all the file I/O, and it suggests that another PC was accessing the BOINC directory. If you look thru that trace, you'll see trace entries like this:

1:05:00.948 PM svchost.exe:732 IRP_MJ_DIRECTORY_CONTROL C:\\$Extend\\$ObjId SUCCESS Change Notify

That "Change Notify" appears every time a file is changed, renamed or deleted. And it notifies an application that a file was changed, in case the program wants to do something with the file. Explorer use that facility to keep the file list up-to-date in its file list, but utility programs also use it to see what files change. Expecailly on remote filesystems accessed over the network.

You could try enabling the share again, and see if the problem comes back. ALthough it would be better to check with FileMon, not wait for another error. If you see the excess file activity from "system:4" you can see the active shares:

Open Computer Management - right-click "My computer", click "Properties". Or click "Start", "Administrative Tools", "Computer Management". If you don't havea menu item for admin tools, look in the Control Panel.

In the Computer Management dialog, double-click "Shared folders" to open it. In the left pane, click "Sessions" to see what sessions are active, click "Open Files" to see what files are being used by the other systems. Its not a dynamic display, you have to press the F5 key to refresh the display.

If you see open files, FileMon on the other PC (the one listed in "sessions"), selecting "network" in the "Volumes" menu item (unselect the other items). That should show all the remote file activity.

Walt

CJOrtega
CJOrtega
Joined: 19 Feb 05
Posts: 39
Credit: 1742781
RAC: 0

Walt, Verified. I

Walt,

Verified.

I enabled 'share' on blnt30, and filmon showed a flurry of activity.

'Computer Management' showed the einstein data file open.

I disabled 'share', and filmon went back to the usual entries.

I had the folder 'shared' because BoincView had used that method to get data from remote systems. When BoincView went to the 'RPC' method I followed, but didn't disable the "share', as it seemed a convenient way of checking things without having to go to the remote systems ( in an other room ).

I can cope with having to walk a few feet to do the checking. :-)

Thanks,

Claude

CJOrtega
CJOrtega
Joined: 19 Feb 05
Posts: 39
Credit: 1742781
RAC: 0

Walt, Now all that someone

Walt,

Now all that someone has to do, is to find out why the Einstein API has trouble with having a 'share' set and Seti & CP doesn't.

:-)

Walt Gribben
Walt Gribben
Joined: 20 Feb 05
Posts: 219
Credit: 1645393
RAC: 0

RE: Walt, Now all that

Message 24735 in response to message 24734

Quote:

Walt,

Now all that someone has to do, is to find out why the Einstein API has trouble with having a 'share' set and Seti & CP doesn't.

:-)

Hi Claude,

Knowing which program on the "remote" system is accessing the files would be a great help. Running FileMon on the remote PC's should give you that information:

-Enable the share on the local machine.
-Run FileMon on the local machine to verify the excessive file activity is taking place.
-Use the Computer Management console, "shared files", "sessions" to see which systems are accessing the system
-Run FileMon on the remote machine:
--Enable "Volume" menu item "Network" (only that one, clear the others)
--Set "Include" filter to "*".
--Remove "Exclude" filter
--Set all the trace options at the bottom of the filter dialog
--enable options "Advanced output", "Clock time", "Show milliseconds".
-Start tracing.

You should see trace entries for the "remote" file read activity, with the name of the process reading it.

CJOrtega
CJOrtega
Joined: 19 Feb 05
Posts: 39
Credit: 1742781
RAC: 0

RE: Hi Claude, Knowing

Message 24736 in response to message 24735

Quote:


Hi Claude,

Knowing which program on the "remote" system is accessing the files would be a great help. Running FileMon on the remote PC's should give you that information:

-Enable the share on the local machine.
-Run FileMon on the local machine to verify the excessive file activity is taking place.
-Use the Computer Management console, "shared files", "sessions" to see which systems are accessing the system
-Run FileMon on the remote machine:
--Enable "Volume" menu item "Network" (only that one, clear the others)
--Set "Include" filter to "*".
--Remove "Exclude" filter
--Set all the trace options at the bottom of the filter dialog
--enable options "Advanced output", "Clock time", "Show milliseconds".
-Start tracing.

You should see trace entries for the "remote" file read activity, with the name of the process reading it.

Ok, done.

I've emailed you the filmon log results of this test.

[ It's BoincLogX. ]

:-)

Claude

[edit] Of course, without the share enabled, BoincLogX can't get data from the remote systems. :-(
[/edit]

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.