albert buffer overflow

Ardis
Ardis
Joined: 17 Dec 05
Posts: 6
Credit: 68787936
RAC: 0
Topic 190520

Hey, this isn't good. Just got the third "computation error" message from Albert 4.37. My change alert program (Prevx) didn't like it one bit. Is this a bug in the new code, or am I getting hacked?

Regards,

Ardis

Walt Gribben
Walt Gribben
Joined: 20 Feb 05
Posts: 219
Credit: 1645393
RAC: 0

albert buffer overflow

Hi Ardis,

0xC0000022 means access was denied to a resource because of security settings.

What does the BOINC message log show was happening when the error occurred? The messages around the time of the errors would be very helpful.

Walt

Ardis
Ardis
Joined: 17 Dec 05
Posts: 6
Credit: 68787936
RAC: 0

Hi Walt, There are indeed

Hi Walt,

There are indeed 0xC0000022 entries in the BM log, 9:02:10 today (the latest occurrence) and surrounding messages follow. I also have the Prevx log, which includes a stack dump, if that's helpful. Or do I just need to get Prevx to allow normal Albert activity?

Thanks for looking at this. It seemed like the Einstein client worked fine.

Regards,

Ardis

1/5/2006 8:48:53 AM|Einstein@Home|Started upload of r1_0471.0__47_S4R2a_2_0
1/5/2006 8:49:16 AM||Couldn't connect to hostname [einstein.phys.uwm.edu]
1/5/2006 8:49:16 AM|Einstein@Home|Temporarily failed upload of r1_0471.0__47_S4R2a_2_0: system I/O
1/5/2006 8:49:16 AM|Einstein@Home|Backing off 3 hours, 21 minutes, and 47 seconds on upload of file r1_0471.0__47_S4R2a_2_0
1/5/2006 8:54:19 AM|SETI@home|Started upload of 24fe05aa.27628.8272.609658.1.147_2_0
1/5/2006 8:54:41 AM||Couldn't connect to hostname [setiboincdata.ssl.berkeley.edu]
1/5/2006 8:54:41 AM|SETI@home|Temporarily failed upload of 24fe05aa.27628.8272.609658.1.147_2_0: system I/O
1/5/2006 8:54:41 AM|SETI@home|Backing off 3 hours, 11 minutes, and 53 seconds on upload of file 24fe05aa.27628.8272.609658.1.147_2_0
1/5/2006 9:01:43 AM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
1/5/2006 9:01:43 AM|Einstein@Home|Reason: To fetch work
1/5/2006 9:01:43 AM|Einstein@Home|Requesting 2008 seconds of new work, and reporting 1 results
1/5/2006 9:02:06 AM||Couldn't connect to hostname [einstein.phys.uwm.edu]
1/5/2006 9:02:08 AM|Einstein@Home|Scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi failed with a return value of -106
1/5/2006 9:02:08 AM|Einstein@Home|No schedulers responded
1/5/2006 9:02:10 AM|Einstein@Home|Unrecoverable error for result r1_0471.0__46_S4R2a_1 ( - exit code -1073741790 (0xc0000022))
1/5/2006 9:02:11 AM||request_reschedule_cpus: process exited
1/5/2006 9:02:11 AM|Einstein@Home|Computation for result r1_0471.0__46_S4R2a_1 finished
1/5/2006 9:02:11 AM|SETI@home|Resuming result 15mr05aa.13934.544.990892.1.22_3 using setiathome version 418
1/5/2006 9:02:13 AM|Einstein@Home|Started upload of r1_0471.0__46_S4R2a_1_0
1/5/2006 9:02:35 AM||Couldn't connect to hostname [einstein.phys.uwm.edu]
1/5/2006 9:02:35 AM|Einstein@Home|Temporarily failed upload of r1_0471.0__46_S4R2a_1_0: system I/O
1/5/2006 9:02:35 AM|Einstein@Home|Backing off 1 minutes and 0 seconds on upload of file r1_0471.0__46_S4R2a_1_0
1/5/2006 9:03:35 AM|Einstein@Home|Started upload of r1_0471.0__46_S4R2a_1_0
1/5/2006 9:03:57 AM||Couldn't connect to hostname [einstein.phys.uwm.edu]
1/5/2006 9:03:58 AM|Einstein@Home|Temporarily failed upload of r1_0471.0__46_S4R2a_1_0: system I/O
1/5/2006 9:03:58 AM|Einstein@Home|Backing off 1 minutes and 0 seconds on upload of file r1_0471.0__46_S4R2a_1_0

Walt Gribben
Walt Gribben
Joined: 20 Feb 05
Posts: 219
Credit: 1645393
RAC: 0

RE: Hi Walt, There are

Message 23392 in response to message 23391

Quote:

Hi Walt,

There are indeed 0xC0000022 entries in the BM log, 9:02:10 today (the latest occurrence) and surrounding messages follow. I also have the Prevx log, which includes a stack dump, if that's helpful. Or do I just need to get Prevx to allow normal Albert activity?

Thanks for looking at this. It seemed like the Einstein client worked fine.

Regards,

Ardis

Hi Ardis,

The stack dump from Prevx would be quite useful.

And yes, you need to set Prevx to allow Albert to access BOINC\\projects\\einstein.phys.uwm.edu\\ and BOINC\\slots and all subdirectories. Thats all file activity allowed, and access to the desktop.

Walt

Ardis
Ardis
Joined: 17 Dec 05
Posts: 6
Credit: 68787936
RAC: 0

RE: Hi Ardis, The stack

Message 23393 in response to message 23392

Quote:


Hi Ardis,

The stack dump from Prevx would be quite useful.

And yes, you need to set Prevx to allow Albert to access BOINC\\projects\\einstein.phys.uwm.edu\\ and BOINC\\slots and all subdirectories. Thats all file activity allowed, and access to the desktop.

Walt

Walt,

The Prevx report follows. Hope it helps. I changed the Prevx Other Services stack overflow policy from "deny" to "report." Internet traffic is monitored by firewall and antivirus apps, and they haven't complained about Albert at all.

Regards,

Ardis

Event Information:
===================
Date: 1/5/2006
Time: 9:02:09 AM
Type: EVENT
Source: WKCOM
Category: BUFFEROVERFLOW ALERT

Extended Event Information:
============================
Prevx has prevented ALBERT_4.37_WINDOWS_INTELX86.EXE from causing a buffer overflow.

The following information has been obtained:
Process: ALBERT_4.37_WINDOWS_INTELX86.EXE
Path: C:\\PROGRAM FILES\\BOINC\\PROJECTS\\EINSTEIN.PHYS.UWM.EDU\\ALBERT_4.37_WINDOWS_INTELX86.EXE
Pid: 2288
Parentprocess: BOINC.EXE
Parentpath: C:\\PROGRAM FILES\\BOINC\\BOINC.EXE
Pid: 228

EIP: 13040464
Return EIP: 0x48925E->0xC6FB50->
Number of frames: 2
Frame Pointer: 0xC6F844->0x484766->
Memory Type: 0
Mechanism Flags: 3
Mechanism Name: ~ALL~
EIP Data:
00000000 C600 00FC 8000 FCFD C600 0008 7851 FD72 ......x...G.pQ.r
00000010 7D40 403B C05C 3B74 443D 5CFB 5608 7437 }@.;D\\Vt.=+....7
00000020 FF3F 3F10 B6A0 1019 E4BF A04A 84E8 1909 .?.........J..P.
00000030 3940 4037 342E 3732 3132 2E31 3139 3231
00000040 2031 3139 2E31 3937 352D 312E 3130 3732 .1.95117.-1.0062
00000050 3320 202E 3139 2E35 342D 3931 3620 3535 3.1.4965e-011.25
00000060 2E30 3036 33?? 36?? ???? ???? ???? ???? .036????????????

EIP Stackdump:
00000000 6647 4700 48FB 0000 50FF FBFF C644 0000 fGH.P........DH.
00000010 CCFF FFFF FF00 FF00 004C 0000 0000 0000 .........LH.....
00000020 0A00 0000 0085 0000 284C 8500 4F00 0000 ....(.O..LH.....
00000030 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000040 0000 0000 0000 0000 3400 0000 0000 0000 ....4...........
00000050 0000 0000 0000 0000 0000 0000 0036 0000 ............364.
00000060 3400 0000 0000 0000 34FB 0000 00FF 00FF 4...4...P.......
00000070 0000 0000 0000 0000 0007 007C B500 0000 ........2..|....
00000080 580C 0C00 B500 0000 00AE 0000 B5F8 0000 X.......X.......
00000090 C4F8 F800 C6FB 0000 04EE FB7C C607 007C ...........|8..|
000000A0 FFFF FFFF FF07 FF7C 3206 077C 9106 7C7C ....2..|...|...|
000000B0 0E03 0300 0000 0000 F057 0000 0057 0000 ........HW..HW..
000000C0 4857 5700 B505 007C 5107 0500 9105 7C7C HW..Q..|x...m..|
000000D0 08AE AE00 E1AD 0000 B800 AD00 E1AE 0000 ................
000000E0 30AB AB00 E1FA 0000 9000 FA00 C600 0000 0.......(...m...
000000F0 708F 8F40 5B37 40BC B91F 373F E1F9 BC00 p.[@.7.....?P...
00000100 0000 0000 0005 007C C8AE 0500 91FA 7C00 .......|(.......
00000110 64F9 F900 C600 0000 0005 007C 00AE 0000 d..........|....
00000120 30FA FA00 C605 007C 5107 0500 9105 7C7C 0...Q..|x...m..|
00000130 0000 0000 00AE 0000 08AB AE00 E183 001E ........0.......
00000140 C601 013E 76B5 3ECE 4FCB B5BF 6C11 CEB0 ..v.O.l.........
00000150 5F7A 7A3F 4000 3F00 00D5 00BE B509 00AA _z@?........F...
00000160 3FB3 B3BF EB72 BFF3 35AC 72C0 904B F300 ?...5r..%....K..
00000170 0500 0000 00F9 0000 1400 F900 C600 0000 ........(.......
00000180 6800 0000 0000 0000 3800 0000 0000 0000 h...8.......(...
00000190 75D3 D300 4700 0000 0000 0000 B5D3 0000 u.G.........z.G.
000001A0 0000 0000 B500 0000 0000 0000 B500 0000 ................
000001B0 0000 0000 B500 0000 00FF 0000 B500 0000 ................
000001C0 60F9 F900 C6F9 0000 64FA F900 C6EE 0000 `...d...H.......
000001D0 0500 0000 00F9 0000 7405 F97C C6FA 0000 ....t...m..|\\...
000001E0 18EE EE7C 9005 7C7C 70FF 05FF 9105 7C7C ...|p..|....m..|
000001F0 75D3 D300 4700 0000 0000 0000 B5D3 0000 u.G.........z.G.
00000200 0000 0000 00BD 0000 40FA BD00 5100 0000 ....@.Q.\\.......
00000210 C805 057C 91AB 7C00 C8FB AB00 E105 007C ...|....(...Q..|
00000220 F811 1100 B505 007C 6DAF 0500 91AB 7C00 ....m..|H.......
00000230 4857 5700 B557 0000 4857 5700 B505 0000 HW..HW..HW......
00000240 D0AB AB00 E100 007C 00AA 0041 00A8 7C2F .......|T..A..T/
00000250 0000 0000 00D0 0032 0887 D0BF 0E08 3249 .......2..}.+..I
00000260 ABCF CF2A 55FA 2A00 E4F5 FA00 C6FA 0000 ..U*....V.G.....
00000270 4AA7 A700 4CFB 0000 2050 FB00 C688 0000 J.L......PH.P...
00000280 0CFB FB00 C6D1 0000 6600 D100 47D0 0000 ....f.G.....3.G.
00000290 2885 8500 4FD0 0000 2A00 D000 47FC 0000 (.O.*.G.........
000002A0 F60E 0E00 0000 0000 3485 0000 0000 0000 ....4...(.O.....
000002B0 DCFA FA00 C608 0051 47FF 0800 702F 5100 ....G.pQ...../H.
000002C0 C074 7400 4BFF 00FF FFD0 FF00 FF09 FF00 .tK.....*.G.3.A.
000002D0 2885 8500 4FA7 0000 4CFB A700 4CFC 0000 (.O.L.L.P.......
000002E0 78FD FD00 C608 0051 4772 0840 703B 515C x...G.pQ.r}@.;D\\
000002F0 5674 743D B0FB 3D08 2B37 FB3F AE10 08A0 Vt.=+....7.?....
00000300 8419 19BF F04A BFE8 3E09 4A40 1637 E82E .....J..P.9@471.
00000310 3132 3232 3831 3239 3531 3131 3439 3931 1282514931.1.951
00000320 3137 372D 202E 2D30 3132 2E20 302E 3039 17.-1.00623.1.49
00000330 3635 352D 6531 2D20 3035 3130 3136 200A 65e-011.25.0364.
00000340 000A 0A3A 0C55 3ABA 4774 55BE A6EF BA61 ...:GU...t.....a
00000350 F856 56BD 4EC4 BDBE 0676 C4BE C080 BED9 .VN......v......
00000360 0D02 02C0 3028 C0C7 D65A 2800 ABC7 C737 ..0..(...Z....a7
00000370 C11E 1E40 59EE 40D2 F1D4 EE3F 963B D200 ..Y@.......?.;H.
00000380 B73B 3B00 48D6 0000 E000 D600 5500 0000 .;H...U.........
00000390 C8FB FB00 C6AE 0000 58FC AE00 E1CF 002A ....X.........U*
000003A0 20FC FC00 C693 0000 2200 9300 4800 0009 ....".H.......2.
000003B0 A54E 4E00 004E 0000 A585 4E00 0000 0000 .N...N..(.O.....
000003C0 2E90 903F FCFB 3F00 F887 FBC0 C6FC 0000 ...?........l...
000003D0 1C2F 2F00 48BA 0000 1800 BA00 4BFC 0000 ./H...K.....|...
000003E0 EFE4 E400 4700 0000 0500 0009 004E 0000 ..G.......2..N..
000003F0 C2E2 E200 1C85 0000 2805 8500 4FE5 0000 ....(.O.......G.
00000400 2885 8500 4F00 0000 00E6 0000 0085 0000 (.O.....N.G.(.O.
00000410 C2E2 E200 1CFD 0000 7805 FD00 C66D 0024 ....x........m.$
00000420 54FC FC00 C63B 005C C0FF 3B00 442F 5C00 T....;D\\...../H.
00000430 C875 7500 4B00 0000 00FF 0000 0098 0000 .uK...........@.
00000440 2885 8500 4F0A 0031 3E30 0A30 0000 3100 (.O....10000....
00000450 C2E2 E200 1C19 0000 51F7 193F 000B 0005 ....Q...f..?%...
00000460 C005 0500 0011 0090 4F2F 1140 6600 9000 ....O.f.k/.@....
00000470 4972 7240 7DFB 4008 2B37 FB3F AE10 08A0 Ir}@+....7.?....
00000480 8419 19BF F000 BF00 0000 0000 0000 0000 ................
00000490 0000 0000 00FB 0008 2B37 FB3F AE10 08A0 ....+....7.?....
000004A0 8419 19BF F000 BF00 00E0 007F 0000 0000 ................
000004B0 0000 0000 0000 0000 0000 0000 0000 0000 ................
000004C0 0000 0000 0000 0000 0000 0000 0000 0000 ................
000004D0 0000 0000 0000 0000 0000 0000 0000 0000 ................
000004E0 0000 0000 0000 0000 0000 0000 0028 0000 .............(..
000004F0 0000 0000 002E 003C 811D 2E3E C1A6 3CA8 ................
00000500 C0BC BC3D F4AE 3D00 80C9 AE00 B56D 0024 ...=.........m.$
00000510 4972 7240 7D3B 405C C074 3B3D 44FB 5C08 Ir}@.;D\\Vt.=+...
00000520 FE37 373F FF10 3FA0 B619 10BF E411 A090 .7.?........O.f.
00000530 6B2F 2F40 02F8 40BD B072 F840 3163 BD42 k/.@..1..r}@.c[B
00000540 7BB5 B5BD F375 BD69 CDBC 753F 1A0B 69A9 {....u.i...?....
00000550 82D1 D1BF F2EB BFEC 1B2C EB40 092E EC2F ..........D@@../
00000560 2E2E 2E70 2F6F 7065 7274 6F2F 6A69 6573 ../projects/eins
00000570 7465 656E 6970 6E79 2E2E 7077 682E 7964 tein.phys.uwm.ed
00000580 752F 2F6F 6366 6F67 6E53 6652 6961 6763 u/config_S4R2a.c
00000590 6667 6700 0000 0000 0000 0000 0000 0000 fg..............
000005A0 0000 0000 0000 0000 0000 0000 0000 0000 ................
000005B0 0000 0000 0000 0000 0000 0000 0000 0000 ................
000005C0 0000 0000 0000 0000 0000 0000 0000 0000 ................
000005D0 0000 0000 0000 0000 0000 0000 0000 0000 ................
000005E0 0000 0000 0000 0000 0000 0000 0000 0000 ................
000005F0 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000600 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000610 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000620 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000630 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000640 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000650 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000660 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000670 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000680 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000690 0000 0000 0000 0000 0000 0000 0000 0000 ................
000006A0 0000 0000 0000 0000 0000 0000 0000 0000 ................
000006B0 0000 0000 0000 0000 0000 0000 0000 0000 ................
000006C0 0000 0000 0000 0000 0000 0000 0000 0000 ................
000006D0 0000 0000 0000 0000 0000 0000 0000 0000 ................
000006E0 0000 0000 0000 0000 0000 0000 0000 0000 ................
000006F0 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000700 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000710 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000720 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000730 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000740 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000750 0000 0000 0000 0000 0000 0000 00FF 0000 ................
00000760 63A1 A100 4000 0000 0BFD 0000 0030 0030 c.@.........0000
00000770 2E71 7100 41B5 007C 0B00 B500 800A 7C31 .qA....|.......1
00000780 3030 3030 3000 3000 00E0 007F 0056 0081 0000.........V..
00000790 C0FF FF00 C608 00FE A8FF 08FF 7E99 FE7C ......~........|
000007A0 18B5 B57C 8000 7C00 0000 0000 0000 0000 ...|............
000007B0 2071 7100 4100 0000 0000 0000 0000 0000 .qA.............
000007C0 0000 0000 0000 0000 0000 0000 0000 0000 ................
000007D0 0000 0000 0000 0000 0000 0000 0000 0000 ................
000007E0 0000 0000 0000 0000 0000 0000 0000 0000 ................
000007F0 0000 0000 0000 0000 0000 0000 0000 0000 ................

Policy causing this event: Other Services (Stack)
The action has been denied

System Information:
============================
System : Microsoft Windows XP Personal Service Pack 2 (Build 2600)
ComputerIdentifier: AT/AT COMPATIBLE
BiosVersion : DELL - 7
CPUName: Intel(R) Pentium(R) 4 CPU 2.66GHz
CPUVendor: GenuineIntel
CPUIdent: x86 Family 15 Model 2 Stepping 9
MemUsage : 74%
MemPhysicalTotal: 266338304 (254 MB)
MemPhysicalAvail: 66662400 (63 MB)
MemTotalAvail : 655286272 (624 MB)
MemPageAvail : 318197760 (303 MB)
MemPageAvail : 2147352576 (2047 MB)
ullAvailExtendedVirtual : 2059726848 (1964 MB)

Walt Gribben
Walt Gribben
Joined: 20 Feb 05
Posts: 219
Credit: 1645393
RAC: 0

RE: RE: Hi Ardis, The

Message 23394 in response to message 23393

Quote:
Quote:


Hi Ardis,

The stack dump from Prevx would be quite useful.

And yes, you need to set Prevx to allow Albert to access BOINC\\projects\\einstein.phys.uwm.edu\\ and BOINC\\slots and all subdirectories. Thats all file activity allowed, and access to the desktop.

Walt

Walt,

The Prevx report follows. Hope it helps. I changed the Prevx Other Services stack overflow policy from "deny" to "report." Internet traffic is monitored by firewall and antivirus apps, and they haven't complained about Albert at all.

Regards,

Ardis

Thanks. Albert only communicates with BOINC, and thats thrua shared memory file. So you wouldn't see any internet activity from it.

If this is actually a stack overflow, it could be a problem with the program. Thats very good to know, helps to identify the real problem.

What other options to you have for "action"? Will it produce a minidump? Or a crash dump?

Looking at the dump, the instruction pointer is at 13040464 which isn't in the albert app. But it could be in a DLL needed by the program. If you run Process Explorer from System Internals, it'll show the starting address for each module. Run it, select the albert process, and look thru the DLL's for the base address right before 0x13040464.

First time you'll probably have to configure it to get the DLL's and base addresses listed. Press ctrl-d to get the DLL's. Right click the header and click "select columns" and switch to the DLL tab. Select "Name", "Description", "Company", "Version", "Base" and "Path".

Back in Process Explorer, in the lower pane, sort the modules by address by clicking the "Base" header. Look for the one that starts before 0x13040464. What module is it? What are the details on that line?

Further in the trace shows:

Return EIP: 0x48925E->0xC6FB50->

The address 0x48925E is where the current function returns to, and a disassembly shows that address does indeed follow a CALL. This one (at 0x48925E) is part of the C runtime library routine write(). Which is where a lot of other results are showing errors, so this could be the problem. The next return address, 0xC6FB50 is garbage, my system shows its all zeros, which would indeed cause errors. (Return address normally points back to an Albert function that called write())

I don't think it'll be a problem setting Prevx to "report" as the Albert application will catch the error and abort the workunit. But the Prevx trace does show a problem where the application is only reporting "write error".

Which is why I was interested in the other options.

Walt

Ardis
Ardis
Joined: 17 Dec 05
Posts: 6
Credit: 68787936
RAC: 0

RE: Looking at the dump,

Message 23395 in response to message 23394

Quote:

Looking at the dump, the instruction pointer is at 13040464 which isn't in the albert app. But it could be in a DLL needed by the program. If you run Process Explorer from System Internals, it'll show the starting address for each module. Run it, select the albert process, and look thru the DLL's for the base address right before 0x13040464.

First time you'll probably have to configure it to get the DLL's and base addresses listed. Press ctrl-d to get the DLL's. Right click the header and click "select columns" and switch to the DLL tab. Select "Name", "Description", "Company", "Version", "Base" and "Path".

Back in Process Explorer, in the lower pane, sort the modules by address by clicking the "Base" header. Look for the one that starts before 0x13040464. What module is it? What are the details on that line?

Further in the trace shows:

Return EIP: 0x48925E->0xC6FB50->

The address 0x48925E is where the current function returns to, and a disassembly shows that address does indeed follow a CALL. This one (at 0x48925E) is part of the C runtime library routine write(). Which is where a lot of other results are showing errors, so this could be the problem. The next return address, 0xC6FB50 is garbage, my system shows its all zeros, which would indeed cause errors. (Return address normally points back to an Albert function that called write())

Walt

Hi Walt,

An interesting moment earlier this evening that may be related to this issue. While I happened to be looking at Windows Task Manager (and Albert was running), in just a few seconds CPU Usage went from 100% to zero, and then BOINC Mgr. produced several messages, saying Albert had exited with no output and if this happens a lot I might need to reset the project. Then Albert restarted and appears to be running fine (I thought I'd copied and saved the messages before a reboot, but alas...).

It doesn't look like Prevx offers any dump formats other than the one you already have.

Process Explorer has been installed, and I did what you said to get the Albert DLLs sorted by address. There is no 0x13040464 entry in the list, it jumps from 0xB40000 to 0x5B0A0000 (no 0x48925E or 0xC6FB50 either). The first of those:

Name: CTYPE.NLS
Description: (blank)
Company Name: (blank)
Version: (blank)
Base: 0xB40000
Path: C:\\WINDOWS\\SYSTEM32\\CTYPE.NLS

Here's the whole list of DLLs:

Process: albert_4.37_windows_intelx86.exe Pid: 2844

Name Description Company Name Version Base Path
UNICODE.NLS 0x260000 C:\\WINDOWS\\SYSTEM32\\UNICODE.NLS
locale.nls 0x280000 C:\\WINDOWS\\SYSTEM32\\locale.nls
SORTKEY.NLS 0x2C0000 C:\\WINDOWS\\SYSTEM32\\SORTKEY.NLS
sorttbls.nls 0x310000 C:\\WINDOWS\\SYSTEM32\\sorttbls.nls
albert_4.37_windows_intelx86.exe 0x400000 C:\\Program Files\\BOINC\\projects\\einstein.phys.uwm.edu\\albert_4.37_windows_intelx86.exe
CTYPE.NLS 0xB40000 C:\\WINDOWS\\SYSTEM32\\CTYPE.NLS
UMDMXFRM.DLL Unimodem Tranform Module Microsoft Corporation 5.01.2600.0000 0x5B0A0000 C:\\WINDOWS\\SYSTEM32\\UMDMXFRM.DLL
SERWVDRV.DLL Unimodem Serial Wave driver Microsoft Corporation 5.01.2600.0000 0x5CD70000 C:\\WINDOWS\\SYSTEM32\\SERWVDRV.DLL
comctl32.dll Common Controls Library Microsoft Corporation 5.82.2900.2180 0x5D090000 C:\\WINDOWS\\SYSTEM32\\comctl32.dll
opengl32.dll OpenGL Client DLL Microsoft Corporation 5.01.2600.2180 0x5ED00000 C:\\WINDOWS\\SYSTEM32\\opengl32.dll
glu32.dll OpenGL Utility Library DLL Microsoft Corporation 5.01.2600.2180 0x68B20000 C:\\WINDOWS\\SYSTEM32\\glu32.dll
ddraw.dll Microsoft DirectDraw Microsoft Corporation 5.03.2600.2180 0x73760000 C:\\WINDOWS\\SYSTEM32\\ddraw.dll
dciman32.dll DCI Manager Microsoft Corporation 5.01.2600.2180 0x73BC0000 C:\\WINDOWS\\SYSTEM32\\dciman32.dll
winmm.dll MCI API DLL Microsoft Corporation 5.01.2600.2180 0x76B40000 C:\\WINDOWS\\SYSTEM32\\winmm.dll
comctl32.dll User Experience Controls Library Microsoft Corporation 6.00.2900.2180 0x773D0000 C:\\WINDOWS\\WinSxS\\x86_Microsoft.Windows.Common-Controls_6595b64144ccf1df_6.0.2600.2180_x-ww_a84f1ff9\\comctl32.dll
msvcrt.dll Windows NT CRT DLL Microsoft Corporation 7.00.2600.2180 0x77C10000 C:\\WINDOWS\\SYSTEM32\\msvcrt.dll
user32.dll Windows XP USER API Client DLL Microsoft Corporation 5.01.2600.2622 0x77D40000 C:\\WINDOWS\\SYSTEM32\\user32.dll
advapi32.dll Advanced Windows 32 Base API Microsoft Corporation 5.01.2600.2180 0x77DD0000 C:\\WINDOWS\\SYSTEM32\\advapi32.dll
rpcrt4.dll Remote Procedure Call Runtime Microsoft Corporation 5.01.2600.2180 0x77E70000 C:\\WINDOWS\\SYSTEM32\\rpcrt4.dll
gdi32.dll GDI Client DLL Microsoft Corporation 5.01.2600.2818 0x77F10000 C:\\WINDOWS\\SYSTEM32\\gdi32.dll
shlwapi.dll Shell Light-weight Utility Library Microsoft Corporation 6.00.2900.2781 0x77F60000 C:\\WINDOWS\\SYSTEM32\\shlwapi.dll
kernel32.dll Windows NT BASE API Client DLL Microsoft Corporation 5.01.2600.2180 0x7C800000 C:\\WINDOWS\\SYSTEM32\\kernel32.dll
ntdll.dll NT Layer DLL Microsoft Corporation 5.01.2600.2180 0x7C900000 C:\\WINDOWS\\SYSTEM32\\ntdll.dll

Walt Gribben
Walt Gribben
Joined: 20 Feb 05
Posts: 219
Credit: 1645393
RAC: 0

Hi Ardis, Thanks for the

Hi Ardis,

Thanks for the info. If the EIP isn't pointing to a loaded module (DLL) it might be something that prevx stuck in to check for buffer overruns.
[edit] You won't find the address explicitly, it'll be inside on of the modules. Add "size" to the columns. For instance. albert start at 0x400000 (base) and is 0x163000 bytes (size). So 0x48925E will be somewhere within albert, but the 0xC6FB50 is past it. And you won't find 0x13040464 either, no module is in range. [/edit]

On the "exited with zero status", its only a problem if its continuous, like every time the application starts it runs for a minute (initializing) and then exits. What you can do is look in the slots\\n folder for the stderr.txt file and see what kind of error messages show up in it.

When Albert exits like you saw, it was because it "lost contact" with BOINC. BOINC sends heartbeat messages to each applicaton that say "I'm alive", when an application doesn't get any for 30 seconds, it exits with a "no heartbeat" message. Check the result, it'll show the message in the section.

A couple of reasons this happens - BOINC does a DNS lookup, and "hangs" until the response comes back. If your internet connection isn't working or you're disconnected, and it "hangs" for 30 seconds or more, the applications will all exit. You'll see DNS lookup type errors in the log, followed by the WU restarting.

Another time is when BOINC starts a new WU, it checks to make sure it can communicate with the newly started application. If it can't, it'll "sleep" for 35 seconds and all the applications will exit with "no heartbeat" messages. This is intentional, and makes sure the just-started application is using the correct "shared memory" segment. You'll see messages about BOINC starting a new workunit and suspending (preempting) another. And the "exiting with zero status" message at least 30 seconds after that.

A third way is when BOINC can't write the client state file, it'll "sleep" for 60 seconds to give the other application time to move out of the way. A side effect is that the applications all exit and BOINC has to restart them. You'll see "couldn't write state fike" in the message. If you see these, it usually means other programs are interfering with BOINC, like Anti-Virus or file indexing utilities.

Last is when filesysten activity on the Windows system is enough that requests get queued up faster than Windows can service them. Copying a lot of small files from one directory to another is one cause, anything scanning all the files on a disk is another. BOINC can "hang" since it checks the state of its files at regular intervals. You'll notice very slow response time when this happens.

Walt

Paul D. Buck
Paul D. Buck
Joined: 17 Jan 05
Posts: 754
Credit: 5385205
RAC: 0

Walt, Gosh you are good...

Walt,

Gosh you are good... :)

I stole it ... would you look and see if I messed it up when I added/editied it a little?

See Result '(result)' exited with zero status but no 'finished' file

Thanks!

Walt Gribben
Walt Gribben
Joined: 20 Feb 05
Posts: 219
Credit: 1645393
RAC: 0

Ardis, Your results are

Ardis,

Your results are all completing sucessfully now, and are passing validation. Was that the only change you made - having Prevx "report" instead of "abort" the error?

Walt

Ardis
Ardis
Joined: 17 Dec 05
Posts: 6
Credit: 68787936
RAC: 0

RE: Ardis, Your results

Message 23399 in response to message 23398

Quote:

Ardis,

Your results are all completing sucessfully now, and are passing validation. Was that the only change you made - having Prevx "report" instead of "abort" the error?

Walt

Hi Walt,

Your explanation on Jan 6 was awesome! Thanks! Sorry for the delay in getting back to you.

The "report" change is the only proactive alteration I've made to Prevx. It has asked for write permission for Albert once or twice since the 6th, but that happened for the other clients also. Those permission requests were entered as permanent changes for that specific app only.

So at the moment all is quiet. I'm hoping it will stay that way and we can just crunch for a while.

Thanks again for your help, and for all the effort you put into looking at this,

Ardis

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.