Albert errors

CJOrtega
CJOrtega
Joined: 19 Feb 05
Posts: 39
Credit: 1742781
RAC: 0
Topic 190512

System: WinXpHome-sp2, on a P4 ( single cpu ).

Two WU's errored out with the same problem.

Output file too big?

1/4/2006 6:43:19 AM|Einstein@Home|Resuming result r1_0389.0__19_S4R2a_0 using albert version 437
1/4/2006 7:00:35 AM|Einstein@Home|Unrecoverable error for result r1_0389.0__19_S4R2a_0 (The environment is incorrect. (0xa) - exit code 10 (0xa))
1/4/2006 7:00:35 AM||request_reschedule_cpus: process exited
1/4/2006 7:00:35 AM|Einstein@Home|Computation for result r1_0389.0__19_S4R2a_0 finished
1/4/2006 7:00:35 AM|Einstein@Home|Output file r1_0389.0__19_S4R2a_0_0 for result r1_0389.0__19_S4R2a_0 exceeds size limit.
1/4/2006 7:00:35 AM|Einstein@Home|File size: 3374418.000000 bytes. Limit: 3000000.000000 bytes
1/4/2006 7:00:36 AM|Einstein@Home|Starting result r1_0389.0__7_S4R2a_1 using albert version 437
1/4/2006 7:15:30 AM|Einstein@Home|Unrecoverable error for result r1_0389.0__7_S4R2a_1 (The environment is incorrect. (0xa) - exit code 10 (0xa))
1/4/2006 7:15:30 AM||request_reschedule_cpus: process exited
1/4/2006 7:15:30 AM|Einstein@Home|Computation for result r1_0389.0__7_S4R2a_1 finished
1/4/2006 7:15:30 AM|Einstein@Home|Output file r1_0389.0__7_S4R2a_1_0 for result r1_0389.0__7_S4R2a_1 exceeds size limit.
1/4/2006 7:15:30 AM|Einstein@Home|File size: 5245780.000000 bytes. Limit: 3000000.000000 bytes
1/4/2006 7:15:32 AM|climateprediction.net|Resuming result 3hl8_200184401_1 using hadsm3 version 413
1/4/2006 7:16:32 AM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
1/4/2006 7:16:32 AM|Einstein@Home|Reason: To fetch work
1/4/2006 7:16:32 AM|Einstein@Home|Requesting 34560 seconds of new work, and reporting 2 results
1/4/2006 7:16:43 AM|Einstein@Home|Scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
1/4/2006 7:16:46 AM|Einstein@Home|Couldn't delete file projects/einstein.phys.uwm.edu/r1_0389.0__7_S4R2a_1_0
1/4/2006 7:16:46 AM|Einstein@Home|Started download of skygrid_0390_r_T00.dat
1/4/2006 7:16:47 AM|Einstein@Home|Finished download of skygrid_0390_r_T00.dat
1/4/2006 7:16:47 AM|Einstein@Home|Throughput 129066 bytes/sec
1/4/2006 7:16:48 AM||request_reschedule_cpus: files downloaded
1/4/2006 7:16:48 AM||request_reschedule_cpus: files downloaded
1/4/2006 8:09:29 AM||Suspending computation and network activity - running CPU benchmarks
1/4/2006 8:09:29 AM|climateprediction.net|Pausing result 3hl8_200184401_1 (left in memory)
1/4/2006 8:09:32 AM||Running CPU benchmarks
1/4/2006 8:10:30 AM||Benchmark results:
1/4/2006 8:10:30 AM|| Number of CPUs: 1
1/4/2006 8:10:30 AM|| 781 double precision MIPS (Whetstone) per CPU
1/4/2006 8:10:30 AM|| 1587 integer MIPS (Dhrystone) per CPU
1/4/2006 8:10:30 AM||Finished CPU benchmarks
1/4/2006 8:10:31 AM||Resuming computation and network activity
1/4/2006 8:10:31 AM||request_reschedule_cpus: Resuming activities
1/4/2006 8:10:31 AM|climateprediction.net|Resuming result 3hl8_200184401_1 using hadsm3 version 413
1/4/2006 9:10:31 AM|climateprediction.net|Pausing result 3hl8_200184401_1 (left in memory)
1/4/2006 9:10:31 AM|Einstein@Home|Starting result r1_0389.0__2_S4R2a_0 using albert version 437

--

CJOrtega
CJOrtega
Joined: 19 Feb 05
Posts: 39
Credit: 1742781
RAC: 0

Albert errors

Just got another one, same error, different computer (P4 Ht ).

Bad batch of WU's?

Walt Gribben
Walt Gribben
Joined: 20 Feb 05
Posts: 219
Credit: 1645393
RAC: 0

Did you make any changes to

Did you make any changes to your computers recently? Most of the errors show Einstein had problems reading or writing files, so if you didn't make changes did your firewall/Anti-Virus software auto-update itself? Or did the errors occur when it was scanning for infected files?

The message log shows this: The environment is incorrect. (0xa) - exit code 10 (0xa)

Which is a Windows error.

And the log also shows this: Couldn't delete file projects/einstein.phys.uwm.edu/r1_0389.0__7_S4R2a_1_0

Which suggests something is interfering with Einstein.

Looking thru your Windows computers, they all have at least one "client error", which is why I was wondering if something changed. Like Host 356683 had two errors on Jan 1, then the problem cleared up.

These are the errors that show up in the results:

2006-01-04 22:37:16.3774 [CRITICAL]: Couldn't write compacted toplist to '../../projects/einstein.phys.uwm.edu/r1_1244.0__1892_S4R2a_1_0'

(that same error occurred a couple of times, different result files each time of course)

Can't resolve file "polka.out"
If running a non-BOINC test, create [INPUT] or touch [OUTPUT] file

w1_1086.0__1086.2_0.1_T12_S4hD_2_0
-161

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4317
Credit: 250804425
RAC: 34146

To me this looks like the

To me this looks like the user BOINC is currently running as can't write to the slots directories - I'd suggest to check the access permissions of
Program Files\\BOINC\\slots

BM

BM

CJOrtega
CJOrtega
Joined: 19 Feb 05
Posts: 39
Credit: 1742781
RAC: 0

No changes were made to the

No changes were made to the computers prior to the errors showing up, and the computers were idleing at the time of the errors occuring. They both completed WU's after this without errors and without my doing anything to them.

I wonder if the WU's are noisy, like some of the SETI ones that pick up local transmissions.

1/4/2006 7:15:30 AM|Einstein@Home|Output file r1_0389.0__7_S4R2a_1_0 for result r1_0389.0__7_S4R2a_1 exceeds size limit.
1/4/2006 7:15:30 AM|Einstein@Home|File size: 5245780.000000 bytes. Limit: 3000000.000000 bytes

The Jan 1 errors occured after I had to do a system restore back to a date before my son opened a spam email and got us a trojan infection. :-(

Walt Gribben
Walt Gribben
Joined: 20 Feb 05
Posts: 219
Credit: 1645393
RAC: 0

RE: No changes were made to

Message 23137 in response to message 23136

Quote:

No changes were made to the computers prior to the errors showing up, and the computers were idleing at the time of the errors occuring. They both completed WU's after this without errors and without my doing anything to them.

I wonder if the WU's are noisy, like some of the SETI ones that pick up local transmissions.

1/4/2006 7:15:30 AM|Einstein@Home|Output file r1_0389.0__7_S4R2a_1_0 for result r1_0389.0__7_S4R2a_1 exceeds size limit.
1/4/2006 7:15:30 AM|Einstein@Home|File size: 5245780.000000 bytes. Limit: 3000000.000000 bytes

The Jan 1 errors occured after I had to do a system restore back to a date before my son opened a spam email and got us a trojan infection. :-(

I don't think its a workunit problem, other hosts are returning results that don't show these errors. And the specific error you show is most likely what happens after another error occured first.

Its the other messages, the ones that show Albert had problems writing to files, and BOINC had problems deleting files. That suggests a background process was active on the computer, perhaps accessing the files when Albert/BOINC tried to use them.

Or as Bernd suggests, that the user doesn't have write access to the BOINC\\slots directory.

Walt

CJOrtega
CJOrtega
Joined: 19 Feb 05
Posts: 39
Credit: 1742781
RAC: 0

Well, the 2 systems have done

Well, the 2 systems have done 3 & 8 WU's since the last error on the 11th., without repeating the error, and I haven't done anything to those systems, so I'm at a loss.

Back to your regular bug hunting. :-) :-)

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4317
Credit: 250804425
RAC: 34146

Are the machines (or your

Are the machines (or your user account) part of a domain? Then something might have changed in the domain as well, might even be a database change that was slow to propagate through the domain hirarchy or even from the PDC to the BDCs...

Anyway - can't track a problem that has vanished.

Best,
Bernd

BM

CJOrtega
CJOrtega
Joined: 19 Feb 05
Posts: 39
Credit: 1742781
RAC: 0

RE: Are the machines (or

Message 23140 in response to message 23139

Quote:

Are the machines (or your user account) part of a domain? Then something might have changed in the domain as well, might even be a database change that was slow to propagate through the domain hirarchy or even from the PDC to the BDCs...

Anyway - can't track a problem that has vanished.

Best,
Bernd

The systems are not in a domain, as they are all Win XP Home systems. They are all current, as to patch level. The Event Log(s) doesn't show any activity at the time of the error occurance. I hadn't made any changes to the systems prior to the errors occuring, and the error hasn't gone away, just isn't occuring as often.

What would cause the file size to exceed the 3 mb limit?

[clip of most resent error]
1/16/2006 4:29:48 PM|Einstein@Home|Unrecoverable error for result r1_1244.0__1782_S4R2a_2 (The environment is incorrect. (0xa) - exit code 10 (0xa))
1/16/2006 4:29:48 PM||request_reschedule_cpus: process exited
1/16/2006 4:29:48 PM|Einstein@Home|Computation for result r1_1244.0__1782_S4R2a_2 finished
1/16/2006 4:29:48 PM|Einstein@Home|Output file r1_1244.0__1782_S4R2a_2_0 for result r1_1244.0__1782_S4R2a_2 exceeds size limit.
1/16/2006 4:29:48 PM|Einstein@Home|File size: 5243115.000000 bytes. Limit: 3000000.000000 bytes

The systems are also running Seti and Climate Prediction with out error.

I ran Filmon, but the error didn't occur for the WU that was running, and the log got too big and caused the OS to choke. :-(

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4317
Credit: 250804425
RAC: 34146

What virus scanner are you

What virus scanner are you using? Can you exclude the BOINC or at least the BOINC/slots directory amd subdirectories?

BM

BM

CJOrtega
CJOrtega
Joined: 19 Feb 05
Posts: 39
Credit: 1742781
RAC: 0

RE: What virus scanner are

Message 23142 in response to message 23141

Quote:

What virus scanner are you using? Can you exclude the BOINC or at least the BOINC/slots directory amd subdirectories?

BM

I have three Win Xp Home systems, one using McAfee VirusScan [getting errors], and the other two using GriSoft AVG [freeware version][one getting errors]. The third system isn't generating these errors when running Albert WU's.

I don't see, in either one, where I can exclude files/folders from being scanned.

McAfee is set to scan on Friday @ 2000, and AVG is set to scan daily at 8:00AM.

The errors haven't occured during those time slots.

I also noticed that the WU's get to about 15 to 17 minutes of processing time when they error out.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.