System: WinXpHome-sp2, on a P4 ( single cpu ).
Two WU's errored out with the same problem.
Output file too big?
1/4/2006 6:43:19 AM|Einstein@Home|Resuming result r1_0389.0__19_S4R2a_0 using albert version 437
1/4/2006 7:00:35 AM|Einstein@Home|Unrecoverable error for result r1_0389.0__19_S4R2a_0 (The environment is incorrect. (0xa) - exit code 10 (0xa))
1/4/2006 7:00:35 AM||request_reschedule_cpus: process exited
1/4/2006 7:00:35 AM|Einstein@Home|Computation for result r1_0389.0__19_S4R2a_0 finished
1/4/2006 7:00:35 AM|Einstein@Home|Output file r1_0389.0__19_S4R2a_0_0 for result r1_0389.0__19_S4R2a_0 exceeds size limit.
1/4/2006 7:00:35 AM|Einstein@Home|File size: 3374418.000000 bytes. Limit: 3000000.000000 bytes
1/4/2006 7:00:36 AM|Einstein@Home|Starting result r1_0389.0__7_S4R2a_1 using albert version 437
1/4/2006 7:15:30 AM|Einstein@Home|Unrecoverable error for result r1_0389.0__7_S4R2a_1 (The environment is incorrect. (0xa) - exit code 10 (0xa))
1/4/2006 7:15:30 AM||request_reschedule_cpus: process exited
1/4/2006 7:15:30 AM|Einstein@Home|Computation for result r1_0389.0__7_S4R2a_1 finished
1/4/2006 7:15:30 AM|Einstein@Home|Output file r1_0389.0__7_S4R2a_1_0 for result r1_0389.0__7_S4R2a_1 exceeds size limit.
1/4/2006 7:15:30 AM|Einstein@Home|File size: 5245780.000000 bytes. Limit: 3000000.000000 bytes
1/4/2006 7:15:32 AM|climateprediction.net|Resuming result 3hl8_200184401_1 using hadsm3 version 413
1/4/2006 7:16:32 AM|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
1/4/2006 7:16:32 AM|Einstein@Home|Reason: To fetch work
1/4/2006 7:16:32 AM|Einstein@Home|Requesting 34560 seconds of new work, and reporting 2 results
1/4/2006 7:16:43 AM|Einstein@Home|Scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
1/4/2006 7:16:46 AM|Einstein@Home|Couldn't delete file projects/einstein.phys.uwm.edu/r1_0389.0__7_S4R2a_1_0
1/4/2006 7:16:46 AM|Einstein@Home|Started download of skygrid_0390_r_T00.dat
1/4/2006 7:16:47 AM|Einstein@Home|Finished download of skygrid_0390_r_T00.dat
1/4/2006 7:16:47 AM|Einstein@Home|Throughput 129066 bytes/sec
1/4/2006 7:16:48 AM||request_reschedule_cpus: files downloaded
1/4/2006 7:16:48 AM||request_reschedule_cpus: files downloaded
1/4/2006 8:09:29 AM||Suspending computation and network activity - running CPU benchmarks
1/4/2006 8:09:29 AM|climateprediction.net|Pausing result 3hl8_200184401_1 (left in memory)
1/4/2006 8:09:32 AM||Running CPU benchmarks
1/4/2006 8:10:30 AM||Benchmark results:
1/4/2006 8:10:30 AM|| Number of CPUs: 1
1/4/2006 8:10:30 AM|| 781 double precision MIPS (Whetstone) per CPU
1/4/2006 8:10:30 AM|| 1587 integer MIPS (Dhrystone) per CPU
1/4/2006 8:10:30 AM||Finished CPU benchmarks
1/4/2006 8:10:31 AM||Resuming computation and network activity
1/4/2006 8:10:31 AM||request_reschedule_cpus: Resuming activities
1/4/2006 8:10:31 AM|climateprediction.net|Resuming result 3hl8_200184401_1 using hadsm3 version 413
1/4/2006 9:10:31 AM|climateprediction.net|Pausing result 3hl8_200184401_1 (left in memory)
1/4/2006 9:10:31 AM|Einstein@Home|Starting result r1_0389.0__2_S4R2a_0 using albert version 437
--
Copyright © 2024 Einstein@Home. All rights reserved.
Albert errors
)
Just got another one, same error, different computer (P4 Ht ).
Bad batch of WU's?
Did you make any changes to
)
Did you make any changes to your computers recently? Most of the errors show Einstein had problems reading or writing files, so if you didn't make changes did your firewall/Anti-Virus software auto-update itself? Or did the errors occur when it was scanning for infected files?
The message log shows this: The environment is incorrect. (0xa) - exit code 10 (0xa)
Which is a Windows error.
And the log also shows this: Couldn't delete file projects/einstein.phys.uwm.edu/r1_0389.0__7_S4R2a_1_0
Which suggests something is interfering with Einstein.
Looking thru your Windows computers, they all have at least one "client error", which is why I was wondering if something changed. Like Host 356683 had two errors on Jan 1, then the problem cleared up.
These are the errors that show up in the results:
2006-01-04 22:37:16.3774 [CRITICAL]: Couldn't write compacted toplist to '../../projects/einstein.phys.uwm.edu/r1_1244.0__1892_S4R2a_1_0'
(that same error occurred a couple of times, different result files each time of course)
Can't resolve file "polka.out"
If running a non-BOINC test, create [INPUT] or touch [OUTPUT] file
w1_1086.0__1086.2_0.1_T12_S4hD_2_0
-161
To me this looks like the
)
To me this looks like the user BOINC is currently running as can't write to the slots directories - I'd suggest to check the access permissions of
Program Files\\BOINC\\slots
BM
BM
No changes were made to the
)
No changes were made to the computers prior to the errors showing up, and the computers were idleing at the time of the errors occuring. They both completed WU's after this without errors and without my doing anything to them.
I wonder if the WU's are noisy, like some of the SETI ones that pick up local transmissions.
1/4/2006 7:15:30 AM|Einstein@Home|Output file r1_0389.0__7_S4R2a_1_0 for result r1_0389.0__7_S4R2a_1 exceeds size limit.
1/4/2006 7:15:30 AM|Einstein@Home|File size: 5245780.000000 bytes. Limit: 3000000.000000 bytes
The Jan 1 errors occured after I had to do a system restore back to a date before my son opened a spam email and got us a trojan infection. :-(
RE: No changes were made to
)
I don't think its a workunit problem, other hosts are returning results that don't show these errors. And the specific error you show is most likely what happens after another error occured first.
Its the other messages, the ones that show Albert had problems writing to files, and BOINC had problems deleting files. That suggests a background process was active on the computer, perhaps accessing the files when Albert/BOINC tried to use them.
Or as Bernd suggests, that the user doesn't have write access to the BOINC\\slots directory.
Walt
Well, the 2 systems have done
)
Well, the 2 systems have done 3 & 8 WU's since the last error on the 11th., without repeating the error, and I haven't done anything to those systems, so I'm at a loss.
Back to your regular bug hunting. :-) :-)
Are the machines (or your
)
Are the machines (or your user account) part of a domain? Then something might have changed in the domain as well, might even be a database change that was slow to propagate through the domain hirarchy or even from the PDC to the BDCs...
Anyway - can't track a problem that has vanished.
Best,
Bernd
BM
RE: Are the machines (or
)
The systems are not in a domain, as they are all Win XP Home systems. They are all current, as to patch level. The Event Log(s) doesn't show any activity at the time of the error occurance. I hadn't made any changes to the systems prior to the errors occuring, and the error hasn't gone away, just isn't occuring as often.
What would cause the file size to exceed the 3 mb limit?
[clip of most resent error]
1/16/2006 4:29:48 PM|Einstein@Home|Unrecoverable error for result r1_1244.0__1782_S4R2a_2 (The environment is incorrect. (0xa) - exit code 10 (0xa))
1/16/2006 4:29:48 PM||request_reschedule_cpus: process exited
1/16/2006 4:29:48 PM|Einstein@Home|Computation for result r1_1244.0__1782_S4R2a_2 finished
1/16/2006 4:29:48 PM|Einstein@Home|Output file r1_1244.0__1782_S4R2a_2_0 for result r1_1244.0__1782_S4R2a_2 exceeds size limit.
1/16/2006 4:29:48 PM|Einstein@Home|File size: 5243115.000000 bytes. Limit: 3000000.000000 bytes
The systems are also running Seti and Climate Prediction with out error.
I ran Filmon, but the error didn't occur for the WU that was running, and the log got too big and caused the OS to choke. :-(
What virus scanner are you
)
What virus scanner are you using? Can you exclude the BOINC or at least the BOINC/slots directory amd subdirectories?
BM
BM
RE: What virus scanner are
)
I have three Win Xp Home systems, one using McAfee VirusScan [getting errors], and the other two using GriSoft AVG [freeware version][one getting errors]. The third system isn't generating these errors when running Albert WU's.
I don't see, in either one, where I can exclude files/folders from being scanned.
McAfee is set to scan on Friday @ 2000, and AVG is set to scan daily at 8:00AM.
The errors haven't occured during those time slots.
I also noticed that the WU's get to about 15 to 17 minutes of processing time when they error out.