Yesterday I noticed I was getting a lot of client errors on my WU's
Everything had been running fine, but when I checked the machines I noticed they had exceeded their limit for the day.
Anyone else having issues or could there be something odd with my machines?
A little info, I have one machine that was running E@H 24/7 http://einsteinathome.org/host/531087
When it started hitting errors I split another machine that is running S@H fully with E@H to see if it would work ok http://einsteinathome.org/host/583917 but it didn't, hit or miss on results.
The odd thing is, both of these machines are hitting errors but the third machine I have on the project is not hitting any errors as of yet. http://einsteinathome.org/host/583955
For the time being I have suspended all E@H work on the main machines and redirected their work to other projects. No sense working on units that will just error.
All three have optimized clients on them, when I noticed there were errors I switched back to the normal client to see if that was the issue. It was not.
Looking back though the logs, this seems to have started teh 17th
Anyone have any ideas?
Copyright © 2024 Einstein@Home. All rights reserved.
Client error on Results
)
Complete error message:
5.2.2
The environment is incorrect. (0xa) - exit code 10 (0xa)
2006-04-18 14:53:40.5720 [normal]: Optimised by akosf (S-39) --> 'projects/einstein.phys.uwm.edu/albert_4.37_windows_intelx86.exe'
2006-04-18 14:53:40.5720 [normal]: Started search at lalDebugLevel = 0
2006-04-18 14:53:42.0095 [normal]: Checkpoint-file 'Fstat.out.ckp' not found.
2006-04-18 14:53:42.0095 [normal]: No usable checkpoint found, starting from beginning.
2006-04-18 15:00:54.6814 [normal]: Fstat file reached MaxFileSizeKB ==> compactifying ...2006-04-18 15:01:00.6814 [CRITICAL]: Couldn't write compacted toplist to '../../projects/einstein.phys.uwm.edu/z1_1395.5__2356_S4R2a_2_0'
What did you do exactly? It looks like file or directory permissions are messed up.
Michael
Team Linux Users Everywhere
Thats the odd thing, these
)
Thats the odd thing, these boxes have been running on auto since they started up, with the exception of switching projects when needed.
I saw the "Couldn't write" error and it got me thinking, so I detached E@H which removed the directory, I then waited a while and reattached it. After about 2hrs of CPU time it errors out.
Couldnt be any permission issues, its running on a local account with sys admin rights to the machine.
I was thinking it might need to be rebooted, but that doesnt make sense b/c the other machine that was erroring on was just rebooted.
The only thing I have changed as of late, but it was way before the 17th if I recall was my pref to "keep in memory" But I doubt that would cause this since my 3rd machine is still running fine.
I am at a loss.