I keep getting tons of compute errors since around December 5 2007 (guess from BoincStats credit graph). Since then, all work units but one ended up like this:
5.10.28
process exited with code 38 (0x26, -218)
2008-01-23 15:41:35.6203 [normal]: Built at: Nov 29 2007 15:00:43
2008-01-23 15:41:35.6205 [normal]: Start of BOINC application 'einstein_S5R3_4.20_i686-pc-linux-gnu'.
2008-01-23 15:41:36.0558 [debug]: Reading SFTs and setting up stacks ... done
2008-01-23 15:45:31.6187 [normal]: INFO: Couldn't open checkpoint h1_0749.05_S5R2__89_S5R3a_1_0.cpt
2008-01-23 15:45:31.6288 [debug]: Total skypoints = 1199. Progress: 0,
$Revision: 1.80 $ OPT:0 SCV:2, SCTRIM:8
c
1,
APP DEBUG: Application caught signal 8.
FPU status word ffff98c1, flags: ERR_SUMM STACK_FAULT INVALID
Obtained 7 stack frames for this thread.
Use gdb command: 'info line *0xADDRESS' to print corresponding line numbers.
einstein_S5R3_4.20_i686-pc-linux-gnu[0x80a4b9e]
einstein_S5R3_4.20_i686-pc-linux-gnu(LocalComputeFStatFreqBand+0x1b33)[0x80ad153]
einstein_S5R3_4.20_i686-pc-linux-gnu(MAIN+0x352d)[0x80a495d]
einstein_S5R3_4.20_i686-pc-linux-gnu[0x80a5b34]
../../projects/einstein.phys.uwm.edu/einstein_S5R3_4.20_i686-pc-linux-gnu.so(_Z6foobarPv+0x14)[0xb7ce6e24]
/lib/libpthread.so.0[0xb7ec918b]
/lib/libc.so.6(clone+0x5e)[0xb7e5314e]
Stack trace of LAL functions in worker thread:
LocalComputeFStatFreqBand at line 201 of file /home/bema/einsteinathome/HierarchicalSearch/EaH_build_release_einstein_S5R3_4.20/extra_sources/lalapps-CVS/src/pulsar/hough/src2/LocalComputeFstat.c
LocalComputeFStat at line 289 of file /home/bema/einsteinathome/HierarchicalSearch/EaH_build_release_einstein_S5R3_4.20/extra_sources/lalapps-CVS/src/pulsar/hough/src2/LocalComputeFstat.c
(null) at line 0 of file (null)
At lowest level status code = 0, description: NO LAL ERROR REGISTERED
]]>
I use Gentoo Linux, Einstein stopped working on Boinc 5.8.15 (worked fine for several months), updating to 5.10.28 didn't help. All other projects (SETI and CPDN) work fine.
Copyright © 2024 Einstein@Home. All rights reserved.
Lots of FPU errors
)
Hi!
What's striking is that this PC is running extremly slow!
This shows it takes almost 4 minutes (!) to just read in the input files, something that should take only 10-20 seconds on your hardware. Either there is a process running on this machine that takes almost 100% of the CPU and slows the E@H app down to a crawl, or the CPU is throttled down because of a problem with cooling, which would also explain the computation errors. There's definitely something wrong with this PC.
Bikeman
RE: This shows it takes
)
It's a laptop so the HDD transfer rates suck. And I think I was running an update at that time which does take 100% of both CPU cores on Gentoo. If you take a look at the long list of failed tasks in my profile, you'll find out that most of them read input files in 20-30 seconds. Temperature is not a problem according to ACPI.
Hi! So far the wingmen
)
Hi!
So far the wingmen (even those using the same Linux app) do not get those errors, so I really think this is specific to your PC, either corruption of files or hardware failure.
Please note that the same host did produce client errors on SETI@Home and CPDN recently, so I really do think that PC needs repair.
http://setiathome.berkeley.edu/results.php?hostid=2720482
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/results.php?hostid=818431
Sorry I have no better news,
Bikeman
RE: So far the wingmen
)
Yes, it may be specific to my PC but it's *NOT* a hardware failure. The problem must be somewhere between Einstein@Home client and system libraries or kernel.
I'm using kernel 2.6.23 since December 17, I used 2.6.22 before. Nothing else was changed around the time of first failures. Were there any Einstein@Home client updates around December 15?
2 segfaults out of 13 reported work units. Most likely minor SETI client bug. 5 results were correct, the rest is pending validation.
Broken work units which crashed on all computers so far, see for yourself. Another work unit has been running for 124 hours and the results are correct so far.
RE: I'm using kernel
)
You are currently using version 4.20 of the science app which was made the official app around December 03. There have been bugfixes recently and the current beta (4.27) is much improved. It would be a very good idea for you to give the 4.27 beta app a spin to see if your problems are solved. It's quite easy to do this and you should check out the beta test page for details about the test app and to download the test package.
To run the beta app, all you need to do is:-
* Completely stop BOINC
* Copy the package contents into your Einstein project directory, overwriting any existing app_info.xml file that might be there
* Restart BOINC
If there is an "in-progress" task, the new app will pick up from where the old app left off - nothing is lost. It might appear that the old app is still being used but that is not so. New tasks downloaded will be "branded" with the version number of the new app. You should get a nice little performance boost as well.
If this doesn't solve your problems then I'm afraid I'd have to agree that you are very likely to have a hardware issue, despite your confidence that you don't. Also please realise that different projects put different stresses on different parts of your system so that it's not impossible to see just one project falling over and perhaps not the others.
Cheers,
Gary.
4.24 crashes as well,
)
4.24 crashes as well, rebooting to the old kernel (on which Einstein@Home worked fine for over a month) didn't help either.
The problem always results in only two error messages:
or
These errors happen in both 4.20 and 4.24.
And it appears that my PC has
)
And it appears that my PC has finished only 1 workunit since December 5 (or it might have been a workunit finished before December 5 which was pending credit for long time) which pretty much overlaps with 4.20 official release.
RE: And it appears that my
)
I really don't think this is related to 4.20, but to double check, you could actually re-install the version you were using prior to 4.20 and see if it crashes.
Version 4.17 (in the beta package with app_info.xml) can still be downloaded from
http://einstein.phys.uwm.edu/app_test/linux/einstein_S5R1_4.17_i686-pc-linux-gnu.tar.gz
If this one crashes as well, but worked before, I guess you should be convinced that there might be something wrong with that CPU of yours.
CU
Bikeman
Scheduler won't give me any
)
Scheduler won't give me any work units for 4.17.
RE: Scheduler won't give me
)
Did you modify your app_info.xml file?
You probably need to add clauses so that the more recent data versions like 4.20 and 4.24 are allowed to be processed by the older 4.17 app. Open the file with a text editor and see how different task versions are handled.
What version is shown against any tasks you have showing on your Boinc Manager Tasks tab?
One other point. I posted previously and suggested using the 4.27 version app. I was careless and called it a beta app whereas it's really called a "power user" app - see the appropriate sticky thread. I've got it installed on at least 20 different machines and it's working without issue for me. It's supposed to be the 4.24 code base but maybe there are changes that might make a difference to your machine. It's worth a try. You could use the 4.27 app_info.xml unchanged if you had either 4.20 or 4.24 "branded" tasks in your tasks list.
Cheers,
Gary.