Computational error in GCS5 v1.05

next_ghost
next_ghost
Joined: 25 Mar 05
Posts: 12
Credit: 246383
RAC: 0
Topic 195224

Hi,
I've recently switched to a new machine and most of my GCS5 workunits have been crashing since. Arecibo work units don't seem to crash at all but sometimes they don't validate (2 out of 5 validated work units failed so far). You can see an example of my crash reports here (2 different errors but both very common):
http://einsteinathome.org/task/186629067
http://einsteinathome.org/task/186783539

The machine is Intel Core i5 M 430, Gentoo Linux, kernel 2.6.34-r2 x86_64 with i686 userspace (clean install).

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5883
Credit: 119064884675
RAC: 24462737

Computational error in GCS5 v1.05

Your new machine joined the project on 26 July and it has been having problems right from the getgo. The problem is most likely hardware related so here is a list of things for you to check.

  • * If you are overclocking, you need to back off the overclock a bit. In particular, make sure you aren't overclocking your RAM excessively (try relaxing timings if you are).
    * Your CPU may be overheating - check for proper application of the heat sink and fan. There should be no 'looseness' in the heat sink.
    * Run a temperature monitoring program and check both idle and full load temperatures. If your temps seem high, go back to the previous item.
    * Run an extended test using memtest86 to look for any RAM errors. If you get any RAM errors, check your RAM timings against what is specified in SPD. If your actual timings are appropriate you need to try different RAM stick(s).
    * Is your PSU adequate for the job and is it in good condition. Overloaded PSUs can give these symptoms as well.

There are other possible causes but the above are the most likely things to check first. Let us know how you get on.

EDIT: Just noticed the 'M' in your CPU description so I guess your machine is a laptop?

If so, some of the above comments (eg overclocking, looseness of heat sink, quality of PSU, etc) shouldn't apply. My suggestions were based on a desktop rather than a laptop - sorry about that.

Laptops aren't really designed to run for long periods at 100% CPU load on all cores so maybe you should see if limiting BOINC to say 2 or 3 cores solved the problems. That should certainly give you a much better idea of what is actually happening.
.

Cheers,
Gary.

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 0

On exit code 38 / signal 8,

Message 98822 in response to message 98821

On exit code 38 / signal 8, we had the older FAQ with the kernel problem. Is it possible something similar is broken again in the newer kernels?

Also see the S5R2 run exit code 99 explanation. The solution might still work.

next_ghost
next_ghost
Joined: 25 Mar 05
Posts: 12
Credit: 246383
RAC: 0

Yes, my machine is a laptop.

Yes, my machine is a laptop. CPU temperature is at 85°C most of the time, ACPI reports it should be safe up to 100°C. I don't overclock manually but I run conservative CPU governor to reduce power consumption when running from battery. Memory is also OK, I've run memtest86 before installation. I also run SETI@Home and CPDN. SETI works fine (though most of my finished work units are still pending validation) and CPDN doesn't crash more than usual.

I've set BOINC to use only 2 cores and restarted Einstein@Home. I'll keep kernel config as is for now and try changing preempt options only if the problem persists.

You might also be interested in this line of BOINC log that I've got after BOINC redownloaded all Einstein files after project restart:

So 7. srpen 2010, 18:00:46 CEST Einstein@Home [error] File l1_0870.60_S5R7 has wrong size: expected 3345408, got 3346102

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.