S5R4 on Linux x64 errors out

Metod, S56RKO
Metod, S56RKO
Joined: 11 Feb 05
Posts: 135
Credit: 834045947
RAC: 83163
Topic 193848

I have an Opteron 2218 (dual core) Debian Lenny which seemingly developed dislike for Einstein. Something like half a year ago it wasn't able to complete a S5R3 WU without problem. With S5R4 I gave it another go and indeed it was successful on almost 20 WUs. Today it choked on another one, catching SIG8: http://einsteinathome.org/task/103577395. It has some backtrace, can some developer take a look?

This machine quite nicely runs some other projects (SaH - both seti_enhanced (the KWSN variant) and astropulse - and Rosetta@Home - all apps).

Metod ...

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 832623407
RAC: 1208949

S5R4 on Linux x64 errors out

There is a [url=http://www.nabble.com/CONFIG_PREEMPT-causes-corruption-of-application's-FPU-stack-td17293854.html]known bug[/url] in the Linux kernel that was only recently fixed, it is responsible for quite a few "signal 8" errors for Einstein@Home (actually a E@H user played an important role in identifying the bug and helped to analyze it. We may not have found GWs yet, but at least a Linux kernel bug ;-) ).

The bug will be visible when the kernel is compiled with the CONFIG_PREEMPT=y setting (often set in "real-time" kernel variants).

Other potential reasons for "sig 8" problems include overclocking, undervolting, poor cooling .... but I'd go after the kernel bug possibility first. Updating the kernel to a recent one could cure this if you are lucky.

Bikeman

Metod, S56RKO
Metod, S56RKO
Joined: 11 Feb 05
Posts: 135
Credit: 834045947
RAC: 83163

Thanks for the pointers,

Message 84290 in response to message 84289

Thanks for the pointers, Bikeman!

It sounds surrealisticaly familiar ... custom-built kernel with CONFIG_PREEMPT=y etc. in my case. I can hardly believe that other reasons you've enumerated can be true in my case as this is a nice HP workstation designed to take two CPUs while only one is installed ATM. No overclocking etc. Which reminds me that I need to install another CPU and a couple of gigs of RAM before AMD becomes extinct :-)

Metod ...

Metod, S56RKO
Metod, S56RKO
Joined: 11 Feb 05
Posts: 135
Credit: 834045947
RAC: 83163

I've installed new kernel on

Message 84291 in response to message 84290

I've installed new kernel on my machine:

$ uname -a
Linux ural 2.6.26.2 #2 SMP Thu Aug 14 08:42:29 CEST 2008 x86_64 GNU/Linux

I think that this one shouldn't experience the same bug anymore. I've also changed the configuration of preemption to a slightly less aggressive one:

# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set

Hopefully this marks this thread as closed/solved.

Metod ...

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.