Computaion Error - ERROR: Stepped outside the coarse grid!

Apa
Apa
Joined: 28 Dec 09
Posts: 10
Credit: 160360
RAC: 0
Topic 195501

I'm killing E@H WUs and need help figuring out why. I need help. I don't know everything that you may need but this should get us started.
Running SUSE 11.3. (Linux 2.6.34.7-0.5)
Tyan K8WE Bios 1.06
Dual Opteron 275 (4 cores)
8Gb Ram

Boinc is running from a directory from which I have permissions.
I *think* I have the 32-bit libraries installed.
I've run the memory test with the Linux 'boot' disk.
I've swapped out motherboards.
Leave tasks in memory while suspended -Yes
I do not recall having a problem when I ran XP on this machine, but I never looked. I do know that I suspected something because this system should have run faster than another I was running but never did.

Very few WUs go to successful completion.
Most end with computation error. This can happen from right away to the very end of the WU.
When I look at the WU I see "ERROR: stepped outside of the coarse grid!"

2010-12-09 11:23:48.4518 (4239) [normal]: 57/3
ERROR: Stepped outside the coarse grid!
2010-12-09 11:24:21.9451 (4239) [CRITICAL]: ERROR: MAIN() returned with error '13'
FPU status flags:
2010-12-09 11:24:21.9451 (4239) [normal]: done. calling boinc_finish(13).
11:24:21 (4239): called boinc_finish

I want to continue contributing, but it's killing me to waste so much processing time.

Where do we start? What should I have done and might have missed?

Apa
Apa
Joined: 28 Dec 09
Posts: 10
Credit: 160360
RAC: 0

Computaion Error - ERROR: Stepped outside the coarse grid!

. . . and . . .
- I switched off the Option ROM error reporting
- disable any unused on-board devices such as MAC OPSCAN, SCSI, etc..
- switched OS type to LINUX on the main BIOS page
- set MTRR to discrete
- enable the IOMMU
- set the memory hole to hardware . . .

Apa
Apa
Joined: 28 Dec 09
Posts: 10
Credit: 160360
RAC: 0

. . . and now . . . I've

. . . and now . . .
I've reinstalled opensuse 11.3 selecting the opensuse bionc (6.4.5) as one of the original install files.
I'm getting still getting computation errors and when looking in the WU it's still "ERROR: Stepped outside the coarse grid!"

HELP!

Stderr output

6.4.5

process exited with code 13 (0xd, -243)

2010-12-11 07:54:58.4207 (19472) [normal]: This program is published under the GNU General Public License, version 2
2010-12-11 07:54:58.4208 (19472) [normal]: For details see http://einstein.phys.uwm.edu/license.php
2010-12-11 07:54:58.4208 (19472) [normal]: This Einstein@home App was built at: Dec 7 2010 23:27:41

2010-12-11 07:54:58.4208 (19472) [normal]: Start of BOINC application 'einstein_S5GC1HF_1.06_i686-pc-linux-gnu__S5GCESSE2'.
command line: einstein_S5GC1HF_1.06_i686-pc-linux-gnu__S5GCESSE2 --Freq=1336.1174671 --FreqBand=0.05 --dFreq=6.71056161393e-06 --f1dot=-2.64248266531e-09 --f1dotBand=2.90673093185e-09 --df1dot=5.77553186099e-10 --skyGridFile=../../projects/einstein.phys.uwm.edu/skygrid_1340Hz_S5GC1.dat --numSkyPartitions=1204 --partitionIndex=868 --tStack=90000 --nStacksMax=205 --gammaRefine=1399 --ephemE=../../projects/einstein.phys.uwm.edu/earth_05_09 --ephemS=../../projects/einstein.phys.uwm.edu/sun_05_09 --nCand1=10000 -o ../../projects/einstein.phys.uwm.edu/h1_1335.85_S5R4__868_S5GC1HFa_1_0 --gridType=3 --printCand1 --semiCohToplist -d1 --Dterms=8 --DataFiles1=../../projects/einstein.phys.uwm.edu/h1_1335.85_S5R4;../../projects/einstein.phys.uwm.edu/h1_1335.85_S5R7;../../projects/einstein.phys.uwm.edu/l1_1335.85_S5R4;../../projects/einstein.phys.uwm.edu/l1_1335.85_S5R7;../../projects/einstein.phys.uwm.edu/h1_1335.90_S5R4;../../projects/einstein.phys.uwm.edu/h1_1335.90_S5R7;../../projects/einstein.phys.uwm.edu/l1_1335.90_S5R4;../../projects/einstein.phys.uwm.edu/l1_1335.90_S5R7;../../projects/einstein.phys.uwm.edu/h1_1335.95_S5R4;../../projects/einstein.phys.uwm.edu/h1_1335.95_S5R7;../../projects/einstein.phys.uwm.edu/l1_1335.95_S5R4;../../projects/einstein.phys.uwm.edu/l1_1335.95_S5R7;../../projects/einstein.phys.uwm.edu/h1_1336.00_S5R4;../../projects/einstein.phys.uwm.edu/h1_1336.00_S5R7;../../projects/einstein.phys.uwm.edu/l1_1336.00_S5R4;../../projects/einstein.phys.uwm.edu/l1_1336.00_S5R7;../../projects/einstein.phys.uwm.edu/h1_1336.05_S5R4;../../projects/einstein.phys.uwm.edu/h1_1336.05_S5R7;../../projects/einstein.phys.uwm.edu/l1_1336.05_S5R4;../../projects/einstein.phys.uwm.edu/l1_1336.05_S5R7;../../projects/einstein.phys.uwm.edu/h1_1336.10_S5R4;../../projects/einstein.phys.uwm.edu/h1_1336.10_S5R7;../../projects/einstein.phys.uwm.edu/l1_1336.10_S5R4;../../projects/einstein.phys.uwm.edu/l1_1336.10_S5R7;../../projects/einstein.phys.uwm.edu/h1_1336.15_S5R4;../../projects/einstein.phys.uwm.edu/h1_1336.15_S5R7;../../projects/einstein.phys.uwm.edu/l1_1336.15_S5R4;../../projects/einstein.phys.uwm.edu/l1_1336.15_S5R7;../../projects/einstein.phys.uwm.edu/h1_1336.20_S5R4;../../projects/einstein.phys.uwm.edu/h1_1336.20_S5R7;../../projects/einstein.phys.uwm.edu/l1_1336.20_S5R4;../../projects/einstein.phys.uwm.edu/l1_1336.20_S5R7;../../projects/einstein.phys.uwm.edu/h1_1336.25_S5R4;../../projects/einstein.phys.uwm.edu/h1_1336.25_S5R7;../../projects/einstein.phys.uwm.edu/l1_1336.25_S5R4;../../projects/einstein.phys.uwm.edu/l1_1336.25_S5R7;../../projects/einstein.phys.uwm.edu/h1_1336.30_S5R4;../../projects/einstein.phys.uwm.edu/h1_1336.30_S5R7;../../projects/einstein.phys.uwm.edu/l1_1336.30_S5R4;../../projects/einstein.phys.uwm.edu/l1_1336.30_S5R7;../../projects/einstein.phys.uwm.edu/h1_1336.35_S5R4;../../projects/einstein.phys.uwm.edu/h1_1336.35_S5R7;../../projects/einstein.phys.uwm.edu/l1_1336.35_S5R4;../../projects/einstein.phys.uwm.edu/l1_1336.35_S5R7;../../projects/einstein.phys.uwm.edu/h1_1336.40_S5R4;../../projects/einstein.phys.uwm.edu/h1_1336.40_S5R7;../../projects/einstein.phys.uwm.edu/l1_1336.40_S5R4;../../projects/einstein.phys.uwm.edu/l1_1336.40_S5R7
2010-12-11 07:54:58.4268 (19472) [debug]: Flags: LAL_NDEBUG, HS_OPTIMIZATION, i386, SSE, SSE2, GNUC
2010-12-11 07:54:58.4268 (19472) [debug]: glibc version/release: 2.11.2/stable
2010-12-11 07:54:58.4270 (19472) [debug]: Set up communication with graphics process.
Code-version: %% LAL: 6.4.1.3 (CLEAN b7467eb27cacb65a3f482f602dd7da1308de7915)
%% LALPulsar: 1.0.0.3 (CLEAN b7467eb27cacb65a3f482f602dd7da1308de7915)
%% LALApps: 6.4.0.3 (CLEAN b7467eb27cacb65a3f482f602dd7da1308de7915)

2010-12-11 07:54:58.5599 (19472) [normal]: Reading input data ... done.
% --- GPS reference time = 847063082.5000 , GPS data mid time = 847063082.5000
% --- Setup, N = 205, T = 90000s, Tobs = 56435059s, gammaRefine = 1399.000000
2010-12-11 07:55:31.8709 (19472) [normal]: INFO: No checkpoint h1_1335.85_S5R4__868_S5GC1HFa_1_0.cpt found - starting from scratch
% --- Cpt:0, total:834, sky:1/139, f1dot:1/6
2010-12-11 07:55:31.8808 (19472) [normal]: 1/1
% --- CG:9881 FG:10423949 f1dotmin_fg:-2.931052841924e-09 df1dot_fg:4.128328706926e-13
2010-12-11 07:56:10.3103 (19472) [normal]: 1/2
2010-12-11 07:56:48.9588 (19472) [normal]: 1/3
2010-12-11 07:57:29.1210 (19472) [normal]: 1/4
2010-12-11 07:58:09.9744 (19472) [normal]: 1/5
2010-12-11 07:58:51.4163 (19472) [normal]: 1/6
2010-12-11 07:59:31.8327 (19472) [normal]: 2/1
2010-12-11 08:00:10.9072 (19472) [normal]: 2/2
2010-12-11 08:00:52.3731 (19472) [normal]: 2/3
2010-12-11 08:01:32.9458 (19472) [normal]: 2/4
2010-12-11 08:02:12.9476 (19472) [normal]: 2/5
ERROR: Stepped outside the coarse grid!
2010-12-11 08:02:48.3299 (19472) [CRITICAL]: ERROR: MAIN() returned with error '13'
FPU status flags:
2010-12-11 08:02:48.3300 (19472) [normal]: done. calling boinc_finish(13).
08:02:48 (19472): called boinc_finish

]]>

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6596
Credit: 340222537
RAC: 135964

I've moved this query from

I've moved this query from 'Getting Started'. :-)

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4352
Credit: 253910220
RAC: 34663

We're getting some of these

We're getting some of these errors recently. My current guess is that it's another symptom of the "preemption kernels", see e.g. this thread. Try a different kernel.

BM

BM

Apa
Apa
Joined: 28 Dec 09
Posts: 10
Credit: 160360
RAC: 0

RE: We're getting some of

Quote:

We're getting some of these errors recently. My current guess is that it's another symptom of the "preemption kernels", see e.g. this thread. Try a different kernel.


I tried to read through these, but was not able to make the changes needed. How do I run a different kernel? Do I need to just run openSUSE 11.1? Can someone please tell me if I can change some configurations or what I need to do to be able to crunch E@H.

Thanks for your help.

Donald A. Tevault
Donald A. Tevault
Joined: 17 Feb 06
Posts: 439
Credit: 73516529
RAC: 0

RE: RE: We're getting

Quote:
Quote:

We're getting some of these errors recently. My current guess is that it's another symptom of the "preemption kernels", see e.g. this thread. Try a different kernel.


I tried to read through these, but was not able to make the changes needed. How do I run a different kernel? Do I need to just run openSUSE 11.1? Can someone please tell me if I can change some configurations or what I need to do to be able to crunch E@H.

Thanks for your help.

Reboot your machine, and when you see the message to do so, hit a key to bring up the boot menu. If you see a "default" kernel, choose to boot from it. (Ironically, the "Desktop" kernel, which is likely the source of your problems, is actually set as the default.)

If the "default" kernel isn't installed, open YaST and install it from there.

Apa
Apa
Joined: 28 Dec 09
Posts: 10
Credit: 160360
RAC: 0

RE: RE: RE: We're

Quote:
Quote:
Quote:

We're getting some of these errors recently. My current guess is that it's another symptom of the "preemption kernels", see e.g. this thread. Try a different kernel.


I tried to read through these, but was not able to make the changes needed. How do I run a different kernel? Do I need to just run openSUSE 11.1? Can someone please tell me if I can change some configurations or what I need to do to be able to crunch E@H.

Thanks for your help.

Reboot your machine, and when you see the message to do so, hit a key to bring up the boot menu. If you see a "default" kernel, choose to boot from it. (Ironically, the "Desktop" kernel, which is likely the source of your problems, is actually set as the default.)

If the "default" kernel isn't installed, open YaST and install it from there.

Yes. I get to choose between the default (desktop) and failsafe. The desktop option is the one that has been not working.
Should I be using the failsafe?

Donald A. Tevault
Donald A. Tevault
Joined: 17 Feb 06
Posts: 439
Credit: 73516529
RAC: 0

RE: RE: RE: RE: We're

Quote:
Quote:
Quote:
Quote:

We're getting some of these errors recently. My current guess is that it's another symptom of the "preemption kernels", see e.g. this thread. Try a different kernel.


I tried to read through these, but was not able to make the changes needed. How do I run a different kernel? Do I need to just run openSUSE 11.1? Can someone please tell me if I can change some configurations or what I need to do to be able to crunch E@H.

Thanks for your help.

Reboot your machine, and when you see the message to do so, hit a key to bring up the boot menu. If you see a "default" kernel, choose to boot from it. (Ironically, the "Desktop" kernel, which is likely the source of your problems, is actually set as the default.)

If the "default" kernel isn't installed, open YaST and install it from there.

Yes. I get to choose between the default (desktop) and failsafe. The desktop option is the one that has been not working.
Should I be using the failsafe?

No, unfortunately, this is a bit confusing. (But then, most things with OpenSuSE are, which is one reason why I hate it.)

The "Desktop" kernel that's set as the default, isn't the same as the kernel that's named "Default". To help clarify, here are the relevant lines from my GRUB configuration file.

default 0
timeout 8
##YaST - generic_mbr
gfxmenu (hd0,0)/message
##YaST - activate

###Don't change this comment - YaST2 identifier: Original name: linux###
title openSUSE 11.2 - 2.6.31.14-0.4 (default)
root (hd0,0)
kernel /vmlinuz-2.6.31.14-0.4-default root=/dev/disk/by-id/ata-WDC_WD6400AACS-00G8B1_WD-WCAUF2791337-part2 resume=/dev/disk/by-id/ata-WDC_WD6400AACS-00G8B1_WD-WCAUF2791337-part3 splash=silent showopts vga=0x375
initrd /initrd-2.6.31.14-0.4-default

###Don't change this comment - YaST2 identifier: Original name: failsafe###
title Failsafe -- openSUSE 11.2 - 2.6.31.14-0.4 (default)
root (hd0,0)
kernel /vmlinuz-2.6.31.14-0.4-default root=/dev/disk/by-id/ata-WDC_WD6400AACS-00G8B1_WD-WCAUF2791337-part2 showopts ide=nodma apm=off noresume edd=off powersaved=off nohz=off highres=off processor.max_cstate=1 x11failsafe vga=0x375
initrd /initrd-2.6.31.14-0.4-default

###Don't change this comment - YaST2 identifier: Original name: linux###
title Desktop -- openSUSE 11.2 - 2.6.31.14-0.4
root (hd0,0)
kernel /vmlinuz-2.6.31.14-0.4-desktop root=/dev/disk/by-id/ata-WDC_WD6400AACS-00G8B1_WD-WCAUF2791337-part2 resume=/dev/disk/by-id/ata-WDC_WD6400AACS-00G8B1_WD-WCAUF2791337-part3 splash=silent showopts vga=0x375
initrd /initrd-2.6.31.14-0.4-desktop

###Don't change this comment - YaST2 identifier: Original name: failsafe###
title Failsafe -- openSUSE 11.2 - 2.6.31.14-0.4 (desktop)
root (hd0,0)
kernel /vmlinuz-2.6.31.14-0.4-desktop root=/dev/disk/by-id/ata-WDC_WD6400AACS-00G8B1_WD-WCAUF2791337-part2 showopts ide=nodma apm=off noresume edd=off powersaved=off nohz=off highres=off processor.max_cstate=1 x11failsafe vga=0x375
initrd /initrd-2.6.31.14-0.4-desktop

Here, I have both the "Desktop" and the "Default" kernels installed. And, it happens that I have the "Default" kernel set as the default for when I boot the machine. But, I could easily change that to make the "Desktop" kernel the default. (Yeah, it sounds crazy, but that's the way the SuSE folk do pretty much everything.)

Anyway, it sounds like you don't have the "Default" kernel installed, so you'll want to open your package manager and search for a kernel with "default" in its name.

Apa
Apa
Joined: 28 Dec 09
Posts: 10
Credit: 160360
RAC: 0

I found the default kernel in

I found the default kernel in YAST and installed it. I now have it running and downloaded some more WUs. We'll see in the morning how this works out.

I sure hope this fixes it. Then I'll only have to automate it and forget about it.

I'll post back on how it goes.

Donald A. Tevault
Donald A. Tevault
Joined: 17 Feb 06
Posts: 439
Credit: 73516529
RAC: 0

RE: I found the default

Quote:

I found the default kernel in YAST and installed it. I now have it running and downloaded some more WUs. We'll see in the morning how this works out.

I sure hope this fixes it. Then I'll only have to automate it and forget about it.

I'll post back on how it goes.

That should fix it. The "Default" kernel has never caused a problem on either of my OpenSuSE boxen. (By the way, whatever happened to the guy who always used the word "boxen" in his posts?)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.