Lots of Client errors

mickydl*
mickydl*
Joined: 7 Oct 08
Posts: 39
Credit: 200374822
RAC: 0

RE: Hello, I have discussed

Message 96016 in response to message 96015

Quote:

Hello,
I have discussed this problem in one of our national forums and there was a suggestion to run an infinite loop and try to provoke generation of client errors. I did so and the result is, that all workunits computed during the test were finished with an error.

Conditions of the test were simple - I have run two infinite loops (for approx. 18 hours), which have been processed by two of four cores of my CPU. Resting two cores were divided between four processes of BOINC (Einstein and Rosetta).

Can you please try it too to prove that this error may be reproduced on different hardware/distro?

Pushkin

Just to make sure, you are testing with the full preemptible kernel (CONFIG_PREEMPT=y) right?

Cheers,
Michael

Pushkin
Pushkin
Joined: 12 Mar 07
Posts: 15
Credit: 33187685
RAC: 0

RE: RE: Hello, I have

Message 96017 in response to message 96016

Quote:
Quote:

Hello,
I have discussed this problem in one of our national forums and there was a suggestion to run an infinite loop and try to provoke generation of client errors. I did so and the result is, that all workunits computed during the test were finished with an error.

Conditions of the test were simple - I have run two infinite loops (for approx. 18 hours), which have been processed by two of four cores of my CPU. Resting two cores were divided between four processes of BOINC (Einstein and Rosetta).

Can you please try it too to prove that this error may be reproduced on different hardware/distro?

Pushkin

Just to make sure, you are testing with the full preemptible kernel (CONFIG_PREEMPT=y) right?

Cheers,
Michael

Yes, I am using the default openSUSE 11.2 kernel 2.6.31.8 with following settings:

# CONFIG_PREEMPT_RCU is not set
# CONFIG_PREEMPT_RCU_TRACE is not set
CONFIG_PREEMPT_NOTIFIERS=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
# CONFIG_DEBUG_PREEMPT is not set
# CONFIG_PREEMPT_TRACER is not set

Pushkin

Caduceus
Caduceus
Joined: 5 Jan 10
Posts: 2
Credit: 2545752
RAC: 0

Hi! I'm new at Einstein, but

Hi!
I'm new at Einstein, but I run SETI sucessful for over one year.
Every Einstein-WU ends with Client error -> Compute error ... SETI and climateprediction still work properly. What's wrong?

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

RE: Hi! I'm new at

Message 96019 in response to message 96018

Quote:
Hi!
I'm new at Einstein, but I run SETI sucessful for over one year.
Every Einstein-WU ends with Client error -> Compute error ... SETI and climateprediction still work properly. What's wrong?

@Lars

All your results exit with the following error:

6.10.18

too many exit(0)s

See this thread for more info.

/Holmis

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1079
Credit: 341280
RAC: 0

The real problem is 14:10:28

Message 96020 in response to message 96018

The real problem is

14:10:28 (3144): Can't acquire lockfile (32) - exiting
14:10:28 (3144): Error: Der Prozess kann nicht auf die Datei zugreifen, da sie von einem anderen Prozess verwendet wird. (0x20)


A more recent thread about that problem would be this one.

Gruß,
Gundolf
[edit]Did you try a reboot? Sometimes it's just a left-over process.[/edit]

Computer sind nicht alles im Leben. (Kleiner Scherz)

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 730762064
RAC: 1196003

This is very interesting,

This is very interesting, because a new Wiondows up is now distributed automatically which MIGHT cure this by eliminating the switcher app. The way the switcher app works might cause virus scanner heuristics to stop the GW search app.

In any case, you might want to have a look at your virus scanning software to check it's not blocking the app.

CU
HB

mickydl*
mickydl*
Joined: 7 Oct 08
Posts: 39
Credit: 200374822
RAC: 0

RE: RE: RE: Hello, I

Message 96022 in response to message 96017

Quote:
Quote:
Quote:

Hello,
I have discussed this problem in one of our national forums and there was a suggestion to run an infinite loop and try to provoke generation of client errors. I did so and the result is, that all workunits computed during the test were finished with an error.

Conditions of the test were simple - I have run two infinite loops (for approx. 18 hours), which have been processed by two of four cores of my CPU. Resting two cores were divided between four processes of BOINC (Einstein and Rosetta).

Can you please try it too to prove that this error may be reproduced on different hardware/distro?

Pushkin

Just to make sure, you are testing with the full preemptible kernel (CONFIG_PREEMPT=y) right?

Cheers,
Michael

Yes, I am using the default openSUSE 11.2 kernel 2.6.31.8 with following settings:

# CONFIG_PREEMPT_RCU is not set
# CONFIG_PREEMPT_RCU_TRACE is not set
CONFIG_PREEMPT_NOTIFIERS=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
# CONFIG_DEBUG_PREEMPT is not set
# CONFIG_PREEMPT_TRACER is not set

Pushkin

Hi,

I have tested on three machines (one, two, three) with different kernel versions and different hardware.

The first one is a normal desktop machine running SUSE 11.2. When I switch to the fully preemptible kernel I get the signal 8 error (with or without infinite loops running). When set to CONFIG_PREEMPT_VOLUNTARY=y everything is OK.
The other two machines are two disk-less machines running a self-compiled LFS. The OS is stripped down to the minimum - so no GUI, multimedia software or unnecessary services running in the background. Both machines are running stable with CONFIG_PREEMPT=y - no errors so far. The kernel of the two LFS clients is the official kernel from Kernel HQ.

Cheers,
Michael

mickydl*
mickydl*
Joined: 7 Oct 08
Posts: 39
Credit: 200374822
RAC: 0

OK, just after finishing

OK,

just after finishing my previous mail one of my LFS machines (here) showed the first signal-8 error. So it does happen on non-SUSE systems. Interestingly, it's an AMD cpu as well.

Cheers,
Michael

tullio
tullio
Joined: 22 Jan 05
Posts: 2118
Credit: 61407735
RAC: 0

I am running SuSE 11.1 on an

I am running SuSE 11.1 on an Opteron 1210. The only error I had was on an orca-alpha unit at QMC@home. All other 8 CPUs errored on it, with both Linux and Windows OS, so it must have been a corrupt WU.
Tullio

mickydl*
mickydl*
Joined: 7 Oct 08
Posts: 39
Credit: 200374822
RAC: 0

@tullio: As you said in an

@tullio: As you said in an earlier post you are running your Linux with CONFIG_PREEMPT_NONE=y, right? The error shouldn't happen with this setting.

Meanwhile I have encountered another signal-8 error on the non-SUSE AMD machine (155842220).

Cheers,
Michael

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.