S5R2c Client Error

Jim Howe

Joined: 25 Mar 05

Posts: 18

Credit: 11707416

RAC: 0

On my machines I am getting

26 Apr 2007 1:23:01 UTC

Message 62704

(moderation:

)

On my machines I am getting SIGABRT for every case where a machine is using core_client_5.4.9 and a WU '...S5R2c_1' but not other cases. I have three machines runing core_client_5.8.15, 2 are AMD one is Intel, and so far these machines are getting to success with these WUs. All of my machines running 5.4.9 are getting the SIGABRT errors described.

Jim Howe
Zhuhai, China and Portland, Oregon

Ananas

Joined: 22 Jan 05

Posts: 272

Credit: 2500681

RAC: 0

Here's one more (not mine),

26 Apr 2007 6:16:25 UTC

Message 62705

(moderation:

)

Here's one more (not mine), he had no problems with S5RIa but every S5R2c gives him

5.4.9

Couldn't start or resume: -108

This one gets a totally different error with S5R2c :

5.2.13
process exited with code 2 (0x2)

2007-04-20 20:28:51 [Einstein@Home] execv(../../projects/einstein.phys.uwm.edu/einstein_S5R2_4.14_i686-pc-linux-gnu) failed: error -1
execv: No such file or directory

Those totally different errors look much like a pointer or array size problem. Too many open files could cause something like that too

Sir Barsteward ...

Joined: 8 Apr 06

Posts: 5

Credit: 84336737

RAC: 0

RE: All my WU - client

26 Apr 2007 9:31:55 UTC

Message 62706 in response to message 62697

(moderation:

)

Quote:

All my WU - client error. :(

Since switching over Iv'e lost 730000 CPU secs (280 credits) and have yet to see one to successful conclusion :o((

The Barsteward

Beer is proof that God loves us and wants us to be happy.
--Benjamin Franklin

Sir Barsteward ...

Joined: 8 Apr 06

Posts: 5

Credit: 84336737

RAC: 0

RE: RE: All my WU -

26 Apr 2007 14:06:26 UTC

Message 62707 in response to message 62706

(moderation:

)

Quote:

Quote:
All my WU - client error. :(

Since switching over Iv'e lost 730000 CPU secs (280 credits) and have yet to see one to successful conclusion :o((

Another 60000 secs lost for no apparent reason, so will, not transfer wu until it seems to be resolved.

The Barsteward

Beer is proof that God loves us and wants us to be happy.
--Benjamin Franklin

Ananas

Joined: 22 Jan 05

Posts: 272

Credit: 2500681

RAC: 0

Got a new one @ CPU time =

26 Apr 2007 17:00:30 UTC

Message 62708

(moderation:

)

Got a new one @ CPU time = 22089 :

- Unhandled Exception Record -
Reason: Privileged Instruction (0xc0000096) at address 0x0044FC48

http://einsteinathome.org/workunit/33428928

The other result in that WU is nice too :

5.2.13
Maximum disk usage exceeded

Nice to see that Bruce Allen still uses 5.2.13 too :-)

Signal 11 on Bruces box as well btw. : resultid=83588082

jjwhalen

Joined: 21 Jun 06

Posts: 7

Credit: 645238

RAC: 0

One of my S5R2 workunits:

27 Apr 2007 3:25:40 UTC

Message 62709

(moderation:

)

One of my S5R2 workunits: http://einsteinathome.org/workunit/33355866 looks to be stuck in database limbo.

It's status shows 2 successful results (a quorum) pending validation, plus a 3rd with client error (hence posting in this thread), plus a 4th 'unsent'. The WU appears in my Results list as 'pending' but doesn't show up in my Pending Credit list at all (hence the limbo).

No change in validation status for over 24 hours. No indication what's keeping it from validating with a quorum, though I notice that Server Status doesn't yet show the S5R2 validators/assimilators either up OR down.

It's not about the credit; I just hate to see 30-plus hours of CPU time go down the toilet, since I have other project mouths to feed. With the larger workunits, time invested in a single result that won't validate becomes a concern.

B/W

Best wishes :)

archae86

Joined: 6 Dec 05

Posts: 3161

Credit: 7305018356

RAC: 2286554

RE: One of my S5R2

27 Apr 2007 3:34:47 UTC

Message 62710 in response to message 62709

(moderation:

)

Quote:

One of my S5R2 workunits: http://einsteinathome.org/workunit/33355866 looks to be stuck in database limbo.

...

No indication what's keeping it from validating with a quorum, though I notice that Server Status doesn't yet show the S5R2 validators/assimilators either up OR down.

The validator has looked, and does not like what it sees enough to accept the result. If you look at the end of the result detail, this shows as:

Checked, but no consensus yet

So another result is already prepared to send out to another host. When that goes out and comes back, if it is sufficiently similar to one of the two already returned, it will decide the issue of who is right.

I'll speculate that one of the the hosts suffered computational error, but not of the sort which generates an illegal memory access or anything else caught but run-time checking in the application.

PaperDragon

Joined: 31 Mar 05

Posts: 6

Credit: 72813968

RAC: 0

So far every one has errored

27 Apr 2007 17:12:16 UTC

Message 62711

(moderation:

)

So far every one has errored out on one of my machines. It says client error, and when I look into the work units it has 'invalid function' logged in it.

Machine with errors

You like Myst? Uru Live returns! www.urulive.com

Adi

Joined: 1 Jan 06

Posts: 11

Credit: 43581749

RAC: 0

I have about 35 hosts, maybe

27 Apr 2007 21:06:38 UTC

Message 62712 in response to message 62711

(moderation:

)

I have about 35 hosts, maybe more
almost none of them haven't received credit last week
NONE of them are overclocked, almost all are linux servers at different companies

for example, my dual Xeon at home CANNOT be overclocked (HPxw6000)
(BIOSes for dual procs don't have such options, servers are made for stability)
but I didn't received credit since 22 apr

results

more than 800 credit lost, and that's only on 1 host!

ALL other projects (seti, climate, predictor) are OK on ALL hosts

for me the decision is simple:

int ResourceShare=1;
int veryFewClientErrors=0;
int bugMessages=1;
int date=1; 
/* hmm, not int, but let's hope it'll not take more than 100 years  365*100>32767
*/
checkNextWeek();

main() {
for (date < theEndOfTheUniverse) {
checkNextWeek();
if (veryFewClientErrors && !bugMessages ) {
increaseBack(ResourceShare);
exit;
} // end if
} // end for
} //end main

checkNextWeek() {
if ( "very few errors"==1) veryFewClientErrors=1;
if ("few bug messages") bugMessages=0;
date+=7;
} //end check
// EOF

have a nice (and successful to find bugs) week

wijata.com

Joined: 11 Feb 05

Posts: 113

Credit: 25495895

RAC: 0

It seems that every WU that

2 May 2007 5:58:57 UTC

Message 62713

(moderation:

)

It seems that every WU that was interupted/resumed gets compute error with signal 11/SIGABRT on Linux machine.
Example http://einsteinathome.org/task/83757575 and this host have more such.
It's pitty, as I have to restart them quite often...

S5R2c Client Error

Forums › Problems and Bug Reports

Comment viewing options

Forums › Problems and Bug Reports