BRP5 Validate Errors && Error While Computing

mountkidd
mountkidd
Joined: 14 Jun 12
Posts: 176
Credit: 12555432555
RAC: 8015741
Topic 197643

I am seeing an unusually high number of EWC errors across all my machines, in the time period 18 July through to 20 July. All EWC errors appear within the first few seconds of startup. The Validate Errors have occurred on 3 different hosts, with a mix of AMD/NV gpu's.

I have looked at a number of the error WU's and the same errors seem to be occurring with the wingmen as well. Is there a problem with this batch of data?

Gord

Gavin
Gavin
Joined: 21 Sep 10
Posts: 191
Credit: 40644337738
RAC: 2

BRP5 Validate Errors && Error While Computing

I have also seen a fair number of tasks fail in the same way across my fleet.

I've had a quick look around the top hosts for others with errors and found many machines have had the same problem.

The common factor being task names beginning PB0028_003xx_xxx_x

Could be bad batch of data but I have seen more than one instance where these tasks have gone on to be successfully completed and validated by wingmen.

Gav.

edit: example here

Logforme
Logforme
Joined: 13 Aug 10
Posts: 332
Credit: 1714373961
RAC: 0

RE: The common factor being

Quote:
The common factor being task names beginning PB0028_003xx_xxx_x


Same here. I have 3 validate errors and 2 error while computing on BRP5 and all are PB0028_003 tasks.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7219624931
RAC: 975722

RE: RE: The common factor

Quote:
Quote:
The common factor being task names beginning PB0028_003xx_xxx_x

Same here. I have 3 validate errors and 2 error while computing on BRP5 and all are PB0028_003 tasks.


Also on one of my hosts with a Maxwell (GTX750) GPU.

Four "error while computing" outcomes on PB0028_003xx_xxx_x tasks, all with elapsed times between 2 and 3 seconds.

In case it might be of any use, here are links to the page with stderr output for my four:

error 1
error 2
error 3
error 4

Gavin
Gavin
Joined: 21 Sep 10
Posts: 191
Credit: 40644337738
RAC: 2

Just checked the logs across

Just checked the logs across the whole of my fleet and found a total of 81 PB0028_003xx_xxx_x tasks have ended in error. Fortunately 61 of those tasks failed after only a few seconds, the remaining 20 crunched to completion only to end up as validate errors which is a tad annoying :)

Betreger
Betreger
Joined: 25 Feb 05
Posts: 992
Credit: 1587495744
RAC: 753436

Yes that is annoying. I surly

Yes that is annoying. I surly hope they get this data set figured out soon.

Betreger
Betreger
Joined: 25 Feb 05
Posts: 992
Credit: 1587495744
RAC: 753436

I wonder if we will be told

I wonder if we will be told what went wrong.

Maximilian Mieth
Maximilian Mieth
Joined: 4 Oct 12
Posts: 130
Credit: 10275215
RAC: 3718

sry, wrong post...

sry, wrong post...

The Xorcist
The Xorcist
Joined: 16 Aug 11
Posts: 16
Credit: 464281554
RAC: 0

I realise my machine means

I realise my machine means very little but the amount of electricity alone wasted on some of these tasks seems a real shame.

What is going on here ?

And another Compute error (Unhandled Exception Detected...)
http://einsteinathome.org/task/446602736

mikey
mikey
Joined: 22 Jan 05
Posts: 12679
Credit: 1839079911
RAC: 3931

RE: I realise my machine

Quote:

I realise my machine means very little but the amount of electricity alone wasted on some of these tasks seems a real shame.

What is going on here ?

And another Compute error (Unhandled Exception Detected...)
http://einsteinathome.org/task/446602736

Are you leaving a cpu core free to feed the gpu? At only 4gb of ram in that machine if not it could be at least part of the problem. The percentage of cpu usage shown in the Boinc Manager is apparently just a figment of someones imagination compared to what is really being used. If you are not try it and see if the errors go away, if they do not you have at least eliminated one possible cause.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.