Validate error - What this really means!

CGR

Joined: 2 Sep 12

Posts: 3

Credit: 272118

RAC: 0

RE: It is not uncommon that

20 Jul 2013 14:00:13 UTC

Message 107890 in response to message 107887

(moderation:

)

Quote:

It is not uncommon that we have a few bad BRP4 "beams" every month that slip through pre-processing without being detected as such. Most of the tasks generated from these will end up as validate errors.

Normally we do have scripts and internal web pages that monitor these, and it's usually me who then cancels the respective workunits.

Is this what happened to WU #170248721? I never had a validate error before that and I would like to understand what caused it. So any information on this particular WU is appreciated.

Thanks in advance.

Alex

Joined: 1 Mar 05

Posts: 451

Credit: 507064930

RAC: 87789

RE: what happened to WU

20 Jul 2013 14:06:34 UTC

Message 107891 in response to message 107890

(moderation:

)

Quote:

what happened to WU #170248721?

already marked as cancelled. Happens from time to time. Nothing to worry about.

Maximilian Mieth

Joined: 4 Oct 12

Posts: 130

Credit: 10286732

RAC: 4018

Thanks to this thread I am

30 Jul 2013 8:08:04 UTC

Message 107892

(moderation:

)

Thanks to this thread I am well informed about validate errors, but what does 'validation inconclusive' actually mean? See e.g. this workunit.

MAGIC Quantum M...

Joined: 18 Jan 05

Posts: 1886

Credit: 1408157900

RAC: 1159282

That happens when there are

30 Jul 2013 8:36:43 UTC

Message 107893 in response to message 107892

(moderation:

)

That happens when there are different results from the same tasks by different hosts which at times it takes a 3rd result to see if 2 of the results are the same.....or not at all.

edenist

Joined: 18 Jan 05

Posts: 2

Credit: 17516500

RAC: 0

Hi everyone, I've just

1 Jun 2014 13:32:55 UTC

Message 107894

(moderation:

)

Hi everyone,

I've just built a new system and have been running BOINC on it this last week.
It's an AMD A10-7850K, running on Ubuntu 14.04 x64.

It seems that almost all of my BRP5 WU's are resulting in Validate Errors when being run with the opencl-ati application.

I had a couple which passed last week, but all the rest are failing. Is there an issue with running GPU code on the APU's?

Here are a couple of WU's which have validate errors...

http://einsteinathome.org/task/437729215
http://einsteinathome.org/task/437808773
http://einsteinathome.org/task/437841431

And this one passed [http://einsteinathome.org/task/437809655]

Any information would be helpful.

Cheers!

[EDIT]:
Taking a closer look at the stderr logs from each WU, it appears that the two which completed successfully never had an application restart occur during there run, whereas all failed WU's did. Is it possible that the pause/resume is causing an error in the computation?

Gordon Lack

Joined: 19 Jun 13

Posts: 6

Credit: 1378284

RAC: 557

I've just had a

26 Jul 2014 1:58:11 UTC

Message 107895 in response to message 107894

(moderation:

)

I've just had a BRP5-opencl-ati complete and fail to validate.

http://einsteinathome.org/workunit/194577452

The stderr output finishes with:

Quote:

[01:39:49][4893][INFO ] Data processing finished successfully!

but the validate status on the job is:

Quote:

Workunit error - check skipped

(there are no errors mentioned...)

Mind you, I've just spotted that the link above now reports:

Quote:

errors WU cancelled

Logforme

Joined: 13 Aug 10

Posts: 332

Credit: 1714373961

RAC: 0

RE: I've just had a

26 Jul 2014 6:21:05 UTC

Message 107896 in response to message 107895

(moderation:

)

Quote:

I've just had a BRP5-opencl-ati complete and fail to validate.

This problem is covered in another thread

John Jamulla

Joined: 26 Feb 05

Posts: 32

Credit: 1172292348

RAC: 547295

Sorry to bother all of you,

26 Jul 2014 13:54:58 UTC

Message 107897

(moderation:

)

Sorry to bother all of you, but would like to know what's wrong with my GPU crunching tasks....

I have a relatively new machine (all kinds of problems with it out of the box, bad Mobo, mempry, CPU) for like 1st 6 months I had it. Theoretically fixed now - appears not with GPU with einstein@home though).

The GPU tasks won't seem to ever validate correctly, ever.

It's a i7-3930k 6-core, AsRock Z77 Extreme 6 Mobo, O.C worthy Mobo and Memory, excellent Corsair PSU, etc. Waterblock cooler. etc. There isn't a heat problem with it, memory and CPUa re fine, most of the tasks from the CPU are fine and working as expected (no errors most of the time).

CPU has/using 12 threads, overclocked CPU to 4.3 GHz (not GPU OC). I have a GTX770 in it.
It appears the GPU is running, but NO TASK ever validate. I don't see a single CPU task from the CPU as "good".

In my list of tasks under "invalid", I either get "Validate error" or "Completed, marked as invalid"
My computer ID: 11453074
Tasks are all (ON GPU): BRP5-cuda32-nv301

Here's at least one of each type with validation errors, can someone tell me what's wrong?

Loading GPU driver 337.88 "fresh" from NVIDIA now to see if it matters...
http://einsteinathome.org/workunit/194670882 - Binary Radio Pulsar
Search (Perseus Arm Survey) v1.39 (BRP5-cuda32-nv301

This one complete, but was marked as invalid:
http://einsteinathome.org/workunit/194648623

How can I tell what's going wrong? There doesn't seem to be a graphics type problem with my GPU, it's fine on screen (no games on this machine, cruncher only).

I am set to run 2 tasks simultaneous on the GPU...

Gavin

Joined: 21 Sep 10

Posts: 191

Credit: 40644337738

RAC: 1

Here's a snippet from one of

26 Jul 2014 14:16:52 UTC

Message 107898 in response to message 107897

(moderation:

)

Here's a snippet from one of your output files (doesn't matter which they are all the same) that hopefully gives a clue, in bold!:

7.2.42

Activated exception handling...
[15:01:26][3632][INFO ] Starting data processing...
[15:01:26][3632][INFO ] CUDA global memory status (initial GPU state, including context):
------> Used in total: 299 MB (1750 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 0 MB
[15:01:26][3632][INFO ] Using CUDA device #0 "GeForce GTX 770" (0 CUDA cores / 0.00 GFLOPS)
[15:01:26][3632][INFO ] Version of installed CUDA driver: 6000
[15:01:26][3632][INFO ] Version of CUDA driver API used: 3020
[15:01:27][3632][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...

I would be tempted to remove then re-install the driver for your GPU with a fresh copy from the NVidia website.

Claggy

Joined: 29 Dec 06

Posts: 560

Credit: 2699403

RAC: 0

RE: ------> Used in total:

26 Jul 2014 14:28:51 UTC

Message 107899 in response to message 107898

(moderation:

)

Quote:

------> Used in total: 299 MB (1750 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 0 MB
[15:01:26][3632][INFO ] Using CUDA device #0 "GeForce GTX 770" (0 CUDA cores / 0.00 GFLOPS)
[15:01:26][3632][INFO ] Version of installed CUDA driver: 6000
[15:01:26][3632][INFO ] Version of CUDA driver API used: 3020

I wouldn't worry about that, the Cuda 3.2 api doesn't know about how GTX 770's are made up,
Arvid Almstrom's GTX780's top Nvidia host also doesn't report the number of Cuda cores or GFLOPS :

http://einsteinathome.org/host/6216490

[18:11:04][11780][INFO ] CUDA global memory status (initial GPU state, including context):
------> Used in total: 675 MB (2399 MB free / 3074 MB total) -> Used by this application (assuming a single GPU task): 0 MB
[18:11:04][11780][INFO ] Using CUDA device #1 "GeForce GTX 780" (0 CUDA cores / 0.00 GFLOPS)
[18:11:04][11780][INFO ] Version of installed CUDA driver: 6000
[18:11:04][11780][INFO ] Version of CUDA driver API used: 3020

and on my GT650M:

[18:30:27][5600][INFO ] CUDA global memory status (initial GPU state, including context):
------> Used in total: 71 MB (1978 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 0 MB
[18:30:27][5600][INFO ] Using CUDA device #0 "GeForce GT 650M" (0 CUDA cores / 0.00 GFLOPS)
[18:30:27][5600][INFO ] Version of installed CUDA driver: 6050
[18:30:27][5600][INFO ] Version of CUDA driver API used: 3020

Claggy

Validate error - What this really means!

Forums › Problems and Bug Reports

Comment viewing options

Forums › Problems and Bug Reports