Validate error - What this really means!

robertmiles
robertmiles
Joined: 8 Oct 09
Posts: 127
Credit: 29047553
RAC: 41795

RE: Ok, My Host has not

Quote:

Ok, My Host has not produced a single valid BRP4cuda32 task, all results are coming back validate error.
OS : Ubuntu 12.04 64 bit
32bit compatibility library/s are installed as per boinc instructions
Before I'm told it must be a hardware problem ,GPU apps for seti@home and GPUgrid work fine.
No overclocking is in use , bios is specifically set to run at stock speeds.
The computer, GPU, RAM and hard drive is all new , not a speck of dust and all fans work.
RAM has been tested for 48 hrs straight and did not produce a single error.
GPU is a GTX 460 2WIN ... 2 460's on one card. (just fyi)

Been through 6 builds of nvidia driver to find one that let seti@home work. maybe I just dont have the right one yet?

Being ubuntu 12.xx I cannot get NVclock to work anymore (it worked with older distros of ubuntu) so, I have no manual voltage or fan control of the GPU and specific manual attempts to change individual functions VIA nv-config, act like something happened but no confirmation output nor do the features change from their default settings
Any ideas?

Something to check: How many of the workunits that didn't validate had at least one wingmate that also ran them under Linux? Was there a difference in the failure rate depending on whether your wingmates ran them under Linux or under some other operating system?

Tron
Tron
Joined: 5 Nov 12
Posts: 8
Credit: 49207
RAC: 0

RE: Something to check:

Quote:

Something to check: How many of the workunits that didn't validate had at least one wingmate that also ran them under Linux? Was there a difference in the failure rate depending on whether your wingmates ran them under Linux or under some other operating system?

After examining a selection of workunits, I find no correlation to the OS of my WMs , in fact all BRP4cuda WUs that ran on my machine failed to validate.

Tron
Tron
Joined: 5 Nov 12
Posts: 8
Credit: 49207
RAC: 0

hello? Still cant figure

hello?

Still cant figure out why my CUDA WUs don't pass validation ...
not one.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5870
Credit: 116965878245
RAC: 36810368

RE: Still cant figure out

Quote:
Still cant figure out why my CUDA WUs don't pass validation ...
not one.


It's not a matter of "passing validation" .... They don't even get to the point of being compared with another result. Before a comparison is even attempted, the validator does a 'sanity check' on your result and aborts the comparison with a 'Validate error' if it doesn't like what it sees. That's why advice to look at anything to do with the other result(s) in the quorum is quite irrelevant.

I picked one of your validate errors at random and checked what is recorded about it in the database. This is what I found:-

Server state 	Over [5]
Outcome 	Validate error [6] (00001000)
- a number is out of valid range for this result

It is possible (although quite unlikely) that there is some sort of bug in the app code. I originally started this thread because of the large discrepancy in validate error rates for the FGRP app between the Windows and Unix based variants. To me, that behaviour pointed to the liklihood of a bug. That doesn't seem to be the case with the BRP4 app.

I didn't respond to your first post because the tone of your message was such that you didn't want to be told it was likely to be a 'hardware problem' of some description. Unfortunately, that's the only advice I could offer.

Cheers,
Gary.

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513211304
RAC: 0

I think I would agree the

I think I would agree the problem appears hardware related.

The 2WIN is two gtx460s in SLI.

I have 2 gtx460s running on Ubuntu 10.04 - not in SLI - without problem.

This thread suggests problems with running in SLI mode.

I would try disabling SLI and/or run a single GPU task on a single card.

Tron
Tron
Joined: 5 Nov 12
Posts: 8
Credit: 49207
RAC: 0

Thanks for replying darn ,

Thanks for replying

darn , I have no control over this GPU card with this OS (ubuntu 12.04)
Anyone know if nvclock works under mint?

Quote:
I didn't respond to your first post because the tone of your message was such that you didn't want to be told it was likely to be a 'hardware problem' of some description. Unfortunately, that's the only advice I could offer.

Sorry it sounded that way, I meant only to eliminate the most obvious reasons that have been checked and re-checked already. I should not have discounted the possibility of a hardware problem 'all inclusively'.

actually ,I kind of assumed it was but, I did not know what to look for.

So this card runs in sli mode with itself ..that might just be the answer I needed.

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513211304
RAC: 0

RE: Thanks for

Quote:

Thanks for replying

darn , I have no control over this GPU card with this OS (ubuntu 12.04)
Anyone know if nvclock works under mint?

No idea.

I would search Ubuntu help forums for nvidia-xconfig, --sli=off and --multigpu=off are probably close to what you need.

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 450

RE: Anyone know if nvclock

Quote:
Anyone know if nvclock works under mint?


It doesn't. However, to use the fan speed control follow the steps in this EVGA thread.

Tron
Tron
Joined: 5 Nov 12
Posts: 8
Credit: 49207
RAC: 0

Fan control achived! very

Fan control achived! very cool , I tried this once before from a different "how to" and it did not work.
The link you provided was a winner!

Testing sli off and multipu off settings now, will report back.

Tron
Tron
Joined: 5 Nov 12
Posts: 8
Credit: 49207
RAC: 0

xorg log wrote: 13.183]

xorg log wrote:

13.183] (**) NVIDIA(0): Option "SLI" "off"
[ 13.183] (**) NVIDIA(0): Option "MultiGPU" "off"
[ 13.184] (**) NVIDIA(0): NVIDIA SLI disabled.
[ 13.184] (**) NVIDIA(0): NVIDIA Multi-GPU disabled.
[ 13.184] (**) NVIDIA(0): Option "Coolbits" "5"

:-( still makes errors , maybe this card is somehow defective.
Any other known features that might be able/disabled to try to narrow this down?

Quote:

[ 15.555] (II) NVIDIA(GPU-1): NVIDIA GPU GeForce GTX 460 (GF104) at PCI:4:0:0 (GPU-1)
[ 15.555] (--) NVIDIA(GPU-1): Memory: 1048576 kBytes
[ 15.555] (--) NVIDIA(GPU-1): VideoBIOS: 70.04.2e.00.86
[ 15.555] (II) NVIDIA(GPU-1): Detected PCI Express Link width: 16X
[ 15.555] (--) NVIDIA(GPU-1): Interlaced video modes are supported on this GPU
[ 15.555] (--) NVIDIA(GPU-1): Valid display device(s) on GeForce GTX 460 at PCI:4:0:0
[ 15.555] (--) NVIDIA(GPU-1): CRT-0
[ 15.555] (--) NVIDIA(GPU-1): CRT-1
[ 15.555] (--) NVIDIA(GPU-1): DFP-0
[ 15.555] (--) NVIDIA(GPU-1): DFP-1
[ 15.555] (--) NVIDIA(GPU-1): CRT-0: 400.0 MHz maximum pixel clock
[ 15.555] (--) NVIDIA(GPU-1): CRT-1: 400.0 MHz maximum pixel clock
[ 15.555] (--) NVIDIA(GPU-1): DFP-0: 330.0 MHz maximum pixel clock
[ 15.555] (--) NVIDIA(GPU-1): DFP-0: Internal Single Link TMDS
[ 15.555] (--) NVIDIA(GPU-1): DFP-1: 330.0 MHz maximum pixel clock
[ 15.555] (--) NVIDIA(GPU-1): DFP-1: Internal Single Link TMDS
[ 15.555] (II) NVIDIA: Using 3072.00 MB of virtual memory for indirect memory
[ 15.555] (II) NVIDIA: access.
[ 15.562] (II) NVIDIA(0): Setting mode "DFP-0:nvidia-auto-select"
[ 15.610] (II) Loading extension NV-GLX
[ 15.669] (==) NVIDIA(0): Disabling shared memory pixmaps
[ 15.669] (==) NVIDIA(0): Backing store disabled
[ 15.669] (==) NVIDIA(0): Silken mouse enabled
[ 15.669] (**) NVIDIA(0): DPMS enabled

Edit : something I noticed in the system log that might have something to do with my trouble

Quote:
Dec 3 17:53:00 zotac2-4000 kernel: [ 14.887559] NVRM: GPU at 0000:03:00: GPU-aa2769b1-5374-2a4b-6424-8bf8cca108fd
Dec 3 17:53:00 zotac2-4000 kernel: [ 14.887564] NVRM: Your system is not currently configured to drive a VGA console
Dec 3 17:53:00 zotac2-4000 kernel: [ 14.887566] NVRM: on the primary VGA device. The NVIDIA Linux graphics driver
Dec 3 17:53:00 zotac2-4000 kernel: [ 14.887568] NVRM: requires the use of a text-mode VGA console. Use of other console
Dec 3 17:53:00 zotac2-4000 kernel: [ 14.887569] NVRM: drivers including, but not limited to, vesafb, may result in
Dec 3 17:53:00 zotac2-4000 kernel: [ 14.887571] NVRM: corruption and stability problems, and is not supported.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.