Validate errors

paris
paris
Joined: 11 Jan 06
Posts: 50
Credit: 10934922
RAC: 14840
Topic 194421

Does anyone know why I might be getting validate errors on a string of work units (54760007, ..6, ..3, ..1, 54759941, ..38, ..37, ..15, ..11)? Some of the units on this machine (ID#1094479) did just fine but nine units apparently produced the above error. There may be more in the works as the pendings clear but I'm not sure if it will work that way.


Plus SETI Classic = 21,082 WUs

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5883
Credit: 119065828000
RAC: 24445842

Validate errors

Quote:
Does anyone know why I might be getting validate errors ....

Because you are using BOINC 5.10.32. There is a rather nasty bug in BOINCs between 5.10.21 and v6. It is triggered by the project file upload handler not being shut down in the "approved" manner - something that is a little hard to do if the server crashes. At the time of the last major outage nearly 3 months ago it caused quite a deal of havoc to a number of people so I'm surprised you didn't notice it at that stage.

If you do an advanced search for "validate error" and set the time frame for at least 6 months you will find the discussion about it last time - including this message where I gave more information about the problem. Do the search and read all the "hits" if you want the full story. I lost hundreds of results as a result of this bug.

To avoid the problem, either downgrade to 5.10.20 (the problem was introduced in 5.10.21) or upgrade to 6.2.19 or more recent. If you don't need to support CUDA on other projects, I'd steer clear of the very recent stuff as there are many problems introduced - seemingly by the mad panic to support CUDA no matter how much "collateral damage" is also introduced.

There is nothing you can do about what you have lost. Now that the project is back up and the file upload handler is receiving new uploads, there will be no further problem until the next crash. Any "pendings" you have will not turn into further "validate error"s. They could be marked "invalid" but that is an entirely separate issue and extremely unlikely unless your machine is flakey.

Cheers,
Gary.

paris
paris
Joined: 11 Jan 06
Posts: 50
Credit: 10934922
RAC: 14840

Well, drat. Thanks for the

Well, drat. Thanks for the info. I didn't have a problem before because I didn't have that particular machine running Einstein at the time. It is interesting that not all of the units from this machine ran into problems - it may have been the exact timing of the uploads. Oh, well. I guess I had better upgrade or downgrade to avoid the problem in the future as you suggested.

Thanks again.


Plus SETI Classic = 21,082 WUs

tapir
tapir
Joined: 19 Mar 05
Posts: 23
Credit: 462935446
RAC: 0

RE: Now that the project

Message 93525 in response to message 93523

Quote:
Now that the project is back up and the file upload handler is receiving new uploads, there will be no further problem until the next crash. Any "pendings" you have will not turn into further "validate error"s.

No, ... all "pendings" will turn into "validate errors" at the time your wigman report WU, at least that occur on mine host.

host

paris
paris
Joined: 11 Jan 06
Posts: 50
Credit: 10934922
RAC: 14840

Please excuse the newbie-type

Please excuse the newbie-type question, but is there a simple way to downgrade from 5.10.32 to 5.10.20? Or is it necessary to do a complete uninstall and then reinstall, reattach, and merge hosts? The computer in question is a Mac mini core duo running OS X 10.4.11 but I may want to do the same on a couple of older Macs, too. Maybe it would be better to upgrade to a version 6 (greater than 6.2.19). If so, is there a most efficient or least problematic choice for said machine?

Thank you for any help you can give. I searched the various forums in multiple projects but did not find anything explicit.


Plus SETI Classic = 21,082 WUs

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1079
Credit: 341280
RAC: 0

I'm no Mac user, but I think

Message 93527 in response to message 93526

I'm no Mac user, but I think the procedure is the same across all platforms:

  • * Stop BOINC completely, that means if running as service, use Advanced/Shutdown connected client...
    * (Make a backup of your BOINC directory)
    * Uninstall BOINC (that would

not delete any tasks or account information)
* Install 5.10.20
* Disconnect from the Internet, so that any errors don't get reported to the servers before you can restore from the backup, if that should be necessary.
* Start BOINC
* If all seems well (task list complete:-) reconnect to the 'net.
That should restart at the point where you left off. To be sure not to lose any work, you could set "No new tasks" for all projects and wait until all tasks are uploaded and reported prior to uninstalling.

Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

paris
paris
Joined: 11 Jan 06
Posts: 50
Credit: 10934922
RAC: 14840

Thank you. I think that your

Thank you. I think that your instructions can be easily adapted for OS X on a Mac. There is no equivalent to running as a service that I am aware of but the rest seems reasonable. Thanks again.


Plus SETI Classic = 21,082 WUs

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5883
Credit: 119065828000
RAC: 24445842

RE: RE: Now that the

Message 93529 in response to message 93525

Quote:
Quote:
Now that the project is back up and the file upload handler is receiving new uploads, there will be no further problem until the next crash. Any "pendings" you have will not turn into further "validate error"s.

No, ... all "pendings" will turn into "validate errors" at the time your wigman report WU, at least that occur on mine host.

host


I'm sorry, I should have phrased it a bit more clearly.

Now that the project is back up and the file upload handler is receiving new uploads, there will be no further problem until the next crash. Any "pendings" you have from these new uploads will not turn into further "validate error"s.

It goes without saying that anything that was attempted to be uploaded by the faulty BOINC client during the outage will have been lost, irrespective of whether or not your wingman had reported in. This is because the result file has been deleted permanently at source (ie permanent upload failure) rather than being held for further attempts (temporary upload failure), as is supposed to happen.

I didn't notice the start of the outage but as soon as I did I terminated network access for those hosts (with the faulty client) which didn't have commitments to other projects and suspended EAH on those hosts that had alternative projects. That way I could guarantee that no attempt would be made to upload EAH results from faulty clients during the outage. Because I missed the start of the outage, I did have some losses until I noticed the problem. I had previously moved a number of machines to non-faulty clients but still had quite a few that I hadn't attended to. When the project was back up, I was able to upload safely, the tasks completed after the network was disabled. Some of those are "pending" but I'm sure they will validate eventually when the other result is returned. Fortunately I had a big enough cache (just) to allow me to disable the network and not run out of tasks.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.