Suddenly, Most Tasks Marked "Validate Error"?

Paul
Paul
Joined: 3 May 07
Posts: 123
Credit: 1785649111
RAC: 266880
Topic 227336

Today I found many "Validation Error" tasks.  I'm not sure I've seen this happen before.  Seems like it's happening to all my work all of a sudden.  It's hard to tell if it's all tasks or just some tasks.  I think I've received credit for some recent tasks, but not the most recent tasks.  Also, it looks like E@H is trying to re-validate by created new tasks for the same WU and sending them out.  I guess that makes sense, but it's not usually necessary, right?  Even when my tasks are judged invalid, it only takes three results, not four or five.  Something is different.

Can someone explain?

Here is an example WU: https://einsteinathome.org/workunit/623832532

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3958
Credit: 47030242642
RAC: 65090720

you have 33 tasks marked as

you have 33 tasks marked as invalid due to the validation errors, but over 1600 valid tasks. I wouldn't call that "most".

 

maybe an issue with the project validator on these tasks and they had to be resent.

 

i wouldnt worry about it. you have hundreds of valid tasks from just the past 2 days.

_________________________________________________________________________

GWGeorge007
GWGeorge007
Joined: 8 Jan 18
Posts: 3065
Credit: 4971187686
RAC: 1428222

I also have received many

I also have received many (nearly 60 so far) tasks of "Validation Error".  Of importance to me is the statement where it says: "Couldn't open file "LATeah3012L05....." No such file or directory"

Here is a shot of what I'm talking about.

I would like at least some explanation of what this means, and why we're all of a sudden getting it.

George

Proud member of the Old Farts Association

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3958
Credit: 47030242642
RAC: 65090720

GWGeorge007 wrote: I also

GWGeorge007 wrote:

I also have received many (nearly 60 so far) tasks of "Validation Error".  Of importance to me is the statement where it says: "Couldn't open file "LATeah3012L05....." No such file or directory"

Here is a shot of what I'm talking about.

I would like at least some explanation of what this means, and why we're all of a sudden getting it.

the stderr output doesn't provide anything useful for this. thats just information about the task itself (from your system). it shows the task completed fine.

the validation happens on the project's servers and we don't have any insight into what it's doing unless a project admin posts about it.

_________________________________________________________________________

GWGeorge007
GWGeorge007
Joined: 8 Jan 18
Posts: 3065
Credit: 4971187686
RAC: 1428222

Ian, you might want to look

Ian, you might want to look at your own computer's "Validation Error", computer #12803503.  You, too, have many such tasks.

Could you contact Bernd and see what's up?

.....[EDIT[.....

You seem to have a better connection with him than I do.

George

Proud member of the Old Farts Association

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3958
Credit: 47030242642
RAC: 65090720

yes I see them. everyone

yes I see them. everyone likely has them, I have more than most because my system(s) are processing many more tasks than most. which means it's not a problem with anyone's specific computer and something wrong with the project validator. there's nothing you can or need to do about it. if the project has an issue with the validator, they will see it and fix it when able.

I sent Bernd a message earlier (about something else, but also project server related) but haven't heard back. He's busy and doesnt always answer.

 

my hunch is that there was a problem with the validator not validating a bunch of results. maybe they were stuck. they unstuck it, but it errored out all the stuck ones, which needed to be sent back out. just my guess.

 

edit, just double checked. looks like most validation has quit around 1500 UTC today. the admins will need to fix it. but they usually award credits retro actively for problems like this. I'll try to cross post the issue into one of the technical forums that Bernd frequents.

_________________________________________________________________________

bluestang
bluestang
Joined: 13 Apr 15
Posts: 34
Credit: 2492970228
RAC: 0

I as well have had many, many

I as well have had many, many more than what I would usually get... if any at all in the past.

 

There must be something different with this new app for NVIDIA GPUs causing this issue.  But that doesn't appear to be the case with PAUL as he has AMD GPU.

 

I just started running E@H again about 5 days ago and these are my Invalid totals:

1x 3080ti machine = 209 Invalids and counting

1x 3070ti machine = 144 Invalids and counting

1x 3060ti machine = 104 Invalids and counting (only been 3 days)

2x 1660ti machine = 75 Invalids and counting (only been 2 days)

 

That is a lot of points gone to the crapper and quite a bit of wasted resources IMO.  Like about 10-15% minimum Invalid rate.

Sid
Sid
Joined: 17 Oct 10
Posts: 164
Credit: 971869807
RAC: 421798

Same here. Nvidia 750Ti(s).

Same here. Nvidia 750Ti(s). Dozen for 5 of Apr

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3958
Credit: 47030242642
RAC: 65090720

bluestang wrote: I as well

bluestang wrote:

I as well have had many, many more than what I would usually get... if any at all in the past.

 

There must be something different with this new app for NVIDIA GPUs causing this issue.  But that doesn't appear to be the case with PAUL as he has AMD GPU.

 

I just started running E@H again about 5 days ago and these are my Invalid totals:

1x 3080ti machine = 209 Invalids and counting

1x 3070ti machine = 144 Invalids and counting

1x 3060ti machine = 104 Invalids and counting (only been 3 days)

2x 1660ti machine = 75 Invalids and counting (only been 2 days)

 

That is a lot of points gone to the crapper and quite a bit of wasted resources IMO.  Like about 10-15% minimum Invalid rate.

has nothing to do with the nvidia app. it's affecting everyone just the same no matter the system or application (for FGRPB1G).

I've seen the admins award credits retroactively for problems like this that aren't the user's fault. wouldnt be surprised if they do it here too.

or you could suspend Einstein and crunch something else until they fix it.

_________________________________________________________________________

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4964
Credit: 18750763806
RAC: 7099853

Wonder if the recent change

Wonder if the recent change to the validators for BRP4 ARM64 tasks and new 1.61 application is connected somehow.

 

Paul
Paul
Joined: 3 May 07
Posts: 123
Credit: 1785649111
RAC: 266880

Thanks everyone.  I

Thanks everyone.  I appreciate your responses.  I feel better now about waiting.  Was worried something on my end was broken.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.