I was looking through my tasks and I found:

lazlo
lazlo
Joined: 20 Nov 19
Posts: 9
Credit: 2958620
RAC: 0
Topic 220186

One of my wing men has almost seven times more aborted/timed out WU than completed WU!

 

https://einsteinathome.org/host/12791233/tasks/0/0

 

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4981
Credit: 18803032053
RAC: 7910871

Send them a friendly PM that

Send them a friendly PM that their host is not performing correctly.  They may not be aware they only produce bad results.

We have a dedicated thread "Invalid host messaging" over at Seti that documents all the bad hosts on the project.

The possible outcomes are that they review their host and correct the problem, ignore the PM or be unreachable because their computers are hidden.

Best scenario is they ask for help in correcting the invalid computer and start producing valid work.

The BOINC mechanism is supposed to automatically throttle the amount of work delivered to a host if it continuously returns errors or invalids.  But it rarely works as designed.  These bad hosts inflate the size of the project database needlessly.

 

lazlo
lazlo
Joined: 20 Nov 19
Posts: 9
Credit: 2958620
RAC: 0

I sent him a PM, but I don't

I sent him a PM, but I don't think it helped.  At this time he has 842 "Error" tasks!

Betreger
Betreger
Joined: 25 Feb 05
Posts: 992
Credit: 1596632289
RAC: 773591

It seems to me that most of

It seems to me that most of those "bad" hosts belong to people who are of the set it and forget it variety. 

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4981
Credit: 18803032053
RAC: 7910871

In our "Invalid host

In our "Invalid host messaging" thread we actually have had half a dozen responses from people we have PM'd alerting them that the AMD 5700/XT models should not be used for Seti because they produce nothing but bad results and have a tendency to cross-validate against another 5700/XT. The amount of bad science getting injected into the database is becoming alarming. But those that have responded have said they will remove the card from Seti.  But the "bad host" list is 4 times that amount that we have discovered so far.

It is going to get worse with the announcement today of an an even cheaper Navi 5500 model.  And still no response from AMD other than they are investigating the issue.

We are trying to come up with a way to only send work to a single 5700/XT card and never pair its wingman with another Navi card.  But no response yet from the scientists. Or just exclude sending any work to a Navi card is another solution.  But it all has to be coded into the server code for Seti for that to happen.

 

lazlo
lazlo
Joined: 20 Nov 19
Posts: 9
Credit: 2958620
RAC: 0

I just found another one with

I just found another one with over 300 "Error" tasks:

https://einsteinathome.org/host/12462057/tasks/0/0

The owner does have two other systems that produce valid results.  I sent him a PM. 

 

I am a bit surprised by this though.  I wonder why the servers are not set to detect that "X amount of failures" are  coming from a single host in "Y amount of time" and throttle them down by only sending them one task a day until the system returns "Z amount of good work". That set up seems to work well over at LHC@Home.

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4981
Credit: 18803032053
RAC: 7910871

The BOINC client does have

The BOINC client does have that mechanism coded into it.  But is does not work on the majority of projects in all conditions. The other factor is that Einstein does not run the latest BOINC server code but many versions older and has mostly gone in an independent direction.  So the latest improvements the BOINC developers have created and updated to handle bad hosts in the standard BOINC software is not being used here.

 

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7234144457
RAC: 1190432

I have seen Einstein limit

I have seen Einstein limit task dispatch to me because of recent errors.  I don't know of any evidence that the function is currently turned off here.

However, the allowed daily task download count as diminished by error doubles with each successfully returned task, so just a few successes can get a user up to full request level.  Some (many) users set their requested task pre-fetch depth very high in days.  Furthermore, the estimated task productivity can be wildly off in such common cases as switching from one task type to another (say GPU GRP to GPU GW, or worse yet going between CPU and GPU tasks).

In sum, with the mechanism working as designed, it is possible and common for individual machines to display quite large Error counts on the task list.

 

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4981
Credit: 18803032053
RAC: 7910871

Quote:I have seen Einstein

Quote:
I have seen Einstein limit task dispatch to me because of recent errors.  I don't know of any evidence that the function is currently turned off here.

Yes, I agree.  I see the same thing when I dump a bunch of work because of a silly mistake.  The BOINC mechanism works as designed in that case limiting you to a single task per day until you start returning valid work again and the mechanism slowly ramps up the work sent to you until you are back at your standard cache levels.

But why the servers continue to send out work to bad hosts at each scheduler connection when the host hasn't returned a single valid task is beyond my understanding.

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.