Validating work taking...VERY long.

Ecip
Ecip
Joined: 24 Oct 20
Posts: 3
Credit: 1621964
RAC: 0
Topic 223832

I have work from the 24th that is still pending validation.. it seems that GPU work is what's taking so long. The CPU stuff seems to go quick. What gives?

archae86
archae86
Joined: 6 Dec 05
Posts: 3161
Credit: 7272611730
RAC: 1819488

The 24th of which

The 24th of which month?

Hint: if you unhide your computer(s) someone who has an interest in helping you would actually have some information to go on.

Account|Preferences|Privacy

Should Einstein@Home show your computers on its website?

Save Changes

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 118372412167
RAC: 25530863

Ecip wrote:... What gives?Hi

Ecip wrote:
... What gives?

Hi Ecip, welcome to Einstein@Home.

The thing that "gives" is probably the lack of a 2nd result to compare with your result.  If you look at the workunit (or quorum) for any of your pending results, you will probably find that none have a 2nd completed result.  Until such a result exists, the validation process can't be started.

If you find any quorum that contains 2 successfully completed results but both still 'pending' that might indicate a problem since validation is virtually instantaneous once the 2nd result arrives.

And, as archae86 suggests, if you need assistance with figuring out how things work here, it's a good idea to allow helpers to investigate the status of any tasks assigned to your computer by 'un-hiding' it.  No sensitive information is disclosed and it saves you the hassle of having to give lots of extra details about any issues of concern to you.

Cheers,
Gary.

Ecip
Ecip
Joined: 24 Oct 20
Posts: 3
Credit: 1621964
RAC: 0

The thing that "gives" is

The thing that "gives" is probably the lack of a 2nd result to compare with your result.  If you look at the workunit (or quorum) for any of your pending results, you will probably find that none have a 2nd completed result.  Until such a result exists, the validation process can't be started.

 

If you find any quorum that contains 2 successfully completed results but both still 'pending' that might indicate a problem since validation is virtually instantaneous once the 2nd result arrives.

I see, ok. That indeed seems to be the issue. Is the second result usually computed by my own computer or does it get randomly assigned to someone?

The 24th of which month?

October.

Hint: if you unhide your computer(s) someone who has an interest in helping you would actually have some information to go on.

 

Account|Preferences|Privacy

 

Should Einstein@Home show your computers on its website?

Done. Thanks.

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

Ecip wrote:Is the second

Ecip wrote:
Is the second result usually computed by my own computer or does it get randomly assigned to someone?

Second results are never computed by any computer under the same user account. Assigned randomly to someone else, yes.

mikey
mikey
Joined: 22 Jan 05
Posts: 12780
Credit: 1867890186
RAC: 1854095

Ecip wrote: I have work from

Ecip wrote:

I have work from the 24th that is still pending validation.. it seems that GPU work is what's taking so long. The CPU stuff seems to go quick. What gives?

Another problem for you seems to be a lack of people crunching the same type of tasks you are crunching, I checked the first half dozen of your tasks and they haven't even been sent to a wingman yet and some of those were returned by you on the 24th of October. I guess it could be too many people crunching your type of tasks too and the cache is just huge. But since you returned them they should eventually validate and then the RAC should build.

archae86
archae86
Joined: 6 Dec 05
Posts: 3161
Credit: 7272611730
RAC: 1819488

Thanks for unhiding your

Thanks for unhiding your computers.

The one with the 5700 XT GPU has Gravitational Wave GPU work in the relatively new Spotlight series.  For this type of GPU work (but not for Gamma-Ray type work) Einstein economizes on the huge amount of file transmission traffic by assigning a given machine closely related tasks which share the same files whenever it can.  This tends to mean that you get a lot of tasks with the same leading part of the task name and slowly declining sequence numbers in the task name.

For example, your oldest task currently has the task name:
h1_0421.55_O2C02Cl4In0__O2MDFS2_Spotlight_422.10Hz_2261_0

The second field, in this case 0421.55, is a base frequency, telling you something about the frequency range of gravity waves being searched for in this case.  

The third from last field, in this case 422.10, is also a frequency, and gives the other bound of the search to be performed for this WU.  The difference between the two we participants have informally decided to call the Difference Frequency, or DF for short.  In this case that is 0.55, which is the highest DF I personally have yet observed assigned to a task distributed in this current series.

The next to last field, in this case 2261, is a sequence number which we see to descend steadily as tasks from a given base frequency are issued.  I've called it the issue number.  2261 is by far the highest issue number value I have seen in the current work.

In order for your task to get validated and receive credit, Einstein has to decide to assign tasks from the same base frequency as your tasks to one or more other participant computers.  Those tasks have to be sent out.  The other computer has to complete each one without detected error and send it back.  Finally your result and the other member of the quorum's result are compared at the validation step and both of you are proclaimed valid and issued credit if the comparison is "close enough".

As Mikey mentioned, your very first task does not yet have a quorum partner task issued, which means probably all the ones of lower sequence number from the same base frequency do not either.  Once Einstein does assign that frequency to another computer, if it is a fast one taking many tasks using a short queue, your validations may come through in a flood.  On the other hand, your first few quorum partners may be slow machines returning few tasks, or have large queue returning them slowly, or may generate errors.

Spotlight work still is being issued with the unusually tight deadline of 7 days.  So once you have a quorum partner with an issued task they can't delay you for more than an additional week.  If they have not given a successful reply by then, a different computer will be selected as an additional quorum partner.

When I, personally, started running a Radeon 5700 GPU machine on Spotlight Gravity Wave work, I got no quorum partners and no validations for quite a few days, in a situation looking a bit like yours.  But the pattern of task assignment eventually settled down and now most of my tasks validate within a couple of days.

If this all distresses you, one option is to use the project preferences on your account page to adjust the task type restrictions for the location (aka venue) assigned to your machine to allow work of the type:
Gamma-ray pulsar binary search #1 (GPU)

and not allow work of the type you are currently receiving:
Gravitational Wave search O2 Multi-Directional GPU

These Gamma-ray tasks would likely run very well on your machine, and generally you'll find your quorum partners are much less likely to have the oddly bursty pattern of assignment provided for GW, so at least some of your tasks will likely reach validation very quickly.  The amount of credit your 5700 machine will receive for an hour's work will also be much more.

On the other hand, you may feel that you prefer to do Gravity Wave work, in which case some patience will eventually be rewarded with a flood of validations.  It could easily be a couple more weeks.

Welcome to Einstein, and good luck.

 

GWGeorge007
GWGeorge007
Joined: 8 Jan 18
Posts: 3119
Credit: 5013180073
RAC: 1741879

Archae86, you have just given

Archae86, you have just given an EXCELLENT example explanation of how Einstein@Home works, if not most other projects also.  For that, I thank you very much.

I've been reading your posts as a 'lurker' before now, and I just wanted to say thank you.  After 3 years of belonging to BOINC projects (5 in all) and seeking help in many ways not unlike Ecip, who is very new to E@H, I've sort of understood what was going on but have never heard it explained as well as you just did.

Again, thank you for being so helpful for so many and for being so patient too.  I will continue to watch and learn as I consider myself to be a lowly being when compared to you and several others.

As I have said before; "I have so much to learn and so little time".

George

Proud member of the Old Farts Association

Ecip
Ecip
Joined: 24 Oct 20
Posts: 3
Credit: 1621964
RAC: 0

If this all distresses

If this all distresses you..

 Negative. I was worried that there may be something wrong with the validation process, but as per your explanation, it seems to be the norm.

In regards to Gravitational Wave search vs Gamma-ray pulsar binary search, I will simply let the Einstein servers decide what to send to my machine(s).. not too picky on that aspect.

archae86
archae86
Joined: 6 Dec 05
Posts: 3161
Credit: 7272611730
RAC: 1819488

For others who might browse

For others who might browse to this thread:

Ecip's 5700 XT machine continued to be unlucky in quorum partners for quite a while.  First, the project did not even send out work to a second machine for some days in many cases.  Second, the first such partner in a large number of cases did not return the work before timeout or errored out.

But eventually the flood came.  The big uptick in credited work (meaning a quorum partner returned a task and the task comparison deemed the two results "close enough" to give credit to both) came on November 10, with a really huge amount on November 11.  As I type on November 15, the machine is showing 194 validated tasks and 80 pending.  The oldest pending task was sent to the machine on November 9, so is not actually very stale at all.

This will ebb and flow.  As an example, I operate a 5700 machine 24/7 on Einstein Gravity Wave.  I run a pretty big queue (between two and three days) and my pending task count got down to just 2 on November 2, but has since rebounded to 155 today.  It is mostly down to the luck of server decisions to send out your work, and luck of the draw on how quickly and accurately your quorum partners get their tasks done.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.