Machine is getting no credits for 2.07 (GPU)

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117667546013
RAC: 35166324

Cruncher-American wrote:I am

Cruncher-American wrote:

I am still getting almost entirely no wingmen with my 2.07 (GPU) tasks, same as above. Why is this happening? Should I shut off my machine(s) and at least save on my electric bill? Is this project becoming (already!) as lazy as Seti was at times recently in not caring about their volunteers? Frankly, this sucks.

 

And are they fated to expire with no credit granted in 2 weeks or so? How long will they sit while waiting for validation? Or will they just hang for an indefinite length of time?

What you are describing is quite normal and there is no need for concern.  The project needs to use Locality Scheduling in order to keep data bandwidths for both project and volunteer under some semblance of control.  There used to be a much more complete description on the BOINC website.  Apart from mentioning it, the specialised version used by Einstein is not described at all, it now seems..

Rest assured that all tasks you complete will eventually be validated against a copy sent to another host.  It can sometimes take a week or two for enough hosts to join a particular frequency bin that you happen to be using so that the second copy can be sent out.  There is absolutely no problem - just a temporary shortage of hosts that have the same group of large data files that you have.

If you want more information, do a forum search using the term (no quotes but exactly) "locality AND scheduling".  You will find all posts using those two words.  There are bound to be quite a few because it has been asked many times in the past.

Cheers,
Gary.

Cruncher-American
Cruncher-American
Joined: 24 Mar 05
Posts: 71
Credit: 5494431735
RAC: 4344740

@Keith M: those aborted tasks

@Keith M: those aborted tasks were all early on, when I was having a problem with Einie sending me way too many tasks due to overly large task limits; that ended when I changed to 0.2 days queue, 0.1 days backup (or whatever it is called). It never affected anything - I always (for a few days until this latest) had any problems with getting work. Still don't. Just no validations because no wingmen on 2.07 GPU tasks.

Cruncher-American
Cruncher-American
Joined: 24 Mar 05
Posts: 71
Credit: 5494431735
RAC: 4344740

@Gary. Thanks, that at least

@Gary. Thanks, that at least makes sense. If it is correct, I should see ugh-pc gradually ramp back up to its previous RAC.

Any idea why I was picked for this? Or do the BOINC Gods have a dislike of me for some reason...

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117667546013
RAC: 35166324

Cruncher-American wrote:Any

Cruncher-American wrote:
Any idea why I was picked for this?

Because you're special. SurprisedLaughing

With existing hosts, the scheduler tries very hard to keep sending tasks for the same 'data set' (a group of closely related frequency bins).  When a new host come along, it's a golden opportunity for the scheduler to open up a new data set (you don't have a pre-existing data set).  Having done that, it can take a bit of time for the scheduler to add enough further hosts to that set and thereby start 'catching up' with a first host that has 'raced ahead'.

Obviously, this can be disturbing if you don't know about it.  The nice surprise comes a little later when the extra hosts start catching up to you and you suddenly see a whole lot of validations occurring quite quickly :-).

Cheers,
Gary.

Cruncher-American
Cruncher-American
Joined: 24 Mar 05
Posts: 71
Credit: 5494431735
RAC: 4344740

That may well be true, except

That may well be true, except that this phenomenon didn't start happening until the evening of the 16th, 2 weeks after I started in earnest. Until then, I was getting full credit consistently pretty quickly as far as I could tell. Then it all collapsed. But it does seem to be coming back somewhat today (in terms of credits, though almost all of the most recent of my completed 2.07 gpu tasks still have no wingman).

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117667546013
RAC: 35166324

Cruncher-American wrote:That

Cruncher-American wrote:
That may well be true, except that this phenomenon didn't start happening until the evening of the 16th, 2 weeks after I started in earnest.

That's quite a common thing as well.  You were probably using an existing well populated data set which ran out of fresh tasks.  You were probably validating other people who had returned their tasks, perhaps a lot earlier.  You made a work request but the scheduler had none left for the previous set so it gave you a brand new one.  Happens all the time.

There are huge numbers of 'different' data sets so exhausting the supply of tasks for one and having to move to a new one is a very common event, particularly when it coincides with the swap for searching for continuous GW from one pulsar (G34731) to another (VelaJr1).  A whole new bunch of different frequency bins would need to be distributed for tasks involved with the different pulsar.

It's easy for you to check your specific situation if you wish to find the precise reason for the sudden change in validation behaviour.  Task names, visible in BOINC Manager and (until they get deleted) more extensively seen in your tasks list on the website, can tell you about the pulsar and the frequency bin.  If you look back on the website, you will probably see a sudden change in either or both of those two things which will correlate with the change in validation behaviour.

I tried to keep the original comments short.  I quite often get accused of explanations that are far too long.  I certainly wasn't trying to give you any sort of implausible answer.

If you do want to check tasks listed on the website here is a typical full task name (one of my tasks) - h1_1347.95_O2C02Cl4ln0_O2MDFV2g_VelaJr1_1348.75Hz_296_1.  There are two frequencies listed 1347.95Hz and 1348.75Hz.  The frequency bins go in steps of 0.05Hz.  The two values are somehow connected to the full range of different bins that make up the complete 'data set' rquired for the analysis of that single task.

There are quite a number of bins in that range (there are data files starting h1_ and l1_ so double the actual number of bins) so you can appreciate why Locality Scheduling is needed to not inflict this big number of large data files on an unsuspecting volunteer for every individual task.  The field representing the pulsar that will be emitting the GW should be quite obvious in that full name.

Two final noteworthy points regarding the task name.  The final field will be either _0 or _1, for the two primary tasks that make up the original quorum.  If a resend is necessary (to replace a failed task) it will be _2 or above.  The field immediately prior to that (_296 in the above example) is a sequence number.  That number starts high and gradually counts down to zero with each successive task issued for the same 'data set'.  You can tell when a data set is just about exhausted - that number will be going to _0.  You can roughly tell if there are lots of hosts sharing the same data set.  Each new task you get will be a long way south of the previous number if there are lots of hosts 'drinking from the same cup'.

Cheers,
Gary.

Cruncher-American
Cruncher-American
Joined: 24 Mar 05
Posts: 71
Credit: 5494431735
RAC: 4344740

@Gary - thank you for all the

@Gary - thank you for all the time you have spent explaining the quirks of Einstein to me. I really appreciate it.

 

Just to bore you, let me explain a bit about my crunchers and crunching to you.

Big32 and ugh-PC are both dual Xeon E5-2680 v2 machines with 64GB of RAM each (thanks, eBay!). They both run Win 7 Ultimate. ugh-PC has dual RTX 2280s (thanks, Craigslist!) and Big32 has dual hybrid GTX 1080tis (also, thanks Craigslist!).

 

On SETI, the major diff between them was the faster speed of the 2280s, roughly 7-10% credit-wise.  (The 30 or so threads of CPU tasks done gave a larger proportion of credit there vs. Einstein). Before this stuff with the 2.07 GPU app, I was getting about 25% more credit on Ugh-PC. As it should have been. E.g., BOINCTasks shows the following for the last week as of 6pm 4/19:

Cruncher      CPU/GPU/day

Ugh             60/225

Big32           60/169

 

or (since running roughly the same mix), 50+ more 2.07/1.22/day, or about 100K credit

But when I looked at the pendings for the apps, I see (GPU only):

Ugh                                      Big32     Credit/task

2.07      518                          43              2000

1.22        79                          62              3465

So roughly a 900K credit deficit since I have been put in solitary with the new data set. Which, in the last 3 days, is about the change in differential in total credits for the 2 machines.

====>>> Still waiting for wingmen!!!!

Jon

Zalster
Zalster
Joined: 26 Nov 13
Posts: 3117
Credit: 4050672230
RAC: 0

Hey Jon,  Just going to

Hey Jon, 

Just going to throw this out there. You might want to look at Linux.  I can process a GW in 7.4 minutes. Since I started Einstein again 1 week ago, I have over 1000 Pending on the GW. I don't mind as the faster we process them, the faster we can move back to the other work. Unfortunately, this new work unit set has shown a major flaw in with Einstein. One that has been know for several years but hadn't been so obviously to everyone until now. Basically 1) RAM of nvidia cards  2) System configurations (be it PCIe lanes, Dual channel vs Quad Channel) 3)How much System Ram is available.   Things not really seen at Seti since those apps were highly refined.

Cruncher-American
Cruncher-American
Joined: 24 Mar 05
Posts: 71
Credit: 5494431735
RAC: 4344740

@Zalster: thanks for the

@Zalster: thanks for the suggestion about Linux, but I am not sure what your thoughts are,  I don't understand what you wrote.

When I  started Einie early in April, I was astonished that it assigned 3 task instances to each of my gpus. (Seti, of course doesn't do that). Furthermore, it seemed ok as 1) GPU utilization was around 90%, which it was on Seti, and 2) Windows Task Mgr showed 10-20% red, so not much system thrashing. All my tasks were 99%+ cpu utilization. (Except for when I was running 31 GW CPU threads. Each wanted 1.5 to 2gb of ram, so I had to upgrade the ram on both from 32 to 64 gb to avoid really bad thrashing, WTM red being around 75% or higher.  And many tasks went over 100% cpu because the system was so tied up with overcommited memory). Seti needed only a fraction of that memory, of course. Fortunately, I had enough spare ram!

How will Linux help me now? Are the Linux versions of the GW app that much more efficient? Now, they run about 50 minutes each, +/-. 

Zalster
Zalster
Joined: 26 Nov 13
Posts: 3117
Credit: 4050672230
RAC: 0

Hi Jon,  Unfortunately, this

Hi Jon, 

Unfortunately, this won't help you in regards to getting wingmen to validate your work units. I'm just pointing out the limitations of these new work units to get people to run them. People are figuring out their low end GPUs are not capable of crunching them. So they avoid them. That significantly decreases the pool. Also, given the credit compared to time to complete removes an even larger group as credit hawks (those usually with high end machines) find it more fruitful (profitable?) to crunch the Gamma Rays. That leaves a narrow amount of users to help validate the work units.

I've been comparing my work units again others with similar systems. The only difference is the OS.  Almost all are running Windows. Either 7 or 10.  My GPU GW run 3-5 times faster. 7.4 minutes is the average for my 2-2080 Supers. From what I remember with my 1080Tis, they ran in about the same time as well.  

 

Z

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.