Work unit cache decreasing over time

Richard de Lhorbe
Richard de Lhorbe
Joined: 15 Dec 05
Posts: 46
Credit: 9519458129
RAC: 772461
Topic 196404

I have two computers running E@h that each have a similar problem ... over time, the work unit cache keeps getting smaller and smaller. One machine is running Ubuntu Linux, and I just upgraded this morning to the newest 12.10 LTS version. In doing so, the cache dropped from having about 8-10 work units spare, to only one. This machine offers 8 cores and a GPU, and is currently set to a location (school) that I have tried to force to a higher cache by setting it to 9.9 days (I read somewhere on this site in the past that this temporary change MIGHT force the software to wake up and reset the parameter that controls the cache size, but it is NOT working). The other machine is a Mac, similar problem, except it is 2 cores and it is down to perhaps 2 to 4 spare work units ... it is also temporarily set to location school, trying to force it to the 9.9 days of cache ... no luck here either.

I tried for a couple of hours this morning to search this Discussion area for the threads that discussed this issue that I found in the past, but I am obviously searching using the wrong words, because I can't find anything at all. I recall the discussion I found a year or more ago was how to edit a certain file to change a parameter to fix the problem, but it was written around fixing Windows, not Linux or Mac, and at that time I could not find the file name they referred to either on the Ubuntu machine, or on the Mac. Both computers are dedicated to E@h and run all the time with minimal errors (just some of the gamma-ray work units end up invalid, a project which seems to just have a somewhat higher error rate anyway), and so have no excuse to constantly shrink their caches. Other machines I have that have been running at least as long (i.e. years) show absolutely no signs of a shrinking cache. Can someone assist with what files to edit and what parameter to reset ? The two problem computer numbers are 4073798 and 3116653. Just a note that if you decide to check the performance parameters for 4073798 it may show a low percentage time that BOINC client is running, but I had this machine down for a few days recently which has warped the number, it would normally show over 99%.

Thanks, Richard

Claggy
Claggy
Joined: 29 Dec 06
Posts: 560
Credit: 2699403
RAC: 0

Work unit cache decreasing over time

What are Both of your cache settings?

Boinc 7 has a totally new rewritten scheduler, the first cache preference is the low water mark, and the 2nd cache preference is the high water mark, Boinc 7 will fill to the high water mark, then wait until it gets to the low water mark before filling again,

Claggy

Richard de Lhorbe
Richard de Lhorbe
Joined: 15 Dec 05
Posts: 46
Credit: 9519458129
RAC: 772461

Right now I have modified the

Right now I have modified the settings for "School" to be about as long as possible, "Contact server every 9.9 days" plus "Maintain enough work for an additional 2.6 days". Both of these problem computers have been set as "School". Prior to changing them to this "School" setting I had them set as "Work" for which my normal settings are "Connect every 3.9 days" plus an "additional 1.1 days", which serves most of my computers needs quite well, riding over Internet connection issues as well as Einstein issues. I was hoping forcing them to "school" would force the caches as full as possible, and then I would change them back to the normal "Work' setting. However, as noted, that plan has not worked .... I did the change to "School" well over a month ago, but the caches are still getting gradually smaller, which is why I need to try something else.

Regards
Richard

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117897174724
RAC: 34606070

RE: .... I did the change

Quote:
.... I did the change to "School" well over a month ago, but the caches are still getting gradually smaller, which is why I need to try something else.


I'm not really surprised that your caches are getting smaller and then topping up some before heading smaller again. The 8 core machine you seem to be referring to got a number of new tasks fairly recently but will probably not get much more work until the cache is nearly empty again.

I believe the problem is that you are telling BOINC that it could be 10 days before you connect to the internet again. So imagine what would happen if you've just had a connection and then a task which has already been in your cache for a bit now starts crunching. BOINC has to allow for the possibility that when that task finishes, it might have to hang around in your work queue for up to 10 days before it can be reported. BOINC will probably see this as a risk of a deadline miss so it will be very conservative in how many tasks it allows to download.

It's really quite counter productive to set a cache size anywhere near or above half the deadline interval, particularly if the CI preference is the major component of the cache size.

Claggy also warns about the changes to the meanings of these preferences for version 7 BOINC. It seems to me that if you would like a 4 day cache for V7 (which is entirely reasonable) you should set the CI (connect to internet) preference to 4 days and the ED (extra days) preference to 0.1 days. That way, the low water mark will be 4 days and the high water mark will be 4.1 days. If you set these values, you may need to wait a bit for BOINC to be comfortable that any 'already in cache' tasks are not at deadline risk. It shouldn't take too long for this to happen.

I'm assuming that these hosts are nearly 'always on' and that BOINC doesn't think otherwise?

Cheers,
Gary.

Richard de Lhorbe
Richard de Lhorbe
Joined: 15 Dec 05
Posts: 46
Credit: 9519458129
RAC: 772461

As I noted earlier,

As I noted earlier, temporarily upping the cache to a really high value was recommended under BOINC 6.x as a possible solution. Until recently, I had "school" set at about 5 days, only upping it to 9.9 recently based on the old recommendation. The Mac in question has been running BOINC 7.0.25 for awhile now under both "school" settings ... no change in cache size (i.e. still small). The Ubuntu has been running BOINC 6.x until just a couple of days ago, and only with the recent Ubuntu upgrade to 12.10 LTS did BOINC get updated to 7.0.24.

Anyway, based on the discussion for BOINC version 7.x behavior here, I have just changed the "school" setting to 4.0 and 0.2 and will see what happens with both machines and report back in a day or so.

Cheers
Richard

p.s. Yes, both machines are nearly always on (>99%), so I don't think BOINC would think otherwise

Richard de Lhorbe
Richard de Lhorbe
Joined: 15 Dec 05
Posts: 46
Credit: 9519458129
RAC: 772461

Well, it has been over two

Well, it has been over two days now at the new suggested cache settings, and no change in the work unit cache size, both machines are still way too small ... the Linux machine is hovering only around 7 work units, and the Mac is down to perhaps one, both of these with a cache setting of 4 and 0.2. I am going to try setting them even lower at 2.5 and 0.1 to see what happens, but have little expectation of any real change, even less of any improvement. Any other suggestions ? I still think, based on the old discussion thread, that there is some parameter that can be edited in some BOINC file to correct this ......

Thanks
Richard

Claggy
Claggy
Joined: 29 Dec 06
Posts: 560
Credit: 2699403
RAC: 0

Are you using Web Preferences

Are you using Web Preferences or the Local Preferences?, the Local Preferences overide the Web Preferences, you may need to clear them,

Claggy

Pooh Bear 27
Pooh Bear 27
Joined: 20 Mar 05
Posts: 1376
Credit: 20312671
RAC: 0

Also are you using an online

Also are you using an online manager, like BAM then making changes only on Einstein? BAM will override that.

Mike Pacelli
Mike Pacelli
Joined: 15 Feb 12
Posts: 3
Credit: 2407753
RAC: 0

Along the same lines: I run 2

Along the same lines: I run 2 dual core units, one Intel (*13) & 1 AMD (*38)and noticed a couple of things. The cache of "ready to run" on the *13 unit has been shrinking in last couple of days and the cache of "ready to run" on the *38 machine contains no units. I run E@H exclusively on both. Are we running out of work or is their something else going on? Thanks

Richard de Lhorbe
Richard de Lhorbe
Joined: 15 Dec 05
Posts: 46
Credit: 9519458129
RAC: 772461

OK Claggy, I think you nailed

OK Claggy, I think you nailed it. I looked on the Linux machine, and sure enough the local preferences were for some reason set way low (0.1 and 0.25). I reset them to 4.0 and 0.25 and immediately the machine started downloading work units. The strange thing is I have NEVER touched these settings. Never. Ever. Can I emphasize NEVER some more ? And it does not explain why the cache was slowly decreasing over a time line of many months. But ... I can probably live with that mystery now that I have solved the basic original issue. Maybe errant gravitational waves pushed the settings downwards over time ....

Can't check the Mac for a few hours, but I hope the fix is just as simple.

Thanks & cheers
Richard

Richard de Lhorbe
Richard de Lhorbe
Joined: 15 Dec 05
Posts: 46
Credit: 9519458129
RAC: 772461

... and as a follow up, the

... and as a follow up, the Mac had exactly the same issue, somehow the local preferences had drifted down to 0.1 and 0.25 without any input from me. Once they were reset, the cache got reloaded with work units.

Cheers
Richard

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.