Reached a maximum quota

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117561233330

RAC: 35315127

Harri Liljeroos wrote:Check

16 May 2023 21:43:25 UTC

Message 212620 in response to message 212597

(moderation:

)

Harri Liljeroos wrote:

Check also that the resource share you have for Einstein@home is higher than 0.

In an earlier message he stated that resource share was set to 100.

It seems strange that he mentions trying a variety of work cache sizes and there was no effect. Maybe he is setting the cache size for a location other than the one the host is using. I've been assuming that with a single host just starting up, it would be in the default (generic) location. Maybe he is changing cache size for a different location (home, work, or school).

I looked at a new scheduler contact where 3 completed tasks were returned and 3 new tasks were sent as replacements. After that there was the following comment from the scheduler:-

2023-05-16 21:02:50.7023 [PID=2862386] [send] don't need more work

This implies that he has a cache size of effectively zero. If there is an available resource (eg. 0.5 of a GPU when running x2), the scheduler will send a single task to keep it working.

@skillz - Only you (the owner) can see what location a host is assigned to. Check on the website. Are you changing the cache settings for the correct location?

Cheers,
Gary.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117561233330

RAC: 35315127

Scrooge McDuck wrote:... If

16 May 2023 21:56:29 UTC

Message 212622 in response to message 212603

(moderation:

)

Scrooge McDuck wrote:

... If there's some mechanism which enforces new users or new hosts to not request too many tasks initially ...

I'm not aware of any such mechanism.

It seems like the scheduler is just making sure that an available computing resource has a task to continue working on. I've never tested it but it seems like this would be the behaviour if the work cache size was actually set to zero.

Cheers,
Gary.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117561233330

RAC: 35315127

Scrooge McDuck wrote:Skillz

16 May 2023 22:39:08 UTC

Message 212623 in response to message 212603

(moderation:

)

Scrooge McDuck wrote:

Skillz started up a high performance host from a user account with a RAC of almost ZERO (1..2 days ago) skyrocketing it until now to a RAC of 18.5M.

The account must have a number of high performing hosts since the one we are discussing has a RAC of less than 1M at the moment - although it is rising very fast :-).

Cheers,
Gary.

Skillz

Joined: 5 May 17

Posts: 16

Credit: 3519991861

RAC: 4656848

The location is home and I've

17 May 2023 13:22:35 UTC

Message 212645

(moderation:

)

The location is home and I've have been changing the home location.

I also avoided the web preferences and just made an override file. No fix.

We believe the issue might be client related as the client seems to only request 1 second worth of tasks each time.

If I let the work finish on the host then reset the project it will download more work for a short while before reverting back to the issue.

mikey

Joined: 22 Jan 05

Posts: 12682

Credit: 1839086099

RAC: 3865

Skillz wrote: The location

17 May 2023 13:36:43 UTC

Message 212647 in response to message 212645

(moderation:

)

Skillz wrote:

The location is home and I've have been changing the home location.

I also avoided the web preferences and just made an override file. No fix.

We believe the issue might be client related as the client seems to only request 1 second worth of tasks each time.

If I let the work finish on the host then reset the project it will download more work for a short while before reverting back to the issue.

What does your cc_config.xml file look like? Is it full of settings changes from the defaults? Or do you run it fairly clean with only minor things in it? Also what version of Boinc are you running? Your pc's are hidden so I can't see that, Gary said he thought your were running Petri's tweaked version for Linux. The one other thing is the gpu driver version.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3946

Credit: 46778262642

RAC: 64118954

mikey wrote:Gary said he

17 May 2023 14:07:42 UTC

Message 212649 in response to message 212647

(moderation:

)

mikey wrote:

Gary said he thought your were running Petri's tweaked version for Linux.

running the special application has nothing to do with work fetch. And some people are having the same issue with the stock application.

my theory is that it's a bug in the client work fetch logic. and likely centers around all the modifications that were made to work around the work fetch bug when using the project_max_concurrent and max_concurrent statements in app_config. it seems that what happens is that a fast host will initially work fine and load up on work. then *something* happens and triggers the host to go into a minimum work request mode. even though there are no settings or any other kind of indication that it's set this way. it seems to be something that gets set in the background and/internal to the boinc logic. the only fix seems to be resetting the project on the client. but it's just a band-aid, it will pop up again. what that *something* actually is that is the trigger remains unknown. I had this same exact thing happen to me on Primegrid. so it's not specific to Einstein, just a lot of people are experiencing it with Einstein due to the Pentathlon contest going on. It doesnt seem to impact slow hosts.

Skillz isn't going to unhide his hosts, he's aware of it and it's intentional. it's a strategy for competition. But I'm sure he'll provide you with a link to the host(s) in question or any other details you like.

_________________________________________________________________________

mikey

Joined: 22 Jan 05

Posts: 12682

Credit: 1839086099

RAC: 3865

Ian&Steve C. wrote: mikey

17 May 2023 16:59:32 UTC

Message 212653 in response to message 212649

(moderation:

)

Ian&Steve C. wrote:

mikey wrote:

Gary said he thought your were running Petri's tweaked version for Linux.

running the special application has nothing to do with work fetch. And some people are having the same issue with the stock application.

my theory is that it's a bug in the client work fetch logic. and likely centers around all the modifications that were made to work around the work fetch bug when using the project_max_concurrent and max_concurrent statements in app_config. it seems that what happens is that a fast host will initially work fine and load up on work. then *something* happens and triggers the host to go into a minimum work request mode. even though there are no settings or any other kind of indication that it's set this way. it seems to be something that gets set in the background and/internal to the boinc logic. the only fix seems to be resetting the project on the client. but it's just a band-aid, it will pop up again. what that *something* actually is that is the trigger remains unknown. I had this same exact thing happen to me on Primegrid. so it's not specific to Einstein, just a lot of people are experiencing it with Einstein due to the Pentathlon contest going on. It doesnt seem to impact slow hosts.

Skillz isn't going to unhide his hosts, he's aware of it and it's intentional. it's a strategy for competition. But I'm sure he'll provide you with a link to the host(s) in question or any other details you like.

Oh I don't need to see his hosts, they are his and not mine and he can hide or unhide them as he chooses.

I was just trying to think big picture and come up with some of the things that can affect the Boinc client and that involves the cc_config file when Boinc starts and then as it runs the app_config file adds more Boinc configuration options and then you add in the various versions of both Boinc and the gpu drivers and there are alot of moving parts that all need to communicate efficiently for Boinc to work for us like we want it too. AND as you said the "project_max_concurrent and max_concurrent statements in app_config" can also affect how Boinc runs especially on the newer versions where it knows you are using those files and can adjust itself to work with it.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117561233330

RAC: 35315127

mikey wrote:...The one other

17 May 2023 22:10:29 UTC

Message 212670 in response to message 212647

(moderation:

)

mikey wrote:

...The one other thing is the gpu driver version.

Mikey,
The host is returning large numbers of validated tasks so how does that suggest there's a problem with "gpu driver version"??

How could it even be remotely possible for the driver to interfere with work fetch??

Cheers,
Gary.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117561233330

RAC: 35315127

Ian&Steve C. wrote:...my

17 May 2023 22:37:37 UTC

Message 212671 in response to message 212649

(moderation:

)

Ian&Steve C. wrote:

...my theory is that it's a bug in the client work fetch logic. and likely centers around all the modifications that were made to work around the work fetch bug when using the project_max_concurrent and max_concurrent statements in app_config.

I tend to agree but had discounted the max_concurrent modifications because the client is listed as 7.16.6. I don't know for sure but doesn't that version pre-date those modifications?

I'm now wondering if it might not be a standard 7.16.6 build but perhaps one with some additional performance tweaks that might be mis-behaving?

I don't follow what is happening in the 'BOINC Pentathlon' world so wasn't even aware there was one going on. Yep, I know, I'm living under a rock :-). However that certainly adds additional context as to what might be causing the problem.

Cheers,
Gary.

Skillz

Joined: 5 May 17

Posts: 16

Credit: 3519991861

RAC: 4656848

The version of boinc is the

17 May 2023 23:36:11 UTC

Message 212673

(moderation:

)

The version of boinc is the one downloaded from the distros package manager. No modifications on my end. Installed by simply typing:

sudo apt install -y boinc boinctui

Its running on Ubuntu 20.3.

edit

I have since just created two new instances and loaded E@H on both of them with an app_config that only runs 1 task per GPU. The problem has not happened again on my end, but one of my teammates has experienced the same issue doing this method as well.

What we have begun to do now is just wait for the tasks to complete, upload and report after setting no new tasks. Once all tasks are done and gone. We issue a project reset.

Then we will get another full cache ~1000 tasks downloaded in a few minutes.

After the pentathlon and when I get some time I will try installing a newer version of BOINC on the host and running E@H on it. See if the issue remains or if it's indeed a problem with such an older client.

Reached a maximum quota

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner