Why is this a thing?
May 15 18:08:14 boinc[2847]: 15-May-2023 18:08:14 [Einstein@Home] (reached daily quota of 2304 tasks)
The host can easily do well above 2304 tasks per day with 4 GPUs and each GPU completing 2 tasks in 200 seconds. That's 8 tasks in 200 seconds or 1 task every 25 seconds. Which is around 3,465 tasks per day.
The host currently has 1515 pending tasks, 1077 valid tasks and 6 invalids. While it's also got 8 in progress tasks and that's the most tasks it will get. 2 tasks per GPU, but only when the task completes, uploads and reports.
Copyright © 2024 Einstein@Home. All rights reserved.
Most projects put a limit on
)
Most projects put a limit on tasks per day. Usually a combination of so many tasks per core and so many tasks per gpu up to a max gpu limit of 16.
This is to protect the servers from bad hosts that produce nothing but errors on every task sent it.
The most common way to get around any limit is to use the cpu reported number in the cc_config.xml file and increase it to spoof the actual core count.
Change <ncpus>-1</ncpus> to
<ncpus>200</ncpus>
or similar.
Skillz wrote:Why is this a
)
E@H enforces daily limits that are no longer large enough for the highest performing GPUs. Your computers are hidden so we can't see exactly what you are running and so can't give specific advice.
The easiest way around this is to fudge the number of CPUs, making sure to limit BOINC (if necessary) so it doesn't try to run CPU tasks on non-existent CPUs. In the options section of cc_config.xml, just set a higher number of CPUs than you actually have - perhaps 8 times your current core count. As an example if you had a real 8 threads you could add something like:-
<ncpus>64</ncpus>
as an updated options entry, and then force BOINC to reread the configuration (click the reread config files in BOINC Manager). This should allow you to get a bunch of new GPU tasks.
Cheers,
Gary.
I faked the CPUs to 128, but
)
I faked the CPUs to 128, but still didn't get anymore tasks than just 2 tasks per GPU at any given time. Does this take time to happen?
My host is reporting on E@H as having 128 CPUs now.
I also ran
boinccmd --host localhost --passwd password1 --read_cc_config
To force it to re-read the config file. Please no one steal my password. It's top secret.
Skillz wrote:I faked the CPUs
)
That should be fine.
Are you talking about tasks running or new tasks downloading per work fetch request? What is your work cache size?
No - assuming you forced an 'update' to make the server aware of the new setting.
Your hosts are hidden so we can't see what is actually happening.
Cheers,
Gary.
I mean the host will only
)
I mean the host will only have no more than 8 tasks at a time. When those complete it will send no more than 8 to replace them.
So out of 8, if one completes, then I'll get 1 in return. If 3 complete, then I'll get 3 in return.
I've changed the work cache days to various different settings with no change.
10/10 days
5/5 days
1/1 days
1/.5 days
10/0 days
10/5 days
And many more I've experimented with.
Resource share is at 100 and no other tasks are running on the host.
I can PM you the host of you would like.
Skillz wrote:So out of 8, if
)
That sounds like you have <fetch_minimal_work> set in cc_config?? Please check all the options.
Please don't set large multi-day values. If it suddenly starts working, your client will go berserk. Unless you really want a large hysteresis effect with BOINC in panic mode, I would suggest setting 0.1/0 since that will give you plently to start with. Once that settles, gradually increase something like 0.2/0, 0.3/0, etc., until you have whatever number in reserve that you want. So as not to risk quickly exceeding your new daily limit, you'll need to increase gradually anyway.
I'm guessing you aren't running the manager since you used boinccmd to reread the changes to cc_config. So are you changing preferences locally or on the website? Are you making sure that both sides become aware of changes you make? If both sides did become aware of the work cache changes, it seems like you must have <fetch_minimal_work> set. That's all I can think of immediately.
Sure, if you wish.
Cheers,
Gary.
No, <fetch_minimal_work> is
)
No, <fetch_minimal_work> is not in the cc_config.
If the 2304 quota is an actual quota then the instance is doing more than 2304 tasks per 24 hour period.
The host has been running BOINC for a little over 1.2 days. It currently has currently received 3104 tasks in that time frame.
Of those 3104 tasks, 8 of them are "in progress" 1589 of them are "pending" 1442 of them are "valid" and 8 of them are "invalid"
So if Einstein@home will only send a host 2304 tasks in a single 24H period and this host can crunch and return more than 2304 tasks. What happens then?
Skillz wrote:If the 2304
)
If that's what you saw when you first got the "exceeded daily quota of ..." message, it no longer applies since your host now clearly shows as having 4 GPUs and 128 CPUs. The daily quota is calculated as something like (A x ngpus + B x ncpus) where A is something rather large like 256 or 384 and B is something like 32. I don't know the current values of A and B. There will now be a much larger daily quota than 2304, depending on the difference between the 'real' and the 'fake' cores and the B value.
Now that I have the host ID, I can verify hardware details and can see what the scheduler thinks about a work request. You can too, using the last scheduler contact link for that host on the website. The last request I looked at showed:-
2023-05-16 03:36:24.4000 [PID=810862] [send] work_req_seconds: 1.00 secs
So it was only requesting 1 sec of new work. In other words, whatever cache size you had set, the client thought that the in-progress tasks were only short by 1 sec. There was also this line:-
2023-05-16 03:36:24.4485 [PID=810862] [send] est. duration for WU 730378439: unscaled 102941.18 scaled 10296.19
For some reason, the scheduler thinks the estimate for the new task should be 102,941 secs and even when scaled (by things like DCF and on_frac) it would still be 10,296 secs which seems stupidly large.
I see you are running the Petri optimised app (anonymous platform) so I have no idea how the scheduler works out estimates for that app (I have no suitable GPUs so have no knowledge about it). Maybe one of the people running that app might have a clue as to why your tasks are still being estimated at nearly 3 hours after correcting for a 0.1 DCF. Speaking of that, earlier on there is a line showing a DCF of 0.01 being adjusted by the scheduler to 0.1. I've never noticed that adjustment before. Maybe the scheduler considers 0.01 to be impossibly low and 'fixes' it :-).
Are new tasks, when they arrive, being estimated at close to 3 hours???
Cheers,
Gary.
Skillz wrote: No,
)
Check also that the resource share you have for Einstein@home is higher than 0. With 0 resource share Einstein is used as a backup project and it will run & download only 1 task at a time if you don't have work from other projects or don't have host attached to other projects.
Skillz schrieb:The host has
)
Skillz started up a high performance host from a user account with a RAC of almost ZERO (1..2 days ago) skyrocketing it until now to a RAC of 18.5M. If there's some mechanism which enforces new users or new hosts to not request too many tasks initially, then this will maybe look like this. Maybe one has to wait some days for incremental lifting of quotas? Just my 2 cents.