Hello,
on one of my PCs Einstein is flooding me with hundreds of work units.
It has an RX460 GPU which needs about 50 minutes per WU, only GPU is allowed.
https://einsteinathome.org/de/host/12523728
The expected run time is realistic shown by the Boinc Manager (50 minutes).
The buffer is set to 0,01 days + 0
Every 60 seconds the boinc manager is requesting new work units and also gets exactly 1 new WU.
I have now more then 500, counting up endless.
The duration correction factor in the client_state.xml was about 2.08... - I have lowered this to 0.99.
Why is Einstein asking for more and more and more work?
And why does the server deliver more and more?
How can I stop this? (not by pressing "no new work" - I want a realistic number of WUs as buffer = 2 or 3)
Best Regards
MagicEye
Copyright © 2024 Einstein@Home. All rights reserved.
Do you have an app_config.xml
)
Do you have an app_config.xml file for Einstein where you limit the max_concurrent number of workunits? That would cause this because of a bug in Boinc client which makes it request new tasks over and over again.
Yes, that was the
)
Yes, that was the reason.
Thank you!
I have used this to crunch one Einstein and one WCG-OPN WU in parallel.
If you use only your GPU for
)
If you use only your GPU for Einstein you can use the project_max_concurrent tag in app_config.xml to limit the number of tasks. Note that it goes to a different place in that file: https://boinc.berkeley.edu/wiki/Client_configuration#Project-level_configuration
@magiceye an optional fix
)
@magiceye
an optional fix is to revert to boinc-manager 7.14 . That version was running fine for me with the app_config line for <project_max_concurrent> parameter. I am running a mix of cpu and gpu tasks, which is known to be badly managed with regard to cache size, but with limits of 1 + 0.1 it was not overcommiting the cpu loading. At least not past the typical 14-day deadlines.
Recently upgraded the Linux system to Debian 11.0.0 and thought I might as well upgrade to the 7.16 boinc packages. BAD IDEA... I had drained the cache before the transition and (foolishly) resumed E@H late at night expecting it to refill the cache and resume normal operation. The next morning I had 1000 tasks downloaded! It had hit the 512 workunit limit before midnight and then got another 512 the "next day." Looking at the event log, it was fetching 3 or 4 work units every (60-second) cycle with total disregard of the cache limit. The thought occurred to me that the new 7.16 had no run-time history to base its estimates on; however, after letting it run for two days I tried enabling new tasks and it immediately downloaded 4 more cpu and 20 gpu tasks. (Those for the gpu were expected as all gpu tasks in the cache had been completed.) I've switched back to boinc 7.14 and now when I do a work fetch, to get more gpu work, it does NOT fetch any cpu work. Alas, I'll have to abort a big bunch of cpu work as there's no way they'll get done before the deadline.
OT - the 7.16 boinc-manager is missing the "shut down connected client" control option. Not a deal breaker but just inconvenient to close boinc gracefully for a system upgrade or such.
The problem is with
)
The problem is with <max_concurrent> tag, the <project_max_concurrent> should work OK.