Hiya!
I have a condition I'd like to change. Einstein@Home seems to take more of my processing power than I want it to.
I'm trying to balance time among SETI, Rosetta, and Einstein.
Some projects are more "generous" with "credit" than others. Einstein may be more generous than the other two.
I tried to balance among projects by setting resource share in inverse proportion to credit.
However, each time I've seen Einstein send work recently, it was as a group of tasks with a total work time out-of-proportion with my balance goal. The number of hours in the task groups has not varied with the resource share I've assigned.
Rosetta has a project setting "Target CPU run time" which I adjusted from 24 hours to 6 hours. If Einstein has a similar setting, it could be a big help.
I can restrict Einstein to some number of my total threads. That's not great because (1) my computer's thread count isn't divisible by three, and (2) my computer would probably complete only half of each task group on time.
Anything better?
Copyright © 2024 Einstein@Home. All rights reserved.
Resource Share is attempted
)
Resource Share is attempted by Boinc to be worked out based on the daily RAC, but it takes time and the more projects you try to run the longer it will take and more times it will be off. Most of the time your cpu's will be running project a, b or c and only rarely will some of each be running. The more you play with it the longer it will take to honor your settings. This is why alot of us end up getting more than one pc for crunching, to run multiple projects at the same time. Throw in a gpu and the whole thing goes very wonky very quickly!! That's one reason I never run gpu units for the same project that I'm running cpu units on the same pc.
Thanks tons for the answer.
)
Thanks tons for the answer. It doesn't sound like there's another control I can use. Sigh.
Good to know!
As to, "Most of the time your cpu's will be running project a, b or c and only rarely will some of each be running.":
That's kinda different from my experience. I'm almost always running all three projects. I started watching closely a couple of months ago. Initially, Einstein was using something like 80% of the computer for weeks on end, but never all of it.
For the last several weeks, I've limited Einstein to 1 or 2 threads; it always uses all I've given it without regard to the resource share I give it. The other two projects share the threads, seemingly consistent with resource shares (as mentioned, the inverse of the rate of granting credit). About half of remaining threads go to each project, though it varies (no surprise).
Have a great day!
The projects don't decide how
)
The projects don't decide how many of their tasks to run on your machine, nor when. That decision is made by BOINC on your machine, responding to a number of things, of which your declared intention in resource share is only one. The design intention is that "eventually" the workload evens out to your requested resource share, but even when it works, the time frame is more like weeks or months than instantaneous.
One thing that may help is using a requested task queue depth that is much shorter than you have tried. The more different projects a machine is running, and the more types of tasks on a project, the shorter the queue request may be optimal.
I mean short like 0.1 day.
All of it depends upon
)
All of it depends upon projects having tasks ready to send when your computer asks for them. Unfortunately it can't invent tasks. So if it needs tasks for A but there are none to be had it will ask for some from B even though B has more than the resource share. Eventually it should be getting tasks from A and then it begins to even out.
An example. When LHC first started they only had tasks sporadically. So they would generate a batch that everyone might grab up in a couple of hours. It might be a couple of weeks before they made any more. Doesn't matter what your resource shares are set to, you simply can't get enough work from the project.
The above is one reason a shorter, much shorter, task queue is a good idea. That makes the requests more often so your computer is more likely to ask for work where there is some available.
My advice, don't micro-manage, it just makes it worse.
Gary Charpentier wrote:...
)
@Garry (the OP),
The above, and archae86's before it, both contain good advice. An even more important consideration (which is quite counter-intuitive) is that NOT using a small cache size actually prevents BOINC from being able to give the project that hasn't been able to supply work, the best chance of 'catching up' as quickly as possible.
Many people tend to think that when one particular project has these periods of not having work, the best strategy is to set a large cache size so that when work is available, you can get as much as possible to last you through the outage. By doing that, a different problem is created. As soon as your favoured project can't supply work, BOINC just sees this big cache size which it is compelled to keep full. So that's exactly what it does - it keeps asking the reliable project(s) because it can't get anything from the unreliable one. This is not "Einstein@Home seems to take more of my processing power" but rather, BOINC keeping your cache full.
The 'different problem' comes later when your favoured project is able to supply work but BOINC now has to make sure that all these extra Einstein tasks that it requested earlier, don't get into trouble with deadlines. So at some point, BOINC's deadline protection mechanism will kick in and will force BOINC to deal with tasks at risk - even though it should be crunching the project that had been out of work. The very best way to prevent BOINC from having large numbers of tasks in deadline trouble is not to request them in the first place. In other words, 'small cache size'.
One other point to consider. Your comment, "I tried to balance among projects by setting resource share in inverse proportion to credit" should actually be quite achievable, with a small cache size. I assume that 'credit' means 'credit per unit of time', so theoretically, this should tend to give all three projects equal RACs. In mentioning RAC, please realise that Mikey gave incorrect advice when he said, "Resource Share is attempted by Boinc to be worked out based on the daily RAC." Resource share is all about time and not credit. If you have equal resource shares for three projects then each one (theoretically) should get 8 hrs per day access to your machine, irrespective of the RAC.
If you arranged your resource shares to be in inverse proportion to the awarded credit per unit of time for each project, it seems to me that BOINC should eventually end up causing each project to have roughly equal RACs. Quite a cunning way to see how successful BOINC is doing it's job without you having to keep detailed records of numbers of tasks and the length of time each one took. :-).
@Garry,
Very sound advice. I would add a final clause to it, "unless you really, really understand what you are doing and have the time to properly pay attention."
Here is an example that might work to your benefit. Imagine you decide to take the advice on small cache size and you choose something like 0.5 days total as an example. Even with that, when Seti runs out, other projects will give some extra tasks during the outage that will need to be processed later. Because Seti's weekly outage is known in advance, you could consider having (say) a 1.2 day cache size which you then reduce to say 0.2 days, a couple of hours before the Seti outage kicks in. So you start with more tasks which then last a lit longer during the outage. The big benefit is that unless the outage lasts longer than a day, BOINC will not need to request any new tasks from other projects. When Seti tasks are available again, you put it back to 1.2 days and the chances are that BOINC will request Seti if that project is behind. There should be minimal numbers of tasks from other projects downloaded during the outage that have to be dealt with later due to deadline issues. The above is purely an example. I don't run Seti at all so don't really know anything about the outages so you may need to tweak some of the details to optimise things.
Think that through carefully. It would require you to do a bit of micromanaging at carefully selected times, say approximately once per week. Work out for yourself if there are any risks if you get the timing wrong. Do you want to tie yourself to this level of 'paying attention'?
Cheers,
Gary.
Wow. WOW! These are
)
Wow. WOW! These are thoroughly reasoned responses. Kudos and thanks! All four of you are giving me much to think about.
Maybe "short queue sizes" have been in effect since I started BOINC here.
Unsaid in my prior comments: I started BOINC on this computer in November. Initially, I operated at "Store at least 0.01 days of work" (about 15 min) and "Store up to an additional 0.04 days of work" (1 hour). +++ For small queue sizes. --- For not recognizing they made better sense reversed.
At the turn of the year, someone kindly pointed out the reversal and I corrected it. Now it's "at least 1 hour" and "at least an additional 15 min".
The result? The scheduler frequently accepted more tasks than the computer could process (well ... requested, as @archae86 says). My computer often submitted tasks late. That's no good, of course.
Active management? Not my preference. Never was, but I forgot for a bit. Kind people like @Gary Charpentier pulled me back. Thanks for asking, @Gary Roberts. All four of you have valuable inputs.
(he he) Just me, or does this conversation have a lot of Gary/Garry's? I hope everyone else is comfortable, too!
There's more information about my efforts at https://setiathome.berkeley.edu/forum_thread.php?id=85155&postid=2033585
My current suspicion is that something in BOINC sets a minimum task batch size and I'm running afoul. And the root of that is either (1) that I bought a mid-range computer (and maybe only "one step above minimum capability") or (related) (2) that I'm spreading my extra processing too widely.
It seems that projects send tasks here in batches of six. SETI sends me 4-hour tasks (24 hours total). Rosetta used to send me 24-hour tasks, but has a setting, "Target CPU run time" which I set to 6 hours (36 hours total). My frequent late tasks fully stopped. Einstein sends me six 28-hour tasks (168 hours total; a week, literally 7 x 24).
Deadlines are the reverse of what I need. SETI, with the shortest tasks, gives me deadlines of something like 6 weeks. Rosetta gives me deadlines of about 2 weeks. Einstein: 3 weeks.
For those numbers, I need at least 4 hours a week for SETI, 18 hours for Rosetta, and 56 hours for Einstein.
Please forgive me, @archae86, if I conclude there are project-based choices involved. Maybe not in the sending, as I tend to word it. Somewhere. I believe you.
I could give Einstein 56 hours a week, but that's over half the hours I tend to have for BOINC.
So, I really like the Einstein project and first registered, according to their records, in 2005. (I think it was earlier. No matter. The point is the science completed.) But maybe I need to figure out which two projects I can continue supporting.
Sigh. That sounds like a silly threat on an Einstein board. That's not the intent. Maybe it's just reality.
Garry wrote:... Now it's "at
)
I'm sorry, but I can't see how this can be true, based on the following assumptions.
Please note that since your computers are hidden and you haven't disclosed your hardware details or the search (or searches) you are contributing to, I had to make some assumptions, as follows:-
I hope you understand that when you don't provide essential details, people either don't answer or are forced to guess.
So, with those assumptions, I could see that a multi-day cache could get you into deadline trouble. Now that you say that your work cache is only 0.04+0.1 days (~1.2 hrs) the boinc client (as a maximum) should only download 1 CPU task per thread until such time as there was less than 1.2 hrs estimated left on a crunching task. At that point the client would download a replacement for that task.
I would love you to explain to me how you could ever get any task to exceed the deadline. Perhaps, by running your machine for only an hour or so per day, you could get early tasks to exceed the deadline but that would be a once off because BOINC would quickly find out about that and would severely limit your host's ability to download further work. So how do you achieve this magic?
Not only is there no "minimum batch size" but also, how top-of-range your computer happens to be (or how widely spread) will have no adverse influence. The client requests work when it sees the estimated time remaining for all tasks as being less than your work cache size.
As a simplified example, if BOINC had determined that you have 4 threads available, this means that your cache size would be 4x1.2=4.8 hrs. As soon as the estimated remaining time of the 4 crunching tasks (plus any yet to start tasks) totaled to less than 4.8 hrs, an extra (single) task would be requested from whichever project was owed the debt. This is precisely why I don't think you ever had a work cache size of just 1.2 hrs. The estimates for crunch times would have to be single digit minutes rather than hours for you to get a whole bunch of tasks sufficient in number to eventually exceed the deadline. You really need to explain more precisely what was going on.
This description makes me wonder if you have (through the <ncpus> option in cc_config.xml) fudged the number of cores that your computer has to some impossibly high number. Do you use cc_config.xml? Is there such an option in that file? I'll save further comment until you answer that. :-).
Cheers,
Gary.
Oh. Dang. My bad. Of course I
)
Oh. Dang. My bad. Of course I need to make my computer visible. I'd forgotten that setting. Fixed.
I hope it's clear that I wish I hadn't created that inconvenience.
All your comments are reasonable. Thanks.
Maybe short replies here. Providing raw information is likely most valuable.
GPU searches: Right. None. This system uses an Intel i5. I didn't know it when I bought it, but the integrated GPU hasn't seemed up to helping BOINC. Other than that, the system does a great job supporting my needs for it. It's a laptop, so adding a GPU is a challenge.
4 cores. 8 threads.
Typical usage: I looked inside some of the BOINC XML files last night. One of them reported <on_frac>0.393270</on_frac> and <active_frac>0.725284</active_frac>. So, less than your (good) 12-hour assumption.
I have the computer on for 12-14 hours on a typical day. When I don't use it for 30 minutes, it sleeps (interrupting BOINC).
"CPU tasks have a 2 week deadline": Not always my experience, as mentioned.
"should only download 1 CPU task per thread until such time as there was less than 1.2 hrs estimated left on a crunching task": Not consistently my experience. That'd fit my needs!
BTW: The settings are worded "Store at least ..." and "Store up to an additional ...", not "Store at most ...", etc. Relevant?
Six-task batches: That's not always, it occurs to me. Sometimes I get only one or two tasks from Rosetta. That works well. I need more batches (or bigger batches) from SETI. Recently, Einstein has always sent in the batches I described. Most recently, I got no tasks from Einstein until the previous batch completed. Soon after, I got the batch I described.
I wasn't aware of cc_config.xml until within the last ten days. It was absent when I set it up to this:
It has since been replaced by a version listing many XML fields. The three above are as I set them. Yours was:
That setting isn't documented at https://boinc.berkeley.edu/wiki/Client_configuration; I didn't do that. (I don't care to spoof anything.) I set it to "0" and restarted the client (BOINC Manager > Exit BOINC; check that there's nothing "BOINC" on the Task Manager list; restart BOINC Manager).
I didn't expect and haven't noticed any difference in operation since I established cc_config.xml.
Your perceptions of BOINC operation seem to at odds with my experience. There must be good reason. I hope we can find that reason. I hope that reason gives us a good chance to change things. Thanks for reading so far!
Garry wrote:I hope it's clear
)
Of course. Thank you very much for making your computers visible.
The key things that strike me are 4 cores/8 threads (so the potential to run lots of concurrent tasks) but a very stringent limit on CPU frequency of 1.6GHz for a modern CPU that would be capable of running at 2.5 times that speed. Manufacturers tend to use such drastic limits when they are forced to reduce heat - i.e. when they know that they don't really have a good enough cooling system to support a higher speed. As the owner, you need to be mindful of this and watch operating temperatures closely. Crunching can be extremely CPU intensive so that using all (or even most) cores may cause enough overheating to shorten the life of your investment. Probably that machine was really designed for light duties only. As a matter of interest, do you actually run 8 concurrent tasks, i.e. use all available threads? How closely do you check the temperature when it's crunching?
I was referring only to Einstein - and in particular as to why Einstein tasks could be exceeding this length of deadline, given the very small work cache size. I think I may have figured out a possible explanation for that.
'at least' means 'do something if what remains becomes less than'. So this is exactly what BOINC tries to do - keep the work on hand to at least 1.2 hrs at all times. It can only do this in increments of a whole task so when there is a work fetch event, there will be somewhat more than 1.2 hrs, depending on the actual physical 'size' of a task. However, there shouldn't be the fetching of multiple tasks if a single task would take the cache to above the 1.2 hrs setting.
BTW, there was a typo in my previous reply where I listed your cache settings as 0.04 + 0.1days = ~1.2 hrs. That should have read 0.04 + 0.01 days to give a 1.2 hr total. At least I got it right in hours :-). Here is the 'possible explanation' that I alluded to above. I've never used the <max_concurrent> setting in app_config.xml but I think it might be the culprit for what you see.
Imagine BOINC has a left-hand and a right-hand and that the old saying about 'hands' is true :-). Because of <max_concurrent>, the left-hand knows it can only run 1 EAH task at a time. However, the right-hand is what does the work fetching. It sees that it has to keep 8 threads satisfied (unless you have told it not to use all available threads). So, what will happen if other projects haven't been able to supply work and your cache (for most of the 8 threads) is below 1.2 hrs? The right-hand is likely to request multiple EAH tasks because it's not smart enough to understand the implications of <max_concurrent>. This is the only way I can understand what I currently see in your Einstein tasks list - the 6 tasks fetched between 14:06:23 and 14:10:49 UTC on Feb 25 in 2 separate batches of 3 tasks. Why would it fetch 6 tasks if it understood that it could only run one at a time?
So, since you now have these 6 new tasks and since your machine is effectively running significantly less than 50% of the time and since a task seems to take more than 24 hours of continuous running, I can appreciate that the last task or two in that group might be under some deadline pressure - eventually :-).
With regard to a cc_config.xml file, I guess you created one to turn off the use of Intel GPUs. I don't understand why you needed the other options, though. For the majority of volunteers, BOINC has default settings that are perfectly satisfactory. Unless you have some advanced and fairly specialised use case, and unless you really understand the consequences, you would be far better off staying with default settings as much as possible. I use cc_config.xml (in Linux) on some machines and nothing ever gets visibly added to what I insert. I don't understand why your file gets all the extra parameters you mention. Are you sure someone hasn't given you a "souped up" file to try out at some point? :-).
Here is roughly what I would advise to have a nice 'set and forget' experience. I'll make the assumption that the only project for which you have set up app_config.xml is Einstein, and that the only real purpose for that was to use <max_concurrent>. To achieve what you have described, all you really need to do is keep your nice low work cache settings and set your resource shares appropriately. As an example of the latter, I would suggest you have Seti on 600, Rosetta on 300 and Einstein on 100. So Seti has 60% share, Rosetta has 30% share and Einstein has 10% share.
I have no solid knowledge about the credit value for Seti and Rosetta tasks, but on the basis of numbers you have mentioned, I'm guessing those resource shares should tend to eventually stabilize the RACs for each project at roughly equal values. If they don't, after say a month of not interfering, you can tweak the values as you see fit, but resist the urge to make big changes or make it happen fast!!! :-) You also need to get rid of or modify options in both cc_config.xml and app_config.xml. In particular, you need to get rid of <max_concurrent> at the value of 1.
If you think about the above and if you are willing to make some changes to give better behaviour, please let me know and I would be prepared to set out some steps that you could follow. I'll wait until you digest the above and respond.
Cheers,
Gary.
Gary: Thanks for the great
)
Gary: Thanks for the great note.
Processor: Your comments did my laptop a favor. I run a cooling fan under my laptop whenever it's on. I downloaded a temperature monitoring program and used it to more directly align the cooling fan with the computer vents. The temp went down. It was a bit high before; it's in the cool range now. Thanks tons.
The temp could be higher when the computer is away from the desk. But then, it's on battery, and BOINC doesn't run. I'll check the temperature next time I'm mobile (not often).
8 concurrent tasks? That's the goal. If I don't get enough work or if I set the concurrency limits out-of-sync with available work, no.
Concurrency settings: I've played with them a lot. For now, I see a lot of value in limiting all projects to a maximum of four threads. That way, I'm forcing them to share the computer while they collect scheduler data. At one point, I had a period when I guessed SETI and Rosetta got pretty evenly matched in the scheduler data; I tested by increasing the concurrency limits for each by one thread. They pretty evenly traded time, sometimes one or the other higher, but most of the time even. That's probably as close as I'll get. Once I got there, maybe I could let both of them go to unlimited access and they'd do the same. I ran out of SETI work before I tried it.
I have to do something to compensate for SETI's weekly maintenance stops. Your idea about manually toying with queue sizes to download extra work just before each outage is great (and I previously tried a version of it). It worked one week, but in later weeks, I kept forgetting. Manual won't work. And I don't care to be so actively adjusting BOINC, either. I could perhaps set up a scheduled task, but I haven't taken that time yet.
If I go with the idea that BOINC needs a little extra priority when it returns, I can leave Rosetta and Einstein at 4 each (or 5). That keeps the processor busy during SETI outages. And set SETI at 6. Presuming prompt return from maintenance, maybe that'd be good. This week (unlike others in the recent past), I'm not getting their work yet on Thursday after a Monday or Tuesday outage. For that long a delay, maybe I'll need to experiment with even as much as no limits for them. Eventually, the scheduler data will reach balance again and start sharing the processor appropriately.
If I'm applying the concept of half-life properly, after about 10 days on a configuration, half of all the scheduling data prior to the change will have worked out of the system. (Sadly, that assertion assumes that those 10 days are representative of the normal 10-day period. This long outage puts that in doubt.) After another 10 days, half again (a total of 75%). The third cycle would be 87.5%; by then, we might be past the point of diminishing returns for the purpose of tuning a configuration. By the end of the second cycle, I suspect we'll pretty well see where the final result is going. And I'd be pretty happy to find a configuration that balances within something like 75% accuracy. (I'm unlikely to collect the data necessary to accurately measure that number.) Here's a case when "perfect" is the enemy of "good enough".
Current concurrency limits: 6 for SETI (but it doesn't matter, because of lack of work), 4 each for Rosetta and Einstein. I started that last night. Since starting it, the system has run 4 and 4 threads every time I've looked. A good candidate for a "set and forget" setting! Later today and presuming I continue not getting SETI data, perhaps I'll relax to 5 and 5 in the hope that'll give me an idea of the scheduler data balance between them. I acknowledge that I don't know whether the scheduler data responds to the concurrency limits, making the experiment poorly designed. But if I get the results I want, I'll be happy! And that'll be an indication that the experiment was okay. When I relaxed for SETI and Rosetta at a different level, the results seemed useful.
Sigh. Two of the six Einstein tasks have completed; all four remaining are running. Within six hours, two of the remaining will complete. If Einstein does for this batch what it did for the last (that is, add more tasks here only when the sixth task completes), either I give Rosetta more access or the processor isn't so busy. Rosetta is consistently maintaining six tasks in inventory. (BTW, the two backup tasks tend to represent an estimated 18 hours of processing, 72 times my "Store at least an additional 15 min ..." setting.)
"'at least' means 'do something if what remains becomes less than'." There's probably some ambiguity here. I interpreted that as, 'at least' means 'I'm doing what you ask so long as I keep at least the amount of work you ask for. Ten times that is at least what you asked.' BOINC has a competing value: conservation of bandwidth for the projects. It's of considerable advantage for them to send larger batch sizes, so BOINC doesn't especially share goals with me as a limited contributor. Note that they didn't give me a control labeled "Store at most ...". That setting would force the scheduler to make small downloads. The scheduler did react when I started using these settings; it has a smaller inventory. I don't recall ever seeing it keep inventories near my settings. Maybe my settings are outside a range in which the scheduler fully satisfies those settings.
left hand, right hand: A possible theory. When I feel the scheduler data has the three projects balanced, I'd like to go unrestricted to try it out.
"I can appreciate that the last task or two in that group might be under some deadline pressure - eventually :-).": Yup. Except that no SETI tasks are getting here, so Einstein should be (and is consistently) getting 4 threads (rather than 2 or 3, an equal sharing with two other projects). Einstein is loving it!
cc_config.xml: (1) I've had cases of two clients running (represented by two entries in the Task Manager). Only one of those was under prompt control of the BOINC Manager. (If I closed the BOINC Manager with "Exit BOINC", the first one closed right away. The other might keep going for three or four minutes before finally exiting.) I don't know what that was, but I thought <allow_multiple_clients> might prevent it. (2) I didn't like Virtual Box when I tried it; I concluded it uses lots of resources. I thought <dont_use_vbox> would keep me from unknowingly signing up for one of those projects. (I probably wouldn't anyway.) (3) You're right, I used <no_gpus> to turn off GPUs. I had thought there was a setting in BOINC Manager > Computing preferences, but the closest there only keeps the GPU from running while the computer is active. It seems strange, but on this processor, the CPUs seemed to need to keep so busy supplying the GPU with data the net output decreased. Not a good recipe. Maybe those three settings are unlikely to cause problems from being off the defaults.
"Are you sure someone hasn't given you a "souped up" file to try out at some point? :-).": Certain. That's not me. I'd remember if it had happened in the two weeks since I first established a cc_config.xml file. I'm the only computer user (let alone geek) in this home. So far as I can see, this file is all defaults, except for my settings. Maybe I have reason to believe the system did it.
Resource shares: You recommend "Seti on 600, Rosetta on 300 and Einstein on 100", giving the three projects work shares of (60%, 30%, 10%). I have SETI 3000, Rosetta 1000, and Einstein 300, (70%, 23%, 7%). Yours is a good recommendation, "close" to mine. Today is day 6 with my setting; I have no reason to believe the scheduling data is close to a balance for the three projects. (Maybe SETI and Rosetta were close before the outage and maybe Rosetta and Einstein are close now, but both those are tenuous conclusions. If both those are true, maybe SETI and Einstein are close, but I have no direct data for saying that.) Day 20 or 30 or so, I'm likely to tweak these. I will tweak toward yours, presuming the data points me there!
Your advice is considerably helping my thinking. I'm using it lots. You're assuring I'm thinking about more subjects. Thanks on all counts. Please forgive any remaining typos.