I've noticed that BOINC seemingly cannot tell BRP4G and BRP5 tasks apart.
If a GPU completes a long running BRP5 task then BOINC applies the derived DCF to BRP4G tasks that run for significantly less time..
Makes a bit of a nonsense of having an estimated time for tasks.
Will this anomaly be addressed sometime in the future?
Regards,
Cliff,
Been there, Done that, Still no damm T Shirt.
Copyright © 2024 Einstein@Home. All rights reserved.
Boinc and DCF with GPU tasks
)
As you've noticed Einstein is running older server software that uses DCF to correct and adjust the estimated run times supplied by the project. Boinc has never had more than one DCF per attached project so that DCF is applied to all the different searches for that project. This is a remnant from the times when every project only had one search running.
If and when Einstein upgrades to a newer version using Credit New then the server will take care of estimating run times and thins will hopefully work better. When this is going to happen is anyones guess but I don't think it will happen before the testing of the new web page front end on Albert@home is completed and ready to be launched here and some major upgrades to how Credit New works are done.
Holmis has addressed the
)
Holmis has addressed the project side. For you or others who might come across this, let me comment on the user side:
Consider low queue length, and consider restricting application types.
1. As Einstein has usually had pretty reliable work supply, reasons extended for deliberately requesting large amounts of work in advance which were applicable at SETI when they routinely shut down work distribution for several days a week, or had frequent unscheduled multi-day outages don't apply so strongly here. If you have something like a half day of requested pre-fetched work, the variations are much less likely to cause deadline misses, "panic mode" operation, and other undesired effects. As little as a sixth of a day is probably enough for most purposes.
2. Personally, I at any given time select a single flavor of CPU work and another single flavor of GPU work. Over a few weeks this usually settles to pretty accurate run time estimates for both flavors.
Hi, Any idea where BOINC
)
Hi,
Any idea where BOINC stores the DCF factor?
I mainly run BRP4G and the odd BRP5 now and then, but I'm getting wigged out by that incorrect DCF on BRP4G:-)
And it seems to be extending the running times of current tasks:-( Since I don't run my rig 24/7 I need to be able to determine fairly accurately the end time of running tasks.
I normally maintain a 0.5 day cache, but have found I need to up that to 1.5 days now and again to get any WU at all.
Regards,
Cliff
Cliff,
Been there, Done that, Still no damm T Shirt.
RE: Hi, Any idea where
)
One place is the Boinc Manager, you open it then click on the projects tab, then your project, then properties on the left and scroll to the bottom of the page. In C:\programdata\boinc is a file called "client_state.xml", as you scroll thru it you will see a line like this:
1.000000
for each project. BE CAREFUL if you change the numbers though, it can be a HUGE mess if you set it wrong. The ideal setting is 1.0 so both of my above numbers are good enough.
But this one:
0.488683
is a little low. While this one:
3.018424
is a bit high. On both of the last two I would change the first number, before the decimal point to a ONE and then save the file and restart Boinc.
Apparently you can also just reset the project to set it back to the default settings. I think you will lose any workunits for that project though, both in progress and cached ones.
Hi Mikey, Thanks,
)
Hi Mikey,
Thanks, I'll take a look
Regards,
Cliff
Cliff,
Been there, Done that, Still no damm T Shirt.
RE: I've noticed that BOINC
)
BRP5 tasks take around 3.3 times longer than BRP4G tasks and are awarded 3.3 times more credit. The difference in run time and the fact that both task types cause BOINC to make adjustments to the (single) DCF is not an issue you should be concerned about. Nor is the actual value of the DCF itself. It really doesn't matter if the DCF is 0.5 or 1.0 or even 5.0, if that is what it really needs to be.
In a perfect world, if the project supplied estimate is precisely the same as the actual crunch time, the DCF will be 1.0 and all will be happy.
In an imperfect world, the project supplied estimate might be twice (or half) the true run time and BOINC will adapt to this by progressively adjusting the DCF until it reaches 0.5 (or 2.0), at which point all will be happy again.
A real problem only arises when there are multiple searches at the one project where the different estimates disagree with the actuals in vastly different amounts, perhaps even in opposing directions.
For example, if BRP4G tasks took an hour and were estimated at 2 hours, and if BRP5 tasks took 3.5 hours and were estimated at 7 hours (ie estimates are both double the actuals), BOINC would fix this itself by reducing the DCF to 0.5, at which point, new tasks of either type would arrive with correct estimates.
However, if BRP4G took an hour but were estimated at 0.5 hours, whilst BRP5 took 3.5 hours and were estimated at 7, this would create a big problem. Each completed BRP4G task under this regime, would cause BOINC to push up the DCF towards 2.0 since they really take double the estimate. Each completed BRP5 task would still be causing BOINC to try to reduce the DCF towards 0.5. The DCF is likely to bounce around quite a lot depending on the numbers of different task types crunched.
So it doesn't really matter if estimates are inaccurate as long as the different searches have similar inaccuracies to each other in both magnitude and direction.
To be sure about what is happening, you should publish the estimates for each task type when you receive a new batch of work containing both types. Then we can see exactly the disparity between estimate and actual for both at a time when DCF is constant. You can't really tell by looking at different work fetch events because the DCF may have changed in the interim.
I haven't done any BRP4G for a very long time but my impression when I last did was that the ratio between estimate and actual was pretty much the same for both searches. However it could be quite different on different hosts with different types of CPUs and GPUs.
Another factor to consider is the number of concurrent GPU tasks that you are running. Do you run multiple concurrent tasks and is the concurrency factor different for the two searches?
For example, assume BRP4G tasks are estimated at 1 hour and take 1 hour when done singly. The DCF would be 1.0. You decide to run these 3x and you find that 3 tasks are now completed in 2 hours. Effectively you are completing a task every 40 mins but BOINC sees 2 hours. I don't know for sure how BOINC handles this but what seems to happen is that the DCF blows out to 2.0 and new tasks arrive with estimates that increase immediately to 2 hours. With a single DCF this could be a problem for other searches. All my GPU hosts run both CPU and concurrent GPU tasks. Running multiple GPU tasks causes the estimate for CPU tasks to blow out considerably due to this effect.
Happily for me, I like seeing the CPU tasks with estimates that are twice or three times what they actually take. I have a number of quad core hosts with AMD GPUs running tasks 4x. This means that 2 CPU cores are crunching and 2 are reserved for GPU support. BOINC still does work fetch based on 4 CPU cores but because the estimates are 2 to 3 times too large, I always end up with about the right amount of CPU work on each host anyway.
Cheers,
Gary.
Hi Gary, That's all
)
Hi Gary,
That's all well and fine, providing your rig runs 24/7 and does only WU when running.
Mine does not run 24/7 nor is it only processing WU, there are other things that I need it to do, which cannot be done while Boinc is active. When I tried to run with boinc active it crashed, resulting in having to reset the project.
Additionally I run 2 projects concurrently so I like to know reasonably accurate timings.
I have limited finances so have to watch my electricity usage, I've already had a situation where my bill exceeded my ability to pay it in 2012, which caused me a few problems sorting out cashflow:-/ So now I severly restrict the time my rig is drawing power, and ensure 'everything' else in the house bar 1 energy efficient light is switched off when my rig is running.
Basically I want to get the most out of my rig, which the least capital outlay:-)
I have no wish to have to quit DC for another 2 years in order to cover cost outlay again.
Regards,
Cliff
Cliff,
Been there, Done that, Still no damm T Shirt.
Cliff, When you said it
)
Cliff,
When you said it crashed...did it lock up or did it reboot? I noticed you are using AMD 8350 chip. I used to run that same chip as well. Only upgrade my machine a few weeks back due to issues of continued reboots or lock ups requiring forced reboots.
I know Mike loves the 8350 that he has but what I've learned from others who also have them is that you can't stress that chip like you can other AMD chips.
In a nutshell, you have 8 cores but only 4 floating points so if you try to stress that chip you have 2 cores fighting over that 1 floating point. Causes the computer to either reboot or freeze.
I ended up only using the Chip to support the GPUs and they did fine.
Just thought I'd mention this in case you continue to have problems.
Happy Crunching...
Zalster
Zalster, Lockup,
)
Zalster,
Lockup, requiring a hard reboot. BUT having seen similar before I devised a cunning plan:-) Bios reset to defaults, and since the 8350 is water cooled with dual push/pull fans it seems to have worked ok.
The problem I find is that over time one tends to tweak the bios, then update it several times and all works OK, until you try running both CPU & GPU flat out then the bios problems come to the surface. Since the manufactures defaults are fail safe ones, they tend to sort out most glitches caused by tweaking it over time.
BTW the 8350 is a BE variant and as I've become aware of the FP shortage I limit the active CPU cores to leave sufficient free for the GPU's. I also limit
the % of CPU time available to 75%, I was running it at 90% when the lockups occurred, and I was also running a GPU intensive program at the time, which is what I believe stressed the CPU to the freeze point.
Now that program and a couple of CPU intensive ones are set as exclusive in BM settings, but I exit Boinc prior to running them anyway to be on the safe side.
Regards,
Cliff
Cliff,
Been there, Done that, Still no damm T Shirt.
RE: Hi Gary, That's
)
One thing you could do is to limit Einstein to running either the BRP4 OR the BRP5 tasks, that way your DCF will stabilize, you only worry about the DCF PER project, as the Einstein DCF will NOT affect any other project.