Task duration

Apoch
Apoch
Joined: 5 Feb 08
Posts: 9
Credit: 475028
RAC: 0
Topic 193810

I have noticed that the new SR4 wu's are only scheduled according to my client to take 3-4 hr's. Unfortunately they actually take 12-18hr's depending on which CPU they are running on (my two machines currently attached AMD athalon 2500, 3200). This resulted in my clients receiving way to much work for me to finish by the deadline.

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 34

Task duration

That's due to BOINC's result duration correction factor being set to a number that corrects the run time of the S5R3 tasks. It will have to learn again how long the new tasks take.

Since you knew the new search was coming, you could've told BOINC to get a lesser queue of work. You can still do so now.

Ananas
Ananas
Joined: 22 Jan 05
Posts: 272
Credit: 2500681
RAC: 0

The new fpops_est seems to

The new fpops_est seems to require a _very_ different correction factor, especially compared to the "old" optimized apps.

Even with a cache setting of only 1.5 days, I received 16 workunits on a single Pentium III / 800

It is possible though, that removing the app_info.xml has reset the DCF to default 1.00

Brian Silvers
Brian Silvers
Joined: 26 Aug 05
Posts: 772
Credit: 282700
RAC: 0

RE: The new fpops_est seems

Message 83557 in response to message 83556

Quote:

The new fpops_est seems to require a _very_ different correction factor, especially compared to the "old" optimized apps.

Even with a cache setting of only 1.5 days, I received 16 workunits on a single Pentium III / 800

It is possible though, that removing the app_info.xml has reset the DCF to default 1.00

/me watching with interest and somewhat dismay...

Ananas
Ananas
Joined: 22 Jan 05
Posts: 272
Credit: 2500681
RAC: 0

Have to correct my above

Have to correct my above post, I had looked at the wrong venue, setting for that P3 is even at 0.35 days - that's good for 16 tasks?

Apoch
Apoch
Joined: 5 Feb 08
Posts: 9
Credit: 475028
RAC: 0

RE: That's due to BOINC's

Message 83559 in response to message 83555

Quote:

That's due to BOINC's result duration correction factor being set to a number that corrects the run time of the S5R3 tasks. It will have to learn again how long the new tasks take.

Since you knew the new search was coming, you could've told BOINC to get a lesser queue of work. You can still do so now.



Well I may have known it was coming ,but I did not know it would need to re-learn the task duration factor. being this is the first time I have run into a situation like this. Your post seems a bit condescending, excuse me for my noobness.

Of course the logical thing to do is reduce work cache settings which I did right away, however this does not really help me after the fact. <_<

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 34

RE: It is possible though,

Message 83560 in response to message 83556

Quote:
It is possible though, that removing the app_info.xml has reset the DCF to default 1.00


Nah. I removed mine and my RDCF is still at 0.427171 on this computer.
I only got 1 task, due to my very low additional work requests (0.05).

Brian Silvers
Brian Silvers
Joined: 26 Aug 05
Posts: 772
Credit: 282700
RAC: 0

Unless there is some sort of

Unless there is some sort of dramatic speed increase during a task's runtime, my P4 is really struggling. It's been chewing on a task for about 2 hours and 40 minutes and is only at around 9.25% estimated completion.

Ananas
Ananas
Joined: 22 Jan 05
Posts: 272
Credit: 2500681
RAC: 0

I found the problem now. It

I found the problem now. It does deliver SR4 files to computers having an app_info.xml for SR3.

Those results are rejected by the BOINC client (5.10.28 in this case) :

[error] State file error: missing application einstein_S5R4
[error] Can't handle task h1_0273.50_S5R4__87_S5R4a in scheduler reply

but the BOINC client does not tell the scheduler, that it has rejected it.

So the web site still lists the result for that box, the box doesn't have anything though.

So that P3 box mentioned above must have received a bunch of SR4 WUs, before I removed the app_info.xml

I guess this is either a configuration problem on the Einstein project side or a bug in this BOINC server side version.

Seems to be quite critical to me because it will create 1000s of ghost WUs.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2962995782
RAC: 699399

RE: I found the problem

Message 83563 in response to message 83562

Quote:

I found the problem now. It does deliver SR4 files to computers having an app_info.xml for SR3.

Those results are rejected by the BOINC client (5.10.28 in this case) :

[error] State file error: missing application einstein_S5R4
[error] Can't handle task h1_0273.50_S5R4__87_S5R4a in scheduler reply

but the BOINC client does not tell the scheduler, that it has rejected it.

So the web site still lists the result for that box, the box doesn't have anything though.

So that P3 box mentioned above must have received a bunch of SR4 WUs, before I removed the app_info.xml

I guess this is either a configuration problem on the Einstein project side or a bug in this BOINC server side version.

Seems to be quite critical to me because it will create 1000s of ghost WUs.


It's already been reported as a 'critical' bug in the BOINC server-side code: trac [trac]#713[/trac]. It's been seen at SETI, SETI Beta and CPDN Beta as well, so it's nothing (special) to do with Einstein - just another BOINC trip-wire for Bruce to fall over during his server upgrade weekend.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117882314797
RAC: 34731082

RE: RE: That's due to

Message 83564 in response to message 83559

Quote:
Quote:

That's due to BOINC's result duration correction factor being set to a number that corrects the run time of the S5R3 tasks. It will have to learn again how long the new tasks take.

Since you knew the new search was coming, you could've told BOINC to get a lesser queue of work. You can still do so now.



Well I may have known it was coming ,but I did not know it would need to re-learn the task duration factor. being this is the first time I have run into a situation like this. Your post seems a bit condescending, excuse me for my noobness.

Of course the logical thing to do is reduce work cache settings which I did right away, however this does not really help me after the fact. <_<

I've picked this post to reply to, not because I want to be condescending towards anyone or pick on anyone for perhaps "noobness" or otherwise. I just want to try to explain the situation and give advice on how to best rectify things. Right up front I want to assure people that it's quite easy to get back to some equilibrium.

Firstly a bit of history. The estimate of crunch time is set by the work unit generator (WUG). Back at the start of S5R3, the apps were rather slower than they were at the end. The WUG estimate was OK at the start of S5R3 but towards the end, the tasks were being done much faster than the WUG estimate. BOINC handles this by lowering the duration correction factor (DCF) stored in your host's state file (client_state.xml). BOINC knows nothing about the efficacy of the WUG estimate. It simply reacts to reality as it perceives it over time. BOINC learns to correct the now incorrect estimate built into the WUG by lowering the DCF. At the end of S5R3, it would not be uncommon for the DCF to be 0.25 or less just to cope with the old WUG estimate which is now hopelessly too long for reality.

Enter S5R4, with a new WUG providing new and updated estimates. Your BOINC client cannot possibly know in advance about the changes in the new WUG so until it knows better (by relearning) it will assume that a DCF of 0.25 is still OK. On the other hand, the new WUG now knows a much better estimate of the real time so a new task now contains a realistic estimate of the true time but BOINC then applies the existing correction of say 0.25 and comes up with its own estimate that is 4 times shorter than it should be. When the first new task is completed, BOINC will get a very sudden wakeup call and will (in one big hit) change the DCF from 0.25 to perhaps a lot closer to the "ideal" value of 1.0.

Until BOINC has a chance to do this, the only real problem is that you will likely get a whole bunch of new work - more than your machine can handle - until the new DCF takes effect. This will only be a problem if you have left your cache size at something like 10 days. In that case you might end up with about 40 days of actual work.

Is this a big deal? Well actually, no it's not! All you have to do is firstly return your cache settings to something a bit more realistic, say 1 - 2 days at the most and then simply abort whatever tasks you feel are in excess of what your machine can comfortably handle. Immediately after you abort the excess, hit the "update" function so that the server can immediately be notified and then it will simply resend these excess tasks to someone else. End of problem. Very little bandwidth has been wasted because a task is just a very small set of parameters that tell the science app how to crunch the data that is already on your machine. Aborting tasks is NOT aborting large data files.

Is there a way to avoid all this? No, not really. Whenever the WUG changes, BOINC can't know in advance if the new crunch time estimates are good bad or indifferent. It will always have to relearn the new reality. We can assist by not leaving machines with overly large caches.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.