Task duration

Apoch

Joined: 5 Feb 08

Posts: 9

Credit: 475028

RAC: 0

5 Aug 2008 20:37:07 UTC

Topic 193810

(moderation:

)

I have noticed that the new SR4 wu's are only scheduled according to my client to take 3-4 hr's. Unfortunately they actually take 12-18hr's depending on which CPU they are running on (my two machines currently attached AMD athalon 2500, 3200). This resulted in my clients receiving way to much work for me to finish by the deadline.

Jord

Joined: 26 Jan 05

Posts: 2952

Credit: 5893653

RAC: 34

Task duration

5 Aug 2008 21:18:36 UTC

Message 83555

(moderation:

)

That's due to BOINC's result duration correction factor being set to a number that corrects the run time of the S5R3 tasks. It will have to learn again how long the new tasks take.

Since you knew the new search was coming, you could've told BOINC to get a lesser queue of work. You can still do so now.

Ananas

Joined: 22 Jan 05

Posts: 272

Credit: 2500681

RAC: 0

The new fpops_est seems to

5 Aug 2008 21:58:07 UTC

Message 83556

(moderation:

)

The new fpops_est seems to require a _very_ different correction factor, especially compared to the "old" optimized apps.

Even with a cache setting of only 1.5 days, I received 16 workunits on a single Pentium III / 800

It is possible though, that removing the app_info.xml has reset the DCF to default 1.00

Brian Silvers

Joined: 26 Aug 05

Posts: 772

Credit: 282700

RAC: 0

RE: The new fpops_est seems

5 Aug 2008 22:18:09 UTC

Message 83557 in response to message 83556

(moderation:

)

Quote:

The new fpops_est seems to require a _very_ different correction factor, especially compared to the "old" optimized apps.

Even with a cache setting of only 1.5 days, I received 16 workunits on a single Pentium III / 800

It is possible though, that removing the app_info.xml has reset the DCF to default 1.00

/me watching with interest and somewhat dismay...

Ananas

Joined: 22 Jan 05

Posts: 272

Credit: 2500681

RAC: 0

Have to correct my above

5 Aug 2008 22:55:30 UTC

Message 83558

(moderation:

)

Have to correct my above post, I had looked at the wrong venue, setting for that P3 is even at 0.35 days - that's good for 16 tasks?

Apoch

Joined: 5 Feb 08

Posts: 9

Credit: 475028

RAC: 0

RE: That's due to BOINC's

5 Aug 2008 23:23:06 UTC

Message 83559 in response to message 83555

(moderation:

)

Quote:

That's due to BOINC's result duration correction factor being set to a number that corrects the run time of the S5R3 tasks. It will have to learn again how long the new tasks take.

Since you knew the new search was coming, you could've told BOINC to get a lesser queue of work. You can still do so now.

Well I may have known it was coming ,but I did not know it would need to re-learn the task duration factor. being this is the first time I have run into a situation like this. Your post seems a bit condescending, excuse me for my noobness.

Of course the logical thing to do is reduce work cache settings which I did right away, however this does not really help me after the fact. <_<

Jord

Joined: 26 Jan 05

Posts: 2952

Credit: 5893653

RAC: 34

RE: It is possible though,

5 Aug 2008 23:23:31 UTC

Message 83560 in response to message 83556

(moderation:

)

Quote:

It is possible though, that removing the app_info.xml has reset the DCF to default 1.00

Nah. I removed mine and my RDCF is still at 0.427171 on this computer.
I only got 1 task, due to my very low additional work requests (0.05).

Brian Silvers

Joined: 26 Aug 05

Posts: 772

Credit: 282700

RAC: 0

Unless there is some sort of

6 Aug 2008 0:31:56 UTC

Message 83561

(moderation:

)

Unless there is some sort of dramatic speed increase during a task's runtime, my P4 is really struggling. It's been chewing on a task for about 2 hours and 40 minutes and is only at around 9.25% estimated completion.

Ananas

Joined: 22 Jan 05

Posts: 272

Credit: 2500681

RAC: 0

I found the problem now. It

6 Aug 2008 8:01:14 UTC

Message 83562

(moderation:

)

I found the problem now. It does deliver SR4 files to computers having an app_info.xml for SR3.

Those results are rejected by the BOINC client (5.10.28 in this case) :

[error] State file error: missing application einstein_S5R4
[error] Can't handle task h1_0273.50_S5R4__87_S5R4a in scheduler reply

but the BOINC client does not tell the scheduler, that it has rejected it.

So the web site still lists the result for that box, the box doesn't have anything though.

So that P3 box mentioned above must have received a bunch of SR4 WUs, before I removed the app_info.xml

I guess this is either a configuration problem on the Einstein project side or a bug in this BOINC server side version.

Seems to be quite critical to me because it will create 1000s of ghost WUs.

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2962995782

RAC: 699399

RE: I found the problem

6 Aug 2008 8:24:46 UTC

Message 83563 in response to message 83562

(moderation:

)

Quote:

I found the problem now. It does deliver SR4 files to computers having an app_info.xml for SR3.

Those results are rejected by the BOINC client (5.10.28 in this case) :
[error] State file error: missing application einstein_S5R4
[error] Can't handle task h1_0273.50_S5R4__87_S5R4a in scheduler reply
but the BOINC client does not tell the scheduler, that it has rejected it.

So the web site still lists the result for that box, the box doesn't have anything though.

So that P3 box mentioned above must have received a bunch of SR4 WUs, before I removed the app_info.xml

I guess this is either a configuration problem on the Einstein project side or a bug in this BOINC server side version.

Seems to be quite critical to me because it will create 1000s of ghost WUs.

It's already been reported as a 'critical' bug in the BOINC server-side code: trac [trac]#713[/trac]. It's been seen at SETI, SETI Beta and CPDN Beta as well, so it's nothing (special) to do with Einstein - just another BOINC trip-wire for Bruce to fall over during his server upgrade weekend.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117882314797

RAC: 34731082

RE: RE: That's due to

6 Aug 2008 10:34:42 UTC

Message 83564 in response to message 83559

(moderation:

)

Quote:

Quote:
That's due to BOINC's result duration correction factor being set to a number that corrects the run time of the S5R3 tasks. It will have to learn again how long the new tasks take.

Since you knew the new search was coming, you could've told BOINC to get a lesser queue of work. You can still do so now.

Well I may have known it was coming ,but I did not know it would need to re-learn the task duration factor. being this is the first time I have run into a situation like this. Your post seems a bit condescending, excuse me for my noobness.

Of course the logical thing to do is reduce work cache settings which I did right away, however this does not really help me after the fact. <_<

I've picked this post to reply to, not because I want to be condescending towards anyone or pick on anyone for perhaps "noobness" or otherwise. I just want to try to explain the situation and give advice on how to best rectify things. Right up front I want to assure people that it's quite easy to get back to some equilibrium.

Firstly a bit of history. The estimate of crunch time is set by the work unit generator (WUG). Back at the start of S5R3, the apps were rather slower than they were at the end. The WUG estimate was OK at the start of S5R3 but towards the end, the tasks were being done much faster than the WUG estimate. BOINC handles this by lowering the duration correction factor (DCF) stored in your host's state file (client_state.xml). BOINC knows nothing about the efficacy of the WUG estimate. It simply reacts to reality as it perceives it over time. BOINC learns to correct the now incorrect estimate built into the WUG by lowering the DCF. At the end of S5R3, it would not be uncommon for the DCF to be 0.25 or less just to cope with the old WUG estimate which is now hopelessly too long for reality.

Enter S5R4, with a new WUG providing new and updated estimates. Your BOINC client cannot possibly know in advance about the changes in the new WUG so until it knows better (by relearning) it will assume that a DCF of 0.25 is still OK. On the other hand, the new WUG now knows a much better estimate of the real time so a new task now contains a realistic estimate of the true time but BOINC then applies the existing correction of say 0.25 and comes up with its own estimate that is 4 times shorter than it should be. When the first new task is completed, BOINC will get a very sudden wakeup call and will (in one big hit) change the DCF from 0.25 to perhaps a lot closer to the "ideal" value of 1.0.

Until BOINC has a chance to do this, the only real problem is that you will likely get a whole bunch of new work - more than your machine can handle - until the new DCF takes effect. This will only be a problem if you have left your cache size at something like 10 days. In that case you might end up with about 40 days of actual work.

Is this a big deal? Well actually, no it's not! All you have to do is firstly return your cache settings to something a bit more realistic, say 1 - 2 days at the most and then simply abort whatever tasks you feel are in excess of what your machine can comfortably handle. Immediately after you abort the excess, hit the "update" function so that the server can immediately be notified and then it will simply resend these excess tasks to someone else. End of problem. Very little bandwidth has been wasted because a task is just a very small set of parameters that tell the science app how to crunch the data that is already on your machine. Aborting tasks is NOT aborting large data files.

Is there a way to avoid all this? No, not really. Whenever the WUG changes, BOINC can't know in advance if the new crunch time estimates are good bad or indifferent. It will always have to relearn the new reality. We can assist by not leaving machines with overly large caches.

Cheers,
Gary.

Task duration

Forums › Problems and Bug Reports

Comment viewing options

Forums › Problems and Bug Reports