Task duration correction factor

[AF>EDLS]zOU

Joined: 5 May 15

Posts: 65

Credit: 384235373

RAC: 0

15 Nov 2022 11:57:05 UTC

Topic 228570

(moderation:

)

Can anyone explain in layman's terms the "Task duration correction factor" value ?

Arc A750 system: Task duration correction factor:0.180861

Dual GTX980 system: Task duration correction factor:0.355147

Pure CPU system: Task duration correction factor:1.591853

[AF>EDLS]zOU

Joined: 5 May 15

Posts: 65

Credit: 384235373

RAC: 0

ok, I found

15 Nov 2022 12:06:29 UTC

Message 203872

(moderation:

)

ok,

I found this:

An TDCF less than 1 means the host is faster than the estimate and greater than one is slower.

so that helps the project sending enough WU so they can be completed before the deadline, if I understand it correctly.

Sending too many WU would get some WU aborted, sending not enough would force the host to ask for WU too often

mikey

Joined: 22 Jan 05

Posts: 12676

Credit: 1839076224

RAC: 3989

[AF>EDLS

15 Nov 2022 12:10:26 UTC

Message 203873 in response to message 203872

(moderation:

)

[AF>EDLS wrote:

zOU]

ok,

I found this:

An TDCF less than 1 means the host is faster than the estimate and greater than one is slower.

so that helps the project sending enough WU so they can be completed before the deadline, if I understand it correctly.

Sending too many WU would get some WU aborted, sending not enough would force the host to ask for WU too often

Yes that's correct but it changes over time as you work on different kinds of tasks or the project changes a type of task so it runs shorter or longer, it also takes time to settle down into a fairly constant number

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7218694931

RAC: 978638

mikey wrote: it also takes

15 Nov 2022 15:02:24 UTC

Message 203878 in response to message 203873

(moderation:

)

mikey wrote:

it also takes time to settle down into a fairly constant number

A particular gripe with the system is that it NEVER settles down to a fairly constant number in many cases where you have more than one type of Einstein task running on a system. The "correct" DCF for that system for one type of task differs from that for another type of task--sometimes by an order of magnitude. So the current value of DCF depends heavily on which tasks have completed recently. This in turn leads to surges in work fetch activity, fluctuations in work queue size, and sometimes to "panic mode" prioritization of tasks deemed in danger of failing to meet their deadlines.

I dislike this enough that I have usually run Einstein with just one type of task allowed by my preferences at any given time.

As it happens, in recent weeks I've made an exception to this rule, and am currently running two types: MeerKAT GPU tasks and GRP #1 GPU tasks. These differ by a small enough amount, and I have my queue request set low enough, that the problems I've mentioned have not been bothering me.

mikey

Joined: 22 Jan 05

Posts: 12676

Credit: 1839076224

RAC: 3989

archae86 wrote:mikey

15 Nov 2022 20:41:09 UTC

Message 203887 in response to message 203878

(moderation:

)

archae86 wrote:

mikey wrote:

it also takes time to settle down into a fairly constant number

A particular gripe with the system is that it NEVER settles down to a fairly constant number in many cases where you have more than one type of Einstein task running on a system. The "correct" DCF for that system for one type of task differs from that for another type of task--sometimes by an order of magnitude. So the current value of DCF depends heavily on which tasks have completed recently. This in turn leads to surges in work fetch activity, fluctuations in work queue size, and sometimes to "panic mode" prioritization of tasks deemed in danger of failing to meet their deadlines.

I dislike this enough that I have usually run Einstein with just one type of task allowed by my preferences at any given time.

As it happens, in recent weeks I've made an exception to this rule, and am currently running two types: MeerKAT GPU tasks and GRP #1 GPU tasks. These differ by a small enough amount, and I have my queue request set low enough, that the problems I've mentioned have not been bothering me.

Add in if you run both cpu and gpu tasks on the same pc and it's always in a state of flux, you seem to have found a sweet spot though

That's why I have so many pc's, it means I can run cpu tasks from Einstein on this pc and gpu tasks on that pc and the cache is going crazy giving me 200 cpu tasks and then telling me 'the cache is full' when I run out of gpu tasks!!

Task duration correction factor

Forums › Cafe Einstein

ok, I found

[AF>EDLS

mikey wrote: it also takes

archae86 wrote:mikey

Comment viewing options

Forums › Cafe Einstein