Why do some CPUs get 200 tasks and not report?

Chris

Joined: 24 Nov 05

Posts: 5

Credit: 2437261

RAC: 0

31 Oct 2009 15:27:34 UTC

Topic 194605

(moderation:

)

I note a great variance in "work done" for my computer over time. When I check in my task list, I have some completed tasks done 150 tasks ago without corroboration from other computers. When I query the state of those work units, I see a computer assigned maybe 200 work units but only 8 or 10 completed.
The work unit I share with that computer show's "in progress" but it's been "in progress" for many days or weeks.
Why do some computers get 100's of tasks assigned, when they appear to be so slow at producing results? I would think the assignment algorithm would hand out only the number that they can process in a reasonable time, based on historical performance.

tullio

Joined: 22 Jan 05

Posts: 2118

Credit: 61407735

RAC: 0

Why do some CPUs get 200 tasks and not report?

31 Oct 2009 16:01:29 UTC

Message 95379

(moderation:

)

Quote:

I note a great variance in "work done" for my computer over time. When I check in my task list, I have some completed tasks done 150 tasks ago without corroboration from other computers. When I query the state of those work units, I see a computer assigned maybe 200 work units but only 8 or 10 completed.
The work unit I share with that computer show's "in progress" but it's been "in progress" for many days or weeks.
Why do some computers get 100's of tasks assigned, when they appear to be so slow at producing results? I would think the assignment algorithm would hand out only the number that they can process in a reasonable time, based on historical performance.

This depends on the value assigned by the user to his cache. I keep a very small one but people fill their caches like refrigerators.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5888

Credit: 119940952055

RAC: 26531831

RE: ... I have some

1 Nov 2009 1:27:00 UTC

Message 95380

(moderation:

)

Quote:

... I have some completed tasks done 150 tasks ago without corroboration from other computers.

Your oldest pending task at the moment is around 12 days old. Your wingman (crunching partner) still has 2 days in which to return the result before it will time out. People stop contributing for all sorts of reasons - from hardware failure to loss of interest. The only way the project can deal with this is to wait for the deadline to arrive and then reissue the task to a new wingman. Even then, there is no guarantee of a speedy resolution.

For your oldest pending task, your wingman's computer is a 16 core machine running a server OS. It is likely that the machine has stopped crunching (at least for the moment) because it is required for its normal duties or maybe it has had some sort of hardware failure. It last received new tasks around 10 days ago and doesn't appear to have been in contact since. That machine would be capable of crunching more than 50 tasks per day so a cache of 200 tasks would be quite reasonable. Its listed turnaround time is in fact 3.31 days so it certainly doesn't have an excessive cache. If it were a hardware problem you would think it would be fixed by now so probably it has been removed from crunching without the owner aborting the cache. If this is the case, the tasks will soon be reissued to other wingmen and there will be no ongoing problem. All of us experience this in our pendings list every day.

Quote:

When I query the state of those work units, I see a computer assigned maybe 200 work units but only 8 or 10 completed.

Soon you will see even less as they are deleted from the database. If you had looked back on October 22, you would have seen hundreds of completed tasks. As I said before, the cache size is actually quite normal for a machine of this crunching power.

Quote:

Why do some computers get 100's of tasks assigned ...

The cache settings are a personal choice for each participant. That machine, if crunching full time for E@H, could easily handle a cache of double what you see. There have been several multi-day outages this year so it's not surprising that 24/7 crunchers would choose this sort of cache size. Remember, it's still a turnaround time of less than 4 days.

Quote:

... when they appear to be so slow at producing results?

That machine is not slow in producing results when it is actually 'in production'.

Quote:

I would think the assignment algorithm would hand out only the number that they can process in a reasonable time, based on historical performance.

And that's exactly what the scheduler does. The scheduler doesn't have a crystal ball so it's not very good at predicting the future when historical performance suddenly and dramatically changes :-).

Cheers,
Gary.

Nagilum

Joined: 13 Feb 09

Posts: 12

Credit: 524868

RAC: 0

I have just loaded Win7 and

2 Nov 2009 6:07:30 UTC

Message 95381 in response to message 95380

(moderation:

)

I have just loaded Win7 and solved many issues that I had with Vista, but even with Vista, for the last several months data crunching has been running smoothly. I had too learn what barriers my system had and work out those individual issues before all data crunching ran flawless. It seems the better I run, the more work I receive. It's almost like a trust relationship the Einstein@home server develops with our computers. The better we perform, the more the system will rely on us and send first run data. But thats just my opinion. NAGILUM...

Chris

Joined: 24 Nov 05

Posts: 5

Credit: 2437261

RAC: 0

Hello Gary and thanks for the

4 Nov 2009 3:38:35 UTC

Message 95382 in response to message 95380

(moderation:

)

Hello Gary and thanks for the reply. I have wondered about the variance for some time, and your info clarifies the process a lot.
Chris

Quote:

Quote:
... I have some completed tasks done 150 tasks ago without corroboration from other computers.

Your oldest pending task at the moment is around 12 days old. Your wingman (crunching partner) still has 2 days in which to return the result before it will time out. People stop contributing for all sorts of reasons - from hardware failure to loss of interest. The only way the project can deal with this is to wait for the deadline to arrive and then reissue the task to a new wingman. Even then, there is no guarantee of a speedy resolution.

For your oldest pending task, your wingman's computer is a 16 core machine running a server OS. It is likely that the machine has stopped crunching (at least for the moment) because it is required for its normal duties or maybe it has had some sort of hardware failure. It last received new tasks around 10 days ago and doesn't appear to have been in contact since. That machine would be capable of crunching more than 50 tasks per day so a cache of 200 tasks would be quite reasonable. Its listed turnaround time is in fact 3.31 days so it certainly doesn't have an excessive cache. If it were a hardware problem you would think it would be fixed by now so probably it has been removed from crunching without the owner aborting the cache. If this is the case, the tasks will soon be reissued to other wingmen and there will be no ongoing problem. All of us experience this in our pendings list every day.

Quote:
When I query the state of those work units, I see a computer assigned maybe 200 work units but only 8 or 10 completed.

Soon you will see even less as they are deleted from the database. If you had looked back on October 22, you would have seen hundreds of completed tasks. As I said before, the cache size is actually quite normal for a machine of this crunching power.

Quote:
Why do some computers get 100's of tasks assigned ...

The cache settings are a personal choice for each participant. That machine, if crunching full time for E@H, could easily handle a cache of double what you see. There have been several multi-day outages this year so it's not surprising that 24/7 crunchers would choose this sort of cache size. Remember, it's still a turnaround time of less than 4 days.

Quote:
... when they appear to be so slow at producing results?

That machine is not slow in producing results when it is actually 'in production'.

Quote:
I would think the assignment algorithm would hand out only the number that they can process in a reasonable time, based on historical performance.

And that's exactly what the scheduler does. The scheduler doesn't have a crystal ball so it's not very good at predicting the future when historical performance suddenly and dramatically changes :-).

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5888

Credit: 119940952055

RAC: 26531831

RE: Hello Gary and thanks

4 Nov 2009 12:10:21 UTC

Message 95383 in response to message 95382

(moderation:

)

Quote:

Hello Gary and thanks for the reply. I have wondered about the variance for some time, and your info clarifies the process a lot.

You're most welcome! I'm glad the information was of use to you.

Cheers,
Gary.

Why do some CPUs get 200 tasks and not report?

Forums › Problems and Bug Reports

Why do some CPUs get 200 tasks and not report?

RE: ... I have some

I have just loaded Win7 and

Hello Gary and thanks for the

RE: Hello Gary and thanks

Comment viewing options

Forums › Problems and Bug Reports