Why do some CPUs get 200 tasks and not report?

Chris
Chris
Joined: 24 Nov 05
Posts: 5
Credit: 2437261
RAC: 0
Topic 194605

I note a great variance in "work done" for my computer over time. When I check in my task list, I have some completed tasks done 150 tasks ago without corroboration from other computers. When I query the state of those work units, I see a computer assigned maybe 200 work units but only 8 or 10 completed.
The work unit I share with that computer show's "in progress" but it's been "in progress" for many days or weeks.
Why do some computers get 100's of tasks assigned, when they appear to be so slow at producing results? I would think the assignment algorithm would hand out only the number that they can process in a reasonable time, based on historical performance.

tullio
tullio
Joined: 22 Jan 05
Posts: 2118
Credit: 61407735
RAC: 0

Why do some CPUs get 200 tasks and not report?

Quote:
I note a great variance in "work done" for my computer over time. When I check in my task list, I have some completed tasks done 150 tasks ago without corroboration from other computers. When I query the state of those work units, I see a computer assigned maybe 200 work units but only 8 or 10 completed.
The work unit I share with that computer show's "in progress" but it's been "in progress" for many days or weeks.
Why do some computers get 100's of tasks assigned, when they appear to be so slow at producing results? I would think the assignment algorithm would hand out only the number that they can process in a reasonable time, based on historical performance.


This depends on the value assigned by the user to his cache. I keep a very small one but people fill their caches like refrigerators.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5888
Credit: 119940952055
RAC: 26531831

RE: ... I have some

Quote:
... I have some completed tasks done 150 tasks ago without corroboration from other computers.


Your oldest pending task at the moment is around 12 days old. Your wingman (crunching partner) still has 2 days in which to return the result before it will time out. People stop contributing for all sorts of reasons - from hardware failure to loss of interest. The only way the project can deal with this is to wait for the deadline to arrive and then reissue the task to a new wingman. Even then, there is no guarantee of a speedy resolution.

For your oldest pending task, your wingman's computer is a 16 core machine running a server OS. It is likely that the machine has stopped crunching (at least for the moment) because it is required for its normal duties or maybe it has had some sort of hardware failure. It last received new tasks around 10 days ago and doesn't appear to have been in contact since. That machine would be capable of crunching more than 50 tasks per day so a cache of 200 tasks would be quite reasonable. Its listed turnaround time is in fact 3.31 days so it certainly doesn't have an excessive cache. If it were a hardware problem you would think it would be fixed by now so probably it has been removed from crunching without the owner aborting the cache. If this is the case, the tasks will soon be reissued to other wingmen and there will be no ongoing problem. All of us experience this in our pendings list every day.

Quote:
When I query the state of those work units, I see a computer assigned maybe 200 work units but only 8 or 10 completed.


Soon you will see even less as they are deleted from the database. If you had looked back on October 22, you would have seen hundreds of completed tasks. As I said before, the cache size is actually quite normal for a machine of this crunching power.

Quote:
Why do some computers get 100's of tasks assigned ...


The cache settings are a personal choice for each participant. That machine, if crunching full time for E@H, could easily handle a cache of double what you see. There have been several multi-day outages this year so it's not surprising that 24/7 crunchers would choose this sort of cache size. Remember, it's still a turnaround time of less than 4 days.

Quote:
... when they appear to be so slow at producing results?


That machine is not slow in producing results when it is actually 'in production'.

Quote:
I would think the assignment algorithm would hand out only the number that they can process in a reasonable time, based on historical performance.


And that's exactly what the scheduler does. The scheduler doesn't have a crystal ball so it's not very good at predicting the future when historical performance suddenly and dramatically changes :-).

Cheers,
Gary.

Nagilum
Nagilum
Joined: 13 Feb 09
Posts: 12
Credit: 524868
RAC: 0

I have just loaded Win7 and

Message 95381 in response to message 95380

I have just loaded Win7 and solved many issues that I had with Vista, but even with Vista, for the last several months data crunching has been running smoothly. I had too learn what barriers my system had and work out those individual issues before all data crunching ran flawless. It seems the better I run, the more work I receive. It's almost like a trust relationship the Einstein@home server develops with our computers. The better we perform, the more the system will rely on us and send first run data. But thats just my opinion. NAGILUM...

Chris
Chris
Joined: 24 Nov 05
Posts: 5
Credit: 2437261
RAC: 0

Hello Gary and thanks for the

Message 95382 in response to message 95380

Hello Gary and thanks for the reply. I have wondered about the variance for some time, and your info clarifies the process a lot.
Chris

Quote:
Quote:
... I have some completed tasks done 150 tasks ago without corroboration from other computers.

Your oldest pending task at the moment is around 12 days old. Your wingman (crunching partner) still has 2 days in which to return the result before it will time out. People stop contributing for all sorts of reasons - from hardware failure to loss of interest. The only way the project can deal with this is to wait for the deadline to arrive and then reissue the task to a new wingman. Even then, there is no guarantee of a speedy resolution.

For your oldest pending task, your wingman's computer is a 16 core machine running a server OS. It is likely that the machine has stopped crunching (at least for the moment) because it is required for its normal duties or maybe it has had some sort of hardware failure. It last received new tasks around 10 days ago and doesn't appear to have been in contact since. That machine would be capable of crunching more than 50 tasks per day so a cache of 200 tasks would be quite reasonable. Its listed turnaround time is in fact 3.31 days so it certainly doesn't have an excessive cache. If it were a hardware problem you would think it would be fixed by now so probably it has been removed from crunching without the owner aborting the cache. If this is the case, the tasks will soon be reissued to other wingmen and there will be no ongoing problem. All of us experience this in our pendings list every day.

Quote:
When I query the state of those work units, I see a computer assigned maybe 200 work units but only 8 or 10 completed.

Soon you will see even less as they are deleted from the database. If you had looked back on October 22, you would have seen hundreds of completed tasks. As I said before, the cache size is actually quite normal for a machine of this crunching power.

Quote:
Why do some computers get 100's of tasks assigned ...

The cache settings are a personal choice for each participant. That machine, if crunching full time for E@H, could easily handle a cache of double what you see. There have been several multi-day outages this year so it's not surprising that 24/7 crunchers would choose this sort of cache size. Remember, it's still a turnaround time of less than 4 days.

Quote:
... when they appear to be so slow at producing results?


That machine is not slow in producing results when it is actually 'in production'.

Quote:
I would think the assignment algorithm would hand out only the number that they can process in a reasonable time, based on historical performance.

And that's exactly what the scheduler does. The scheduler doesn't have a crystal ball so it's not very good at predicting the future when historical performance suddenly and dramatically changes :-).
Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5888
Credit: 119940952055
RAC: 26531831

RE: Hello Gary and thanks

Message 95383 in response to message 95382

Quote:
Hello Gary and thanks for the reply. I have wondered about the variance for some time, and your info clarifies the process a lot.


You're most welcome! I'm glad the information was of use to you.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.