No new work- scheduler says unreliable?

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 800820576

RAC: 1221963

RE: So, ANYONE? Why is

29 Jul 2009 7:25:00 UTC

Message 93954 in response to message 93953

(moderation:

)

Quote:

So, ANYONE? Why is BOINC not running Einstein jobs?

It now has only one CPU to play with, and the I guess the work cache for SETI is still filled from the time when it was scheduling for two CPUs, to the client didn't ask for more CPU tasks for E@H or SETI. Allow some time for BOINC to reach equilibrium again.

CU
Bikeman

bigjohn

Joined: 20 Oct 06

Posts: 10

Credit: 135392856

RAC: 0

RE: RE: So, ANYONE? Why

31 Jul 2009 6:11:30 UTC

Message 93955 in response to message 93954

(moderation:

)

Quote:

Quote:
So, ANYONE? Why is BOINC not running Einstein jobs?

It now has only one CPU to play with, and the I guess the work cache for SETI is still filled from the time when it was scheduling for two CPUs, to the client didn't ask for more CPU tasks for E@H or SETI. Allow some time for BOINC to reach equilibrium again.

CU
Bikeman

Now the message has changed a bit - but the server still does not like me...

Quote:

7/31/2009 12:28:08 AM Einstein@Home Scheduler request failed: Couldn't resolve host name
7/31/2009 12:29:08 AM Einstein@Home Sending scheduler request: To fetch work.
7/31/2009 12:29:08 AM Einstein@Home Requesting new tasks for GPU
7/31/2009 12:29:13 AM Einstein@Home Scheduler request completed: got 0 new tasks
7/31/2009 12:29:13 AM Einstein@Home Message from server: No work sent
7/31/2009 12:49:34 AM SETI@home Computation for task 12mr09ad.23116.22976.5.10.4_0 finished
7/31/2009 12:49:34 AM SETI@home Starting 12mr09ad.23116.22976.5.10.6_0
7/31/2009 12:49:34 AM SETI@home Starting task 12mr09ad.23116.22976.5.10.6_0 using setiathome_enhanced version 608
7/31/2009 12:49:36 AM SETI@home Started upload of 12mr09ad.23116.22976.5.10.4_0_0
7/31/2009 12:49:40 AM SETI@home Finished upload of 12mr09ad.23116.22976.5.10.4_0_0
7/31/2009 12:54:36 AM Einstein@Home Sending scheduler request: To fetch work.
7/31/2009 12:54:36 AM Einstein@Home Requesting new tasks for GPU
7/31/2009 12:54:41 AM Einstein@Home Scheduler request completed: got 0 new tasks
7/31/2009 12:54:41 AM Einstein@Home Message from server: No work sent
7/31/2009 1:14:04 AM SETI@home Computation for task 12mr09ad.23116.22976.5.10.6_0 finished
7/31/2009 1:14:04 AM SETI@home Starting 12mr09ab.23074.23794.10.10.23_1
7/31/2009 1:14:04 AM SETI@home Starting task 12mr09ab.23074.23794.10.10.23_1 using setiathome_enhanced version 608
7/31/2009 1:14:04 AM SETI@home Sending scheduler request: To fetch work.

Michael Karlinsky

Joined: 22 Jan 05

Posts: 888

Credit: 23502182

RAC: 0

RE: 7/31/2009 12:29:08 AM

31 Jul 2009 6:26:15 UTC

Message 93956

(moderation:

)

Quote:

7/31/2009 12:29:08 AM Einstein@Home Requesting new tasks for GPU

That is OK, E@H does not have any *G*PU application yet.

Michael

Team Linux Users Everywhere

Jord

Joined: 26 Jan 05

Posts: 2952

Credit: 5893653

RAC: 0

There is good reasoning

31 Jul 2009 6:32:36 UTC

Message 93957

(moderation:

)

There is good reasoning behind this work request. It is only done in case the project installs a GPU application from one day to the other, so people who have a GPU and want that project to work on their GPU will get work. It's just a simple check, nothing broken. It won't interfere with requesting CPU work either, all it does is look weird, nothing more.

bigjohn

Joined: 20 Oct 06

Posts: 10

Credit: 135392856

RAC: 0

RE: There is good reasoning

31 Jul 2009 11:33:49 UTC

Message 93958 in response to message 93957

(moderation:

)

Quote:

There is good reasoning behind this work request. It is only done in case the project installs a GPU application from one day to the other, so people who have a GPU and want that project to work on their GPU will get work. It's just a simple check, nothing broken. It won't interfere with requesting CPU work either, all it does is look weird, nothing more.

Which I understand, but Einstein has not done any work now in 4 - 5 days.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5883

Credit: 119076744577

RAC: 24075704

RE: RE: There is good

31 Jul 2009 23:32:12 UTC

Message 93959 in response to message 93958

(moderation:

)

Quote:

Quote:
There is good reasoning behind this work request. It is only done in case the project installs a GPU application from one day to the other, so people who have a GPU and want that project to work on their GPU will get work. It's just a simple check, nothing broken. It won't interfere with requesting CPU work either, all it does is look weird, nothing more.

Which I understand, but Einstein has not done any work now in 4 - 5 days.

I'm only just reading this thread for the first time so perhaps I'll try to cover some issues raised in your previous messages as well. Perhaps I'll be repeating stuff that others have covered previously, so sorry if any of this is no longer needed.

In your first message you were confusing large data files (for GW crunching) that the server was acknowledging, with GW tasks that you could see in Boinc Manager. These are two quite separate entities. The large data files are not tasks they are simply the data that potentially a large number of tasks will operate on. You may get hundreds of tasks over time that use the same set of data files. When the data is exhausted, the data files are thrown away (you will see 'delete' messages) and fresh data is downloaded. This is quite separate from downloading an actual task.

A task is simply a set of parameters to feed to the science app to tell it how to operate on a particular set of data files. Each task listed in Boinc Manager will have a frequency component to its name eg. ..._0866.75_... which will match that in a pair of large data files. However other data files on either side of this frequency will also be used in the crunching of the task. If you get a new task at the next frequency step, say ..._0866.80_..., it is possible that you may also need to download 2 more large data files at a somewhat higher frequency, say ..._0867.05_... perhaps.

With regard to the client errors you have been having - yes they were most likely due to the bug when using CPU throttling. No doubt you have resolved those errors by only using 1 CPU and setting throttling at 100%. I understand your concern about allowing your background tasks to run unhindered but have you actually tried using both CPUs and no throttling to see what happens? BOINC runs the science apps at idle (the lowest) priority so that they should give up the CPU the moment anything else needs to run. My experience is that BOINC is quite good at doing this so perhaps your background tasks would be quite OK without closing off a whole CPU to BOINC.

With regard to your final question as to why EAH is not getting work at the moment. You haven't told us about your resource shares but your current SAH/EAH RAC figures show 1502/147 which suggest something like a 90%/10% or even 95%/5% split. How you split your resources is entirely up to you but please realise that a heavily one-sided split like this is likely to lead to prolonged periods where the minor project will not have work - just as you are seeing. If Seti has an outage, BOINC is likely to overstock with EAH tasks which will probably invoke 'high priority' mode for those tasks when Seti again has work. Once the crisis is over and the EAH work is finished, BOINC is unlikely to allow more EAH work until the debt is repaid to Seti - unless another Seti outage comes along and spoils the party.

You basically have two choices - either accept that EAH will have periods of no work on board, or give EAH a somewhat higher share of the pie. Also, I've just noticed that you are using BOINC 6.6.36. Many people are reporting weird scheduling issues with 6.6.x and the mere fact that there are now newer versions (.37, .38, and reputedly .39) doesn't give a great deal of confidence that things are fixed yet. Unless you are wanting to run CUDA apps at Seti, I would be tempted to give the whole 6.6.x mess a big miss and revert to something like 6.4.7 perhaps.

Cheers,
Gary.

bigjohn

Joined: 20 Oct 06

Posts: 10

Credit: 135392856

RAC: 0

RE: RE: RE: There is

1 Aug 2009 15:00:38 UTC

Message 93960 in response to message 93959

(moderation:

)

Quote:

Quote:
Quote:
There is good reasoning behind this work request. It is only done in case the project installs a GPU application from one day to the other, so people who have a GPU and want that project to work on their GPU will get work. It's just a simple check, nothing broken. It won't interfere with requesting CPU work either, all it does is look weird, nothing more.

Which I understand, but Einstein has not done any work now in 4 - 5 days.

I'm only just reading this thread for the first time so perhaps I'll try to cover some issues raised in your previous messages as well. Perhaps I'll be repeating stuff that others have covered previously, so sorry if any of this is no longer needed.

In your first message you were confusing large data files (for GW crunching) that the server was acknowledging, with GW tasks that you could see in Boinc Manager. These are two quite separate entities. The large data files are not tasks they are simply the data that potentially a large number of tasks will operate on. You may get hundreds of tasks over time that use the same set of data files. When the data is exhausted, the data files are thrown away (you will see 'delete' messages) and fresh data is downloaded. This is quite separate from downloading an actual task.

A task is simply a set of parameters to feed to the science app to tell it how to operate on a particular set of data files. Each task listed in Boinc Manager will have a frequency component to its name eg. ..._0866.75_... which will match that in a pair of large data files. However other data files on either side of this frequency will also be used in the crunching of the task. If you get a new task at the next frequency step, say ..._0866.80_..., it is possible that you may also need to download 2 more large data files at a somewhat higher frequency, say ..._0867.05_... perhaps.

With regard to the client errors you have been having - yes they were most likely due to the bug when using CPU throttling. No doubt you have resolved those errors by only using 1 CPU and setting throttling at 100%. I understand your concern about allowing your background tasks to run unhindered but have you actually tried using both CPUs and no throttling to see what happens? BOINC runs the science apps at idle (the lowest) priority so that they should give up the CPU the moment anything else needs to run. My experience is that BOINC is quite good at doing this so perhaps your background tasks would be quite OK without closing off a whole CPU to BOINC.

With regard to your final question as to why EAH is not getting work at the moment. You haven't told us about your resource shares but your current SAH/EAH RAC figures show 1502/147 which suggest something like a 90%/10% or even 95%/5% split. How you split your resources is entirely up to you but please realise that a heavily one-sided split like this is likely to lead to prolonged periods where the minor project will not have work - just as you are seeing. If Seti has an outage, BOINC is likely to overstock with EAH tasks which will probably invoke 'high priority' mode for those tasks when Seti again has work. Once the crisis is over and the EAH work is finished, BOINC is unlikely to allow more EAH work until the debt is repaid to Seti - unless another Seti outage comes along and spoils the party.

You basically have two choices - either accept that EAH will have periods of no work on board, or give EAH a somewhat higher share of the pie. Also, I've just noticed that you are using BOINC 6.6.36. Many people are reporting weird scheduling issues with 6.6.x and the mere fact that there are now newer versions (.37, .38, and reputedly .39) doesn't give a great deal of confidence that things are fixed yet. Unless you are wanting to run CUDA apps at Seti, I would be tempted to give the whole 6.6.x mess a big miss and revert to something like 6.4.7 perhaps.

Thank you Gary.

I do have a pretty good understanding of how this works, sliceing the info out of the larger file.

As to resource share, I'm set at 60/40, and to run only when I am not at the computer. And yes, I have a 260GTX so I'm running CUDA apps for Seti, so the Client 3.6.x is needed, otherwise the CUDA apps don't go into standby when I use the computer, and, potentially bluescreen me.

I will try 100%x2cpu - I just don't want to impact the virtual server running...

still, no "work" listed in BOINC. SETI, on the other hand, has hundreds of tasks there, with due dates out into late september...

No new work- scheduler says unreliable?

Forums › Problems and Bug Reports

RE: So, ANYONE? Why is

RE: RE: So, ANYONE? Why

RE: 7/31/2009 12:29:08 AM

There is good reasoning

RE: There is good reasoning

RE: RE: There is good

RE: RE: RE: There is

Comment viewing options

Forums › Problems and Bug Reports