Why am I getting this message?

Jim Wilkins
Jim Wilkins
Joined: 1 Jun 05
Posts: 33
Credit: 28426884
RAC: 0
Topic 194677

Tue Dec 15 12:42:41 2009 Einstein@Home Sending scheduler request: To fetch work.
Tue Dec 15 12:42:41 2009 Einstein@Home Reporting 1 completed tasks, requesting new tasks
Tue Dec 15 12:42:42 2009 SETI@home Computation for task 07mr07ab.9040.2936.14.10.37_1 finished
Tue Dec 15 12:42:44 2009 SETI@home Started upload of 07mr07ab.9040.2936.14.10.37_1_0
Tue Dec 15 12:42:46 2009 Einstein@Home Scheduler request completed: got 1 new tasks
Tue Dec 15 12:42:46 2009 Einstein@Home Message from server: No work can be sent for the applications you have selected
Tue Dec 15 12:42:46 2009 Einstein@Home Message from server: You have selected to receive work from other applications if no work is available for the applications you selected
Tue Dec 15 12:42:46 2009 Einstein@Home Message from server: Sending work from other applications

Tue Dec 15 12:42:48 2009 Einstein@Home Starting h1_1112.15_S5R4__866_S5R6a_0
Tue Dec 15 12:42:49 2009 Einstein@Home Starting task h1_1112.15_S5R4__866_S5R6a_0 using einstein_S5R6 version 501

Not sure why I am getting this. Doesn't seem to affect downloads.

Jim

Michael Karlinsky
Michael Karlinsky
Joined: 22 Jan 05
Posts: 888
Credit: 23502182
RAC: 0

Why am I getting this message?

Same here.

Maybe some server (mis-)configuration.

Michael

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5877
Credit: 118664153801
RAC: 19146767

RE: Not sure why I am

Quote:
Not sure why I am getting this. Doesn't seem to affect downloads.


As a general comment to anyone reading this, if you ever see a message like this that seems a bit odd and for which you would like more info, the scheduler logs are your friend. Take a look at the sticky thread in this very forum that Bruce posted a long time ago about the scheduler logs.

One thing that makes this quite easy is that the time of last contact shown on your computers list is a clickable link to the appropriate scheduler log so you don't even have to work out how to find the correct log - if you're quick and there is no subsequent contact to spoil the party. Since this was the case here, below is the relevant log snippet that I found with a single click and a search for 712103. The hostID for Jim's computer is 712103. This snippet reveals the 'thinking' of the server concerning the request it received from Jim's host.

I don't claim to be any great shakes at interpreting scheduler logs so I'll just give a rough outline of what it means to me. I invite anyone with a more detailed interpretation to please join in.

To me it looks like the scheduler thought it might be able to send some old S5R5 work but in the end it concluded that the best option was to just send an S5R6 task (taskID #151245075) which you now have in your cache. The messages at the end of the snippet are to tell the client that the request for S5R5 work couldn't be honoured and that S5R6 work was sent instead. I don't really know why it felt the necessity to tell this to the client. I also don't understand why it talked about removing the l1_1112.xx files while confirming the h1_1112.xx files near the start of the snippet. Maybe there's a bit of a bug in the current locality scheduling algorithm.

PS: I've just noticed that the host has made further (later) contact and by checking the newer log, the same type of exchange has occurred. It wasn't just a once off.

2009-12-15 17:42:45.4265 [PID=10965]   Request:  [HOST#712103]  client 6.10.21
2009-12-15 17:42:45.6226 [PID=10965] [debug]   [HOST#712103] Resetting nresults_today
2009-12-15 17:42:45.6484 [PID=10965]    [send] effective_ncpus 4 max_jobs_on_host_cpu 999999 max_jobs_on_host 999999
2009-12-15 17:42:45.6484 [PID=10965]    [send] effective_ngpus 0 max_jobs_on_host_gpu 999999
2009-12-15 17:42:45.6484 [PID=10965]    [send] Not using matchmaker scheduling; Not using EDF sim
2009-12-15 17:42:45.6484 [PID=10965]    [send] CPU: req 0.62 sec, 0.00 instances; est delay 0.00
2009-12-15 17:42:45.6484 [PID=10965]    [send] work_req_seconds: 0.62 secs
2009-12-15 17:42:45.6484 [PID=10965]    [send] available disk 99.85 GB, work_buf_min 0
2009-12-15 17:42:45.6484 [PID=10965]    [send] active_frac 0.999819 on_frac 0.454276 DCF 1.565010
2009-12-15 17:42:45.6493 [PID=10965]    [send] [HOST#712103] is reliable
2009-12-15 17:42:45.6493 [PID=10965]    [send] set_trust: random choice for error rate 0.000010: yes
2009-12-15 17:42:45.6493 [PID=10965]    [mixed] sending locality work first
2009-12-15 17:42:45.6494 [PID=10965]    [locality] [HOST#712103] removing file l1_1112.15_S5R4 file_infos list
2009-12-15 17:42:45.6494 [PID=10965]    [locality] [HOST#712103] removing file l1_1112.20_S5R4 file_infos list
2009-12-15 17:42:45.6494 [PID=10965]    [locality] [HOST#712103] removing file l1_1112.25_S5R4 file_infos list
2009-12-15 17:42:45.6495 [PID=10965]    [locality] [HOST#712103] removing file l1_1112.30_S5R4 file_infos list
2009-12-15 17:42:45.6495 [PID=10965]    [locality] [HOST#712103] removing file l1_1112.35_S5R4 file_infos list
2009-12-15 17:42:45.6495 [PID=10965]    [locality] [HOST#712103] removing file l1_1112.40_S5R4 file_infos list
2009-12-15 17:42:45.6495 [PID=10965]    [locality] [HOST#712103] removing file l1_1112.45_S5R4 file_infos list
2009-12-15 17:42:45.6495 [PID=10965]    [locality] [HOST#712103] removing file l1_1112.50_S5R4 file_infos list
2009-12-15 17:42:45.6496 [PID=10965]    [locality] [HOST#712103] removing file skygrid_1120Hz_S5R5.dat file_infos list
2009-12-15 17:42:45.6496 [PID=10965]    [locality] [HOST#712103] has file h1_1112.15_S5R4
2009-12-15 17:42:45.6496 [PID=10965]    [locality] [HOST#712103] has file h1_1112.20_S5R4
2009-12-15 17:42:45.6496 [PID=10965]    [locality] [HOST#712103] has file h1_1112.25_S5R4
2009-12-15 17:42:45.6496 [PID=10965]    [locality] [HOST#712103] has file h1_1112.30_S5R4
2009-12-15 17:42:45.6496 [PID=10965]    [locality] [HOST#712103] has file h1_1112.35_S5R4
2009-12-15 17:42:45.6496 [PID=10965]    [locality] [HOST#712103] has file h1_1112.40_S5R4
2009-12-15 17:42:45.6496 [PID=10965]    [locality] [HOST#712103] has file h1_1112.45_S5R4
2009-12-15 17:42:45.6496 [PID=10965]    [locality] [HOST#712103] has file h1_1112.50_S5R4
2009-12-15 17:42:45.7509 [PID=10965]    [locality] send_old_work() no feasible result older than 336.7 hours
2009-12-15 17:42:45.7520 [PID=10965]    [locality] in_send_results_for_file(h1_1112.15_S5R4, 0) prev_result.id=151177303
2009-12-15 17:42:45.7561 [PID=10965]    [version] Best version of app einstein_S5R6 is ID 155 (3.07 GFLOPS)
2009-12-15 17:42:45.7569 [PID=10965] [debug]   Sorted list of URLs follows [host timezone: UTC-18000]
2009-12-15 17:42:45.7569 [PID=10965] [debug]   zone=-21600 url=http://einstein.phys.uwm.edu
2009-12-15 17:42:45.7569 [PID=10965] [debug]   zone=-28800 url=http://einstein.ligo.caltech.edu
2009-12-15 17:42:45.7569 [PID=10965] [debug]   zone=+00000 url=http://einstein.astro.gla.ac.uk
2009-12-15 17:42:45.7569 [PID=10965] [debug]   zone=+03600 url=http://einstein.aei.mpg.de
2009-12-15 17:42:45.7571 [PID=10965]    [send] [HOST#712103] Sending app_version einstein_S5R6 6 501 ; 3.07 GFLOPS
2009-12-15 17:42:45.7574 [PID=10965]    [locality] [HOST#712103] Already has file h1_1112.15_S5R4
2009-12-15 17:42:45.7575 [PID=10965]    [locality] [HOST#712103] reducing disk needed for WU by 4262400 bytes (length of h1_1112.15_S5R4)
2009-12-15 17:42:45.7581 [PID=10965]    [send] est. duration for WU 63946074: unscaled 16446.57 scaled 56669.73
2009-12-15 17:42:45.7581 [PID=10965]    [HOST#712103] Sending [RESULT#151245075 h1_1112.15_S5R4__866_S5R6a_0] (est. dur. 56669.73 seconds)
2009-12-15 17:42:45.7746 [PID=10965]    [send] don't need more work
2009-12-15 17:42:45.7746 [PID=10965]    [send] don't need more work
2009-12-15 17:42:45.7746 [PID=10965]    [send] don't need more work
2009-12-15 17:42:45.7746 [PID=10965]    [mixed] sending non-locality work second
2009-12-15 17:42:45.7746 [PID=10965]    [send] don't need more work
2009-12-15 17:42:45.7946 [PID=10965] [debug]   [HOST#712103] MSG(high) No work can be sent for the applications you have selected
2009-12-15 17:42:45.7946 [PID=10965] [debug]   [HOST#712103] MSG(high) You have selected to receive work other applications if no work is available for the applications you selected
2009-12-15 17:42:45.7947 [PID=10965] [debug]   [HOST#712103] MSG(high) Sending work other applications
2009-12-15 17:42:45.7947 [PID=10965]    Sending reply to [HOST#712103]: 1 results, delay req 60.00
2009-12-15 17:42:45.7949 [PID=10965]    Scheduler ran 0.464 seconds


Cheers,
Gary.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4332
Credit: 252175324
RAC: 33709

RE: To me it looks like the

Message 96030 in response to message 96029

Quote:
To me it looks like the scheduler thought it might be able to send some old S5R5 work but in the end it concluded that the best option was to just send an S5R6 task (taskID #151245075) which you now have in your cache. The messages at the end of the snippet are to tell the client that the request for S5R5 work couldn't be honoured and that S5R6 work was sent instead. I don't really know why it felt the necessity to tell this to the client. I also don't understand why it talked about removing the l1_1112.xx files while confirming the h1_1112.xx files near the start of the snippet. Maybe there's a bit of a bug in the current locality scheduling algorithm.

There probably is, but not visible here ;-)

I recently switched on a bit more debugging output to find things that need to be fixed in the locality scheduling.

Since the launch of the Radio Pulsar search Einstein@home is actually calling two different scheduler codes to serve both types of work:

The Radio Pulsar search (ABP1) is an ordinary ('non-locality') BOINC type of work, where one data file is downloaded for every task, and after processing the task it is deleted from the client (and even from the server when the workunit is done).

The Gravitational Wave search (S5R6) is 'locality work': The same set of data files is used for multiple workunits. So the whole server (locality scheduler, workunit generator and even transitioner) tries to generate, find and assign tasks to the client that minimizes its download volume.

At each scheduler contact one scheduler part gets called first to handle the client request, then the other one gets called to fill up if more work is requested than the first scheduler part could send. Which scheduler part gets the first chance to send work is chosen at random at each contact, the current setting is 70% locality, 30% non-locality. We call this 'mixed' scheduling.

From the cited log you can see that locality work was sent first ("[mixed] sending locality work first").

The locality scheduler removes everything that is not a 'trigger' (i.e. the beginning of a workunit name) from an internal structure named 'file_infos' list. As far as I can see, the behavior of the 'locality scheduler' is correct.

The message apparently comes from the 'non-locality scheduler' that is called second ("[mixed] sending non-locality work second").

Honestly I don't know yet where it comes from, but I'll look into that. It's definitely a problem on the server side.

BM

BM

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5877
Credit: 118664153801
RAC: 19146767

Thanks, Bernd, for taking the

Thanks, Bernd, for taking the time to explain it all. The things I didn't understand make sense now.

Cheers,
Gary.

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 1

RE: The message apparently

Message 96032 in response to message 96030

Quote:

The message apparently comes from the 'non-locality scheduler' that is called second ("[mixed] sending non-locality work second").

Honestly I don't know yet where it comes from, but I'll look into that. It's definitely a problem on the server side.


Possibly having a non-CUDA GPU in the system? I see similar messages on the GPU work request that my ATI GPU does once in a blue moon, on different projects.

I just did an update and got this:

16-Dec-09 08:00:58 Einstein@Home update requested by user
16-Dec-09 08:01:01 Einstein@Home [sched_op_debug] Starting scheduler request
16-Dec-09 08:01:01 Einstein@Home Sending scheduler request: Requested by user.
16-Dec-09 08:01:01 Einstein@Home Requesting new tasks for GPU
16-Dec-09 08:01:01 Einstein@Home [sched_op_debug] CPU work request: 0.00 seconds; 0 idle CPUs
16-Dec-09 08:01:01 Einstein@Home [sched_op_debug] ATI GPU work request: 864.86 seconds; 1 idle GPUs
16-Dec-09 08:01:06 Einstein@Home Scheduler request completed: got 0 new tasks
16-Dec-09 08:01:06 Einstein@Home [sched_op_debug] Server version 611
16-Dec-09 08:01:06 Einstein@Home Message from server: No work sent
16-Dec-09 08:01:06 Einstein@Home Message from server: No work is available for Hierarchical S5 all-sky GW search #5
16-Dec-09 08:01:06 Einstein@Home Message from server: No work available for the applications you have selected. Please check your settings on the web site.
16-Dec-09 08:01:06 Einstein@Home Project requested delay of 60 seconds
16-Dec-09 08:01:06 Einstein@Home [sched_op_debug] Deferring communication for 1 min 0 sec
16-Dec-09 08:01:06 Einstein@Home [sched_op_debug] Reason: requested by project

Looks familiar? ;-)

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1079
Credit: 341280
RAC: 0

RE: Looks familiar?

Message 96033 in response to message 96032

Quote:
Looks familiar? ;-)


Vaguely ;-)

But I get exactly the same, even without GPU (of any kind):

16/12/2009 06:58:48|Einstein@Home|[sched_op_debug] Starting scheduler request
16/12/2009 06:58:48|Einstein@Home|Sending scheduler request: To fetch work. Requesting 7 seconds of work, reporting 0 completed tasks
16/12/2009 06:58:53||[unparsed_xml] APP_VERSION::parse(): unrecognized: 1.000000
16/12/2009 06:58:53||[unparsed_xml] APP_VERSION::parse(): unrecognized: 1.000000
16/12/2009 06:58:53||[unparsed_xml] APP_VERSION::parse(): unrecognized: 1355697238.732052
16/12/2009 06:58:53||[unparsed_xml] RESULT::parse(): unrecognized:
16/12/2009 06:58:54|Einstein@Home|Scheduler request succeeded: got 1 new tasks
16/12/2009 06:58:54|Einstein@Home|[sched_ops_debug] Server version 611
16/12/2009 06:58:54|Einstein@Home|Message from server: No work can be sent for the applications you have selected
16/12/2009 06:58:54|Einstein@Home|Message from server: You have selected to receive work from other applications if no work is available for the applications you selected
16/12/2009 06:58:54|Einstein@Home|Message from server: Sending work from other applications

16/12/2009 06:58:54|Einstein@Home|Project requested delay of 60.000000 seconds
16/12/2009 06:58:54|Einstein@Home|[sched_op_debug] Deferring communication for 1 min 0 sec
16/12/2009 06:58:54|Einstein@Home|[sched_op_debug] Reason: requested by project

Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 1

RE: But I get exactly the

Message 96034 in response to message 96033

Quote:
But I get exactly the same, even without GPU (of any kind)


You just need to update your BOINC, so it recognizes what an app_plan is. :-)

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4332
Credit: 252175324
RAC: 33709

Should be fixed now. If you

Should be fixed now. If you still get this message, please post here.

BM

BM

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 1

RE: 2009-12-17

Message 96036 in response to message 96035

Quote:
2009-12-17 14:14:52.8293 [PID=12346] Request: [HOST#1260526] client 6.10.25
2009-12-17 14:14:52.8611 [PID=12346] [debug] [HOST#1260526] Resetting nresults_today
2009-12-17 14:14:52.8612 [PID=12346] [send] effective_ncpus 2 max_jobs_on_host_cpu 999999 max_jobs_on_host 999999
2009-12-17 14:14:52.8612 [PID=12346] [send] effective_ngpus 1 max_jobs_on_host_gpu 999999
2009-12-17 14:14:52.8612 [PID=12346] [send] Not using matchmaker scheduling; Not using EDF sim
2009-12-17 14:14:52.8612 [PID=12346] [send] CPU: req 0.00 sec, 0.00 instances; est delay 0.00
2009-12-17 14:14:52.8612 [PID=12346] [send] ATI: req 864.86 sec, 1.00 instances; est delay 0.00
2009-12-17 14:14:52.8612 [PID=12346] [send] work_req_seconds: 0.00 secs
2009-12-17 14:14:52.8612 [PID=12346] [send] available disk 4.09 GB, work_buf_min 0
2009-12-17 14:14:52.8612 [PID=12346] [send] active_frac 0.994600 on_frac 0.918957 DCF 1.995207
2009-12-17 14:14:52.8625 [PID=12346] [send] [HOST#1260526] is reliable
2009-12-17 14:14:52.8625 [PID=12346] [send] set_trust: random choice for error rate 0.000010: yes
2009-12-17 14:14:52.8625 [PID=12346] [mixed] sending locality work first
2009-12-17 14:14:52.8626 [PID=12346] [locality] [HOST#1260526] removing file l1_1001.45_S5R4 file_infos list
2009-12-17 14:14:52.8626 [PID=12346] [locality] [HOST#1260526] removing file l1_1001.50_S5R4 file_infos list
2009-12-17 14:14:52.8626 [PID=12346] [locality] [HOST#1260526] removing file l1_1001.55_S5R4 file_infos list
2009-12-17 14:14:52.8627 [PID=12346] [locality] [HOST#1260526] removing file skygrid_1010Hz_S5R5.dat file_infos list
2009-12-17 14:14:52.8627 [PID=12346] [locality] [HOST#1260526] removing file l1_1001.60_S5R4 file_infos list
2009-12-17 14:14:52.8627 [PID=12346] [locality] [HOST#1260526] removing file l1_1001.65_S5R4 file_infos list
2009-12-17 14:14:52.8627 [PID=12346] [locality] [HOST#1260526] removing file l1_1001.70_S5R4 file_infos list
2009-12-17 14:14:52.8627 [PID=12346] [locality] [HOST#1260526] removing file l1_1001.75_S5R4 file_infos list
2009-12-17 14:14:52.8627 [PID=12346] [locality] [HOST#1260526] has file h1_1001.45_S5R4
2009-12-17 14:14:52.8627 [PID=12346] [locality] [HOST#1260526] has file h1_1001.50_S5R4
2009-12-17 14:14:52.8627 [PID=12346] [locality] [HOST#1260526] has file h1_1001.55_S5R4
2009-12-17 14:14:52.8627 [PID=12346] [locality] [HOST#1260526] has file h1_1001.60_S5R4
2009-12-17 14:14:52.8627 [PID=12346] [locality] [HOST#1260526] has file h1_1001.65_S5R4
2009-12-17 14:14:52.8627 [PID=12346] [locality] [HOST#1260526] has file h1_1001.70_S5R4
2009-12-17 14:14:52.8627 [PID=12346] [locality] [HOST#1260526] has file h1_1001.75_S5R4
2009-12-17 14:14:53.0049 [PID=12346] [locality] send_old_work() no feasible result older than 336.2 hours
2009-12-17 14:14:53.0060 [PID=12346] [locality] in_send_results_for_file(h1_1001.45_S5R4, 0) prev_result.id=151429046
2009-12-17 14:14:53.0181 [PID=12346] [version] Don't need CPU jobs, skipping version 301 for einstein_S5R6 ()
2009-12-17 14:14:53.0181 [PID=12346] [version] no app version available: APP#10 (einstein_S5R6) PLATFORM#2 (windows_intelx86) min_version 0
2009-12-17 14:14:53.0182 [PID=12346] [mixed] sending non-locality work second
2009-12-17 14:14:53.0548 [PID=12346] [debug] [HOST#1260526] MSG(high) No work sent
2009-12-17 14:14:53.0548 [PID=12346] [debug] [HOST#1260526] MSG(high) No work is available for Hierarchical S5 all-sky GW search #5
2009-12-17 14:14:53.0548 [PID=12346] [debug] [HOST#1260526] MSG(high) No work available for the applications you have selected. Please check your settings on the web site.

2009-12-17 14:14:53.0548 [PID=12346] Sending reply to [HOST#1260526]: 0 results, delay req 60.00
2009-12-17 14:14:53.0550 [PID=12346] Scheduler ran 0.507 seconds


I would expect it to say "You don't have an NVIDIA GPU", but alas, it doesn't.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4332
Credit: 252175324
RAC: 33709

RE: I would expect it to

Message 96037 in response to message 96036

Quote:
I would expect it to say "You don't have an NVIDIA GPU", but alas, it doesn't.


Hm. I could see that the message is a bit too general. It means that though you have selected to get GPU work (too) the project has no GPU work that could be sent to your machine.

But anyway - that's a (slightly) different message and a different problem than what I was addressing.

BM

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.