BRP6 out of work, Arecibo not available for your type of computer

SuperSluether

Joined: 1 Sep 14

Posts: 4

Credit: 54809605

RAC: 0

18 Oct 2015 4:08:26 UTC

Topic 198277

(moderation:

)

For some reason, BRP6 (Parkes PMPS XT) is out of work, and my computer won't download BRP4 (Arecibo) because it's "not available?" I checked the compatible apps, and I should be getting something here.

I'm on 64-bit Linux with an Nvidia GTX 760. Shouldn't I be getting BRP4G-cuda32-nv270 or BRP4G-Beta-cuda32-nv270? Does the 270 refer to a specific driver or card, hence why there's no work being sent?

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5882

Credit: 118956891987

RAC: 24028127

BRP6 out of work, Arecibo not available for your type of compute

18 Oct 2015 8:13:05 UTC

Message 134606

(moderation:

)

The server status page shows no tasks for both BRP4G and BRP6. BRP4G is expected because data is scarce and available relatively infrequently. There should be plenty of BRP6 data but there is a history of very occasional outages if the task generator daemon stops working for some odd reason. This is probably one of those occasions.

Seeing as it's now still early on a Sunday morning, maybe nobody has noticed just yet. I'm sure someone might notice shortly :-). What happens will depend on the nature of the problem. It may be as simple as restarting the daemon. At worst, it might be something that can't readily be fixed until Monday, in which case one of the Devs is likely to post a short announcement at some point.

One of Murphy's famous laws says that if something can go wrong it will, and it will always be at a time that causes maximum inconvenience :-). I guess we'll find out later that there was some notorious all night party going on in that part of Hannover and that all the staff were (probably still are) in attendance and in no fit state to fix things :-).

Murphy always knows these things :-).

On a less frivolous note, it's always wise to have a cache of work to outlast the weekend. Your GTX 760 should do quite nicely here. I'm on 64bit Linux too and I've recently added a 750Ti to my farm. I'm running 3 tasks concurrently which improves the output. By the look of your crunch times you are running at least two concurrent GPU tasks, maybe more. How many are you actually running?

Cheers,
Gary.

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2999352146

RAC: 706115

I'm not entirely convinced by

18 Oct 2015 9:40:04 UTC

Message 134607 in response to message 134606

(moderation:

)

I'm not entirely convinced by that. I have a couple of machines which have been concentrating on BRP6/intel_gpu, but didn't get any new work overnight.

The server log shows no attempt to scan the BRP6 plan_classes:

2015-10-18 09:03:51.2336 [PID=32592]   Request: [USER#xxxxx] [HOST#8864187] [IP xxx.xxx.xxx.143] client 7.6.9
2015-10-18 09:03:51.2341 [PID=32592]    [send] effective_ncpus 4 max_jobs_on_host_cpu 999999 max_jobs_on_host 999999
2015-10-18 09:03:51.2342 [PID=32592]    [send] effective_ngpus 1 max_jobs_on_host_gpu 999999
2015-10-18 09:03:51.2342 [PID=32592]    [send] Not using matchmaker scheduling; Not using EDF sim
2015-10-18 09:03:51.2342 [PID=32592]    [send] CPU: req 0.00 sec, 0.00 instances; est delay 0.00
2015-10-18 09:03:51.2342 [PID=32592]    [send] Intel GPU: req 26987.07 sec, 0.00 instances; est delay 0.00
2015-10-18 09:03:51.2342 [PID=32592]    [send] work_req_seconds: 0.00 secs
2015-10-18 09:03:51.2342 [PID=32592]    [send] available disk 97.71 GB, work_buf_min 43200
2015-10-18 09:03:51.2342 [PID=32592]    [send] active_frac 0.999982 on_frac 0.999373 DCF 0.860931
2015-10-18 09:03:51.2351 [PID=32592]    [send] [HOST#8864187] is reliable
2015-10-18 09:03:51.2353 [PID=32592]    [send] set_trust: random choice for error rate 0.000010: yes
2015-10-18 09:03:51.2353 [PID=32592]    [mixed] sending non-locality work first (0.2788)
2015-10-18 09:03:51.2549 [PID=32592]    [send] [HOST#8864187] will accept beta work.  Scanning for beta work.
2015-10-18 09:03:51.2846 [PID=32592]    [mixed] sending locality work second
2015-10-18 09:03:51.2879 [PID=32592] [debug]   [HOST#8864187] MSG(high) No work sent
2015-10-18 09:03:51.2879 [PID=32592]    Sending reply to [HOST#8864187]: 0 results, delay req 60.00
2015-10-18 09:03:51.2890 [PID=32592]    Scheduler ran 0.060 seconds

The machine is showing no signs of distress: All Parkes PMPS XT tasks for computer 8864187

Now I've added BRP4 to the 'selected apps' list, the Server log for host 8864187 shows a normal plan_class scan and work allocation.

Snow Crash

Joined: 24 Dec 09

Posts: 65

Credit: 100880785

RAC: 0

I am getting occasional

18 Oct 2015 11:00:23 UTC

Message 134608 in response to message 134607

(moderation:

)

I am getting occasional "resends" for BRP6-beta.
My log show similar to RH's - excepting that I am requesting ATI / NVIDIA depending on rig. Of course this is the morning I was adding a new rig and without a "reliable" factor there is no possibility of getting even resends - thanks Mr. Murphy :-)

--------------------------
- Crunch, Crunch, Crunch -
--------------------------

SuperSluether

Joined: 1 Sep 14

Posts: 4

Credit: 54809605

RAC: 0

RE: The server status page

18 Oct 2015 12:59:51 UTC

Message 134609 in response to message 134606

(moderation:

)

Quote:

The server status page shows no tasks for both BRP4G and BRP6. BRP4G is expected because data is scarce and available relatively infrequently. There should be plenty of BRP6 data but there is a history of very occasional outages if the task generator daemon stops working for some odd reason. This is probably one of those occasions.

Seeing as it's now still early on a Sunday morning, maybe nobody has noticed just yet. I'm sure someone might notice shortly :-). What happens will depend on the nature of the problem. It may be as simple as restarting the daemon. At worst, it might be something that can't readily be fixed until Monday, in which case one of the Devs is likely to post a short announcement at some point.

One of Murphy's famous laws says that if something can go wrong it will, and it will always be at a time that causes maximum inconvenience :-). I guess we'll find out later that there was some notorious all night party going on in that part of Hannover and that all the staff were (probably still are) in attendance and in no fit state to fix things :-).

Murphy always knows these things :-).

On a less frivolous note, it's always wise to have a cache of work to outlast the weekend. Your GTX 760 should do quite nicely here. I'm on 64bit Linux too and I've recently added a 750Ti to my farm. I'm running 3 tasks concurrently which improves the output. By the look of your crunch times you are running at least two concurrent GPU tasks, maybe more. How many are you actually running?

Thanks for the info. It looks like I mixed up BRP4 with BRP4G. I'm running 3 tasks on my GPU because each task was only using about 30% of it. I have my work buffer set to 0.5 days up to an extra day, but I don't want to set it too large because BOINC is running on a 2GB Ramdisk.

I thought the server was being prejudice against my system or something. Now I see it was just an oversight on my part, seeing as how BOINC has intermittent outages on every project. Thanks again for clearing it up. â˜º

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5882

Credit: 118956891987

RAC: 24028127

RE: I'm not entirely

18 Oct 2015 21:37:09 UTC

Message 134610 in response to message 134607

(moderation:

)

Quote:

I'm not entirely convinced by that. I have a couple of machines which have been concentrating on BRP6/intel_gpu, but didn't get any new work overnight.

Somewhere around 1600 (or a bit later) UTC on Oct 17, the flow of BRP6 primary tasks stopped for me. A trickle of resends has continued. The status page continues to show essentially zero tasks for BRP6 and BRP4G. BRP4 (not for mainstream discrete GPUs) continues to have work for those devices it was intended to support. On the basis that Bernd tries not to make changes on a Friday which might impact the weekend, I don't think the lack of work is deliberate or is associated with serverside programming changes. I'm still guessing that something unexpected has happened to the work generation process. It has happened before and has taken a while to be noticed before (failure of a monitoring script to report the problem) if I remember correctly.

Quote:

The server log shows no attempt to scan the BRP6 plan_classes:

My hosts seem to show the same. Maybe the logic is to scan only those plan classes for which there is work to send? After you added BRP4, the log shows only intel GPU plan classes and not BRP6, so if you didn't disable BRP6, it's perhaps still not a 'normal' log?

Cheers,
Gary.

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2999352146

RAC: 706115

RE: After you added BRP4,

18 Oct 2015 23:51:23 UTC

Message 134611 in response to message 134610

(moderation:

)

Quote:

After you added BRP4, the log shows only intel GPU plan classes and not BRP6, so if you didn't disable BRP6, it's perhaps still not a 'normal' log?

Yes, I did it that way - currently explicitly allowing intel_gpu (only), BRP4 and BRP6 (only). I might try BRP6+'allow others if none available' tomorrow. And I did get one BRP6 resend after I set those preferences and posted.

cliff

Joined: 15 Feb 12

Posts: 176

Credit: 283452444

RAC: 0

Cant get any WU, seems

19 Oct 2015 0:39:13 UTC

Message 134612

(moderation:

)

Cant get any WU, seems servers are offline, anyone got any idea when they might get a kick in the pants?

Cheers,

Cliff,

Been there, Done that, Still no damm T Shirt.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5882

Credit: 118956891987

RAC: 24028127

RE: I thought the server

19 Oct 2015 4:03:46 UTC

Message 134613 in response to message 134609

(moderation:

)

Quote:

I thought the server was being prejudice against my system or something. Now I see it was just an oversight on my part, seeing as how BOINC has intermittent outages on every project. Thanks again for clearing it up. â˜º

Outages at Einstein are, fortunately, relatively infrequent. Quite often, things are back on track relatively quickly but there is always the slight chance of a more serious situation. None of the Devs have said anything yet that I've seen so there is no way of knowing what the problem actually is.

I'm now at the site where the bulk of my hosts are and the logs there tell me that things were seemingly OK at 7:22PM UTC on Sat, 17 Oct. The next host of mine 'in the queue' failed to get any BRP6 when it asked at 7:27PM and from that point onwards only 'resend' tasks (which don't require to be generated since they are direct extra copies of already generated tasks) have come through. I've been getting around 2 resends per hour over my whole fleet.

All this points to a problem with work generation. On the server status page, the work generator for BRP6 is shown as "Not running" but that tells you nothing because it only runs for a short time when needed to top up the supply of 'ready to send' tasks. The biggest chance is that the data for the status page will be refreshed at a time the daemon isn't running anyway.

These daemons have been known to quit unexpectedly on occasions but previous comments by the Devs lead me to believe that this should be detected and reported. I seem to recall a previous occasion where that didn't work either but I thought that failure had been fixed.

If the lack of available work causes angst to anyone, the two easiest solutions are either to have a backup project or to cache work for a sensible period based on the most likely type of outage. I would put that figure in the 1-3 day range. There have been longer outages but there are disadvantages in trying to cope with something like a 7-10 day outage, which is a very rare event at this project.

Cheers,
Gary.

Christian Beer

Joined: 9 Feb 05

Posts: 595

Credit: 197233927

RAC: 58229

We are working on a solution

19 Oct 2015 7:44:44 UTC

Message 134614

(moderation:

)

We are working on a solution right now. Stay tuned.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4339

Credit: 252579845

RAC: 35427

BRP6 pre-processing got stuck

19 Oct 2015 8:11:42 UTC

Message 134615

(moderation:

)

BRP6 pre-processing got stuck over the weekend. Unsent tasks are available again.

BRP6 out of work, Arecibo not available for your type of computer

Forums › Problems and Bug Reports

Comment viewing options

Forums › Problems and Bug Reports