For some reason, BRP6 (Parkes PMPS XT) is out of work, and my computer won't download BRP4 (Arecibo) because it's "not available?" I checked the compatible apps, and I should be getting something here.
I'm on 64-bit Linux with an Nvidia GTX 760. Shouldn't I be getting BRP4G-cuda32-nv270 or BRP4G-Beta-cuda32-nv270? Does the 270 refer to a specific driver or card, hence why there's no work being sent?
Copyright © 2024 Einstein@Home. All rights reserved.
BRP6 out of work, Arecibo not available for your type of compute
)
The server status page shows no tasks for both BRP4G and BRP6. BRP4G is expected because data is scarce and available relatively infrequently. There should be plenty of BRP6 data but there is a history of very occasional outages if the task generator daemon stops working for some odd reason. This is probably one of those occasions.
Seeing as it's now still early on a Sunday morning, maybe nobody has noticed just yet. I'm sure someone might notice shortly :-). What happens will depend on the nature of the problem. It may be as simple as restarting the daemon. At worst, it might be something that can't readily be fixed until Monday, in which case one of the Devs is likely to post a short announcement at some point.
One of Murphy's famous laws says that if something can go wrong it will, and it will always be at a time that causes maximum inconvenience :-). I guess we'll find out later that there was some notorious all night party going on in that part of Hannover and that all the staff were (probably still are) in attendance and in no fit state to fix things :-).
Murphy always knows these things :-).
On a less frivolous note, it's always wise to have a cache of work to outlast the weekend. Your GTX 760 should do quite nicely here. I'm on 64bit Linux too and I've recently added a 750Ti to my farm. I'm running 3 tasks concurrently which improves the output. By the look of your crunch times you are running at least two concurrent GPU tasks, maybe more. How many are you actually running?
Cheers,
Gary.
I'm not entirely convinced by
)
I'm not entirely convinced by that. I have a couple of machines which have been concentrating on BRP6/intel_gpu, but didn't get any new work overnight.
The server log shows no attempt to scan the BRP6 plan_classes:
The machine is showing no signs of distress: All Parkes PMPS XT tasks for computer 8864187
Now I've added BRP4 to the 'selected apps' list, the Server log for host 8864187 shows a normal plan_class scan and work allocation.
I am getting occasional
)
I am getting occasional "resends" for BRP6-beta.
My log show similar to RH's - excepting that I am requesting ATI / NVIDIA depending on rig. Of course this is the morning I was adding a new rig and without a "reliable" factor there is no possibility of getting even resends - thanks Mr. Murphy :-)
--------------------------
- Crunch, Crunch, Crunch -
--------------------------
RE: The server status page
)
Thanks for the info. It looks like I mixed up BRP4 with BRP4G. I'm running 3 tasks on my GPU because each task was only using about 30% of it. I have my work buffer set to 0.5 days up to an extra day, but I don't want to set it too large because BOINC is running on a 2GB Ramdisk.
I thought the server was being prejudice against my system or something. Now I see it was just an oversight on my part, seeing as how BOINC has intermittent outages on every project. Thanks again for clearing it up. ☺
RE: I'm not entirely
)
Somewhere around 1600 (or a bit later) UTC on Oct 17, the flow of BRP6 primary tasks stopped for me. A trickle of resends has continued. The status page continues to show essentially zero tasks for BRP6 and BRP4G. BRP4 (not for mainstream discrete GPUs) continues to have work for those devices it was intended to support. On the basis that Bernd tries not to make changes on a Friday which might impact the weekend, I don't think the lack of work is deliberate or is associated with serverside programming changes. I'm still guessing that something unexpected has happened to the work generation process. It has happened before and has taken a while to be noticed before (failure of a monitoring script to report the problem) if I remember correctly.
My hosts seem to show the same. Maybe the logic is to scan only those plan classes for which there is work to send? After you added BRP4, the log shows only intel GPU plan classes and not BRP6, so if you didn't disable BRP6, it's perhaps still not a 'normal' log?
Cheers,
Gary.
RE: After you added BRP4,
)
Yes, I did it that way - currently explicitly allowing intel_gpu (only), BRP4 and BRP6 (only). I might try BRP6+'allow others if none available' tomorrow. And I did get one BRP6 resend after I set those preferences and posted.
Cant get any WU, seems
)
Cant get any WU, seems servers are offline, anyone got any idea when they might get a kick in the pants?
Cheers,
Cliff,
Been there, Done that, Still no damm T Shirt.
RE: I thought the server
)
Outages at Einstein are, fortunately, relatively infrequent. Quite often, things are back on track relatively quickly but there is always the slight chance of a more serious situation. None of the Devs have said anything yet that I've seen so there is no way of knowing what the problem actually is.
I'm now at the site where the bulk of my hosts are and the logs there tell me that things were seemingly OK at 7:22PM UTC on Sat, 17 Oct. The next host of mine 'in the queue' failed to get any BRP6 when it asked at 7:27PM and from that point onwards only 'resend' tasks (which don't require to be generated since they are direct extra copies of already generated tasks) have come through. I've been getting around 2 resends per hour over my whole fleet.
All this points to a problem with work generation. On the server status page, the work generator for BRP6 is shown as "Not running" but that tells you nothing because it only runs for a short time when needed to top up the supply of 'ready to send' tasks. The biggest chance is that the data for the status page will be refreshed at a time the daemon isn't running anyway.
These daemons have been known to quit unexpectedly on occasions but previous comments by the Devs lead me to believe that this should be detected and reported. I seem to recall a previous occasion where that didn't work either but I thought that failure had been fixed.
If the lack of available work causes angst to anyone, the two easiest solutions are either to have a backup project or to cache work for a sensible period based on the most likely type of outage. I would put that figure in the 1-3 day range. There have been longer outages but there are disadvantages in trying to cope with something like a 7-10 day outage, which is a very rare event at this project.
Cheers,
Gary.
We are working on a solution
)
We are working on a solution right now. Stay tuned.
BRP6 pre-processing got stuck
)
BRP6 pre-processing got stuck over the weekend. Unsent tasks are available again.
BM
BM