While I have a long history of Boinc not wanting to see a GPU I always found a fix. This time it is xhost local:boinc. But only from the command line. I have put it in .profile, .bashrc, /etc/init.d/boinc-client and mdm.conf, it is Linux Mint. I am sure a few other places that is all I remember.
When I boot the machine Boinc will not see the GPUs. It then errors out what was in process unless I remembered to suspend the work before I shut it down. There is the catch. I am not a young man and remember nothing. Due to heat I need to shut it down every day for 10 hours. I turn it on in the evening and loose 8 work units. I run xhost local:boinc from command line and restart Boinc and all is good.
Any other ideas on where to try it? I have search and Googled and am stumped.
Thanks.
Copyright © 2024 Einstein@Home. All rights reserved.
xhost local:boinc
)
I do not know if the following will work for you.
"To automate the restart of BOINC and to do the xhost thing on boot, add this to /etc/rc.local:"
from this url: http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=3245
Good luck!
RE: Any other ideas on
)
Assuming this AMD GPU host?
AMD needs X running in order for boinc to detect the GPU.
Before we go into detail - if you just restart boinc, no xhost commands - does it solve the problem? I'll assume no but please check.
How exactly are you restarting boinc?
xhost 101
Some info about xhost.
I don't know for 100% certain the format for xhost on mint, "man xhost" will tell you - i'm 99.99% certain it is the same as ubuntu and many others - and you are running on that host some fresh mint.
a) xhost with no parameters displays a list of valid hosts or usernames who can connect to the Xserver. With parameters a plus or minus, xhost adds or removes them from the list.
b) you need to be connected to the Xserver (think a service on your host not hardware) if you want to make changes with xhost. Simply put - X needs to be in charge of the session running any xhost changes. Running xhost in a start-up script will usually fail silently (by design) (*)
c) the correct format is (i like the plus (+) - its optional to show "add")
xhost +si:localuser:boinc # (allows localuser boinc to connect without authentication)
ORxhost local: #(allows ALL local users to connect without authentication - includes boinc)
NOTxhost local:boinc
I don't know where xhost local:boinc came from - perhaps an old version but after i test it
you'll see the effect.
The man page states
OK, so the next question is - for "autostart" what do you do to crunch on bootup?
You need to
get X running,
then run xhost
BEFORE boinc does its GPU detection.
GPU detection is a one shot event, so it won't try again later.
I use a fudge really - lightdm is the default display manager on ubuntu (but that is not minty) and it has a "startup" commands (after X is started but before login).
See my post about this here
hth.
(*) this was not always the case - old versions of xhost would allow xhost changes to go anywhere which meant a superuser could connect to your Xserver very easily.
Thank you Jari for the link.
)
Thank you Jari for the link. :-) It did not help though. :-(
AgentB
Yes that is the host.
Yes simply restarting Boinc after boot without the xhost command works.
Already had sleep 6 in /etc/init.d/boinc-client
Startup files are set to 99.
On boot if I just type xhost I get:
So at some point that started working but still was not the answer.
I have to manually restart Boinc for it to see the GPUs. Putting the service restart command in rc.local did not work.
Found this repeated 3 times in /var/log/syslog:
Thank you very much for suggestions!!
The plot
)
The plot deepens....
RE: Yes simply restarting
)
How exactly are you restarting boinc-client? it needs to be sudo etc.
I'm also assuming you are running boinc from a repo, is it LocutusofBorg or Mint?
ok you could increase that to twice that.
This is not something i recognize as normal on Debian / Ubuntu ... on ubuntu it is here.
/etc/rc2.d/S20boinc-client -> etc/init.d/boinc-client
I'm guessing you changed to S99, it's not likely to make a difference.
I'm not sure i understand entirely what all the above means, but at some point you may have upgraded your boinc from the repo.
You don't need it there (rc.local)
hmm the scheduling part of the boinc-client tries to set priorities but for some reason - it it can't find oom_score_adj's
this might be something to do with running in rc.local or some permissions problem. it originates here in the boinc-client script.
To take this further
* remove from rc.local and any user startup scripts
* set the coproc_debug flag in boincmgr GUI
* restart host and capture the event log, and post the first 20-40 lines.
* Check xhost.
* restart boinc
* post difference
good luck.
First boot without restart. I
)
First boot without restart. I use sudo service boinc-client restart or stop then start.
After restart:
I think I will remove Boinc
)
I think I will remove Boinc from startup rc.d and just start it manually. It is not a big deal to do it that way and it works fine.
Thanks for all the help. I may reinstall Linux someday and see if it is any better.
Cheers!
RE: I think I will remove
)
I have just had a quick look over the logs and they look as i expect.
You mention you were running Mint. I don't know what display manager and greeter you are using, but most of them will have an option to run a command immediately after the login screen (hint X) has started.
If you use MDM then /etc/mdm/mdm.conf should have a place to add the xhost command. LightDM is what i use and i posted the details for that earlier.
It is worthwhile getting the autoboot running if you can especially if you plan on auto-starting unattended - for night crunching for example.
I did solve the stderrgpudetect.txt also i expect you may see matching errors in stdoutgpudetect.txt - but i can't think what the solution was.
I'll post back later if i remember.
Thanks again AgentB. The
)
Thanks again AgentB. The effort you put in helping others here is awesome. :-)
I did have xhost local:boinc in mdm.conf. Made no difference.
I may let cache run dry and reinstall. I may have done something that I have forgotten about that is now causing this. This has been going on for a few weeks and I have tried many things. So who knows.
Thanks again!
RE: Thanks again AgentB.
)
Well we haven't fixed anything yet, so "it's for the win!"
Where exactly in mdm.conf?
I would guess at you had a boinc/fgrlx upgrade. I had the same errors but they stopped March 25, so i know i must have fixed something at that point.
I suspect i was playing around and - ... the last time i edited lightdm.conf was ..... March 25. So the cause of those errors is the
not being run in time for boinc.
Some simple things to try... set the start delay to 60 seconds, this will give you enough time to login and type the xhost command, and see if it auto-detects.
If it does, then you need to dig around MDM conf here
maybe