Probably not a "bug" but just the way boinc-manager works... I am running O3AS GPU work and the subject skygrid file is downloaded (properly, I think) when a batch of O3AS work is downloaded to the cache. However, if all the GPU tasks are completed, and the cache goes empty (for GPU work) the skygrid file is deleted as soon as the last work unit has been reported. Re-enabling new work and the skygrid file is downloaded again, along with new work units. It IS the same file. Same name, same byte count, and on binary compare it reports identical content. As long as I don't let the GPU work cache run dry the skygrid file is retained and the work units are executed normally. So -- you're asking -- why do I ever let the GPU cache run dry? It's a bit OT but let's just say I have been "flooded" with CPU tasks and I'm trying to push through as many as possible before their deadlines are passed. Sure, I can just abort the excess, and I am doiong that in bunches of 20 or so per day, but it bothers me to see the "error" task count statistics build up. It's a no-win tradeoff, I guess, of aborting work vs. letting the deadlines expire. Waiting for the deadline just seems unfair to the wingpeople. The skygrid file, BTW, is not of trivial size. It's 42 MB.
As I said in the opening line, it may be just the way boinc works. One would wish that such a file be retained until the project server tells the boinc client to delete it. Maybe it is, since all the work units depending on it have been reported. But maybe it shouldn't be if there are more work units "in the pipe" which will need that skygrid file.
Copyright © 2024 Einstein@Home. All rights reserved.
Eugene Stemple wrote:Probably
)
It's neither of those :-).
The manager simply provides a 'window' into what the client is doing and allows you to interact with the client. The manager doesn't download or delete any files - it just logs the action of the client when those things happen.
Every file that is important to what the client is doing is listed in the state file. For things like data files, the default action for the client is to delete any that are no longer needed if work on board that depends on those files has been completed and returned. That will happen even if a subsequent work request pulls in more work that depends on those same data files. The client can't know that before the event happens.
Usually (eg for all the h1_... and l1_... data files) the project sends them out marked as <sticky> and this overrides the default client action to delete them. You can easily see this by browsing the relevant <file>...</file> blocks in the state file. For some reason, the project has decided not to mark the skygrid files as <sticky/>.
I see you use Linux. Why don't you just chmod them to 'read only' (chmod 444 <filename>) as you receive them? Unless the client doesn't care about permissions, that should prevent the client from deleting them. If the client still does, use chown root <filename> to make them owned by root but still world readable. As long as they are globally readable, the app will be able to read them but the OS shouldn't allow the client to delete them. I've never used this but I don't know any reason why the client would object to files with changed permissions. Try it and see what happens :-).
You'll just have to remember to clean up after they are truly finished with.
Cheers,
Gary.
@Gary I've downloaded a
)
@Gary
I've downloaded a "new" copy of skygrid... along with 21 GPU tasks. Those will run about 5 hours and, thus, complete some time in the middle of the night. I did the "chown root" approach which leaves global read permission. Around 24 hours from now I'll get around to enabling new tasks and then we'll see how boinc manages the situation. I will remember to verify the file still exists and to check the event log for possible complaints from boinc that it had a "permission" problem in trying to delete the file. It's remotely possible, I think, that boinc may not list it in the state file even if it exists, since it presumably intended to delete it, in which case the server would send another copy. Exploring is always exciting! So give me 24-hours (until 9/8 04:30 UTC) and I'll report the results...
When I referred to "just the way boinc-manager works" I should have said "the way boinc works." I was just exposing my foggy view of what roles boinc-client and boinc-manager play in the big picture. I didn't realize there was such a thing as a <sticky> file property although I am aware that h1 and l1 data is kept around for locality scheduling purposes.
@Gary Not the result I
)
@Gary
Not the result I expected or hoped for... I set the permissions for skygrid... as [-r--r--r-- root gene ] after it was downloaded last night. It came with 21 tasks. When those completed and were reported the skygrid file disppeared! i am certainly wondering "How can that be?" The boinc and boincmgr processes are shown in the (top) process monitor with user "gene". Is there such a thing as a "syscall" to delete a file that carries root authority even when issued by a user process? How else can a "read only" file be deleted? This is going to take some searching to figure out. Unless there's a boinc guru in the house to step forward. I can try the "+t" tag on the file but if the delete operation really carries "root" authority I don't think that will block it.
Just for the record, the boinc client was initiated by this command line in a restart script:
exec /usr/bin/boinc --dir /home/gene/BOINC --redirectio --allow_multiple_clients &
Eugene Stemple wrote:.... the
)
What is the ownership and permissions of the script? Is it by any chance launching the BOINC client with root privileges?
Cheers,
Gary.
@Gary The script is run
)
@Gary
The script is run from a user (gene) session and then appears in "top" with user=gene as expected.
Meanwhile, I have done some searching and experimenting and I've come to a reasonable explanation of this puzzle. To wit - when boinc downloads the skygrid file into its own directory, obviously with RW permission, it creates an open file and associated file handle which retains the creation permissions. No matter what happens to the... what shall I call them..."shadow(?)" file permissions that everybody else sees, and may "change", the original file handle retains the RW permissions. So when boinc has no further use of the file and deletes/closes it (and releases the file handle) it is in fact deleted. It doesn't require root priviledge. RW authority was retained in the file handle created when the file was opened. Probably any new access to the file would have just the current permissions but even if root changes them it does not propagate into already existing file handles. That's my theory and I'm sticking with it!
I've run this thread way off the E@H track. The bottom line is: I can't stop boinc from deleting the skygrid file and I should just make sure the cache doesn't run empty to prevent a repeat download. And no need for the project servers to make any adjustments.
Eugene Stemple wrote:... No
)
I'm not really sure what you are talking about.
There is only one file that "everybody sees" and that file lives in the Einstein project directory.
I don't know anything about the file structure on Ubuntu. On my systems, the BOINC directory is /home/gary/BOINC/ and all the BOINC specific programs and configuration live there. There are several sub-directories of which projects/ and slots/ are the ones of interest. Under projects/ there are the various projects you are attached to - einstein.phys.uwm.edu/ for the case in question. Within that project sub-directory will be the skygrid file. That is the only copy. That is the file whose permissions you need to change.
When a task that needs that file starts crunching, a temporary directory under the slots/ directory is created and that is where the bits associated with a particular task are placed. Because you may have several concurrently running tasks that need access to the same group of data files, each task instance will be in a unique slot sub-directory - 0, 1, 2, 3, .... as many as necessary. Within each slot, links to the real files will be created so that several tasks could be accessing the same sets of files simultaneously without impinging on each other. I'm guessing that your reference to "shadow" files is talking about the created links to the real files.
If you are changing permissions on a link and not the real file, it wont work. You just need to change the permissions on the real file in the einstein project directory in order to protect it.
No you haven't. It's quite important that this type of issue is discussed and that people become aware of how to protect their own and the projects bandwidth. An individual case might not matter at all but when you add it up over many tens of thousands of individual computers, it can (and does) have a big impact on the project.
Cheers,
Gary.
@Gary I'm learning more
)
@Gary
I'm learning more about Linux internals than I ever wanted to know...
I did this experiment: "gene" is the user identifier, and I presume anybody following this thread this far will understand the [rwxrwxrwx -owner- -group-] Linux syntax..
(1) as root, create a file in /home/gene/ as "permanent.txt" and with [644 root root] permissions;
(2) back to a user terminal screen... "rm permanent.txt" A warning message is displayed: "rm: remove write-protected regular file 'permanent.txt'?"
(3) acknowledge with "y"; and the file is deleted.
I understand this to be the consequences of the permissions at the directory level [777 gene gene] which allow the owner of the directory to override, with explicit override of warnings, any file permission restrictions of files within that directory.
As pertaining to the skygrid situation, since boinc is running in its /home/gene/boinc/ directory, as user=gene, it clearly can override any set of permissions on any file in that directory, or sub-directory. And I see no way to get around that. Take away the "W" permission at the directory level and then boinc can't even create, or save, its essential project files. Leave the "W" permission and there's no way to prevent boinc from deleting any of its files. (From a command line "remove" command a warning/confirmation message is issued. I assume that syscalls within a compiled program (boinc) default to a "yes" response or, more likely, just carry out the requested action if all the permissions would have allowed it.)
Does this merit any attention at the project management level? You are the best judge of that. I stumbled into this by allowing the (O3AS gpu task) cache to run out. That may be rare enough that the benefit does not outweigh the effort of a fix. OTOH, as you hinted in the previous post, a relatively rare situation in thousands of hosts can become worth looking into.
Are we having fun yet???
Eugene Stemple wrote:I'm
)
If you want to assert some degree of control over a linux system, a basic understanding of file permissions and how they work is rather important. I have no knowledge of the defaults that come with Ubuntu but if your home directory is drwxrwxrwx and files created there are rw-rw-rw- it means that your home is completely open. In other words, everything is world writable and anyone or any bit of software can do anything it likes with your files.
My background was with Unix in the 1970s and 80s and I'm used to files owned by root not being able to be deleted by anyone or any thing without root privileges. I've repeated your experiment in my home directory where the permissions are drwxr-xr-x and files are rw-r--r-- (in other words the umask in play is 002) and even if my test file is owned by root and has r--r--r-- permissions, I can still delete it after being warned. That must be a 'feature' of modern Linux as compared with traditional Unix. I certainly wasn't aware of that. It also means that your home being drwxrwxrwx is not the cause of the problem. Modern Linux must take the view that if a user is silly enough to continue after being warned, then so be it.
I can think of two ways to solve your problem. The first way is to make the file immutable with the chattr command. The flag you need is chattr +i <filename>. Check the man page for chattr. When you eventually need to delete it, you reverse the immutable flag with chattr -i.
The second way is a little more involved. It's used in situations where a directory contains files owned by various users and you need to protect a file owned by a particular user from being deleted by other users. The first thing to do is change the ownership and then set the sticky bit on the directory containing the file. Setting the sticky bit on the parent directory should protect the file in much the same way as the project marking a file as <sticky/> in the state file.
The commands (to be issued by root) would be (when in the parent directory ie, projects/):-
chown root:gene einstein.phys.uwm.edu
followed by
chmod 1775 einstein.phys.uwm.edu
If you then looked at the permissions on that directory, you would see drwxrwxr-t where the 't' signifies the sticky bit. If you then changed the skygrid file to be owned by root, only root should be able to delete it and not the boinc client launched by the user gene. The user gene can create and delete other files since the skygrid file is the only one owned by root.
I haven't tested either of the above two methods but I believe they should work. You should test one or both for yourself and make sure whatever you choose does work :-).
Cheers,
Gary.
@Gary I tried the "sticky
)
@Gary
I tried the "sticky bit" first. Using just a file in the /home/gene/ directory as the test case. Setting it as owner=root and permissions 400 with the +t parameter. Not effective... Delete (rm) command, as user, produced the usual "... remove write-protected file?" warning, and responding "y" removed the file. Apparently the sticky bit is just another way of bocking "W" access but it is over-ridden by the directory permissions.
I looked at the chattr +i option. On a test file it did prevent removal. Producing an "Operation not permitted" response. However, the chattr "man" has this comment...
"...file with the 'i' attribute cannot be modified: it cannot be deleted or renamed, no link can be created to this file..." (emphasis added).
We both, I think, believe that boinc is creating links into the /slots/ directories where the task execution is carried out. So, I fear that preventing such links would "crash" the task. Or, boinc might decide the skygrid file is not present and download another copy! In any case, I don't feel comfortable in even trying this option.
(My host is busy aborting tasks that have "not started and deadline has passed." In another couple of hours it will have gotten its cache under control and operation will be "back to normal." At that point, and even now, the gpu tasks in the cache will prevent another download of skygrid. )
The host setup where this skygrid download might have an adverse impact is a host that connects to the network rarely, or on a timed schedule, where the gpu task cache might very well run dry on a regular basis. Such a host might end up downloading skygrid every time it connects to the server. I don't have any idea how many such host systems there might be. <<Remember the days of dial-up modems and lower nighttime long-distance rates??>> For the majority of hosts, with full-time network access, an extra download of skygrid will not happen.
Thanks for your interest, and discussion, of this topic. I didn't want to call it a "bug" in the first place but this thread has shed some light into the recesses of the boinc + E@H interaction. Have a great weekend.
Gene;
Eugene Stemple wrote:I tried
)
The instructions I gave said
Your above description seems to be saying that you set the sticky bit on the file itself, rather than on the directory containing it?? If that's not the case, please explain exactly what you did do.
I've just tested this for myself. On a working machine, alongside the einstein.phys.uwm.edu/ directory, I've created a new directory called junk. The full path is /home/gary/BOINC/projects/junk/ and it's owned by root with the group name of gary. The sticky bit has been set on this directory as per the instructions. As user gary I've created a dummy file in there called junk1- I just used the command "touch junk1". As root, I then did "touch junk2". If I try to delete junk2 as user gary with the command "rm -f junk2" I get the response, "Operation not permitted". As user gary, I have no problem deleting junk1. As root, I have no problem deleting junk2. If applied as per the instructions, the method works.
Firstly, "can't be deleted or renamed" is exactly what you want. When you are finished with this file, "chattr -i <filename>" allows you to remove the immutable attribute and then you can delete it.
Secondly, I don't think the "no link" bit is of concern because I don't think actual links are created. We might refer to them as links but they are just bits of xml code that inform BOINC of the path to the real file. Go into an active slots directory and "cat" one of those files to see the contents. here's what one of mine says:-
<soft_link>../../projects/einstein.phys.uwm.edu/JPLEPH.405</soft_link>
I'm not a programmer but I imagine the client reads this and passes the path on to the science app. The science app then knows where to find the data it needs and no actual link gets created as a separate file. I'm no expert but I don't think there should be any problem. The term <soft_link> is just a name used by the BOINC Devs to describe the use of this bit of xml code. You'd need somebody familiar with the BOINC code to explain exactly how it works.
As I've mentioned above, that's not my understanding at all. I'm sorry if you think I'm giving you bogus information. I'm just trying to provide possible solutions in good faith. I believe both methods do work. It's entirely up to you whether or not you decide to use either of them.
Cheers,
Gary.