Hi!
We just noticed that the workunits of the S6BucketFU* runs were generated without the tags for the SFT files (h1_XXXX.XX_S6GC1 and l1_XXXX.XX_S6GC1). For Clients older than 7.0 this means that the files existing on the client aren't reported to the server, and locality scheduling doesn't work as it should. In the current runs this shouldn't be much of a problem regarding the possible download volume for each new task. But it also means that the clients won't receive "delete" messages for these files, and the files will start to fill up the disk.
Whether resetting the project will help cleaning up these files depends on how old the client actually is, upgrading the client certainly helps.
BM
BM
Copyright © 2024 Einstein@Home. All rights reserved.
GW follow-up runs (S6BucketFU*) and pre-7.0 clients
)
Hallo Bernd!
I´m wondering about so little response on your thread.
I´m bothered by the effect described since longe time, but I´m using allways in due time the actual version of BOINC, now 7.4.42 without VB. Some month ago I had to setup up my system completly new, because of a new motherboard. But all this didn´t help. When BOINC etc. was fresh installed, I had astonishing 75MB about storage space only. Now it is 2.43GB without any changes from my side in the project. I have addressed the applications FGRP4, PMPS XT and S6BFup #2 only. It´s increasing in steps, but I have a very little rate of crunching-failure, much lower than the rate of jumps in storagespace. My assumption is, that the effect results from FGRP4, as these tasks are coming so seldom about as these jumps in storage space. But I didn´t take accurate data for this correlation until now. And I have the impression, that the stepsize in storagespace becoming absolutely bigger by time, something like a fixed percentage of the storagespace. If this is of value for you, I´ll observe this more carefully. I hope this unnecessary wasted storagespace will not reduce my crunching speed, due to crawling unnecessarely long through the storage to find the actually required data.
Kind regards and happy crunching
Martin
RE: I´m wondering about so
)
Perhaps people have become used to seeing Einstein consume quite a bit of disk space and aren't worried about it too much these days :-).
If I understand Bernd's message correctly, the problem affects old clients and not so much clients newer than 7.0.
The increase to 2.43GB will be coming from the GW run and NOT from FGRP4. The FGRP4 data files are named LATeahnnnnE.dat where the current nnnn is 1052. These data files are deleted immediately the last FGRP4 task depending on that file is successfully reported. At any one time there will be quite minimal disk usage from FGRP4. The problem of space consumed growing a lot over time is always caused by a GW science run. Even so, surely 2.43GB is not really very much these days with multi-terabyte disks now quite common. I'm not really the one to comment about this since 90% of my machines have 20GB disks :-).
It's highly unlikely that you could notice even a tiny/miniscule speed difference if you were able to get rid of all the unneeded large data files that make up most of the 2.43GB being consumed. It would be a relatively small number of files, maybe a couple of hundred, which is infinitesimally small compared to the total number of all files residing on your disk. The OS would have no problem at all finding any one of the files in your BOINC data area. If it really bothers you, set NNT and complete all tasks and then reset the project. When you restart BOINC you should be back to about 75MB once again and this will slowly grow over time as before.
If Bernd has fixed the problem for future workunits, perhaps it wont grow nearly as much from now on.
Cheers,
Gary.
The current "follow-up" runs
)
The current "follow-up" runs are technically different from earlier GW runs (such as S6Bucket or S6CasA). In previous runs we were searching large, contiguous areas of parameter space. Workunits were generated on demand, as we swiped through that parameter space.
Now we are following up on individual candidates that are not necessarily connected in any dimension of the parameter space. Workunits are generated by manually feeding a list of candidates (a "charge" of usually 500k) to the workunit generator. The candidates are ordered by increasing frequency, so workunits for candidates requiring (roughly) the same files are generated at about the same time. This allows us to still make use of "locality scheduling", which minimizes the data transfer volume on both the client's and the server's end.
With no workunit generator continuously running and monitoring the progress, we have to somewhat manually tell the clients which files to keep (because there are still workunits left to send that need these), and which to drop.
There are two types of such data files: "h1_*" (from the Hanford detector), 9MB each, and "l1_*" from Livingston, 8MB each. Each such file spans a frequency range of 50 mHz, or 0.05 Hz which makes so 20 files per Hz.
Currently we still have workunits left in the ~50Hz area, and the last "charge" of candidates we issued reaches up to ~360Hz. Currently we tell clients to keep files from 50.00Hz up to 399.99Hz.
So there are 350*20=7000 different file frequencies in the current range. In worst case, i.e. if you got at least one workunit (from teh current or the previous run) for _each_ of the 7000 frequencies, you would have 14000 files, occupying ~238GB. However, our "locality scheduling" - assigning work to your computer that matches the files you already have - is meant to avoid such excessive disk usage. On top of that the scheduler shouldn't send you work that would exceed the disk usage that you specified in your "computing settings".
BM
BM
I think that I was lucky
)
I think that I was lucky enough to get one of these files. I received it on July 5,2015 and it has a deadline o December 31, 1969. the name is h1_0428.25_S6GC1__S6BucketFU2UBb_33169323_0
work unit 222157008
What is the Deal with extremely out of dated deadline?
I have received several of these but they have achievable deadlines this one does not.
When I list all tasks from my account it shows this asd having no deadline and that it has timed out or no response yet I have only had the task on my computer for a little over 6 hours and still has a little under three to go.
If I understand right and the results will not be accepted should I just abort it or what?
RE: ..... it has a deadline
)
Ooooh ..... that's awfully close to a *nix time epoch zero @ 1970-01-01 00:00:00
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
RE: work unit
)
Unix time is calculated as the number of seconds since the 'epoch' which is midnight on 31 Dec 1969/01 Jan 1970. I would guess that your task from workunit 222157008 somehow had a deadline of zero attached to it rather than the full number of seconds since the epoch. If you click the link you will see that your task is safely returned and awaiting two others (normally only one) to be completed and returned. It seems that the scheduler must have issued your task and a second one (there are two very close issue times) and then some time later decided to send the third because yours was already 45 years past its deadline :-). However, since yours is safely showing as being returned and awaiting validation, there should be no problem with your result participating in the validation process. It was wise to get the result returned quickly since it will actually be used despite how late the scheduler may think it is :-).
Cheers,
Gary.
RE: We just noticed that
)
I have a main and backup DC that do nothing 80% of the time so I installed BOINC 5.10.45 on them (7.*.* won't install on a DC). Both seem to process a lot of h1_XXXX.XX_S6GC1 and l1_XXXX.XX_S6GC1 WUs.
I checked and see that one of these machines has 8GB of disk used for Einstein and 11GB on the other. Using this much disk is not a problem on these machines (they each have 500GB disks with < 100GB used) but it does seem kind of high.
If I want to reclaim the space how would I go about it (short of resetting, cleaning the disk, and reinstalling the project)?
RE: If I want to reclaim
)
Unless there has been a somewhat unusual malfunction, that is all space for files which may get used again on future work. Assuming you continue to run the project, and continue to run these particular flavors of work, deleting the files will cause additional network traffic and server load, without benefit to you (since you mention you can afford the space).
When things work as intended (usually but not always) the software will figure out eventually which files are no longer in prospect of being reused and delete them without your intervention.
RE: If I want to reclaim
)
Because of the old client that you are running, the server will not be able to get your client to delete the files when they are no longer needed. You basically have two options.
The simple one is to set NNT (No New Tasks) with BOINC Manager and allow your current work to complete. Then you reset the project which automatically removes all the data and program files, etc, belonging to the project. On 'allowing new tasks' BOINC will fetch everything new again, with just those data files currently needed for whatever tasks are next sent. Over time, data volume will build up but it will take a while to reach current levels.
The second option is to peruse the frequency values you are seeing in current tasks and make an educated guess as to what frequencies have been dealt with. If you read closely what Bernd wrote at the start (actually his 2nd post), he mentions a range from 50Hz to 400Hz in steps of 0.05Hz. Perhaps they are progressing through this range from bottom to top - I'm not doing this run so I don't know for sure. Perhaps by this time they are higher than 100Hz. Perhaps it is now 'safe' to get rid of any data below 100Hz. You would have to make this decision by observing the flow of tasks and see if the frequency is generally on the rise.
If you decide it's safe to delete files below a certain frequency, you can't just delete them as BOINC will immediately re-fetch them. You have to 'tell' BOINC by stopping BOINC and removing the entries for those files from the state file (client_state.xml). When BOINC is restarted it will no longer 'know' about those files so they can be manually deleted. This second option is NOT normal procedure and should NOT be attempted if you don't understand the structure of the state file. If you do know what you are doing, it's quite a trivial exercise and has the advantage of not throwing everything away and then downloading some of the same files again. Obviously the first option is 'safer' if you're not worried about download bandwidth. BOINC isn't too forgiving if you make mistakes in editing your state file :-).
Cheers,
Gary.
RE: Because of the old
)
Gary, thanks for correcting my falsely optimistic claims about deletion.