[Question] Disk usage seems high

Michael Hoffmann
Michael Hoffmann
Joined: 31 Oct 10
Posts: 32
Credit: 31031260
RAC: 0
Topic 197056

Hi folks,

first of all: I don't know if this is the right place to put my question, so please don't hit me too hard.

My question: Even if I have no E@H-tasks running, the disk usage of the project is about roughly 800 Mb. That seems a bit much to me, but I have basically no idea what's behind the scenes - maybe some sort of libraries need for the project in case new WUs drop in?

I'd be glad if someone would help me out with an explanation.

Greetings,
Michel

Om mani padme hum.

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0

[Question] Disk usage seems high

OK, a couple of things to keep in mind.

Generally speaking the data files which EAH uses are typically much larger than other projects you may used to running.

As a result, they use a BOINC mechanism know as locality scheduling to cut down on the network bandwidth used to get this data out in the field to be crunched.

The idea is 'datapacks' are sent out to groups of hosts, and then those hosts are able to crunch many individual workunits out of that datapack until it is completely analyzed.

However, there are a couple of common situations which arise which swell the amount of disk space your hosts uses to store EAH datapacks.

One of them is when your host has been chewing on one for a while and the number of potential WU's left in the datapack is getting low. Your host may come along when there isn't any work available for the one it's working on at the moment, so the project will assign it a new datapack and get it started working on that one.

The other one, which usually is the one which gets people's attention, is when we get to the end of a science run, like we have with the Gamma Ray Pulsar Search #2. In this case, there is typically a lot of 'leftovers' in various datapacks which needs to be cleaned up to finish the run. The catch is, there may or may not be any and/or enough hosts left assigned to that datapack to finish it up. Therefore, the project has no other choice but to send it out to another host which comes along looking for new work at some point.

I've simplified the details a bit to help clarify the procedure. One other thing to note, when everything works the way it's supposed to the project will eventually come back and tell your host it can delete datapacks once it's sure they aren't/won't be needed by your host any longer.

HTH

Michael Hoffmann
Michael Hoffmann
Joined: 31 Oct 10
Posts: 32
Credit: 31031260
RAC: 0

Thank you very much for that

Thank you very much for that detailed explanation!
So I'll just let keep it going.
Again, one step more in understanding how things work within the project :)

Om mani padme hum.

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6591
Credit: 325479808
RAC: 70553

Just to expand on Alinator's

Just to expand on Alinator's excellent response. :-)

The 'datapacks' are like a loaf of bread from which your computer will cut successive slices. Each slice is analysed and called a 'task' by your computer. So you slice, then analyse, and then report a result ( per task ). Iterate until you run out of bread slices from some loaf. Then we give you a new one! :-)

Now for a given task your computer is one of a quorum - a small group of computers working on precisely the same data set. So your 'wingman' in that group also has a copy of that same loaf of bread that you have. He/she is also slicing tasks, analysing and reporting just as you are.

Our server software initially has sent out these identical loaves, essentially randomly to accepting/acceptable hosts ( but some caveats ), and remembers who got what. This is called 'locality scheduling' in that our server will endeavour to keep your overall downloads to a minimum by asking for slices ( tasks ) from a loaf that you were previously given. Ditto for your wingman. E@H is basically the only BOINC project ( that I'm aware of ) doing this to any significant degree. Ultimately we want to honor and retain the low-bandwidth users worldwide ( dial-up is the most common mode ).

Now provided that you and your wingman have returned results that then validate successfully, we then say that the quorum of results has been met and thus that a given 'workunit' is completed from the project's point of view. So a workunit is a slice of data that may be replicated several times, to different hosts, until such time that satisfactorily agreeable analysis on distinct machines from different users has occurred. That completed workunit then moves on to a better place .... :-)

As stated, at the end of a science run ( thousands of bread loaves, suitably duplicated ) we gather up slices that have ( for a blizzard of reasons ) fallen on the floor, been mangled, eaten by the dog, run over by the lawn mower ..... etc. So previously distributed loaves sometimes need to be re-emitted to the host pool of machines, even though some of these only require a slice or three to complete. So if you are a low bandwidth user, you may well be annoyed by high IP throughput for not much credit ( relatively ). We routinely warn about that towards the end of runs.

There are other file types too : indeed the odd library or two, ephemerides ( data tables to calculate solar system body positions at given times from ), xml's containing user/host/project state, and many others. There is the opportunity to micromanage the host/server interactions for those that prefer to, but that is generally fraught with danger ( of the 'blow both your own feet off with a shotgun' variety ) if you are not absolutely au fait with the detail. We hence recommend using the existing straight forward designed interfaces eg. your web account, to select your desired behaviours and allow 'hands-free' operation.

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.