Huge download

paris
paris
Joined: 11 Jan 06
Posts: 50
Credit: 10706925
RAC: 12645
Topic 195743

I just started to download a new set of files to a computer that had not run Einstein for quite some time. As expected, the old files were deleted and the new files began to arrive. Nearly 400 files making up an estimated 1.5 GB of new stuff were scheduled to download. That is way more than I have ever picked up before. Even though I have a moderately fast connection, this would have taken about three hours. That is way too much, so I aborted the transfer and connected to a different project. I suppose that it is a result of the tail end of the current run as explained by Gary Roberts in another thread.

I am still running Einstein on other machines but I'm not going to try a new batch until the next run brings the downloads to a reasonable size. I'm not complaining here. I just want people to be aware of the possibility of some really big and time-consuming transfers. Imagine if you were on a dial-up connection. It could have taken, quite literally, days. Even a slower DSL connection might run into a monthly cap problem.


Plus SETI Classic = 21,082 WUs

Claggy
Claggy
Joined: 29 Dec 06
Posts: 560
Credit: 2751750
RAC: 1677

Huge download

How big a Cache have you got set?, ie how many days?

Claggy

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 780250371
RAC: 1201326

Hi! Note that you

Hi!

Note that you configure a quota of download volume for newer BOINC versions: In the web interface , it is :

Transfer at most XXX Mbytes every XXX days
(Enforced by version 6.10.46+)

Note this will apply to all BOINC projects, tho.

CU
HB

paris
paris
Joined: 11 Jan 06
Posts: 50
Credit: 10706925
RAC: 12645

I have a two day cache set

I have a two day cache set plus one day of buffer. I've never had so many work units try to download at one time. It's just too much.

My other machines are not pulling in nearly this much at a time so I think it may be a combination of restarting a machine that hasn't been used for Einstein for a long time and the end-of-run clean up alluded to before.

If I were just starting Einstein for the first time ever, seeing a set of files to be downloaded like this would certainly scare me off.

Thanks for your help.


Plus SETI Classic = 21,082 WUs

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5877
Credit: 118655833847
RAC: 18990668

RE: I just started to

Quote:
I just started to download a new set of files to a computer that had not run Einstein for quite some time. As expected, the old files were deleted and the new files began to arrive. Nearly 400 files making up an estimated 1.5 GB of new stuff were scheduled to download.


Your problem is precisely the same as the one I was describing to Oliver in this message recently. Here is the key part from that post that describes what happens.

Quote:
The remaining 10 tasks will be supplied after jumping to a completely different frequency. Even though 48 LIGO files will need to be downloaded for a single frequency band, the scheduler will not take advantage of that and will not choose more tasks from immediately adjacent frequency bands. Quite often, the new frequency band contains a single task (the scheduler loves to use the opportunity to get rid of a single resend). I can remember one example where the extra 10 tasks were going to be supplied from 7 completely new frequency sets - around 350 new LIGO files in total. It's easy enough to recover from this - suspend network activity, set NNT, abort the 10 downloading tasks, abort around 350 skygrid and LIGO data downloads, resume network activity, and finally report the 10 aborted tasks. The only problem is recovering from the strain of clicking OK to "abort the file transfer" 350 times :-).

When you restarted after a considerable gap, none of your LIGO data files would be likely to be still current. The problem you saw occurs because your client was asking for multiple tasks with no current data. The scheduler will choose some random available frequency band and will assign whatever tasks it has for that band (probably between 1-3 tasks). Quite often it will be just 1 task because the scheduler has a habit of choosing a single resend in these sort of cases. I've seen it happen many times. The data download for this single task will be 48 LIGO files and 1 skygrid file, assuming you still have the sun and earth files from previously.

With that frequency band now effectively "used up" (temporarily), the scheduler will move on to an entirely different frequency band and repeat the whole process until it has enough tasks to fill the work request. The scheduler is not smart enough to realise that it could use tasks from a related band (0.05Hz higher) which would only require 4 more LIGO files rather than the full 48.

The easy way to prevent the problem is to temporarily reduce your cache size to 0.05 days total. Then when you request work you're only requesting 1 task and that's exactly what you will get - 1 task and about 180MB of data files. Now that this data is described with all the blocks in your state file, you can ask for more tasks and the scheduler will send them for this same set of frequencies as for the first task, provided the first task wasn't an isolated resend with nothing else available in a nearby frequency band. You can cop extra full downloads if that happens on the first task.

There is an even simpler way to reinstate a machine like yours and enjoy the benefits of zero downloads. All you do is use the data from one of your other hosts. Here is a step by step set of instructions using a pen drive or network share as the transfer medium.

  • * Choose your frequency of interest and make a copy of all LIGO data (and skygrid) for this set of related frequencies.
    * Take a copy of the state file from the donor host so that you can extract the blocks that describe the skygrid file and all your chosen LIGO files
    * Make sure the host to be revived is not running BOINC yet. Shut BOINC down if necessary.
    * Delete all old LIGO files on that host and seed the project directory with the new files of your chosen frequency that you harvested.
    * Edit the state file on that host and delete all the old blocks for both data and skygrid and then insert the replacement blocks.
    * While you have the state file open you might like to correct the section so that you can pretend that the machine hasn't had a big holiday after all :-).
    * Save your changes and restart BOINC.

When BOINC starts up and makes a work request, all the tasks will come from the chosen frequency. You can quite easily have zero downloads.

I've just done something very similar to about a dozen machines today. For the last six weeks they were all crunching tasks for a particular frequency that is now virtually exhausted. I have saved data and blocks for a different frequency that still has plenty of tasks left. So rather than risk the sort of problem you experienced, I added the new frequency set to the existing frequency set. So, when the tasks for the old set do run out, all these hosts will transition to the new frequency when they are good and ready and there will be no big downloads - in fact no downloads at all. You can allow the transition to occur by itself or there are tricks you can use to make it happen exactly when you want it to. But that's getting a bit complicated :-).

Cheers,
Gary.

paris
paris
Joined: 11 Jan 06
Posts: 50
Credit: 10706925
RAC: 12645

Yes, thank you, Gary Roberts.

Yes, thank you, Gary Roberts. That was exactly the message that I was referring to in my post at the top of this thread. I appreciate both the confirmation and the elaboration in your explanation. You did an excellent job of filling in the gaps in my knowledge. Thanks again.


Plus SETI Classic = 21,082 WUs

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.