Do some tasks have a much larger download size than the others?

Stephane Yelle
Stephane Yelle
Joined: 14 Oct 05
Posts: 1
Credit: 71868234
RAC: 51013
Topic 224631

I'm trying to figure out what used so much of my bandwidth today. (I'm on a somewhat limited internet plan, budget reasons)

I happened to have a bandwith monitoring app opened when this workunit was downloaded, and its total download size was about 200 MB . (Lots of ~3MB files according to Boinc's transfers tab).

Is that the usual download size for GW O2 GPU workunits? Or is it possible that a batch of workunits created/sent today had a larger size than usual?

 

I've set a limit of 2000 MB of network usage per 30 days in Boinc, and I've processed 100+ GW O2 workunits in the last week and a half. So most of them must have a smaller size?

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 5023
Credit: 18930116165
RAC: 6468214

You might want to put a

You might want to put a stiffer download limit on your preferences.

They just started a new O2MDFS3_Spotlight series of tasks.  They need to download a completely new set of base frequencies for the new series.  The scheduler should be deleting all the old S2 series support files at each new download of the S3 series.

But this will cause a very large download package.  My hosts have been downloading all day long with no sign of stopping.  My download analyzer on my router shows about 300GB downloaded so far across 3 hosts.

Eventually, you will accumulate all the necessary base frequency files for the new work and the downloads will return to just the work units to work with the files you've downloaded so far.

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 118368182196
RAC: 25524635

Stephane Yelle wrote:... I

Stephane Yelle wrote:
... I happened to have a bandwith monitoring app opened when this workunit was downloaded, and its total download size was about 200 MB . (Lots of ~3MB files according to Boinc's transfers tab).

All workunits contain 2 tasks - one to you and the other to a different computer for verification.  Each task is exceedingly small - just a set of parameters with which to analyse data.  It's the data that's the killer.

Einstein has always used Locality Scheduling.  The scheduler tries very hard to send you tasks for the data you already have.  There will be several thousand tasks that use the one set of data files.

The workunit you linked to has 2 tasks with the basic name h1_0505.75_O2C02Cl4In0__O2MDFS3_Spotlight_506.35Hz_241 - to which is attached _0 or _1 to identify the separate (identical) tasks.  Your machine has the _1 version.  There are two frequencies mentioned in the name.  The first (505.75Hz) identifies the lowest frequency data file associated with this task.  The second (506.35Hz) identifies the midpoint of the full range of data file frequencies required to analyse the task.

We have coined a term to describe this frequency difference (in this case it is 0.60) - we call it the 'delta frequency', or DF.  If you look back through forum discussions about previous runs over the last year or so, you should find many references to it.  The DF value gives an indication of how much GPU memory might be required to run the task.  There is no guarantee that the new run will behave exactly the same as previous runs.

So, with data files every 0.05Hz, there will be 12 files to get from the low value to the midpoint and another 12 to get from the midpoint to the top of the range for your example. You can double that again because there are separate data files from the Hanford (h1) and Livingston (l1) observatories - assuming this new run follows the pattern of previous runs - I haven't had time to check that yet.  So to do the very first task for this frequency range, you would have downloaded approx 48 of these large files (at a guess).  As I said, this one off penalty should be good for lots more tasks using the same data.

One mistake that people often make it to try to download too many tasks at a time.  The scheduler has to keep tasks for a wide range of different frequency series.  There are a likely to be small limits of 'quickly available' tasks for a particular series.  If you make a big work request, you may force the scheduler to tap into several different series in order to quickly find the number of tasks requested.  You will likely get a very bloated download if that happens - big blocks of large data files for several different frequency ranges.  If you're not worried about download limits, no problem.  You mentioned you were concerned, hence the advice.

Particularly when I want to test out the behaviour of a new series like this, I find I can get hundreds of tasks, pretty nearly consecutive ones in decreasing issue number (the 2nd to last field in the task name - in your example '241') by making repeated small requests for work - maybe around 3-5 tasks at a time.  The scheduler seems always to be able to refill its stock of 'quick issue' tasks in the 60 sec interval between work requests.  Once you have your initial work cache size filled, ongoing requests will naturally be spaced out and small and so Locality Scheduling should work as intended, without any repeated large downloads until a given series is exhausted.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 118368182196
RAC: 25524635

Keith Myers wrote:....  The

Keith Myers wrote:
....  The scheduler should be deleting all the old S2 series support files at each new download of the S3 series.

Have you actually seen any evidence of that?  Normally, it takes quite a while for all resends to be dealt with and so the scheduler doesn't start giving delete requests for weeks to months afterwards, whilst the final cleanup continues.

If it's happening already, it would be pretty strong evidence that the reason for the sudden changeover from S2 to S3 was not due to S2 finishing normally.  In other words, maybe some sort of problem has been discovered with the S2 run.  The plug seems to have been pulled very suddenly - and that's quite unusual.

Cheers,
Gary.

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 5023
Credit: 18930116165
RAC: 6468214

Yes, I see deletions on every

Yes, I see deletions on every new work request usually.

Serenity

67137    Einstein@Home    1/28/2021 8:45:02 AM    Sending scheduler request: To fetch work.    
67138    Einstein@Home    1/28/2021 8:45:02 AM    Reporting 1 completed tasks    
67139    Einstein@Home    1/28/2021 8:45:02 AM    Requesting new tasks for NVIDIA GPU    
67140    Einstein@Home    1/28/2021 8:45:02 AM    [sched_op] CPU work request: 0.00 seconds; 0.00 devices    
67141    Einstein@Home    1/28/2021 8:45:02 AM    [sched_op] NVIDIA GPU work request: 1.00 seconds; 1.00 devices    
67142    Milkyway@Home    1/28/2021 8:45:16 AM    Computation for task de_modfit_83_bundle4_4s_south4s_bgset_4_1603804501_74374460_0 finished    
67143    Milkyway@Home    1/28/2021 8:45:16 AM    Starting task de_modfit_80_bundle4_4s_south4s_bgset_4_1603804501_74362715_1    
67144    Einstein@Home    1/28/2021 8:45:19 AM    Scheduler request completed: got 1 new tasks    
67145    Einstein@Home    1/28/2021 8:45:19 AM    [sched_op] Server version 611    
67146    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file h1_0500.00_O2C02Cl4In0.9KnA (no longer needed)    
67147    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file l1_0500.00_O2C02Cl4In0.9KnA (no longer needed)    
67148    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file h1_0500.05_O2C02Cl4In0.HaHz (no longer needed)    
67149    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file l1_0500.05_O2C02Cl4In0.HaHz (no longer needed)    
67150    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file h1_0500.10_O2C02Cl4In0.B9aA (no longer needed)    
67151    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file l1_0500.10_O2C02Cl4In0.B9aA (no longer needed)    
67152    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file h1_0500.15_O2C02Cl4In0.q6ze (no longer needed)    
67153    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file l1_0500.15_O2C02Cl4In0.q6ze (no longer needed)    
67154    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file h1_0500.20_O2C02Cl4In0.vWhC (no longer needed)    
67155    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file l1_0500.20_O2C02Cl4In0.vWhC (no longer needed)    
67156    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file h1_0500.25_O2C02Cl4In0.9wXm (no longer needed)    
67157    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file l1_0500.25_O2C02Cl4In0.9wXm (no longer needed)    
67158    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file h1_0500.30_O2C02Cl4In0.NEiW (no longer needed)    
67159    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file l1_0500.30_O2C02Cl4In0.NEiW (no longer needed)    
67160    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file h1_0500.35_O2C02Cl4In0.C7yX (no longer needed)    
67161    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file l1_0500.35_O2C02Cl4In0.C7yX (no longer needed)    
67162    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file h1_0500.40_O2C02Cl4In0.SGPu (no longer needed)    
67163    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file l1_0500.40_O2C02Cl4In0.SGPu (no longer needed)    
67164    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file h1_0500.45_O2C02Cl4In0.dI9Q (no longer needed)    
67165    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file l1_0500.45_O2C02Cl4In0.dI9Q (no longer needed)    
67166    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file h1_0500.50_O2C02Cl4In0.TKgP (no longer needed)    
67167    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file l1_0500.50_O2C02Cl4In0.TKgP (no longer needed)    
67168    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file h1_0500.55_O2C02Cl4In0.Tmw4 (no longer needed)    
67169    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file l1_0500.55_O2C02Cl4In0.Tmw4 (no longer needed)    
67170    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file h1_0500.60_O2C02Cl4In0.rIfV (no longer needed)    
67171    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file l1_0500.60_O2C02Cl4In0.rIfV (no longer needed)    
67172    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file h1_0500.65_O2C02Cl4In0.YOkB (no longer needed)    
67173    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file l1_0500.65_O2C02Cl4In0.YOkB (no longer needed)    
67174    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file h1_0500.70_O2C02Cl4In0.rVcz (no longer needed)    
67175    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file l1_0500.70_O2C02Cl4In0.rVcz (no longer needed)    
67176    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file h1_0500.75_O2C02Cl4In0.9S4c (no longer needed)    
67177    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file l1_0500.75_O2C02Cl4In0.9S4c (no longer needed)    
67178    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file h1_0500.80_O2C02Cl4In0.8LrW (no longer needed)    
67179    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file l1_0500.80_O2C02Cl4In0.8LrW (no longer needed)    
67180    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file h1_0500.85_O2C02Cl4In0.5ZIO (no longer needed)    
67181    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file l1_0500.85_O2C02Cl4In0.5ZIO (no longer needed)    
67182    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file h1_0500.90_O2C02Cl4In0.55Zg (no longer needed)    
67183    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file l1_0500.90_O2C02Cl4In0.55Zg (no longer needed)    
67184    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file h1_0500.95_O2C02Cl4In0.3AvQ (no longer needed)    
67185    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file l1_0500.95_O2C02Cl4In0.3AvQ (no longer needed)    
67186    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file h1_0501.00_O2C02Cl4In0.7Ek4 (no longer needed)    
67187    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file l1_0501.00_O2C02Cl4In0.7Ek4 (no longer needed)    
67188    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file h1_0501.05_O2C02Cl4In0.co7b (no longer needed)    
67189    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file l1_0501.05_O2C02Cl4In0.co7b (no longer needed)    
67190    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file h1_0501.10_O2C02Cl4In0.3AWK (no longer needed)    
67191    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file l1_0501.10_O2C02Cl4In0.3AWK (no longer needed)    
67192    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file h1_0501.15_O2C02Cl4In0.43XT (no longer needed)    
67193    Einstein@Home    1/28/2021 8:45:19 AM    BOINC will delete file l1_0501.15_O2C02Cl4In0.43XT (no longer needed)    
67194    Einstein@Home    1/28/2021 8:45:19 AM    Project requested delay of 60 seconds    
67195    Einstein@Home    1/28/2021 8:45:19 AM    [sched_op] estimated total CPU task duration: 0 seconds    
67196    Einstein@Home    1/28/2021 8:45:19 AM    [sched_op] estimated total NVIDIA GPU task duration: 2596 seconds    
67197    Einstein@Home    1/28/2021 8:45:19 AM    [sched_op] handle_scheduler_reply(): got ack for task h1_0497.85_O2C02Cl4In0__O2MDFS2_Spotlight_498.05Hz_273_2    
67198    Einstein@Home    1/28/2021 8:45:19 AM    [sched_op] Deferring communication for 00:01:00    
 

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 4045
Credit: 48074011261
RAC: 34060545

+1 Keith, I saw the same

+1 Keith, I saw the same thing in my logs.

_________________________________________________________________________

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 118368182196
RAC: 25524635

Keith Myers wrote:Yes, I see

Keith Myers wrote:
Yes, I see deletions on every new work request usually.

Thanks very much for that.

As I mentioned in another thread, something looked 'not right' with the switch from S2 to S3.  I immediately set the work cache size for all my hosts doing GW to 0.02days.  Each had ~2 days of work on board so I wanted to get more information about what was happening before allowing them to get compromised in any way.  As a result, I haven't seen the sort of activity you mention so am grateful for the heads-up.

The first thing to notice is that the files being deleted are NOT S2 files, but rather S3 files.  This is an immediate red flag for me.  The highest frequency I had seen for S2 was about 490Hz.  These are over 500, which is where it appears that S3 started from.

To confirm, I allowed the host with the lowest amount of cached work remaining to take a few sips.  What I saw was not pretty.  Each new task received was for a different frequency series and there was only ever 1 task per series.  As you mention, each new task was accompanied by deletion requests for frequencies over 500.  I quickly put a stop to it after about 5 random tasks for frequencies between ~500 and ~550.  At first glance (I need to check more thoroughly) it looks like a single task is being issued and then the large data files associated with that (or with something else but not S2) are being deleted.  I dropped everything to post a response here.  I'll check further during the remainder of my day.

There's no way I can sustain that amount of network traffic so I've immediately reverted all hosts doing GW back to GRP.  I own the premises where I house all my fleet and the major tenant is my daughter's commercial real estate business.  She employs about 20 staff and they use the internet (which we both share) all day long.  It would be insane of me to allow each host to be downloading potentially hundreds of GB of single use data under those circumstances.

Hopefully this behaviour wasn't intentional by the project - particularly when there was no warning about it.

Cheers,
Gary.

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 5023
Credit: 18930116165
RAC: 6468214

I'd like to know how you

I'd like to know how you determined the deleted files are for S3 WU's.

I see nothing in the data file that proclaims to be labelled as S3 data.

How do you know that S2 data topped out at 490Hz?  Just for what you observed on your hosts?

Are you privy to the dataset design per chance?

Maybe others have received WU's in the 500Hz range.

But I was also was being impacted by constant downloads that ate a big chunk of my 1 TB monthly download limit through my ISP.

I am grateful that the GRP beta application for Volta/Turing/Ampere cards was released so I could stop GW and resume GRP.

BTW, I haven't seen a single S3 WU I have returned being validated.  I also see no sign of a S3 task validator in the server list.  I also see currently a 19,000 S3 O2MDF1 backlog in validations, so I assume nobody else has had any S3 work units validated either.  The S3 tasks were a big waste of time apparently.

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 118368182196
RAC: 25524635

Keith Myers wrote:Q1.  I'd

Keith Myers wrote:

Q1.  I'd like to know how you determined the deleted files are for S3 WU's.

Q2.  I see nothing in the data file that proclaims to be labelled as S3 data.

Q3.  How do you know that S2 data topped out at 490Hz?  Just for what you observed on your hosts?

Q4.  Are you privy to the dataset design per chance?

Q5.  Maybe others have received WU's in the 500Hz range.

Q6.  BTW, I haven't seen a single S3 WU I have returned being validated.  I also see no sign of a S3 task validator in the server list.  I also see currently a 19,000 S3 O2MDF1 backlog in validations, so I assume nobody else has had any S3 work units validated either.  The S3 tasks were a big waste of time apparently.

A1.  There is nothing in the data file name to indicate which particular series a given data file belongs to.  I mentioned I deliberately acquired a small number of S3 tasks (one at a time) on one machine.  I have now had time to go back and analyse the event log to figure out what happened as each new task arrived.  I saw data files being downloaded with the correct frequency values to cover the range as specified by the DF of each task.  So the downloads were definitely for S3 tasks.

The next work request for a single task gave a completely different frequency series and a fresh batch of downloads.  It also included the message, "Boinc will delete file ..." where the full set of files was exactly the set that had just been downloaded for the previous task - a task that wasn't going to even start until the earlier S2 tasks left on the machine had been processed.  The "will delete" means "will delete in the future" - when the task for which the data applies is finally crunched and returned.

Having gone through several of these tasks and seen the delete messages for the previous downloads, it appears that EITHER there will only ever be one task for a particular frequency so delete what has been downloaded when done, OR, there has been some sort of stuff-up and the delete instruction should never have been issued.  Either way, we wont know until the Devs say something.

A2.  On a bunch of machines I saw S2 resends (many pages of them) where the frequency was always between about 480 and 496.  Several other machines also got the odd S3 task before I put a stop to it.  One machine by design deliberately got 6 consecutive tasks with a couple of minutes gap between each one.  In total I probably had around 20 S3 tasks all up.  All these S3 tasks had frequencies above 500.

A3.  My guess is that S2 never got to finish (so we don't know where the frequency might have ended up) in the haste to get S3 started.  S3 must have started pretty close to 500.  I saw at least one at 501.xx.  Every delete message I saw was for a frequency above 500.

A4.  Absolutely not.  I have no more information than any other volunteer.  I don't get any 'heads-up' from the Devs.  If there is something that appears to be a project problem, I send a PM to Bernd or Oliver.  I just look at what is happening and make judgements based on past experience.  If there is something that might adversely affect volunteers, I try to alert people.  I try not to 'cry wolf' but obviously I can be wrong.

A5.  I'm sure that is so.  I'd be astounded if they're not all S3 tasks.

A6.  This is totally normal procedure.  When there is a new search starting, the usual procedure is to hold all returned tasks until validation has been checked manually on a selection of what has been returned.  Saves a lot of hassle if there turns out to be a problem.  Usually within a day or two when there is confidence that the results are good, the floodgates will be opened.  The fact you don't see a validator doesn't mean one doesn't exist.  I'd be confident it's waiting in the wings ready to go when they have tested it on a selection of the returns.

However, there is still something very unusual with S3 that is yet to be revealed.  To download a single task with a bunch of data and then mark for deletion all that same data simply isn't sustainable.  It will be interesting to see how this all pans out.

Cheers,
Gary.

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 5023
Credit: 18930116165
RAC: 6468214

Thanks Gary for the detailed

Thanks Gary for the detailed post-mortem.

I agree, there is something very strange and unusual with this new S3 series.

Good catch on the fact that the deletions are attached to the very same task being downloaded at the time.

I missed that.

 

Eugene Stemple
Eugene Stemple
Joined: 9 Feb 11
Posts: 67
Credit: 389279952
RAC: 461056

...jumping on this thread,

...jumping on this thread, with this observation:  in the past 24 hours I have downloaded 566 GW data files of the h1_0xxx.xx_02C02Cxxxxxx.xxxx form (half of them the l1 mate).  They are nominal 4 MB size, so about 2.2 GB total.  I don't remember seeing this high volume before so it may be a non-repeating situation.  However, the lifetime (before "BOINC will delete... no longer needed") also appears shorter than I would expect.  Yes, I am aware of Gary's advice that this only means the data file is "marked" for deletion.  One example, a block of 24 files (0525.10 - 0526.25 in 0.05 steps) was downloaded beginning at 23:58 UTC 2/10 and "BOINC will delete" 19 minutes later at 00:17 UTC 2/11.  When I searched for files in that block ~05:00 UTC) they are indeed GONE!  A very obvious pattern, but maybe only 70% of the cases, the downloaded files are marked "BOINC will delete" within 30 to 45 minutes of the download.   I don't find anything actually deleted in less than 5 hours from download but even within 24 hours seems suspicious to me.

Did I just happen to hit a periodic data flush of some kind?

I'm in NNT mode for a while.

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.