Disproportionate amount of workunits

Henri Ala-Peijari

Joined: 11 Sep 05

Posts: 5

Credit: 6428544

RAC: 0

21 Jul 2010 15:13:36 UTC

Topic 195201

(moderation:

)

I attached to eistein@home to take work mainly when other projects are down/out of work. Now I got stuffed with gigabytes worth of einstein units right after attaching (on both computers).

1St computer showed close to 2gigabytes of space used by einstein (I would imagine that the einstein program binaries are nowhere near this) and the other well over 1gig. Both are running windows 7, otherone is x64 and other is 32bit.

I've set seti@home to 500 and einstein to 1 and milkyway@home to 2 (work proportions).

Any ideas why this is so?

Gundolf Jahn

Joined: 1 Mar 05

Posts: 1079

Credit: 341280

RAC: 0

Disproportionate amount of workunits

21 Jul 2010 17:43:48 UTC

Message 98674

(moderation:

)

The reason might be your BOINC version. You should upgrade.

I don't think it's aproblem with the Einstein project.

GruÃŸ,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

Henri Ala-Peijari

Joined: 11 Sep 05

Posts: 5

Credit: 6428544

RAC: 0

I just realized that I'm

21 Jul 2010 18:26:14 UTC

Message 98675 in response to message 98674

(moderation:

)

I just realized that I'm using the lunatics mod for Seti@home, maybe it's causing this? http://lunatics.kwsn.net/

I checked the boinc version on one of the computers and it's 6.10.18.

Quriously it works fine with milkyway@home.

Gundolf Jahn

Joined: 1 Mar 05

Posts: 1079

Credit: 341280

RAC: 0

No, it's not the fault of the

21 Jul 2010 18:34:20 UTC

Message 98676 in response to message 98675

(moderation:

)

No, it's not the fault of the (optimised) science application. There have been BOINC versions with severe bugs in the scheduler code. I'm not quite sure if 6.10.18 is one of them, but the currently recommended version is 6.10.58.

GruÃŸ,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

Henri Ala-Peijari

Joined: 11 Sep 05

Posts: 5

Credit: 6428544

RAC: 0

I aborted tasks on one

21 Jul 2010 20:03:52 UTC

Message 98677 in response to message 98676

(moderation:

)

I aborted tasks on one computer and only two running tasks were left. Still (after update and only 1 einstein task in the task queue), boinc tells me that einstein@home is using 1,58Gb disk space. I had updated the client to the latest version but not before I aborted most of the tasks.

Not that I'm running out of disk space, but does the application really use this much?! Still, I calculated from the number of tasks I originally received and their projected run time that they would take all computing power for atleast a week.

I have set the work buffer to 10 days on boinc. But does this mean that when seti@home runs out of work, I get 10 days worth einstein@home on the spot?

Gundolf Jahn

Joined: 1 Mar 05

Posts: 1079

Credit: 341280

RAC: 0

RE: Not that I'm running

21 Jul 2010 22:33:41 UTC

Message 98678 in response to message 98677

(moderation:

)

Quote:

Not that I'm running out of disk space, but does the application really use this much?!

It's not the application but the (general/global) data files that take so much space. Aborting the tasks doesn't automatically delete all those data files. See Gary Roberts's messages about locality scheduling for more information and links concerning this.

Quote:

Still, I calculated from the number of tasks I originally received and their projected run time that they would take all computing power for atleast a week.

That's what I meant with "severe bugs in the scheduler code" of the BOINC client. :-)

Quote:

I have set the work buffer to 10 days on boinc. But does this mean that when seti@home runs out of work, I get 10 days worth einstein@home on the spot?

Yes, that would be the conseqences of the aforementioned bug.

GruÃŸ,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5883

Credit: 119064921340

RAC: 24458249

Gundolf has already provided

22 Jul 2010 2:19:40 UTC

Message 98679

(moderation:

)

Gundolf has already provided you with good answers to your queries but I thought I'd add a couple of extra points to help with understanding some options you have.

Quote:

I attached to eistein@home to take work mainly when other projects are down/out of work. Now I got stuffed with gigabytes worth of einstein units right after attaching (on both computers).

Many people claim this is 'Einstein being greedy' but the behaviour is a combination of BOINC and the preference settings you have chosen and the E@H project has no control over this.

Quote:

1St computer showed close to 2gigabytes of space used by einstein (I would imagine that the einstein program binaries are nowhere near this) and the other well over 1gig. Both are running windows 7, otherone is x64 and other is 32bit.

If your computers are doing both S5GC1 and ABP2 tasks (you can choose not to do ABP2) then you will have the science apps for both. This will be a few 10's of MB in total - so quite small in the overall scheme of things. A single GC1 task is very small (in itself) but does require the downloading of close to 100MB of large data files. These can be reused for many more tasks (usually) and this is the function of locality scheduling that Gundolf pointed you towards.

ABP2 tasks do not use locality scheduling. Each task requires an 8MB data download which remains on your machine only until the task is successfully returned. At that point the data is immediately deleted.

GC1 tasks may use common data but unfortunately the WUG (work unit generator) doesn't create very many for a particular frequency band and only creates more when those few have been exhausted. It takes maybe a few minutes for any new extra tasks to be added to the pool so the scheduler usually moves you on to the next frequency band when there is a temporary shortage of tasks for the existing band. Moving a single frequency step, eg. from xxx.00 to xxx.05, will result in 4 files (~15MB) being marked for deletion and 4 new files (~15MB) being downloaded and added to the pool of available data files. On a fast machine, you can have several (maybe even >10) of these frequency band movements per day. Files marked for deletion can't be deleted until all tasks that depend on them are completed and returned. So new data added and data will both exist for many days if you set up large multi-day caches.

Quote:

I've set seti@home to 500 and einstein to 1 and milkyway@home to 2 (work proportions).

Unfortunately, this isn't the way to solve your problem. You should upgrade to the latest BOINC, as Gundolf advised, as this will make sure that you aren't suffering from KNOWN scheduling bugs. Don't be lulled into thinking that this will fix everything, because it wont. There are still plenty more unsatisfactory scheduling, resource share and debt management behaviours to work around.

For example, BOINC will still download far too much E@H work when Seti has its regular multi-day outages. The resource share of 1 is immaterial if you set a 10 day cache. BOINC will still try to get 10 days of work when Seti can't supply. You won't get it from Milkyway because they limit you to 6 tasks per CPU core. So the bulk of your 10 day cache will fill up with E@H work. When Seti does have more work, it will be impossible for the oversupply of E@H tasks to be done without invoking panic mode and this is not Einstein's fault.

The only hope you have without severe micro-managing is to try out BOINC's new backup project mode. Even there, you will still get far too much work unless you reduce your cache size as well. Backup project mode is invoked by setting the chosen backup project resource share to zero. BOINC will only download tasks if BOTH other projects run out of work. You would then get a full cache load of backup project tasks so you should consider setting your cache to around a day or two or else be prepared to work off several days of backup project work each time the other projects fail to deliver.

The real source of these problems is the way the Seti project is being run. It has always been difficult and is now virtually impossible to come up with a satisfactory 'set and forget' scheme with Seti as your preferred project. The real solution is to not make Seti your principal project. I first joined Seti in 1999 and did over 100K 'classic' tasks. I continued for a bit with Seti BOINC but got frustrated with all the problems and switched to E@H. It was actually quite hard to make that decision and I felt quite 'guilty' about it. However, it was the best decision for me in the long run.

I'd be even more of a nut case if I was still trying to support Seti :-).

And the strange but true situation is that the project would really love a large number of you to switch your allegiances elsewhere ;-).

Good luck with your decisions!

Cheers,
Gary.

tullio

Joined: 22 Jan 05

Posts: 2118

Credit: 61407735

RAC: 0

I am running 6 BOINC projects

22 Jul 2010 4:35:03 UTC

Message 98680

(moderation:

)

I am running 6 BOINC projects with a very small cache, 0.25 days, and I get new WUs one at time and only when the preceding one has been done and uploaded. I am never out of work and I presently have 5 SETI pending units on my Linux box and 5 pending units on my OpenSolaris virtual machine, which is much slower than the Linux box. I think SETI users are downloading too many units.
Tullio

Henri Ala-Peijari

Joined: 11 Sep 05

Posts: 5

Credit: 6428544

RAC: 0

I noticed that with

23 Jul 2010 19:05:12 UTC

Message 98681 in response to message 98680

(moderation:

)

I noticed that with einstein@home the deadline is also very near (today I tested and the date was August 8th, about 2 weeks from now), causing BOINC to run einstein@home units on "high priority" mode right after they're downloaded. So with this, I have to say that in my humble opinion: some balme rests with the way einstein@home is setup.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5883

Credit: 119064921340

RAC: 24458249

RE: I noticed that with

23 Jul 2010 23:35:57 UTC

Message 98682 in response to message 98681

(moderation:

)

Quote:

I noticed that with einstein@home the deadline is also very near (today I tested and the date was August 8th, about 2 weeks from now), causing BOINC to run einstein@home units on "high priority" mode right after they're downloaded. So with this, I have to say that in my humble opinion: some balme rests with the way einstein@home is setup.

Yes, the E@H deadline is precisely 2 weeks and it has pretty much always been that. The MW deadline was 1 day at one stage (I think) which was unworkable and then it became 3 days and, because of continuing complaints it was eventually made 7 days (or thereabouts).

The simple fact is that different projects set deadlines, usually as a compromise between the true needs of the project and the needs of the volunteers. No project deliberately tries to set unworkable deadlines. Your Q9300 quad core takes around 7 hours per task per cpu on E@H. So really, a 2 week deadline in normal circumstances could be regarded as quite generous. You cannot blame any one project if the requirements of other projects, in combination with an extreme setting for resource shares, creates an unworkable problem for BOINC. The unfortunate fact is that BOINC, on its own, is simply not smart enough (yet). Perhaps it never will be.

Here is another suggestion for you. I have an old Northwood P4 with HT that is jointly doing E@H/Seti in a 50/50 resource share ratio. Running separate projects on the 2 virtual cores actually speeds up both quite significantly so I've always left it that way. It's always had a 6 day cache and it never ran out of work for either project. With the new 3 day outage at Seti, the cache tends to fill up with extra E@H tasks as the Seti cache runs down. I can see this causing future problems and perhaps disturbing my nice arrangement where 1 core is always running Seti and the other - E@H.

So, here is what I'm doing about it. Sometime during the day before the 3 day outage starts, I set E@H to NNT and set the cache to 8 days. After Seti fills up, I set the cache to 4 days so that neither project will be trying to get work for a couple of days. By the time the Seti outage finishes, the machine will have started topping up with E@H but perhaps not Seti. Over the weekend when Seti has settled down I put the cache back to 6 and both projects top up. When the day before the next outage is due - rinse and repeat. In other words, by having 3 micromanaging events per week, I can keep everything nicely balanced and it's working just fine so far.

So for your resource share situation, you would need to get a bit more extreme. You could try with a cache setting of say 0.5 day (or less). On the day before the outage set E@H to NNT and ramp up to 8 days and fill up with Seti. Set your cache back to less than 0.5 days and continue to the end of the outage. With luck you should still have plenty of Seti left and there would be no need for any more E@H. If Seti's outage was protracted you might need to get some E@H but 0.5 days (or less) would easily take care of that. You should never get a large swag of E@H tasks. However you will need to micromanage but not that often.

Let us know how you get on.

Cheers,
Gary.

glohr

Joined: 27 Aug 05

Posts: 5

Credit: 280015234

RAC: 155337

My problem seems to fit this

8 Aug 2010 8:45:54 UTC

Message 98683

(moderation:

)

My problem seems to fit this thread, so I'll add it here, rather than starting a new thread. Before I set it to NNT on 7/30, E@H loaded Computer ID: 995651 up with over 100 WUs for S5GCESSE2. The work buffer is set to 1.0 day. It can process about 5 per day, so something is clearly amiss.

It will grind through about 25 of the remaining 80+ WUs before the deadline. Once the report deadlines have passed and it has done all it can, what should I do, if anything? Reset? I don't want to again have a huge backlog of WUs that won't be processed in time.

Normally seti@home is running an equal share on the box, but for the duration it's turned off. Until this event, S@H has been running out of work for part of their weekly outage, but E@H took up the slack without any drama.

Here's some info on the box:

CPU type AuthenticAMD AMD Athlon(tm) 64 X2 Dual Core Processor 4400+ [Family 15 Model 35 Stepping 2] Number of processors 2
Coprocessors NVIDIA GeForce GT 220 (1023MB) driver: 19713
Operating System Microsoft Windows XP Professional x86 Edition, Service Pack 3, (05.01.2600.00)
BOINC client version 6.10.58
Memory 2047.48 MB
Cache 1024 KB

Disproportionate amount of workunits

Forums › Problems and Bug Reports

Comment viewing options

Forums › Problems and Bug Reports