Why so fussy? I will have to get out unless situation improves

Kenny MacPhee

Joined: 26 Feb 05

Posts: 3

Credit: 16071

RAC: 0

5 May 2005 20:00:34 UTC

Topic 189113

(moderation:

)

Sorry for the sensationalist title but I am having pretty constant (yet shifting problems) with Einstein that I am simply not getting with SAH or CPDN.

I have two PCs running BOINC and one of them needs to potentially be offline for several days. It is this one that is having the problems, possibly due to the "Contact server every ..." setting being at 10 days.

First off the deadline of 7 days is daft. A couple of months ago I would get four or so WUs in one go ... all with a deadline of 7 days. This would lead to at least some of them not getting processed in time and then me having to cancel WUs or them finishing after the deadline. Wasted cycles. Not good. What is the rationale behind 7 days? Surely it can't be so time-critical that the deadline could not be extended to 14 days?

Then this Monday just gone I started getting a "your connection interval is too high" (I paraphrase) error. Today I tried to connect for the first time since then and am getting "Your network connection interval is longer than WU deadline". What is this all about?!

The other two project I run do so with absolutely no intervention and Einstein is the only one that requires "mothering". Are these all post-Beta glitches and will it improve or should I cut my losses?

Bruno G. Olsen ...

Joined: 20 Dec 04

Posts: 115

Credit: 7668259

RAC: 0

Why so fussy? I will have to get out unless situation improves

5 May 2005 20:46:23 UTC

Message 11351

(moderation:

)

The new message you get is propably due to an upgrade on the server end to get the attention of people like your self that would run into deadline problems. So it's not a glitch, it's actually a good thing. As you said yourself, there would potentially be alot of wasted cycles as a result of not meeting deadlines.

So, there are two main ways to fix this: 1) the deadline is increased 2) you lower the cache. As I personally would imagine there was good scientific reasons to choose a 7 day deadline period, the most appropriate way to fix it would be for you to lower your cache.

And then there are two alternative ways to fix this: 1) you could put each computer on different venues and set those up differently 2) dettach from the project on the computer that has this problem.

I have two computers myself, one I use myself and it's on 24/7, and the other is used by my father which means it's only on when he uses it. I ended up having the same problem on the second computer, actually it was worse as it would never even get to return an einstein wu on time, struggeling with getting work done for sah... So, I ended up using a combination of the two alternative fixes: I dettached from all other project than climate and predictor, AND set up the school venue for it. Then whenever my father gets his own computer and I have the opportunity to let the one he is using now run 24/7 I'll make changes to the current setup.

Hope this was helpful :)

Kenny MacPhee

Joined: 26 Feb 05

Posts: 3

Credit: 16071

RAC: 0

Thanks Bruno. It was

5 May 2005 21:00:56 UTC

Message 11352 in response to message 11351

(moderation:

)

Thanks Bruno. It was helpful!

The computers are already at two different "locations". The desktop with permanent net access has been fine (apart from at the very beginning) so I'll have to cancel Einstein on the laptop.

I could reduce the cache ... but that would hurt S@H which works just fine as is and is my primary project.

I would be *very* interested to understand the reason for a 7 day deadline though, as I really cannot see any scientific or logistical reason for it.

Thanks for taking the time to reply.

Blank Reg

Joined: 18 Jan 05

Posts: 228

Credit: 40599

RAC: 0

Here is the info you

5 May 2005 22:12:57 UTC

Message 11353

(moderation:

)

Here is the info you wanted

http://einsteinathome.org/node/187702

Bruce Allen
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Oct 15, 2004
Posts: 427
ID: 3
Posted: 18 Feb 2005 17:03:12 UTC - in response to Message ID 3726.

> > > Hi MetalWarrior,
> > >
> > > it IS public since yesterday! That's how I registered.
> > >
> > >
> >
> > Feb 14, 2005
> >
> > For testing purposes, in anticipation of our public launch, we will be
> > enabling account creation for some unannounced periods this week. If you
> are
> > interested in joining Einstein@Home, please try the Create Account link.
> >
> >
> > no, its not really public... this also belongs to the testing phase. i
> think
> > they want to get more users to test if the System can handle it, so its
> still
> > in testing
> >
>
>
> O.K., thanks for the information. I was one of the lucky-ones then!

Note: At least in the short term, I am not planning to extend the report deadlines. The size of the database grows directly in proportion to these deadlines, so keeping the deadlines shorter reduces the strain on the server and allows more people to participate.

A single E@H workunit should take 7 or 8 hours on a modern machine, and perhaps 24-36 CPU hours on an old machine. Provided that the machine is available for BOINC at least 10% of the time, most machines should be able to complete the workunit by the deadline.

Bruce

Link to Unofficial Wiki for BOINC, by Paul and Friends

Reid

Joined: 15 Apr 05

Posts: 5

Credit: 45581

RAC: 0

It allows more people to

6 May 2005 4:54:35 UTC

Message 11354

(moderation:

)

It allows more people to participate because deadlines aren't being met...
Therefore a WU has to be sent out several times to be done , so actually, one WU, can take up to 3 weeks to be completed....Not time saving, when a 10-14 day would allow it to be completed
I do see how the 7 day deadline does become worry some , for a few

I have only missed one deadline ....the first one....NP (prolly my fault)

Boinc 4.35, and 4.36...are addressing the problem....Pretty well , so far

I have two 2.4 P4's...And one unit usually takes 14 hrs. So I am set to change projects(Ein & SAH)every 2 hrs...7 cycles and it is done(approx 28 hrs)
My Cache is set for 3.5 days

I have the workload for SAH at 75%, and Ein 25%

Everything is working OK for me , I haven't ran out of WU's with SAH, when they are down either

Ziran

Joined: 26 Nov 04

Posts: 194

Credit: 655119

RAC: 1823

Determining what the best

6 May 2005 20:00:45 UTC

Message 11355

(moderation:

)

Determining what the best suitable deadline for a project is, ain’t an easy thing. Many factors have to be taken into account. If i remember correctly the database was the limiting factor for Einstein. So to illustrate the problem lets imagine that we set the deadline to 14 days.

So what will happened now then we have this longer deadline? According to Bruce “The size of the database grows directly in proportion to these deadlines, so keeping the deadlines shorter reduces the strain on the server and allows more people to participate”.

Now why is that? An entry of a WU will remain in the database until the WU has bin validated and enough matching results have been sent in and all sent out WU’s have been accounted for or passed the deadline.

So:

Every WU that that will never be returned, we will now have to wait for 14 days instead of 7, before we can resend it to a new host.

A result is reported the next time you download work from the project (or contact the scheduler for some other reason) or the deadline is less then 24 H away. So with longer deadlines the client will wait longer before reporting the result.

If i am correctly informed: the bigger the database is, the slower it is.

The project has limited hardware so it has to determine how it can use it to get the most science done.

If the database is the limiting factor you can ether try to reduce the average turnaround time for the WU’s, or limit the number of participants to something that the hardware can handle.

Then you're really interested in a subject, there is no way to avoid it. You have to read the Manual.

That's all folks

Joined: 7 Apr 05

Posts: 20

Credit: 23903

RAC: 0

If the database is the

6 May 2005 21:39:06 UTC

Message 11356 in response to message 11355

(moderation:

)

If the database is the limiting factor you can ether try to reduce the average turnaround time for the WU’s, or limit the number of participants to something that the hardware can handle.

I guess you left out step 1 which is the scheduler sending out those work units and managing the feedback. If this part would be a bit more clever this would cut turn arround time as well.

For instance if one is resetting the project the WUs on the client side are deleted. However the scheduler is not cancelling the files on the server side but is waiting for the full grace period to receive a result which will not be returned for sure. On the other hand the scheduler must have some kind of knowledge about the reset condition as it sends some brand new WUs for the client. If the dangling WUs would have been marked invalid after a reset they could be send out immediately after they have been deleted on the clients side.

Another point would be to reduce the initial WU quota for computers with no current RAC. These are usually new registered computers and/or computers falling back to S@H if their main project is somehow stuck. These computers start with a quota of 8 WUs for each processor. I.e. without any further knowledge about the average return period each new registered computer can download the maximum WUs. The quota will be reduced for each and every not delivered WU by one after the grace period has finished.

This leaves a lot of dangling WUs as well as some new users even if they are experienced in other projects seemingly tend to overestimate the potential power of their machines. If new machines would start with a quota of one WU/processor there would be a potential risk of resending 1 unit only instead of 7 od 8 units which can be seen on a lot of new accounts. If they deliver their results on time they will be awarded with an increase in WU quota. But if they fail to deliver they may not block up to 8 WUs from the very beginning.

Last but not least IMHO it makes no sense in sending out a WU to one computer soley then wait for one or two days before sending it to some other computers as well. This is artificially spreading the active data . Instead of initializing a new data set for the faster machines it could be a memory saving technique to let them pick up older work units where some other computers allready failed to deliver. If the pace for putting new units into the data base would be handled by the slower CPUs the faster ones could act as an deescalation unit to keep the active data ends closer together.

Then it would be no problem to spread the grace time for slower machines as the waiting time for a not delivered unit would be 14 days plus one for an active fast processor. If the scheduling would be done a little better than it is handled right now, the slower computers would have a chance to participate as well.

gravywavy

Joined: 22 Jan 05

Posts: 392

Credit: 68962

RAC: 0

>I would be very interested

7 May 2005 8:48:12 UTC

Message 11357 in response to message 11352

(moderation:

)

>I would be *very* interested to understand the reason for a 7 day deadline >though, as I really cannot see any scientific or logistical reason for it.

You are right on the first point, there is no scientific reason for it, at least none that has ever been posted on these forums.

There is a sound logistical reason, or actually *three*

1. For this project the most scarce resource is capacity on the database server - this has been discussed above.

2. Note that with a deadline of 1 week, it can still take 2 weeks for a WU to be processed: four machines are assigned the same wu, but if not enough of them report within the deadline, more machines have to be assigned the same wu, making the total time up to 2 weeks. (It can get to 3 weeks on the odd occasion that the new machines also fail to report.) This not only affects issue 1, but also delays the award of credit to those participants whose machines did return the wu promptly. Clearly credit is an important incentive to some contributors, and even if they get the credit eventually the delay does upset some people. As the project relies on the goodwill of donors it is a wise move to limit the delay to some figure, not necessarily 7 days. Some users say the current setting is too long, others that it is too short: but the important thing to notice is that whatever setting they chose would not please everyone.

3. The project saves on network time by sending data to your machine only once and then sending your machine several wu that all process the same data in different ways. This saves the project's bandwidth, and for donors using a metered connection cuts the costs to the donor of downloading new data.

By a metered connection I mean either that you have dial-up and pay for the calls, or that you have broadband but that your ISP charges extra when you exceed their download limits. Both kinds of metering are common in the UK.

Whenever you ask for another wu, the scheduler will give you a wu that matches the data you already have, until there are no more of these wu left. The longer the deadline, the longer it will take for the last few stragglers to be returned - it is not uncommon for some wu from each dataset to take 3 weeks. The greater this overhang gets the greater the chance that some machine will be asked to download the data just to process the single outstanding wu.

Recently we had a complaint on these forums from someone who was given three different wu in the same connection, all with different datasets, and it took ages for the download to complete on her metered dial-up. This is not a good way to keep users on board, and the project management are wise to limit this sort of effect as much as they can.

Weighin up all the considerations, I can't honestly say that the project management are *wrong* to set a 7-day deadline. My own view is that it is better to avoid deadlines that exactly match people's work patterns - I'd go for a 6.5 or 9.5 day deadline rather than a 7 day one simply to avoid the annoyance when a wu is not quite fininshed on friday and the machine has to be turned off till monday, but that is only my view. You will find other views (both that the deadline should be shorter, or very much longer) expressed in these forums.

Whatever deadline the project management choose in the future it is guaranteed that some donors will be unhappy with their choice.

~~gravywavy

Why so fussy? I will have to get out unless situation improves

Forums › Problems and Bug Reports

Why so fussy? I will have to get out unless situation improves

Thanks Bruno. It was

Here is the info you

It allows more people to

Determining what the best

If the database is the

>I would be *very* interested

Comment viewing options

Forums › Problems and Bug Reports

>I would be very interested