Possible Answers to some of your Questions

Odysseus

Joined: 17 Dec 05

Posts: 372

Credit: 20576627

RAC: 5808

RE: RE: Pages are loading

22 Feb 2007 0:47:48 UTC

Message 60797 in response to message 60796

(moderation:

)

Quote:

Quote:
Pages are loading extremely fast.

Not here, the Einstein website here and forums are loading very slowly here and thats with my high speed DSL connection 20 miles from UWM.

Last night I experienced a very refreshing improvement in the websiteâ€™s responsiveness for a while: pages were loading almost instantly. But today itâ€™s back to the â€˜molasses flowing uphill in Januryâ€™ performance weâ€™ve been getting for the past few weeks.

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2959962752

RAC: 708651

As the front page says, work

22 Feb 2007 10:49:32 UTC

Message 60798

(moderation:

)

As the front page says, work is ongoing, and I keep losing all connections - but when the server is up, it seems to be much more responsive.

And my Celeron has just downloaded new work for the first time in over a week.

Definite signs of progress - keep up the good work. (And thanks for the daily updates on the front page).

Terry

Joined: 18 Feb 07

Posts: 7

Credit: 9229

RAC: 0

The webpages and forum are

22 Feb 2007 11:45:31 UTC

Message 60799

(moderation:

)

The webpages and forum are remarkably speedy this early AM (5:30 CST). I no longer feel like I did back in the old dialup days. Excellent work to the E@H team.

--Terry photostuff.org

Vladimir Zarkov

Joined: 27 Feb 05

Posts: 66

Credit: 4876895

RAC: 0

The project looks better and

22 Feb 2007 11:58:28 UTC

Message 60800

(moderation:

)

The project looks better and better today - servers started gobbling that unvalidated load as my dwindling pending credit shows. And the curve in Total Credit chart in my BOINC Manager points at the sky right now. How can it not make me happy? :)))
Heroic work again. Hats off to the project's team.

qdemn7

Joined: 20 Feb 05

Posts: 12

Credit: 3414228

RAC: 0

Looking very good right now

22 Feb 2007 12:07:55 UTC

Message 60801

(moderation:

)

Looking very good right now at 6:07am CST.

kami4ligo

Joined: 15 Mar 05

Posts: 48

Credit: 16105651

RAC: 0

Good to see things going back

22 Feb 2007 12:28:13 UTC

Message 60802

(moderation:

)

Good to see things going back to normal, very good to see news in the home page telling about the progress made. Thanks a lot.

-rg-

(But my two boxes remain committed to 88% to climateprediction - these WUs take long weeks to complete, and it's stupid to throw away whatever work was done on them.)

F. Prefect

Joined: 7 Nov 05

Posts: 135

Credit: 1016868

RAC: 0

RE: Looking very good right

22 Feb 2007 18:41:38 UTC

Message 60803 in response to message 60801

(moderation:

)

Quote:

Looking very good right now at 6:07am CST.

Everything appears to be back to normal as of 11:00AM CST. I still have about 3 times my usual pending numbers, but they have been falling rapidly all morning.

F. Prefect

In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move.....Douglas Adams

kami4ligo

Joined: 15 Mar 05

Posts: 48

Credit: 16105651

RAC: 0

Gary Roberts, pls ... The

22 Feb 2007 19:44:43 UTC

Message 60804

(moderation:

)

Gary Roberts, pls ...

The two threads Links to informative posts ... in Cruncher's Corner and in Problems and Bug Reports no longer have a purpose.

Please remove them in any way you think appropriate. Thanks & regards.

-rg-

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117728865595

RAC: 34946568

RE: Please remove them in

22 Feb 2007 21:48:15 UTC

Message 60805 in response to message 60804

(moderation:

)

Quote:

Please remove them in any way you think appropriate. Thanks & regards.

-rg-

Both deleted at your request. When a thread is deleted, a category for the type of deletion has to be assigned. The only options are:-

*Obscene,
*Flame/hate mail, and
*Commercial spam.

Obviously your thread fits none of these. Please don't take offense that it was categorised as spam when you get an email informing you of the deletion :).

Thanks for your assistance during the period of the database problems.

Cheers,
Gary.

Bruce Allen

Moderator

Joined: 15 Oct 04

Posts: 1119

Credit: 172127663

RAC: 0

Dear Einstein@Home volunteers

24 Feb 2007 20:10:34 UTC

Message 60806

(moderation:

)

Dear Einstein@Home volunteers and contributors,

I thought I would post a description of what went wrong and how it was fixed.

(1) Project performance problems. These were due to our database getting overloaded. It was processing an average of 950 queries per second, with peaks of up to about 3000 queries per second. Ultimately, these were due to the way that the BOINC locality scheduler works and the fact that our new analysis run did not have many low-frequency workunits. Einstein@Home is the only project that uses the locality scheduler, which is designed to send many workunits for the same data file, only sending a new data file when there is no work left for the previous data file. What happened was that many hosts that had low frequency files (because they were slower than the majority of hosts) requested work for these files, or NEW workunits also for low frequency files. When the project ran out of work for these files, the locality scheduler would then perform an extremely database intensive 'crawl' through the database looking for more work. So the slowest 20% of hosts were generating very large numbers of database queries looking for non-existent low frequency workunits. I fixed this by modifying the algorithm that searches for new work. Anyone interested in the details can look at BOINC CVS next week when I check in the modified code.

The database is now averaging about 60 to 80 queries per second, and the database server and project servers are once again snappy and responsive.

(2) File server problems. Our project uses three file servers, each of which has about 8TB of RAID-6 disk space. The file servers use Areca 24-port SATA controller cards, and Western Digital WD4000YR disks. For a number of months we have been experiencing problems in which a disk would apparently drop from the array and then reappear a few seconds later, prompting a RAID array rebuild. In the end we sent one of our server boxes (approximately 80 kg, worth about 10kUSD) by express mail to Taiwan, and the Areca engineers looked at it more closely. (Many thanks to these engineers, who have given us first-rate support!) It turned out that our problems were due to a hardware problem with the WD4000YR drives. They have a SATA interface chip which (in some revisions of the WD4000YR) is incompatible with an interface chip used on the Areca RAID controller. This incompatibility is only triggered by issuing NCQ commands. So by disabling NCQ on the RAID controller, the problem was fixed. Our two remaining file servers have now been working without issues for more than two weeks.

These things were further exacerbated by my move to Germany with my family (our kids are 2.5 and 6 years old) which meant that I couldn't give these issues enough attention until now.

Hopefully these problems are behind us! I am grateful to everyone for their patience, and apologize for how long it took to track these things down and deal with them.

Cheers,
Bruce Allen

Director, Einstein@Home

Possible Answers to some of your Questions

Forums › Problems and Bug Reports

Comment viewing options

Forums › Problems and Bug Reports