Possible Answers to some of your Questions

Odysseus
Odysseus
Joined: 17 Dec 05
Posts: 372
Credit: 20576627
RAC: 5808

RE: RE: Pages are loading

Message 60797 in response to message 60796

Quote:
Quote:
Pages are loading extremely fast.

Not here, the Einstein website here and forums are loading very slowly here and thats with my high speed DSL connection 20 miles from UWM.


Last night I experienced a very refreshing improvement in the website’s responsiveness for a while: pages were loading almost instantly. But today it’s back to the ‘molasses flowing uphill in Janury’ performance we’ve been getting for the past few weeks.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2959962752
RAC: 708651

As the front page says, work

As the front page says, work is ongoing, and I keep losing all connections - but when the server is up, it seems to be much more responsive.

And my Celeron has just downloaded new work for the first time in over a week.

Definite signs of progress - keep up the good work. (And thanks for the daily updates on the front page).

Terry
Terry
Joined: 18 Feb 07
Posts: 7
Credit: 9229
RAC: 0

The webpages and forum are

The webpages and forum are remarkably speedy this early AM (5:30 CST). I no longer feel like I did back in the old dialup days. Excellent work to the E@H team.

--Terry photostuff.org

Vladimir Zarkov
Vladimir Zarkov
Joined: 27 Feb 05
Posts: 66
Credit: 4876895
RAC: 0

The project looks better and

The project looks better and better today - servers started gobbling that unvalidated load as my dwindling pending credit shows. And the curve in Total Credit chart in my BOINC Manager points at the sky right now. How can it not make me happy? :)))
Heroic work again. Hats off to the project's team.

qdemn7
qdemn7
Joined: 20 Feb 05
Posts: 12
Credit: 3414228
RAC: 0

Looking very good right now

Looking very good right now at 6:07am CST.

kami4ligo
kami4ligo
Joined: 15 Mar 05
Posts: 48
Credit: 16105651
RAC: 0

Good to see things going back

Good to see things going back to normal, very good to see news in the home page telling about the progress made. Thanks a lot.

-rg-

(But my two boxes remain committed to 88% to climateprediction - these WUs take long weeks to complete, and it's stupid to throw away whatever work was done on them.)

F. Prefect
F. Prefect
Joined: 7 Nov 05
Posts: 135
Credit: 1016868
RAC: 0

RE: Looking very good right

Message 60803 in response to message 60801

Quote:
Looking very good right now at 6:07am CST.

Everything appears to be back to normal as of 11:00AM CST. I still have about 3 times my usual pending numbers, but they have been falling rapidly all morning.

F. Prefect

In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move.....Douglas Adams

kami4ligo
kami4ligo
Joined: 15 Mar 05
Posts: 48
Credit: 16105651
RAC: 0

Gary Roberts, pls ... The

Gary Roberts, pls ...

The two threads Links to informative posts ... in Cruncher's Corner and in Problems and Bug Reports no longer have a purpose.

Please remove them in any way you think appropriate. Thanks & regards.

-rg-

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117728865595
RAC: 34946568

RE: Please remove them in

Message 60805 in response to message 60804

Quote:


Please remove them in any way you think appropriate. Thanks & regards.

-rg-

Both deleted at your request. When a thread is deleted, a category for the type of deletion has to be assigned. The only options are:-

  • *Obscene,
    *Flame/hate mail, and
    *Commercial spam.

Obviously your thread fits none of these. Please don't take offense that it was categorised as spam when you get an email informing you of the deletion :).

Thanks for your assistance during the period of the database problems.

Cheers,
Gary.

Bruce Allen
Bruce Allen
Moderator
Joined: 15 Oct 04
Posts: 1119
Credit: 172127663
RAC: 0

Dear Einstein@Home volunteers

Dear Einstein@Home volunteers and contributors,

I thought I would post a description of what went wrong and how it was fixed.

(1) Project performance problems. These were due to our database getting overloaded. It was processing an average of 950 queries per second, with peaks of up to about 3000 queries per second. Ultimately, these were due to the way that the BOINC locality scheduler works and the fact that our new analysis run did not have many low-frequency workunits. Einstein@Home is the only project that uses the locality scheduler, which is designed to send many workunits for the same data file, only sending a new data file when there is no work left for the previous data file. What happened was that many hosts that had low frequency files (because they were slower than the majority of hosts) requested work for these files, or NEW workunits also for low frequency files. When the project ran out of work for these files, the locality scheduler would then perform an extremely database intensive 'crawl' through the database looking for more work. So the slowest 20% of hosts were generating very large numbers of database queries looking for non-existent low frequency workunits. I fixed this by modifying the algorithm that searches for new work. Anyone interested in the details can look at BOINC CVS next week when I check in the modified code.

The database is now averaging about 60 to 80 queries per second, and the database server and project servers are once again snappy and responsive.

(2) File server problems. Our project uses three file servers, each of which has about 8TB of RAID-6 disk space. The file servers use Areca 24-port SATA controller cards, and Western Digital WD4000YR disks. For a number of months we have been experiencing problems in which a disk would apparently drop from the array and then reappear a few seconds later, prompting a RAID array rebuild. In the end we sent one of our server boxes (approximately 80 kg, worth about 10kUSD) by express mail to Taiwan, and the Areca engineers looked at it more closely. (Many thanks to these engineers, who have given us first-rate support!) It turned out that our problems were due to a hardware problem with the WD4000YR drives. They have a SATA interface chip which (in some revisions of the WD4000YR) is incompatible with an interface chip used on the Areca RAID controller. This incompatibility is only triggered by issuing NCQ commands. So by disabling NCQ on the RAID controller, the problem was fixed. Our two remaining file servers have now been working without issues for more than two weeks.

These things were further exacerbated by my move to Germany with my family (our kids are 2.5 and 6 years old) which meant that I couldn't give these issues enough attention until now.

Hopefully these problems are behind us! I am grateful to everyone for their patience, and apologize for how long it took to track these things down and deal with them.

Cheers,
Bruce Allen

Director, Einstein@Home

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.