Not here, the Einstein website here and forums are loading very slowly here and thats with my high speed DSL connection 20 miles from UWM.
Last night I experienced a very refreshing improvement in the website’s responsiveness for a while: pages were loading almost instantly. But today it’s back to the ‘molasses flowing uphill in Janury’ performance we’ve been getting for the past few weeks.
The webpages and forum are remarkably speedy this early AM (5:30 CST). I no longer feel like I did back in the old dialup days. Excellent work to the E@H team.
The project looks better and better today - servers started gobbling that unvalidated load as my dwindling pending credit shows. And the curve in Total Credit chart in my BOINC Manager points at the sky right now. How can it not make me happy? :)))
Heroic work again. Hats off to the project's team.
Good to see things going back to normal, very good to see news in the home page telling about the progress made. Thanks a lot.
-rg-
(But my two boxes remain committed to 88% to climateprediction - these WUs take long weeks to complete, and it's stupid to throw away whatever work was done on them.)
Everything appears to be back to normal as of 11:00AM CST. I still have about 3 times my usual pending numbers, but they have been falling rapidly all morning.
F. Prefect
In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move.....Douglas Adams
Please remove them in any way you think appropriate. Thanks & regards.
-rg-
Both deleted at your request. When a thread is deleted, a category for the type of deletion has to be assigned. The only options are:-
*Obscene,
*Flame/hate mail, and
*Commercial spam.
Obviously your thread fits none of these. Please don't take offense that it was categorised as spam when you get an email informing you of the deletion :).
Thanks for your assistance during the period of the database problems.
I thought I would post a description of what went wrong and how it was fixed.
(1) Project performance problems. These were due to our database getting overloaded. It was processing an average of 950 queries per second, with peaks of up to about 3000 queries per second. Ultimately, these were due to the way that the BOINC locality scheduler works and the fact that our new analysis run did not have many low-frequency workunits. Einstein@Home is the only project that uses the locality scheduler, which is designed to send many workunits for the same data file, only sending a new data file when there is no work left for the previous data file. What happened was that many hosts that had low frequency files (because they were slower than the majority of hosts) requested work for these files, or NEW workunits also for low frequency files. When the project ran out of work for these files, the locality scheduler would then perform an extremely database intensive 'crawl' through the database looking for more work. So the slowest 20% of hosts were generating very large numbers of database queries looking for non-existent low frequency workunits. I fixed this by modifying the algorithm that searches for new work. Anyone interested in the details can look at BOINC CVS next week when I check in the modified code.
The database is now averaging about 60 to 80 queries per second, and the database server and project servers are once again snappy and responsive.
(2) File server problems. Our project uses three file servers, each of which has about 8TB of RAID-6 disk space. The file servers use Areca 24-port SATA controller cards, and Western Digital WD4000YR disks. For a number of months we have been experiencing problems in which a disk would apparently drop from the array and then reappear a few seconds later, prompting a RAID array rebuild. In the end we sent one of our server boxes (approximately 80 kg, worth about 10kUSD) by express mail to Taiwan, and the Areca engineers looked at it more closely. (Many thanks to these engineers, who have given us first-rate support!) It turned out that our problems were due to a hardware problem with the WD4000YR drives. They have a SATA interface chip which (in some revisions of the WD4000YR) is incompatible with an interface chip used on the Areca RAID controller. This incompatibility is only triggered by issuing NCQ commands. So by disabling NCQ on the RAID controller, the problem was fixed. Our two remaining file servers have now been working without issues for more than two weeks.
These things were further exacerbated by my move to Germany with my family (our kids are 2.5 and 6 years old) which meant that I couldn't give these issues enough attention until now.
Hopefully these problems are behind us! I am grateful to everyone for their patience, and apologize for how long it took to track these things down and deal with them.
RE: RE: Pages are loading
)
Last night I experienced a very refreshing improvement in the website’s responsiveness for a while: pages were loading almost instantly. But today it’s back to the ‘molasses flowing uphill in Janury’ performance we’ve been getting for the past few weeks.
As the front page says, work
)
As the front page says, work is ongoing, and I keep losing all connections - but when the server is up, it seems to be much more responsive.
And my Celeron has just downloaded new work for the first time in over a week.
Definite signs of progress - keep up the good work. (And thanks for the daily updates on the front page).
The webpages and forum are
)
The webpages and forum are remarkably speedy this early AM (5:30 CST). I no longer feel like I did back in the old dialup days. Excellent work to the E@H team.
--Terry photostuff.org
The project looks better and
)
The project looks better and better today - servers started gobbling that unvalidated load as my dwindling pending credit shows. And the curve in Total Credit chart in my BOINC Manager points at the sky right now. How can it not make me happy? :)))
Heroic work again. Hats off to the project's team.
Looking very good right now
)
Looking very good right now at 6:07am CST.
Good to see things going back
)
Good to see things going back to normal, very good to see news in the home page telling about the progress made. Thanks a lot.
-rg-
(But my two boxes remain committed to 88% to climateprediction - these WUs take long weeks to complete, and it's stupid to throw away whatever work was done on them.)
RE: Looking very good right
)
Everything appears to be back to normal as of 11:00AM CST. I still have about 3 times my usual pending numbers, but they have been falling rapidly all morning.
F. Prefect
In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move.....Douglas Adams
Gary Roberts, pls ... The
)
Gary Roberts, pls ...
The two threads Links to informative posts ... in Cruncher's Corner and in Problems and Bug Reports no longer have a purpose.
Please remove them in any way you think appropriate. Thanks & regards.
-rg-
RE: Please remove them in
)
Both deleted at your request. When a thread is deleted, a category for the type of deletion has to be assigned. The only options are:-
*Flame/hate mail, and
*Commercial spam.
Obviously your thread fits none of these. Please don't take offense that it was categorised as spam when you get an email informing you of the deletion :).
Thanks for your assistance during the period of the database problems.
Cheers,
Gary.
Dear Einstein@Home volunteers
)
Dear Einstein@Home volunteers and contributors,
I thought I would post a description of what went wrong and how it was fixed.
(1) Project performance problems. These were due to our database getting overloaded. It was processing an average of 950 queries per second, with peaks of up to about 3000 queries per second. Ultimately, these were due to the way that the BOINC locality scheduler works and the fact that our new analysis run did not have many low-frequency workunits. Einstein@Home is the only project that uses the locality scheduler, which is designed to send many workunits for the same data file, only sending a new data file when there is no work left for the previous data file. What happened was that many hosts that had low frequency files (because they were slower than the majority of hosts) requested work for these files, or NEW workunits also for low frequency files. When the project ran out of work for these files, the locality scheduler would then perform an extremely database intensive 'crawl' through the database looking for more work. So the slowest 20% of hosts were generating very large numbers of database queries looking for non-existent low frequency workunits. I fixed this by modifying the algorithm that searches for new work. Anyone interested in the details can look at BOINC CVS next week when I check in the modified code.
The database is now averaging about 60 to 80 queries per second, and the database server and project servers are once again snappy and responsive.
(2) File server problems. Our project uses three file servers, each of which has about 8TB of RAID-6 disk space. The file servers use Areca 24-port SATA controller cards, and Western Digital WD4000YR disks. For a number of months we have been experiencing problems in which a disk would apparently drop from the array and then reappear a few seconds later, prompting a RAID array rebuild. In the end we sent one of our server boxes (approximately 80 kg, worth about 10kUSD) by express mail to Taiwan, and the Areca engineers looked at it more closely. (Many thanks to these engineers, who have given us first-rate support!) It turned out that our problems were due to a hardware problem with the WD4000YR drives. They have a SATA interface chip which (in some revisions of the WD4000YR) is incompatible with an interface chip used on the Areca RAID controller. This incompatibility is only triggered by issuing NCQ commands. So by disabling NCQ on the RAID controller, the problem was fixed. Our two remaining file servers have now been working without issues for more than two weeks.
These things were further exacerbated by my move to Germany with my family (our kids are 2.5 and 6 years old) which meant that I couldn't give these issues enough attention until now.
Hopefully these problems are behind us! I am grateful to everyone for their patience, and apologize for how long it took to track these things down and deal with them.
Cheers,
Bruce Allen
Director, Einstein@Home