P.S. The peak is obviously very high; we’ll see how broad it gets over the next few days. It’s a little like watching the light-curve of a supernova …
Well .. i still hope that we get an answer to what has caused the issue in the first place.
As for now i'm sure that the issues were caused by switching from boinc 506 to 507 and introducing a "bug" that hasn't been there before (that's the server version of course).
So please tell use what caused the issues in the first place...
Not a guess, but an actual post by a BOINC developer explaining that there was database issues. If you read the whole thread, it does explain Einstein is part of this issue.
Here’s an impressive illustration of how great the backlog was, from the E@h overview page on BOINCstats:
—it appears that over sixty million credits were granted yesterday.
P.S. The peak is obviously very high; we’ll see how broad it gets over the next few days. It’s a little like watching the light-curve of a supernova …
I expect it will be a one-day wonder. From what I could see, they held off the stats export until the validator had finished catching up - and very sensible too.
Not a guess, but an actual post by a BOINC developer explaining that there was database issues. If you read the whole thread, it does explain Einstein is part of this issue.
Matt's post explained what happened, but not why, nor how it came to escape into the wild without checking.
We've just finished 95% of a 16 million WU run with no problems at all: and I think it's generally accepted that the problems with the final 5% were more attributable to data traffic load and fileserver problems, rather than database issues.
So I don't see why the database has such problems with the new 7 million WU run. I'm afraid I just don't buy the 'tipping point' theory, that Moore's Law has speeded up the user base just enough to outrun the servers.
I'm with Crunch3r on this one: I suspect that a BOINC back-end upgrade went out the door without being fully tested. Having seen the state of readiness of the 5.8.x client range when they were declared fit for public use, it wouldn't surprise me if the same thing had happened in the server code.
If that analysis is correct, I hope the BOINC team (by which I mean the integraters and releasers of the volunteer code) learn the lesson: More Haste, Less Speed.
If I understand Bruce's post correctly, the late-stage problems in the previous run were directly related to one of this run's problem. When the supply of "short" (as we call them) work units ran lower than the demand (from slow systems), the behavior of the BOINC code very badly bogged down the system with massive slow searches.
Yep, I understood it the same way. Maybe that's why the old P3 here didn't get any more work. No idea, but it sounds kinda logical. Don't worry, it hasn't been idle, my Dad developed a liking for Quantum Monte Carlo so the box was busy there. The post just made me wonder.
Well, never mind. What is more important is that everything now seems to be up and running again and we even got some very good, detailed information about the problems and how they were fixed. Thanks to all the project staff and especially to Dr. Bruce Allen for his latest post (hope you read it over here) and... well, a belated "Welcome to Germany".
Thanks to all the project staff and especially to Dr. Bruce Allen for his latest post (hope you read it over here) and... well, a belated "Welcome to Germany".
RE: P.S. The peak is
)
Well .. i still hope that we get an answer to what has caused the issue in the first place.
As for now i'm sure that the issues were caused by switching from boinc 506 to 507 and introducing a "bug" that hasn't been there before (that's the server version of course).
So please tell use what caused the issues in the first place...
Thanks.
Matt L post Not a guess,
)
Matt L post
Not a guess, but an actual post by a BOINC developer explaining that there was database issues. If you read the whole thread, it does explain Einstein is part of this issue.
RE: Here’s an impressive
)
I expect it will be a one-day wonder. From what I could see, they held off the stats export until the validator had finished catching up - and very sensible too.
RE: Matt L post Not a
)
Matt's post explained what happened, but not why, nor how it came to escape into the wild without checking.
We've just finished 95% of a 16 million WU run with no problems at all: and I think it's generally accepted that the problems with the final 5% were more attributable to data traffic load and fileserver problems, rather than database issues.
So I don't see why the database has such problems with the new 7 million WU run. I'm afraid I just don't buy the 'tipping point' theory, that Moore's Law has speeded up the user base just enough to outrun the servers.
I'm with Crunch3r on this one: I suspect that a BOINC back-end upgrade went out the door without being fully tested. Having seen the state of readiness of the 5.8.x client range when they were declared fit for public use, it wouldn't surprise me if the same thing had happened in the server code.
If that analysis is correct, I hope the BOINC team (by which I mean the integraters and releasers of the volunteer code) learn the lesson: More Haste, Less Speed.
Bruce Allen's post on the
)
Bruce Allen's post on the subject
RE: Bruce Allen's post on
)
If I understand Bruce's post correctly, the late-stage problems in the previous run were directly related to one of this run's problem. When the supply of "short" (as we call them) work units ran lower than the demand (from slow systems), the behavior of the BOINC code very badly bogged down the system with massive slow searches.
Yep, I understood it the same
)
Yep, I understood it the same way. Maybe that's why the old P3 here didn't get any more work. No idea, but it sounds kinda logical. Don't worry, it hasn't been idle, my Dad developed a liking for Quantum Monte Carlo so the box was busy there. The post just made me wonder.
Well, never mind. What is more important is that everything now seems to be up and running again and we even got some very good, detailed information about the problems and how they were fixed. Thanks to all the project staff and especially to Dr. Bruce Allen for his latest post (hope you read it over here) and... well, a belated "Welcome to Germany".
RE: Thanks to all the
)
You're welcome -- and thank you!
Director, Einstein@Home