BRP4 & FGRP1 download (server) problems

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6588
Credit: 317610332
RAC: 379689

RE: This sort of thing

Quote:
This sort of thing should be posted to the front page news section. That way also gets the word out via RSS.


ROFL! Oh, yeah. Right. So the people who cannot reach us can tell us that? :-)

Cheers, Mike.

( edit ) For the rest of us : those who hold the validators for editing the web content are currently incommunicado .... but the next time my car runs out of petrol I'll be sure to drive it to the next town to fill up.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6588
Credit: 317610332
RAC: 379689

Now, as to the original issue

Now, as to the original issue : it seems E@H may be a victim of it's own success. Again alas. Posters may recall analogous problems in the past when there's been a change in workflow patterns due to new work unit types etc. Thresholds get reached, bandwidths peak ..... that sort of thing. AFAIK a key problem is maintaining logical coherence of activities across separated hardware. Naturally in a perfect world with infinite funds, plenty of staff and an accurate crystal ball these scenarios would be escaped or never entered. :-)

In any case please bear with us. Most likely temporizing measures will be put in place and then followed by more lasting ones. Right now there's alot of back end discussion on a wide range of alternatives. Your patience is very much appreciated, but I guess now might be the time ( & I can't think of a better sort of occasion ) to switch to a backup BOINC project of your choice meantime if that suits your mindset.

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250539239
RAC: 34258

As for the network outage in

As for the network outage in Hannover: A couple of network switches suddenly blew fuses, the reason being investigated. Probably power malfunction.

Anyway, switches are back to normal operation, the server issue still being worked on.

BM

BM

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 984
Credit: 25171438
RAC: 34

Ok, we should be back on

Ok, we should be back on track now. We identified the cause and fixed it. Data are flowing again. We'll monitor the situation and ramp up BRP/FGRP work unit distribution over the next hours/days...

Thanks for your patience!

Oliver

Einstein@Home Project

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2958409574
RAC: 712991

RE: Ok, we should be back

Quote:

Ok, we should be back on track now. We identified the cause and fixed it. Data is flowing again. We'll monitor the situation and ramp up BRP/FGRP work unit distribution over the next hours/days...

Thanks for your patience!

Oliver


Would you mind telling us what it turned out to be, in case the experience might be useful for other BOINC projects?

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 984
Credit: 25171438
RAC: 34

RE: Would you mind telling

Quote:

Would you mind telling us what it turned out to be, in case the experience might be useful for other BOINC projects?

Sure! A few months ago we noticed that Apache wasn't able to handle the BRP/FGRP download requests anymore and switched to lighttpd which turned to be more suitable for our specific setup, data type and access pattern. The load increased even further and we seem to have crossed a crucial threshold last week such that lighttpd also wasn't up to the task anymore. Various filesystem/network/daemon tests have revealed that the web server was in fact the bottleneck and we now moved to nginx, the very efficient web server that powers Facebook, WordPress, SourceForge and GitHub for instance (third, almost second, most popular web server).

Best,
Oliver

Einstein@Home Project

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2958409574
RAC: 712991

Many thanks. Although BRP4 is

Many thanks. Although BRP4 is probably the highest-download-traffic sub-project I know, there are others with high flows - that could well be useful advice/experience for other admins.

zombie67 [MM]
Joined: 10 Oct 06
Posts: 121
Credit: 495236375
RAC: 1363760

RE: RE: This sort of

Quote:
Quote:
This sort of thing should be posted to the front page news section. That way also gets the word out via RSS.

ROFL! Oh, yeah. Right. So the people who cannot reach us can tell us that? :-).


I don't understand your point. Everyone could get to the web site (and RSS) just fine. It was only the upload/download of tasks that wasn't working. It would have been good to announce the issue, so that crunchers would know to redirect their machines to other projects for the duration. And it helps head off all the posts from people asking "what's up?".

Reno, NV Team: SETI.USA

telegd
telegd
Joined: 17 Apr 07
Posts: 91
Credit: 10212522
RAC: 0

Is it just me or have we run

Is it just me or have we run out of BRP4 work today?

I just checked the server status page, which has "Tasks to send" at 0.

Not sure if that was planned...

Svenie25
Svenie25
Joined: 21 Mar 05
Posts: 139
Credit: 2436862
RAC: 0

RE: Is it just me or have

Quote:

Is it just me or have we run out of BRP4 work today?

I just checked the server status page, which has "Tasks to send" at 0.

Not sure if that was planned...

I don´t think, it was planned, but wonder why nobody ask about this until now. ;)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.