Web replica down

Anonymous

RE: RE: RE: RE: I am

Quote:
Quote:
Quote:
Quote:

I am not sure its just a "stats" issue. I share my computer with SETI and have plenty of work from them. However my downloads from E&H have dropped off quite a bit in the last few days. As of right now I have 2 tasks in progress. This not the norm so something is not quite right.

I am seeing the following in the online log which I don't understand:

2013-03-06 16:20:12.3545 [PID=12285] Request: [USER#xxxxx] [HOST#6382800] [IP xxx.xxx.xxx.22] client 7.0.29
2013-03-06 16:20:12.3568 [PID=12285] [debug] [HOST#6382800] Resetting nresults_today
2013-03-06 16:20:12.3576 [PID=12285] [handle] [HOST#6382800] [RESULT#351302051] [WU#151592225] got result (DB: server_state=4 outcome=0 client_state=0 validate_state=0 delete_state=0)
2013-03-06 16:20:12.3576 [PID=12285] [handle] cpu time 0.000000 credit/sec 0.003894, claimed credit 0.000000
2013-03-06 16:20:12.3578 [PID=12285] [handle] [RESULT#351302051] [WU#151592225]: setting outcome SUCCESS
2013-03-06 16:20:12.4147 [PID=12285] [send] effective_ncpus 8 max_jobs_on_host_cpu 999999 max_jobs_on_host 999999
2013-03-06 16:20:12.4147 [PID=12285] [send] effective_ngpus 1 max_jobs_on_host_gpu 999999
2013-03-06 16:20:12.4147 [PID=12285] [send] Not using matchmaker scheduling; Not using EDF sim
2013-03-06 16:20:12.4147 [PID=12285] [send] CPU: req 0.00 sec, 0.00 instances; est delay 0.00
2013-03-06 16:20:12.4147 [PID=12285] [send] CUDA: req 0.00 sec, 0.00 instances; est delay 0.00
2013-03-06 16:20:12.4147 [PID=12285] [send] work_req_seconds: 0.00 secs
2013-03-06 16:20:12.4147 [PID=12285] [send] available disk 87.30 GB, work_buf_min 0
2013-03-06 16:20:12.4147 [PID=12285] [send] active_frac 0.999992 on_frac 0.999911 DCF 0.897591
2013-03-06 16:20:12.4169 [PID=12285] Sending reply to [HOST#6382800]: 0 results, delay req 60.00
2013-03-06 16:20:12.4172 [PID=12285] Scheduler ran 0.069 seconds

What is the "matchmaker" comment about?


That's an interesting question, which I'm sure one of our esteemed (and technically adept) moderators will answer in due course.

But it has nothing to do with your downloads dropping off. You're not getting any new work because you're not asking for any new work. Look closer to home.

I am running on Linux X64. If I update E&H using the "update" button in Boinc Manager I download new WUs. It almost seems as though the "automatic" update is not taking place when jobs are complete. I have looked at the various parameters on this site and do not see any that could effect automatic download of WUs.

What am I missing?

The settings for the cache of work works like this for Boinc version 7:
"Computer is connected to the Internet about every: xx days" is a low water mark.
"Maintain enough work for an additional xx days" forms a high water mark.
Boinc will request enough work for low + high and then wait until it drops below the low water mark again before asking for more.
So if you set it to something like 1 + 0.1 Boinc will always keep about one days worth of work.

If you run more than one project you have to consider resource share, you have probably run more Einstein work in recent time than Seti work and now Seti is allowed to catch up.

Interesting. I am supporting S@H in addition to E&H. S&H had been down 3 days (Fri, Sat, and most of Sun) for electrical upgrades. They came back up late Sunday, ran Monday and were down again on Tues for their weekly admin functions. This gave E&H free reign over computer resources for several days. If a project is down because of admin/hardware upgrades does it make sense to give them back their time? I thought that timesharing was based upon "online/available" status and that downtime was not factored in. Is my understanding correct?

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2956889726
RAC: 719998

RE: RE: RE: 2013-03-06

Quote:
Quote:
Quote:
2013-03-06 16:20:12.4147 [PID=12285] [send] CPU: req 0.00 sec, 0.00 instances; est delay 0.00
2013-03-06 16:20:12.4147 [PID=12285] [send] CUDA: req 0.00 sec, 0.00 instances; est delay 0.00

The settings for the cache of work works like this for Boinc version 7:
"Computer is connected to the Internet about every: xx days" is a low water mark.
"Maintain enough work for an additional xx days" forms a high water mark.
Boinc will request enough work for low + high and then wait until it drops below the low water mark again before asking for more.
So if you set it to something like 1 + 0.1 Boinc will always keep about one days worth of work.

If you run more than one project you have to consider resource share, you have probably run more Einstein work in recent time than Seti work and now Seti is allowed to catch up.


Interesting. I am supporting S@H in addition to E&H. S&H had been down 3 days (Fri, Sat, and most of Sun) for electrical upgrades. They came back up late Sunday, ran Monday and were down again on Tues for their weekly admin functions. This gave E&H free reign over computer resources for several days. If a project is down because of admin/hardware upgrades does it make sense to give them back their time? I thought that timesharing was based upon "online/available" status and that downtime was not factored in. Is my understanding correct?


Not really. What is counted is the work actually done, without any consideration of why work may not have been available from a particular project at a particular time. Balancing the resource shares between projects is something which is done by your computer: the various project servers don't share information about downtime between themselves - each project is entirely autonomous.

In this case, your computer requested zero seconds of work for both CPU and GPU - in other words, it asked for no work at all. The reasons for that will be entirely contained in your own computer - it may be possible to deduce it from the level of local logging that you were maintaining at the time, or you may have to enable extra debug logging and wait for it to happen again.

TPCBF
TPCBF
Joined: 24 Nov 12
Posts: 17
Credit: 216330463
RAC: 1552860

RE: - stats have been

Quote:
- stats have been dumped from the master DB now, it depends on the stats sites when they'll pick it up.

Thanks, great, BOINCStats already did for me overnight (here in LA)...

Quote:
- work at UWM progresses faster than feared, we may have the replica working (or at least been worked on) later today.

Good to hear things are getting back to normal...

Ralf

Darth Beaver
Darth Beaver
Joined: 28 Jul 08
Posts: 49
Credit: 14208989
RAC: 0

Hi mate Scootty checked stats

Hi mate Scootty checked stats today and yes e@H has updated stats but there seems to be a problem i've lost my combined stats and seti stats not showing up lost 250,000 points on seti ?? E@H score is yesterdays ,hopefully you guys can fix it by weekend ?

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 92

RE: - BOINC offers three

Quote:
- BOINC offers three "schedulers", referred to as "old"/"array", "locality" and "matchmaker". On Einstein@Home we're using the array scheduler to send work for BRP(4) and FGRP(2) and the locality scheduler for GW (S6BucketLVE) work, the matchmaker isn't used. The log entry about it can safely been ignored.


Isn't the matchmaker scheduling used in combination with homogeneous redundancy, where the hosts will be compared to each other, to see if they match the required 'level-of-hardware', before work is sent out to them?

Nobody316
Nobody316
Joined: 14 Jan 13
Posts: 141
Credit: 2008126
RAC: 0

yay our Einstein@Home server

yay our Einstein@Home server status page is back up fully.

PC setup MSI-970A-G46 AMD FX-8350 8 core OC'd 4.45GHz 16GB ram PC3-10700 Geforce GTX 650Ti Windows 7 x64 Einstein@Home

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250470891
RAC: 35329

We are currently using the

We are currently using the replica hosted @AEI for the Einstein@Home web pages. This means

- the information displayed during ~3-6 AM CET will not be updated (while the DB is dumped to disk)

- all information that is gotten from the replica is transferred (twice) across the Atlantic. This may increase latency.

- to limit I/O, the Pendings page has been modified to show only the sum of claimed credit. For individual tasks it refers to the Tasks page, filtering Pending tasks.

- to make a bit up for that the Tasks page now shows the number of tasks that match the current selection (State / Application) next to "Previous 20 / Next 20" navigation. This is not as convenient as the newer web code we are using over at Albert@Home, but should be better than nothing.

BM

BM

ggesmundo
ggesmundo
Joined: 3 Jun 12
Posts: 31
Credit: 18699116
RAC: 0

Thanks for the update. I

Thanks for the update. I especially like the counts, saves a lot of IO paging thru them to get a count.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250470891
RAC: 35329

RE: Seems every time

Quote:
Seems every time something with the hardware on pretty much any of the DC projects goes tits up, it does so big time... :-(

Well, the replica failure is not such a big deal. The main problem was that it coincided with some much larger problem independent of Einstein@Home that bound the manpower that would otherwise be available to fix the E@H problem fast.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250470891
RAC: 35329

RE: Isn't the matchmaker

Quote:
Isn't the matchmaker scheduling used in combination with homogeneous redundancy, where the hosts will be compared to each other, to see if they match the required 'level-of-hardware', before work is sent out to them?

No, HR (homogeneous redundancy) even works with the classical "array" scheduler.

The "matchmaker" scheduler generates a "score" for each task in the array for the host in question, in order to pick the best matching task from the array (the scoring function must be supplied by the project). In contrast the "array" scheduler treats all task equally and just picks the first ones satisfying a couple of constraints (like disk space, computing time and possibly platform if HR is used).

BM

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.