Status page explanation

Jord

Joined: 26 Jan 05

Posts: 2952

Credit: 5893653

RAC: 1

RE: First, 7:50 UTC 6 jan

7 Jan 2011 8:22:30 UTC

Message 101653 in response to message 101652

(moderation:

)

Quote:

First, 7:50 UTC 6 jan 2011
[pre]
Work S5GC1HF BRP3 S5GC1 ABP2 in DB
Tasks total 1,514,404 133,106 13,851 2,001 1,663,362
Tasks to send 12,690 16,045 1,170 3 29,908
Tasks in progress 301,755 36,749 1,322 63 339,889
Tasks valid 833,579 47,084 3,319 895 884,877
Tasks invalid 700 2,126 43 15 2,884
Tasks too late 6,464 28 224 10 6,726
Workunits total 664,650 58,757 2,331 214 725,952
Workunits without canonical result 249,987 35,210 764 24 285,985

Second one, 7:55 UTC 7 jan 2011

Work S5GC1HF BRP3 S5GC1 ABP2 in DB
Tasks total 1,499,320 156,961 12,984 1,897 1,671,162
Tasks to send 11,550 29,557 622 1 41,730
Tasks in progress 300,066 39,543 1,551 52 341,212
Tasks valid 827,448 49,713 3,282 862 881,305
Tasks invalid 679 2,235 39 13 2,966
Tasks too late 6,437 21 211 10 6,679
Workunits total 658,423 69,742 2,133 191 730,489
Workunits without canonical result 246,853 44,888 592 20 292,353
[/pre]
(sorry for not a table representation)

Use the [pre ][/pre ] tags in BBCode (hold the space).

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2990489699

RAC: 700790

RE: Looking at the numbers

7 Jan 2011 9:37:26 UTC

Message 101654 in response to message 101652

(moderation:

)

Quote:

Looking at the numbers we see, that ABP2 and S5GC1 caches are draining, while BRP3 and S5GC1 are still growing up. So, when the tasks to send will grow up to about 30 000 for BRP3, we'll see that the WUG will be paused for sometime. Am I right?

ABP2 and S5GC1 are draining because those research runs are essentially complete, and we're merely chasing down the stragglers.

BRP3 and S5GC1HF are still growing up because they are current, active research runs, with 'Work still remaining', as it says further down the page.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4332

Credit: 252133010

RAC: 33798

ABP2 and S5GC1 are being

7 Jan 2011 10:23:44 UTC

Message 101655

(moderation:

)

ABP2 and S5GC1 are being phased out. In particular there are no workunit generators (WUGs) running that would generate new workunits for these applications. New tasks for these applications will only be generated to for the "workunits without canonical result" e.g. because tasks "in progress" get reported as client errors (or because we manually raise the number of task to be sent out for these workunits in order to speed up finishing these "runs").

Eventually the "Workunits without canonical result" of these applications will reach zero. A week later these will be purged from the database and total tasks and workunits for these applications will show zero, too. Then we usually "deprecate" this application, which means that it doesn't show up on the server status page at all anymore. During the past few weeks this could be observed with "S5GCE".

The variation we see in the numbers of S5GC1HF is actually pretty small, it has reached a more or less steady state by now. When BRP3 goes from experimental to production level and the BRP3 throughput is increased, the numbers for S5GC1HF will decrease noticeably.

For less than 15 Minutes the "Tasks to send" of BRP3 were slightly above 30.000, and the BRP WUGs stopped. However, they will be started every five minutes again and see whether the number of unsent tasks has dropped below 20.000, and immediately terminate if not. The two WUGs that are shown running are actually an artifact of the delay caused by the database access of the WUGs and the script that checks the status - technically the processes are running, but these won't generate additional workunits.

hoarfrost

Joined: 9 Feb 05

Posts: 207

Credit: 106277746

RAC: 89746

RE: RE: RE: RE: (

7 Jan 2011 12:40:50 UTC

Message 101656 in response to message 101651

(moderation:

)

Quote:

Quote:
Quote:
Quote:
( edit ) Because it's such a large and complex database, plus traditionally a bit persnickety, then probably it's not trivial/wise to query it for these tally purposes any closer to the true real time figures. Each query has to reduce throughput performance .....

There is a great thing created especially for such situations. It is called REGISTER. This wonderful thing is thought up to collect those numbers that we can see on the status page. First time I've seen these in "1C:Accounting", an integrated programmable complex for trade accounting http://v8.1c.ru/eng/the-system-of-programs/ (you know, we have very difficult tax laws and tax accounting therefore here in Russia). So, any event (query, insertion or deletion of database records) leads to change of certain register. And any register anytime keeps current status of what is stored in it. It gives us current information as soon and as often as we want it to receive.
And it would be wonderful to see changing in real time numbers like on a counter. But I think, this proposition should be addressed to BOINC developers.

I think that more powerful mechanism already exists. It's an READ ONLY TRANSACTION in Oracle Database words.
:)

Read only transaction gets additional load to the db server. Instead of this, it will be better to use integrated procedures (may be I'm talking about MS-Sql? I'm not a guru in SQL yet, sorry), connected to database events. In any such event, connected integrated procedure will update connected registers. And there is no need in any transaction at all, only single addition or subtraction operation with used register. That is why I insist on using registers.

In Oracle, Read only transaction does not increase a server load, if it duration smaller than undo_retention parameter, that by default is equivalent of 90 minutes. But, of course, for each project database we should have a different situation.

Quote:

Quote:

I think that more powerful mechanism already exists. It's an READ ONLY TRANSACTION in Oracle Database words.
:)

Does MySQL support an equivalent? Boinc server was designed without a DAL, and is too tightly coupled to MySQL to swap it out for a different DB. Which is unfortunate since since some of the larger projects could use the increased scalability of competing products.

In MySQL exists SERIALIZATION mode for transaction, but MySQL is "blocking" database and this mode will press it performance into zero (if I understand right).

Stranger7777

Joined: 17 Mar 05

Posts: 436

Credit: 432680388

RAC: 60080

I think it would be wonderful

7 Jan 2011 19:15:36 UTC

Message 101657

(moderation:

)

I think it would be wonderful to add a row called "overal number of tasks" to the table "Workunits and tasks". Now we only see overal number for the S5GCHF. But it will be very interesting to see these numbers for any run we do.

BTW. Does anybody there have information (it is better in table representation) about all the runs we have completed so far, e.g. total number or WUs in the run, total number of FLOPS in the run, total time (usual time, better in days) to complete the run, total number of cores/computers/processors/volunteers involved in that run, total machine time, average productivity of the run in common cobblestones per day or FLOPS?

P.S. Thank you, Jord, for the help with the table ;)
P.P.S. Sadly, but the "Total needed" days for S5GCHF grows up almost to 110 days. It is because of BRP3 coming out in production phase, that uses only 0.2% of a core. But it seems, many people reduced their "on multiprocessor systems use at most %" to open the road for BRP3. Usually "Total needed" starts decreasing when 40-50% already complete. :(
P.P.P.S. Yeah! The ABP2 "tasks to send" is zero now. But last 43 WUs are still in progress. I'll be glad to calc them as fast as possible if any of these will suddenly error out ;)

DanNeely

Joined: 4 Sep 05

Posts: 1364

Credit: 3567178053

RAC: 394766

RE: I think it would be

8 Jan 2011 22:03:03 UTC

Message 101658 in response to message 101657

(moderation:

)

Quote:

I think it would be wonderful to add a row called "overal number of tasks" to the table "Workunits and tasks". Now we only see overal number for the S5GCHF. But it will be very interesting to see these numbers for any run we do.

Total number of tasks isn't really meaningful for the BRP searches. The LIGO data is heavily culled so that we're only analyzing a small chunk of each science run. In contrast Arecibo is generating new data every day, and all of its take is being processed, so there will always be more data being fed into the BRP system. I don't know if the other telescopes whose data is being processed are doing daily collects suitable for a BRP search, or if it's just a case of existing, completed, data sets collected for something else that are suitable for analysis.

There was an ABP1/2 progress page that showed the rate of data processed vs data collected, but it hasn't been updated for BRP yet.

http://einstein-dl.aei.uni-hannover.de/EinsteinAtHome/ABP/

Stranger7777

Joined: 17 Mar 05

Posts: 436

Credit: 432680388

RAC: 60080

Yes, I understand that BRP

11 Jan 2011 9:27:35 UTC

Message 101659

(moderation:

)

Yes, I understand that BRP search is endless somekind. But all the other searches are indeed have certain volume of data split into WUs. So the number of these WUs is known from the beginning of each run. And that's why I asked for the row being added.

Stranger7777

Joined: 17 Mar 05

Posts: 436

Credit: 432680388

RAC: 60080

Oh! I see status page content

15 Jan 2011 10:28:08 UTC

Message 101660

(moderation:

)

Oh! I see status page content has been changed. Now we see BRP3 status instead of ABP2 and "Computing" section above it. It looks like BRP3 will last about a year, always eating time from S5GC1HF which is going to the second half of the computation volume today. All the previous searches "total needed" runtimes were going down while were in progress. But this new one (S5GC1HF) is going up almost from the beginning. Is it because of the BRP3 eating enough CPU time to make the search longer?

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4332

Credit: 252133010

RAC: 33798

Yes, when we started S5GC1HF

15 Jan 2011 12:52:16 UTC

Message 101661

(moderation:

)

Yes, when we started S5GC1HF there was no Radio-Pulsar search (ABP2 or BRP3) running (well, to be precise, the last tasks of ABP2 were shipped).

BRP3 was delayed by almost two months, and we're still experimenting with it. While we are ramping up the output, the computation spent on S5GC1HF decreases, and the estimated end of S5GC1HF pushes later.

Right now I'm trying to find the limits of the system, i.e. how much BRP3 work we can ship. I'm shifting the scheduling ratio between GW and RP work about twice a week towards BRP3. I know that we couldn't run the project with more that ~40% of ABP2 work. With the new setup and the new workunit generators of BRP3 some limitations have been removed, I'd like to see how far we could get in case we need to.

Then ultimately I'd like to settle for about 50/50 GW/RP search in normal operation.

Stranger7777

Joined: 17 Mar 05

Posts: 436

Credit: 432680388

RAC: 60080

So, BRP becomes equally

16 Jan 2011 6:47:10 UTC

Message 101662

(moderation:

)

So, BRP becomes equally important together with GW-search, does it?

Status page explanation

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner