Didn't found a full explanation of what is on the status page. So far, I decided to create a thread about it, so we can not only explain everything, but also discuss all the features that would be found or would get lost.
For now, there a lot of questions.
1) Why "Tasks total" isn't equal to a sum of subsequent numbers?
2) Why "Tasks total" for some projects is increasing constantly while for others it is not?
3) Why "Tasks too late" do not become "Tasks to send" immediately?
And so on. Ask your questions here.
Copyright © 2024 Einstein@Home. All rights reserved.
Status page explanation
)
For several reasons, I imagine ...
I'm guessing that the numbers come from a series of queries that run consecutively so that things may have changed a little between each query. This would probably result in small differences due to the dynamic nature of the database. The big factor, I think, is that certain categories of tasks aren't displayed. I don't think there is any accounting shown for tasks which error out in any particular way - such as download errors, compute errors, or any other form of error. I used to think that these were included in the 'invalid tasks' figure but I now think that 'invalid tasks' are just successful tasks that fail validation.
Probably because some projects are ramping up and others are ramping down, I would guess. I haven't been following the precise numbers but I imagine that the S5GC1 and ABP2 numbers would be gradually declining. The HF project may have been increasing over the last couple of days because of the shortage of BRP3 tasks. Now that those are flowing again, the BRP3 figure should ramp up quite quickly and the HF figure may actually fall a little. On top of this, there would be an imposed trend following the number of hosts contributing to the project. Do you have any examples that seem to be counter to these trends?
Why do you think they don't? Is it just because for some projects (ABP2) there are zero tasks to send and some higher number of 'too late' tasks? If so, it is possibly because the replacement task will be sent quickly but the 'too late' task will remain in the database until the quorum is eventually deleted (just my guess for what it's worth).
Cheers,
Gary.
RE: RE: 1) Why "Tasks
)
"Tasks total" isn't equal to any combination of numbers below it. Therefore I guess this reflects something else. May be the number of tasks generated by WUG and stored in database ready to work. And this includes "Tasks to send", the number of WUs ready to be sent to the public work.
Yes, I did look at the page for some time and found that BRP3 "tasks total" is always constantly increasing. Numbers for other projects are staying still.
I thought this may be a register, that only fixes the number of failed tasks, and not a counter of failed tasks that still remain in database like "Tasks invalid".
Looking at the "S5GC1HF
)
Looking at the "S5GC1HF search progress" I found that "Total needed" number is not equal to "Tasks total" number. This leads me to a conclusion, that "Tasks total" is only the number of tasks, generated by WUG and temporarily stored in database ready to work.
Well, I'm not sure that
)
Well, I'm not sure that looking for some double entry accounting standard will help here! Probably what is marked as in the 'total' would be many that exist but are not yet due for release, with the 'to send' crew being available whenever someone comes by to pick them up. So perhaps there's such an as yet hidden/declared category meaning 'not yet to be placed as available'. My guess on that behaviour is simply that the distal pipeline ( beyond E@H ) wants to receive results of analysis more or less in ( ascending ) frequency order to keep it's analysis suitably paced, so that's the release strategy to help that. As your observations suggest there is quite an evident gap in the numbers.
The pulsar counts will frequently go up, at least partly because we are picking up Aricebo realtime now, whereas GW units are bundled in 'runs'.
As for 'too late' vs 'to send' the simplest I can think of is that flow counts are 'pulsed'. Certainly it has been an issue in the past as regards keeping the database coherent over multiple simultaneous accesses, and not overloaded in toto, so it could well be more efficient to do activities in batches - as long as it averages out in the end. Meaning that the sub-task that counts the number of 'to send' WU's is done sometime later than the sub-task that counts the number of 'too late' WU's. A given WU may well be 'instantly' shifted from the 'too late' to the 'to send' basket, it's the tallying that has the lag/non-simultaneity.
Cheers, Mike.
( edit ) Because it's such a large and complex database, plus traditionally a bit persnickety, then probably it's not trivial/wise to query it for these tally purposes any closer to the true real time figures. Each query has to reduce throughput performance .....
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
Workunits are generated by
)
Workunits are generated by the workunit generators and put in the database. Then the transitioner generates tasks from these that are sent to the clients, are processed, reported back and checked for validity. If no "canonical result" could be found for a workunit during validation, eventually more tasks for this workunit are generated, sent out etc. When a canonical result could be determined, the workunit and all tasks are kept in the database for another week for inspection. Then these are deleted from the active database. At least that's what we do on Einstein@home.
"Tasks total" and "Workunits total" show the total number of tasks and workunits currently in the active databse. The other table rows show tasks and workunits with certain properties, but not all states or situations are covered. Not shown are e.g. the tasks that were reported as a "client error".
Tasks "too late" are tasks that were reported after a canonical result was found for the workunit (precisely: after the result file of the canonical result had been deleted). No additional task is generated for these, as it isn't needed. A new task is generated when a task is overdue, i.e. was not reported until the deadline. The server status page also doesn't list overdue tasks.
We are still starting BRP3, had some problems with workunit generation, are still ramping up the output etc. Thus the number of tasks is still growing. The criteria for the workunit generator is the number of "tasks to send", it is programmed to keep this value between 20.000 and 30.000 units (if it can keep up). At some point these numbers will reach a more or less steady state, as it was the case for ABP2 for about half a year.
I'm not sure that the total number of tasks is really constantly growing on other projects, but I could imagine that these projects are small enough that they don't ever need to purge tasks from their database.
hth,
BM
Edit: sorry if my explanation missed a point, but for me project is a BOINC project (like Einstein@home), and S5GC1HF, BRP3, ABP2 etc. are applications of Einstein@home.
BM
RE: Edit: sorry if my
)
Indeed your explanation is quite clear. I missed the key point, that task and WU are not the same things.
BTW, when will BRP3 progress bar appear on the status page?
There is a great thing created especially for such situations. It is called REGISTER. This wonderful thing is thought up to collect those numbers that we can see on the status page. First time I've seen these in "1C:Accounting", an integrated programmable complex for trade accounting http://v8.1c.ru/eng/the-system-of-programs/ (you know, we have very difficult tax laws and tax accounting therefore here in Russia). So, any event (query, insertion or deletion of database records) leads to change of certain register. And any register anytime keeps current status of what is stored in it. It gives us current information as soon and as often as we want it to receive.
And it would be wonderful to see changing in real time numbers like on a counter. But I think, this proposition should be addressed to BOINC developers.
RE: RE: ( edit ) Because
)
I think that more powerful mechanism already exists. It's an READ ONLY TRANSACTION in Oracle Database words.
:)
RE: RE: RE: ( edit )
)
Read only transaction gets additional load to the db server. Instead of this, it will be better to use integrated procedures (may be I'm talking about MS-Sql? I'm not a guru in SQL yet, sorry), connected to database events. In any such event, connected integrated procedure will update connected registers. And there is no need in any transaction at all, only single addition or subtraction operation with used register. That is why I insist on using registers.
RE: I think that more
)
Does MySQL support an equivalent? Boinc server was designed without a DAL, and is too tightly coupled to MySQL to swap it out for a different DB. Which is unfortunate since since some of the larger projects could use the increased scalability of competing products.
I've made (thanks to Opera
)
I've made (thanks to Opera cache) two screen shots of a status page, yesterday and today.
First, 7:50 UTC 6 jan 2011
Work S5GC1HF BRP3 S5GC1 ABP2 in DB
Tasks total 1,514,404 133,106 13,851 2,001 1,663,362
Tasks to send 12,690 16,045 1,170 3 29,908
Tasks in progress 301,755 36,749 1,322 63 339,889
Tasks valid 833,579 47,084 3,319 895 884,877
Tasks invalid 700 2,126 43 15 2,884
Tasks too late 6,464 28 224 10 6,726
Workunits total 664,650 58,757 2,331 214 725,952
Workunits without canonical result 249,987 35,210 764 24 285,985
Second one, 7:55 UTC 7 jan 2011
Work S5GC1HF BRP3 S5GC1 ABP2 in DB
Tasks total 1,499,320 156,961 12,984 1,897 1,671,162
Tasks to send 11,550 29,557 622 1 41,730
Tasks in progress 300,066 39,543 1,551 52 341,212
Tasks valid 827,448 49,713 3,282 862 881,305
Tasks invalid 679 2,235 39 13 2,966
Tasks too late 6,437 21 211 10 6,679
Workunits total 658,423 69,742 2,133 191 730,489
Workunits without canonical result 246,853 44,888 592 20 292,353
(sorry for not a table representation)
Looking at the numbers we see, that ABP2 and S5GC1 caches are draining, while BRP3 and S5GC1 are still growing up. So, when the tasks to send will grow up to about 30 000 for BRP3, we'll see that the WUG will be paused for sometime. Am I right?
P.S. Does anybody have ABP2 or S5GC1 tasks in their caches?