Since a disk crash about 10 days ago, the work buffer on that repaired machine only will fill partially. The total estimated time for work in the buffer is well below (about 30% of) the time that should be available.
To repair from the crash I installed new hard drive, reinstalled Windows, and reinstalled BOINC 6.4.7 from distribution media. I used the same HOST NAME and BOINC/E@H recognized the machine as the same one from before the crash. (That seemed strange to me ... usually I have to MERGE machines to incorporate the history from prior to a crash. I don't think that I initiated a MERGE, but since I did the rebuild in the middle of the night, I might not remember correctly.)
The machine (hostid=1945350) task list shows "host detached" for the tasks that were in the buffer at the time of the crash. It is as though the "host detached" tasks were being counted as using up part of the buffer capacity.
Those "host detached" tasks do not appear in my buffer now (unless BOINC has recovered the info and hidden them somewhere ... they do not show in the BOINC Manager task list.)
Is this "normal?" Has BOINC or E@H put the machine "on probation" until it shows that it can stay up for more than a couple weeks? Will the situation "self-correct" or should I be doing something such as "Reset Project"?
Stan
Copyright © 2024 Einstein@Home. All rights reserved.
Work Buffer Not Filled
)
I guess bOINC also needs some time to recallibrate its estimation of your PCs throughput in terms of results returned per day. I'd wait a bit and see how this evolves.
CU
Bikeman
RE: I guess bOINC also
)
Thank you. Absent strong contraindications, that will be my path.
The aspect that puzzles me is that time estimates shown in Boinc Manager for each WU are comparable to those on an almost identical Q9550 that is running a full buffer, albeit an earlier release of BOINC Manager (5.10.13). Also, I ompared the various "factors" stored in the machines' profiles and they seem comparable. So, BOINC must have a variable hidden away somewhere that tells it to drag its feet to refill this machine's buffer for a while ... cuz it might still be an unreliable machine. :)
Stan
RE: ...So, BOINC must have
)
What abaout the (near the bottom of Computer summary)? Are the values near 1 (resp. 100%)?
Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)
RE: cuz it might still be
)
No, your machine is designated as reliable. You're just getting those Arecibo tasks which take quite a bit longer than the Hierarchical ones.
As per the last scheduler log:
RE: RE: cuz it might
)
I'm not following the "Arecibo task" logic ...
Here are buffers from two nearly identical machines:
Problem machine ID 1945350: Time estimates for the 11 Arecibo WU's range 5:45 to 6:06 and for the 18 usual jobs 3:44 to 4:54 in its buffer. (29 total WU's, aprox 37 wall clock hours)
On an almost identical Q9550, ID=1719080, time estimates for its buffer of 29 Arecibo WU's range 5:55-6:10 and for 79 usual WU's range 4:10 to 5:38. (108 total WU's, approx. 145 wall clock hours.)
37 hours is approx 25% of the other machine's buffer.
Stan
RE: I'm not following the
)
And those two machines are in the same venue and have comparable Duration Correction Factors?
Computer sind nicht alles im Leben. (Kleiner Scherz)
RE: RE: I'm not following
)
The DCFs are good enough, judging by the time estimates quoted.
The question is, are
as bad as this machine (someone switched it off last week....)
Stan, you (and only you) can see those figures as
on the 'Computer summary' pages on this website. If one of the figures is low on one machine but not the other, it would have the effect you're describing.
RE: The DCFs are good
)
From Jord's post, we know the values of host 1945350:
So, I wanted to know if the other host has comparable values and if both are in the same venue.
Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)
RE: So, I wanted to know if
)
You can look those values up in the scheduler log.
Searching for 1719080 in 2009-06-14_13:14.txt: active_frac 0.999915 on_frac 0.999372 DCF 1.131630
You just can't see the venue. Checking the new value for 1945350 its active_frac 0.999926 on_frac 0.999056 DCF 1.073404 (and it made contact 3 minutes earlier).
Perhaps that client versions also matter. 1945350 is using 6.4.7, 1719080 is using 5.10.13
But still, running both the new and longer Arecibo (Ar) search and the old and shorter Hierarchical (Hi) searches on the same DCF will make that DCF bounce up and down. The Estimated time to completion numbers aren't reliable as finishing an old Hi task will change the estimated time to completion of the longer Ar tasks.
RE: RE: The DCFs are good
)
From bad machine's state file:
0.999057
1.000000
0.999926
That is, the values are approximmately 100%.
The good machine's valuse are comparable;
0.999375
-1.000000
0.999915
0.993433
Both machines run 24X7 and have no outages during the past week. Bad machine's last outage was disk crash/rebuild about a week or so ago.
Both machines are on my home LAN, just a couple of ethernet switches away from each other (all of the 1000baseT machines are grouped on switches separate from the 100baseT machines.
Stan