My current Einstein work unit, which I got this evening when I finally got boinc working on the second server, shows 108:59:35 minutes to complete, and a deadline of 8/26 8:54pm. Einstein is one of two secondary projects, configured to get 25% of the resources. Just arithmetically, this does *NOT* look good; in fact I'm surprised the scheduler allowed it.
It's possible this is a case of the early part of the work unit moving very slowly, causing a gross overestimate of time to completing, of course. Otherwise, it seems to me that the deadlines are unreasonably tight.
This is certainly a slow machine -- dual processor 200MHz Pentium Pro. And old server, still being used as a server for my home web and misc. hosting.
Regardless of deadline issues, it seems like workunits over 100 hours are just too damned big. Sometimes the problem doesn't chunk smaller well, I know, but it's still a problem. The global climate predication people are *much* worse; I'm not even *trying* to run them on anything except the new, fast, desktop (which is an illustration of one of the problems; fewer people will run big workunits at all, let alone to completion).
Copyright © 2024 Einstein@Home. All rights reserved.
Short deadlines
)
I assume you mean 109 HOURS?
Then the estimate sounds about right...
We're starting to issue new workunits (your first server got some of them, in fact) which have a 2-week deadline. This should help.
Director, Einstein@Home
RE: My current Einstein
)
You were just unlucky in that whilst your older cpuID got a set of the new 2 week work, your newer cpuID got a set of the older, one week work.
Bruce has announced that the deadline will be 14 days for new work but there is still obviously some older work to finish off and you were just unlucky enough to be hit with it. Depending on what version of Boinc you are running, there still may be no real problem as the scheduler should go into EDF mode and process your EAH work first and then refuse to get more EAH work until the other project has its fair share again. As long as you allow Boinc to do it, it should take care of this for you. I notice you have three work units. Two of those should be OK but the third will probably be too stale before it gets a chance to run. You may have to abort it. Make sure your connect to network interval is low (eg the default of 0.1 days) to prevent too early downloading of new work.
The estimate before crunching starts is usually much too short. Once crunching starts and things settle down, a fairly accurate number is usually reported. There are 168 hours in a week so work taking 110 hours is possible by Boinc going into EDF mode and temporarily ignoring your resource share. The other projects will get their share later on and it will even out in the long run. For machines this slow, maybe you should consider just one backup project instead of two. That would make things easier for Boinc.
The projects have a scientific problem they want our help with. They are constrained by the science requirements. Do you think that if the calcs could easily be broken down into much smaller bites then they wouldn't do so? The projects are going to do what is most efficient for the science. We just have to be a bit sensible in choosing an appropriate project according to the capabilities of our cpu. I think you would find that P-200s are probably rather thin on the ground as serious workhorses these days. The projects will cater for the type of cpu that most people are likely to have and that would be a little faster than a P-200. But hey, I'm not criticising. My slowest successfully contributing box is a P-100 :). It takes 4 days to do a Seti WU but it works!! I wouldn't dream of trying to run EAH on it.
What three projects are you trying to support and which one is your "main" one?
Cheers,
Gary.
I received two E@H WUs today,
)
I received two E@H WUs today, and after consuming 12 and 8 hours of CPU, the estimate to complete is still a total of 340 hours. Since the deadline for each is 9/11 (about 300 hours from now), I don't think the deadlines are realistic for the size of the WUs being sent out. My reconnect time is only 0.2, trying not to receive too many WUs and miss the deadlines.
My system is a HT 3.0GHz P4 w/1GB RAM which runs 24/7 (except when the power goes out), so it is not an old, slow processor.
The original idea of those
)
The original idea of those public DC projects was not to make people buy new machines and have them run 24/7, the original idea was to make people use the *spare* CPU time while they have their boxes running anyway.
Maybe the DC freaks spoiled this idea quite much *blush* so now the project people expect all helpers to use their computers for DC 24/7 and if there are some CPU cycles left might allow different programs.
RE: I received two E@H WUs
)
===================
Check task manager and see if something else is using cpu power.
RE: RE: I received two
)
Yes, I have other tasks running - that is why I have a computer. The idea of BOINC and projects is to use the otherwise UNUSED cycles, and the 12+8 hours are the UNUSED cycles after my normal work. Luckily, the estimates to complete (ETC) are now down to 9+27=36 more hours. I guess the ETC is just wildly incorrect for the first 10-15 CPU hours of crunch time.
actually the time boinc
)
actually the time boinc counts is only the time the process actually gets CPU cycles..
besides that your 3ghz CPU should finish 2 wu in about 12h when hyperthreading. maybe your CPU is getting too hot and therefor its being throttled? afaik p4s get auto-thermal throttled by 80°C core temp.
cant think of anything else atm..
RE: RE: RE: Check task
)
RE: Yes, I have other
)
The ETC is not usually wrong by very much so I think there is something unusual about how your machine is running. I have a P4 2.6G HT machine that does 2 EAH work units every 13.5 hours approximately. Before crunching starts, the estimate is 10 hours but this quickly extends to 13.5 hours and then stays there once crunching starts. In your case, the poor ETC performance is probably associated with the "No heartbeat" errors listed below.
Here is the stderr output from one of my results:
Here is part of the stderr output from one of your results (ResultID=8035023):
The "No heartbeat" error message is documented in the Wiki without any positive conclusion being reached as to what exactly causes one of the components to crash or how serious it really is.
The other thing of note is how often the computation keeps getting resumed. Have you enabled the preference for only doing work when the computer is idle? I've never used it on my boxes so that I can only imagine that it would create a string of these messages if it were enabled.
If so, why don't you try letting BOINC and EAH run all the time as they really wont interfere with your normal computing activities. BOINC/EAH is very quick in getting out of the way when you have other serious tasks to perform. You wont really be able to detect that it is running unless you happen to catch it when it is performing one of the auto benchmarks or downloading a new data file. Both of these are quite infrequent occurrances.
Cheers,
Gary.