Still a problem with the “enforce_delay_bound� option. Message from server: No work sent (won't finish in time)

Ziran
Ziran
Joined: 26 Nov 04
Posts: 194
Credit: 665118
RAC: 633
Topic 188071

That seams to me that there still is a problem with the “enforce_delay_bound� option on the server side. Now according to the message from the server i wouldn’t finish a WU in time. So based on the information from the server i did some calculations.

24*7*0.346=58.128
24*7*0.346*0.346=20.112288

Now my machine does a WU in 40H and i haven’t missed a deadline yet. So according to my calculation ether i would almost have a 50% margin or not a chance in hell to finish a WU in time.

Message from server: No work sent (won't finish in time) Computer on 34.6% of time, BOINC on 34.6% of that, Einstein gets 100.0% of that

The way i read this message, BOINC would only crunch for 20H/week. If the message is accurately representing the way the server calculates the time my computer will crunch, then it seams to me that the “ BOINC on 34.6% of that� might be the problem. Instead of the percentage BOINC gets of the processing time then the computer is on, the server uses the time the computer is on.

--- - 2005-02-26 14:07:16 - Starting BOINC client version 4.19 for windows_intelx86
Einstein@Home - 2005-02-26 14:07:19 - Project prefs: no separate prefs for home; using your defaults
Einstein@Home - 2005-02-26 14:07:22 - Host ID is 5349
--- - 2005-02-26 14:07:25 - General prefs: from Einstein@Home (last modified 2005-02-23 00:30:37)
--- - 2005-02-26 14:07:25 - General prefs: no separate prefs for home; using your defaults
Einstein@Home - 2005-02-26 14:07:33 - Resuming computation for result H1_0953.4__0953.5_0.1_T03_Test02_0 using einstein version 4.79
--- - 2005-02-26 14:37:17 - May run out of work in 0.02 days; requesting more
Einstein@Home - 2005-02-26 14:37:17 - Requesting 1197 seconds of work
Einstein@Home - 2005-02-26 14:37:17 - Sending request to scheduler: http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
Einstein@Home - 2005-02-26 14:37:18 - Computation for result H1_0953.4__0953.5_0.1_T03_Test02 finished
Einstein@Home - 2005-02-26 14:37:18 - Started upload of H1_0953.4__0953.5_0.1_T03_Test02_0_0
Einstein@Home - 2005-02-26 14:37:20 - Scheduler RPC to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
Einstein@Home - 2005-02-26 14:37:20 - Message from server: No work sent (won't finish in time) Computer on 34.6% of time, BOINC on 34.6% of that, Einstein gets 100.0% of that
Einstein@Home - 2005-02-26 14:37:20 - Project prefs: no separate prefs for home; using your defaults
Einstein@Home - 2005-02-26 14:37:20 - No work from project
Einstein@Home - 2005-02-26 14:37:20 - Deferring communication with project for 1 hours, 0 minutes, and 0 seconds
Einstein@Home - 2005-02-26 14:37:25 - Finished upload of H1_0953.4__0953.5_0.1_T03_Test02_0_0
Einstein@Home - 2005-02-26 14:37:25 - Throughput 19575 bytes/sec
--- - 2005-02-26 14:53:11 - Insufficient work; requesting more
--- - 2005-02-26 14:53:15 - Insufficient work; requesting more
Einstein@Home - 2005-02-26 14:53:15 - Requesting 1198 seconds of work
Einstein@Home - 2005-02-26 14:53:15 - Sending request to scheduler: http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
Einstein@Home - 2005-02-26 14:53:18 - Scheduler RPC to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
Einstein@Home - 2005-02-26 14:53:19 - Starting result H1_0953.4__0953.7_0.1_T06_Test02_3 using einstein version 4.79

Then you're really interested in a subject, there is no way to avoid it. You have to read the Manual.

Bruce Allen
Bruce Allen
Moderator
Joined: 15 Oct 04
Posts: 1119
Credit: 172127663
RAC: 0

Still a problem with the “enforce_delay_bound� option. Messa

Here is the view of the first scheduler interaction, from the point of view of the server:

2005-02-26 13:29:19 [normal ] OS version Microsoft Windows 98 , (04.10.1998.00)
2005-02-26 13:29:19 [normal ] Request [HOST#5349] Database [HOST#5349] Request [RPC#27] Database [RPC#26]
2005-02-26 13:29:19 [normal ] Processing request from [USER#2042] [HOST#5349] [IP 213.112.125.121] [RPC#27] core client version 4.19
2005-02-26 13:29:19 [debug ] [HOST#5349] Resetting nresults_today
2005-02-26 13:29:19 [normal ] [HOST#5349] got request for 1196.786825 seconds of work; available disk 0.908115 GB
2005-02-26 13:29:19 [debug ] [HOST#5349]: has file H1_0953.4
2005-02-26 13:29:19 [debug ] in_send_results_for_file(H1_0953.4, 0) prev_result.id=1264095
2005-02-26 13:29:19 [debug ] est cpu dur 100426.666667; running_frac 0.119929; rsf 1.000000; est 837382.729324

The estimate is that one WU would take 100426 seconds (say 28 hours) of CPU on your machine. But since the estimate is that the code would only run for 11% of the time, the work was estimated to take 837382 seconds, whereas the deadline is a week, which is less than this.

2005-02-26 13:29:19 [debug ] [WU#403228 H1_0953.4__0953.7_0.1_T06_Test02] needs 837382 seconds on [HOST#5349]; delay_bound is 604800 (request.estimated_delay is 0.156596)

Here's an odd thing: your machine is estimated to still have 0.15 seconds of remaining work for E@H. If it had NO remaining work for E@H, then it would have gotten more work.

2005-02-26 13:29:19 [normal ] [HOST#5349] Sent 0 results
2005-02-26 13:29:19 [debug ] [HOST#5349] MSG(high) No work sent
2005-02-26 13:29:19 [debug ] [HOST#5349] MSG(high) (won't finish in time) Computer on 34.6% of time, BOINC on 34.6% of that, Einstein gets 100.0% of that
2005-02-26 13:29:19 [normal ] sending delay request 3600.000000

Here's the later scheduler logic (when there was no remaining E@H work on your machine):
2005-02-26 13:45:18 [normal ] OS version Microsoft Windows 98 , (04.10.1998.00)
2005-02-26 13:45:18 [normal ] Request [HOST#5349] Database [HOST#5349] Request [RPC#28] Database [RPC#27]
2005-02-26 13:45:18 [normal ] Processing request from [USER#2042] [HOST#5349] [IP 213.112.125.121] [RPC#28] core client version 4.19
2005-02-26 13:45:18 [normal ] [HOST#5349] [RESULT#1264095 H1_0953.4__0953.5_0.1_T03_Test02_0] got result
2005-02-26 13:45:18 [debug ] cpu 142079.230000 cpcs 0.000804, cc 114.191635
2005-02-26 13:45:18 [debug ] [RESULT#1264095 H1_0953.4__0953.5_0.1_T03_Test02_0]: setting outcome SUCCESS
2005-02-26 13:45:18 [normal ] [HOST#5349] got request for 1197.675012 seconds of work; available disk 0.908115 GB
2005-02-26 13:45:18 [debug ] [HOST#5349]: has file H1_0953.4
2005-02-26 13:45:18 [debug ] in_send_results_for_file(H1_0953.4, 0) prev_result.id=1264095
2005-02-26 13:45:18 [debug ] Sorted list of URLs follows [host timezone: UTC+3600]
2005-02-26 13:45:18 [debug ] zone=+3600 url=http://einstein.aei.mpg.de
2005-02-26 13:45:18 [debug ] zone=-21600 url=http://einstein.phys.uwm.edu
2005-02-26 13:45:18 [debug ] [HOST#5349] Sending app_version einstein windows_intelx86 479
2005-02-26 13:45:18 [debug ] [HOST#5349] Already has file H1_0953.4
2005-02-26 13:45:18 [debug ] [HOST#5349] reducing disk needed for WU by 14736000 bytes (length of H1_0953.4)
2005-02-26 13:45:18 [debug ] est cpu dur 100426.666667; running_frac 0.120096; rsf 1.000000; est 836218.454917
2005-02-26 13:45:18 [normal ] [HOST#5349] Sending [RESULT#1429554 H1_0953.4__0953.7_0.1_T06_Test02_3] (fills 836218.45 seconds)
2005-02-26 13:45:18 [normal ] [HOST#5349] Sent 1 results

Director, Einstein@Home

Ziran
Ziran
Joined: 26 Nov 04
Posts: 194
Credit: 665118
RAC: 633

Ok, let’s analyze

Ok, let’s analyze this.

So the sever dose this estimate:

A WU will take 28H to crunch.
BOINC can use my CPU 11% of the day.

So to return a WU in time i will need at least a 10-day deadline. (9.69) So i will have no chance in hell to crunch that WU in less than 7 days.

Now in reality things are worse, a WU will need 40H not 28H to crunch. So if BOINC only can use my CPU 11% of the day, the deadline would have to be at least 15 days (15.15) so i could return a WU in time. Can we agree on this so far?

Now my computer must have good connections with some higher power (it should have because it have already been frown in the Dumpster once) or the CPU must be located in a hole in the space-time continuum, because it haven’t returned a WU late yet and has an Average turnaround time of 5.3 days.

But before we ask for a grant to explore this and make plans for a BOINC project aimed at exploiting this, we might examine if there could be an alternative explanation for this.

The estimate for the crunch time of the WU is wrong, but it is to my advantage so it isn’t the problem here. That only leaves the estimate of how much CPU-time BOINC will get in a day. If i understand it correctly the estimate of the available CPU-time (running_frac) is calculated by multiplying “the percentage of the day computer is on� with “the percentage of CPU-time available to BOINC then it’s on�.

In my case it is:
Computer on 34.6% of time
BOINC on 34.6% of that

Now this looks suspicious to me. Both values are 34.6%. Since we know something is wrong, my guess would be that one of these values is a copy of the other. That the Computer is on 34.6% of time seams fairly accurate. My guess would be something lower then 50%. That BOINC would only have access to the CPU 34.6% of time seams very low to me. This computer is manly used to surf the web, reading BOINC message boards and to listen to internet-radio through winamp (18% CPU use). So i would expect a value around 80%.

But this doesn’t explain it all. If the computer were on 34.6% BOINC would on average have access to the CPU 97-98% of the time to be able to have a turnaround time in 5.3 days. This isn’t possible on my machine, so let’s play around with my estimates. If my machine was on 50% of the time and BOINC had access to the CPU 80% then running_frac would be 40%. That is fairly close to 34.6%. Now lets say my computer was on 43.25% and BOINC had access to the CPU 80% then the running_frac would be 34.6%.

So let’s run the numbers:
A WU will take 40H to crunch.
BOINC can use my CPU 34.6% of the day.

40/(24*0.346)=4.82

Since my last WU was done in 4 days 10H 10 min and my average turnaround time is 5.3 days, this isn’t such a bad estimate.

So my guess is that the values for “ Computer on� and “ BOINC on� is actually running_frac. So then the server calculates running_frac it actually gets squared.

Now how can we test this theory? Well that’s easy. Lets look for computers that have the same value for both “ Computer on� and “ BOINC on�. Fist of, does my computer have that? If not that’s interesting because that was what the server reported. If it has, is it all computers or just, lets say all windows 98 users?

Bruce, do you think i can be on to something hear?
Thanks for the fast replay as always.

-----------------

The reason my machine have 0.15 seconds of remaining work for E@H is because i have set the “contact server every� value to 0.02. Then running application version 4.75 the client downloaded a new WU 5 min before the old finished (value 0.01). With application version 4.79 it seams that the calculation of remaining time is not so accurate for the last minutes. I think i will try 0.05 the next time.

Then you're really interested in a subject, there is no way to avoid it. You have to read the Manual.

debugas
debugas
Joined: 11 Nov 04
Posts: 170
Credit: 77331
RAC: 0

> The estimate is that one WU

Message 5707 in response to message 5705

> The estimate is that one WU would take 100426 seconds (say 28 hours) of CPU on your machine. But since the estimate is that the code would only run for 11% of the time, the work was estimated to take 837382 seconds, whereas the deadline is week, which is less than this.

The thing is that 100426 seconds is the REAL WORLD time ( in how much time WU was completed) and not the computer crunching time (seconds CPU was working on E@H)

It seems that BOINC client adjusts the time to completion and then BOINC server does it once again (why?). it results in over adjustment.
if i work 24/h a day on E@H the estimate wil be Ok, but if i work 12hours a day the estimate will be not 1/2 but 1/2*1/2 as if i worked only 6 hours per day.

or am i wrong here ?

ok , i have a PC working on two projects, when E@H is paused its time to completion and CPU time are paused too, but time to completion is already estimated based on the time my computer is on i.e already based on the percentage of a real world day my CPU is working (on E@H).

ok , one more thing to consider.
when i get a new WU time to completion is estimated to be about 41 hour but in reality i end crunching WU in 24-28 hours, i suspect it is because E@H gets only 50% of the time

Ziran
Ziran
Joined: 26 Nov 04
Posts: 194
Credit: 665118
RAC: 633

It seams that i am not the

It seams that i am not the only one noticing that both “ Computer on� and “ BOINC on� hase the same value. Debugas spotted it also. Fj has at least one computer with the values being the same.

http://einsteinathome.org/node/188003

Then you're really interested in a subject, there is no way to avoid it. You have to read the Manual.

debugas
debugas
Joined: 11 Nov 04
Posts: 170
Credit: 77331
RAC: 0

Problem #1: It seems that

Message 5709 in response to message 5708

Problem #1:
It seems that BOINC client adjusts the time to completion and then BOINC server does it once again (why?). it results in over adjustment.
if i work 24h a day on E@H the estimate will be Ok, but if i work 12h a day the estimate will be not 1/2 but 1/2*1/2 as if i worked only 6 hours per day.

the less i work the worse the adjustment is - if i worked 1/3 of the day the BOINC server estimates it to be 1/9

BOINC server does not recognize that BOINC client has already adjusted the time to completion

Problem #2: when sharing E@H with other projects on BOINC client the time is further misadjusted by BOINC client itself by doing the same mistake as server does. If E@H is taking 0.5 of the time taken by all boinc projects in the given PC, the estimated time to completion is based not on this 0.5 but on double this i.e. 0.5*0.5=0.25

Ziran
Ziran
Joined: 26 Nov 04
Posts: 194
Credit: 665118
RAC: 633

Lets se if we can sort this

Lets se if we can sort this out.

In addition to the problems i got, you got the following:

1. A WU is estimated to take 41H but in reality only takes 24-28H
This is a problem if it is correct. My computer say it will take 40H but the server say it will take 28H. It would be interesting to se what time the server estimates that your WU will take.

2. Since you run multiple projects only half the CPU time is available to each project.
The server isn’t miss adjusting, it does what it is supposed to do in this case by halving the calculated time, because only half the time is available to this particular project.

Then you're really interested in a subject, there is no way to avoid it. You have to read the Manual.

S@NL - Marleen
S@NL - Marleen
Joined: 18 Jan 05
Posts: 25
Credit: 4068135
RAC: 0

In my case the log

In my case the log says:
Einstein@Home - 2005-02-27 21:28:11 - Message from server: No work sent (won't finish in time) Computer on 25.9% of time, BOINC on 25.8% of that, Einstein gets 50.0% of that

I think the problem is "BOINC on 25.8% of that". This suggests my computer is turned on for only 1/4 of a day, and than BOINC runs for only 1/4 of that time, resulting in a time of 1/16 of a day.
No, my computer is actually on for about 3/4 of a day. I only run BOINC for 1/4 of a day, and that's what it is reporting as "Computer on 25.9% of time". If BOINC isn't running, how can it know how long my computer is on?
Since BOINC does not get 100% of CPU power (usually 99%), it reports "BOINC on 25.8%", which is a bit less.
But imho this does NOT mean "25.8% of 25.9%", but "25.8% of a day"!
Call 25.9% the "gross" BOINC CPU time (the wall clock time the BOINC program is running), and 25.8% the "net" BOINC CPU time (the time it actually crunches WU's).

In my case, the estimate how long it takes to calculate the WU should be:
/ (0.258 * 0.5) or / ( * )
and NOT
/ (0.259 * 0.258 * 0.5) or / ( * * )

So in the server interaction example posted by Bruce (taking the values of that computer), the running_frac should be 0.346 and not 0.12 (which is 0.346 * 0.346).

Ziran
Ziran
Joined: 26 Nov 04
Posts: 194
Credit: 665118
RAC: 633

edit Now if i only could

Message 5712 in response to message 5711

edit

Now if i only could get the dyslexia under control i wouldn’t have to delete things so often.
Well i could always blame it on that it is getting late hear.

Then you're really interested in a subject, there is no way to avoid it. You have to read the Manual.

debugas
debugas
Joined: 11 Nov 04
Posts: 170
Credit: 77331
RAC: 0

> Problem #1: > It seems that

Message 5713 in response to message 5709

> Problem #1:
> It seems that BOINC client adjusts the time to completion and then BOINC
> server does it once again (why?)...
> BOINC server does not recognize that BOINC client has already adjusted
>
the time to completion

I am more and more sure about the problem, e.g. one of my PCs is running E@H only and it completes a WU in aprox 24 hours. But when i recieve a new WU the time to completion is set to be aprox 41 hour. Why is that ? It is because my PC is 60% ON. ~24/0.6 = ~41

> Problem #2: when sharing E@H with other projects on BOINC client the time is
> further misadjusted
It seems i was mistaken about this one

Bruce Allen
Bruce Allen
Moderator
Joined: 15 Oct 04
Posts: 1119
Credit: 172127663
RAC: 0

> The estimate for the crunch

Message 5714 in response to message 5706

> The estimate for the crunch time of the WU is wrong, but it is to my advantage
> so it isn’t the problem here. That only leaves the estimate of how much
> CPU-time BOINC will get in a day. If i understand it correctly the estimate of
> the available CPU-time (running_frac) is calculated by multiplying “the
> percentage of the day computer is on� with “the percentage of CPU-time
> available to BOINC then it’s on�.
>
> In my case it is:
> Computer on 34.6% of time
> BOINC on 34.6% of that
>
> Now this looks suspicious to me. Both values are 34.6%. Since we know
> something is wrong, my guess would be that one of these values is a copy of
> the other.

I'll ask David Anderson (who wrote this bit of the scheduler code) about this.

> That the Computer is on 34.6% of time seams fairly accurate. My
> guess would be something lower then 50%. That BOINC would only have access to
> the CPU 34.6% of time seams very low to me. This computer is manly used to
> surf the web, reading BOINC message boards and to listen to internet-radio
> through winamp (18% CPU use). So i would expect a value around 80%.

Are your preferences set so that BOINC runs all the time? Or just when your computer is idle? In the latter case, how long does it have to be idle before BOINC restarts the work?

> The reason my machine have 0.15 seconds of remaining work for E@H is because i
> have set the “contact server every� value to 0.02. Then running
> application version 4.75 the client downloaded a new WU 5 min before the old
> finished (value 0.01). With application version 4.79 it seams that the
> calculation of remaining time is not so accurate for the last minutes. I think
> i will try 0.05 the next time.

OK, I think that makes sense.

Bruce

Director, Einstein@Home

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.