Boinc 5.2.2 Behavior

Remember911
Remember911
Joined: 18 Jan 05
Posts: 20
Credit: 185497
RAC: 0
Topic 190060

It seems that I must manually do an update in order to get Boinc to report a completed work unit. I have a good (dsl) 24/7 connection and never had to do that with Boinc 4.55 or earlier versions.

I am attached to 4 projects but at the moment have the other 3 projects suspended. Yet I am getting messages that the "suspended projects" will communicate at a delayed time. I would think that "suspended" means NO messages about these should appear. Here is a fragment of the log.

9/30/2005 4:21:59 PM|Einstein@Home|Computation for result w1_0710.5__0710.5_0.1_T06_S4hC_2 finished
9/30/2005 4:22:00 PM|Einstein@Home|Starting result l1_1460.5__1460.9_0.1_T04_S4lC_2 using einstein version 479
9/30/2005 4:22:02 PM|Einstein@Home|Started upload of w1_0710.5__0710.5_0.1_T06_S4hC_2_0
9/30/2005 4:22:08 PM|Einstein@Home|Finished upload of w1_0710.5__0710.5_0.1_T06_S4hC_2_0
9/30/2005 4:22:08 PM|Einstein@Home|Throughput 14827 bytes/sec
9/30/2005 4:37:44 PM|Predictor @ Home|Deferring communication with project for 3 weeks, 0 days, 23 hours, 22 minutes, and 41 seconds
9/30/2005 4:37:44 PM|Einstein@Home|Deferring communication with project for 3 weeks, 3 days, 15 hours, 23 minutes, and 26 seconds
9/30/2005 4:37:44 PM|XtremLab|Deferring communication with project for 3 weeks, 1 days, 16 hours, 23 minutes, and 26 seconds
9/30/2005 5:37:46 PM|Predictor @ Home|Deferring communication with project for 3 weeks, 0 days, 22 hours, 22 minutes, and 39 seconds
9/30/2005 5:37:46 PM|Einstein@Home|Deferring communication with project for 3 weeks, 3 days, 14 hours, 23 minutes, and 24 seconds
9/30/2005 5:37:46 PM|XtremLab|Deferring communication with project for 3 weeks, 1 days, 15 hours, 23 minutes, and 24 seconds
9/30/2005 6:37:47 PM|Predictor @ Home|Deferring communication with project for 3 weeks, 0 days, 21 hours, 22 minutes, and 37 seconds
9/30/2005 6:37:47 PM|Einstein@Home|Deferring communication with project for 3 weeks, 3 days, 13 hours, 23 minutes, and 23 seconds
9/30/2005 6:37:47 PM|XtremLab|Deferring communication with project for 3 weeks, 1 days, 14 hours, 23 minutes, and 23 seconds
9/30/2005 6:54:52 PM||request_reschedule_cpus: project op


Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5876
Credit: 118565227621
RAC: 24157080

Boinc 5.2.2 Behavior

I've never suspended anything so I don't know for sure if the nessages are "normal" or not. I'm intrigued as to why BOINC chooses 3 weeks for the deferral period though. If you have suspended the other three projects and not EAH, why is EAH also deferring communication for three weeks as shown in the log snippet??

You don't mention the actual version of BOINC but you do mention 4.55 (4.45??) as being an earlier version so I presume you may have upgraded to 5.2.2. Has all this happened since the upgrade? Your computers are hidden so I can't look at a results list to see what's been happening.

Edit: It just occurred to me that maybe you posted this in the wrong forum. Maybe it's seti that's not suspended and EAH is supposed to be?? :).
Also, sorry, forgot to look in the thread title for the BOINC version.

Cheers,
Gary.

Stick
Stick
Joined: 24 Feb 05
Posts: 790
Credit: 33356240
RAC: 10284

RE: It seems that I must

Quote:
It seems that I must manually do an update in order to get Boinc to report a completed work unit. I have a good (dsl) 24/7 connection and never had to do that with Boinc 4.55 or earlier versions.

I would suggest you look at Message 20012 on Message boards : Cafe Einstein : New Boinc Windows core client released - 5.2.1. I reported a similar experience there last week and got a very good explanation in a response from Walt Gribben.

Remember911
Remember911
Joined: 18 Jan 05
Posts: 20
Credit: 185497
RAC: 0

RE: I've never suspended

Message 18660 in response to message 18658

Quote:

I've never suspended anything so I don't know for sure if the nessages are "normal" or not. I'm intrigued as to why BOINC chooses 3 weeks for the deferral period though. If you have suspended the other three projects and not EAH, why is EAH also deferring communication for three weeks as shown in the log snippet??

You don't mention the actual version of BOINC but you do mention 4.55 (4.45??) as being an earlier version so I presume you may have upgraded to 5.2.2. Has all this happened since the upgrade? Your computers are hidden so I can't look at a results list to see what's been happening.

Edit: It just occurred to me that maybe you posted this in the wrong forum. Maybe it's seti that's not suspended and EAH is supposed to be?? :).
Also, sorry, forgot to look in the thread title for the BOINC version.

Einstein is NOT suspended, Predictor is, LHC is and XtreamLab is. I changed over from 4.55 to 5.2.2 on Sunday Afternoon. It is interesting that Einstein is deferring communication for 3 weeks with this version. It is also interesting that the other 3 projects are showing this 3 week deferral when they are marked as suspended and there are no work units to be processed except Einstein WU's in the work queue. I assume this will provide some food for thought to the dev's as they move forward in the development process. I do have the "Run Always" and the "Network Always Available" options checked. I also am using WinXP Pro SP2 on an Intel 3.0ghz HT with 512meg of P3200 memory.


Stick
Stick
Joined: 24 Feb 05
Posts: 790
Credit: 33356240
RAC: 10284

RE: Einstein is NOT

Message 18661 in response to message 18660

Quote:
Einstein is NOT suspended, Predictor is, LHC is and XtreamLab is. I changed over from 4.55 to 5.2.2 on Sunday Afternoon. It is interesting that Einstein is deferring communication for 3 weeks with this version. It is also interesting that the other 3 projects are showing this 3 week deferral when they are marked as suspended and there are no work units to be processed except Einstein WU's in the work queue. I assume this will provide some food for thought to the dev's as they move forward in the development process. I do have the "Run Always" and the "Network Always Available" options checked. I also am using WinXP Pro SP2 on an Intel 3.0ghz HT with 512meg of P3200 memory.

I think "Suspend" just keeps the Science App from crunching (i.e. communication still possible to report, etc.) I agree that your "Deferring . . ." messages are a little strange. You might try rebooting and see if things get back to normal.

Remember911
Remember911
Joined: 18 Jan 05
Posts: 20
Credit: 185497
RAC: 0

RE: RE: It seems that I

Message 18662 in response to message 18659

Quote:
Quote:
It seems that I must manually do an update in order to get Boinc to report a completed work unit. I have a good (dsl) 24/7 connection and never had to do that with Boinc 4.55 or earlier versions.

I would suggest you look at Message 20012 on Message boards : Cafe Einstein : New Boinc Windows core client released - 5.2.1. I reported a similar experience there last week and got a very good explanation in a response from Walt Gribben.

Stick I saw this posted by Walt Gribbon:

Idea is to reduce the times BOINC contacts the scheduler to report work or request it, and this way it combines several requests into one contacts. With a short "connect every" interval it'll report the previous result when it requests a new workunit. With a longer interval it can report several workunits either after 24 hours have passed or when it requests more work.

And even though a result gets uploaded, the server won't show it as complete until its reported. And it'll use that as the "completion" time, even though the result had already been uploaded.

Walt Gribben
Forum moderator
Project developer

Although I don't agree with the concept of crunching the work unit in 10.5 to 11 hours time, uploading the results, and then waiting for 22 hours to get credit for the completion of the workunit, I do understand why this is being done. Under this scheme, the scheduler has a shorter contact with the Boinc Client (the 1st time). Thereafter, there is no gain, the processing time is doubled for the workunit (which is a bug in itself) and the credit is further delayed for twice the length of time as reality. I think that this is flawed logic but then I do not know the effects from the server side of the story.


Stick
Stick
Joined: 24 Feb 05
Posts: 790
Credit: 33356240
RAC: 10284

RE: Although I don't agree

Message 18663 in response to message 18662

Quote:
Although I don't agree with the concept of crunching the work unit in 10.5 to 11 hours time, uploading the results, and then waiting for 22 hours to get credit for the completion of the workunit, I do understand why this is being done. Under this scheme, the scheduler has a shorter contact with the Boinc Client (the 1st time). Thereafter, there is no gain, the processing time is doubled for the workunit (which is a bug in itself) and the credit is further delayed for twice the length of time as reality. I think that this is flawed logic but then I do not know the effects from the server side of the story.



Although, I tend to agree with your view - it's been my experience that the schedule is not delayed as much as you predict. Somehow (without intervention from me), my results "disappear" from the "Work" tab "Ready to report" status well before the next WU is completed - usually within a couple of hours of completion. That is, there must be some other kinds of "interim communications" that result reports can piggy-back with.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5876
Credit: 118565227621
RAC: 24157080

RE: Einstein is NOT

Message 18664 in response to message 18660

Quote:
Einstein is NOT suspended, Predictor is, LHC is and XtreamLab is.

In the log snippet you supplied, only three projects were listed. I had no idea that your 4th project was actually LHC (isn't LHC non-compliant for V5.x.x??) and I just wondered if it might have been Seti. I made the mistake of wondering if the three "deferring" projects might have been the three suspended projects.

If EAH has been doing the work lately it probably has a negative LTD. If it was negative at the time the other projects were suspended maybe BOINC is deferring EAH in the hope that a positive LTD project may become "unsuspended" in the near future. These are just random thoughts trying to visualize why BOINC would want to defer the only "unsuspended" project it has. In my mind, it could very well be debt related. Out of interest, how much of a cache of EAH do you have and how is this being maintained?

Quote:
... It is also interesting that the other 3 projects are showing this 3 week deferral when they are marked as suspended and there are no work units to be processed except Einstein WU's in the work queue.

So all four projects are "deferring" for three weeks then? Must be some sort of a bug in BOINC. Maybe you should post a log snippet showing all 4 projects on the BOINC boards where you might get the attention of a BOINC Developer.

Good luck!!

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5876
Credit: 118565227621
RAC: 24157080

RE: ... Thereafter, there

Message 18665 in response to message 18662

Quote:
... Thereafter, there is no gain, the processing time is doubled for the workunit (which is a bug in itself) and the credit is further delayed for twice the length of time as reality. I think that this is flawed logic but then I do not know the effects from the server side of the story.

When I first read this I thought you were trying to say there was a bug in that the cpu crunch time was somehow being doubled. I presume by "processing time" you don't actually mean "cpu crunch time" but rather "reporting time", the value that replaces the "deadline time" in the online database.

If this really bothers you, you can fiddle around with your "connect to network" interval so that the BOINC client is going to be attempting to get new work from the server at a time when a previous result has not long been finished. This way, the finished (uploaded) result doesn't hang around for very long at all. Of course you can only do this with projects that have a very predictable crunch time and it so happens that EAH is one of those. You also need similar boxes if you have more than one machine. I have a number of boxes that do a result every 6 hours. A "connect" setting of around 0.22 has them each looking for new work (and reporting the previous result) within an hour of that previous result finishing. You do have to fiddle a bit and let things settle in order to get it working efficiently. However the benefit is very small latency in reporting.

I don't understand your comment about credit being further delayed. Validation (and the granting of credit, if a quorum exists) is virtually instantaneous with the reporting event. The biggest delay is waiting for the quorum to form. In my experience with small "connect" intervals, a bit of a delay in reporting makes virtually zero difference to the granting of credit because there is rarely a quorum formed at the time of reporting.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.