Possible Bug, Please Check

Steve Dyer
Steve Dyer
Joined: 24 Feb 05
Posts: 8
Credit: 44607
RAC: 0
Topic 188127

I'm running BOINC 4.19 and Einstein 4.79 on Windows XP. I have a broadband connection and my boinc page preferences are set to contact server every 2 days (this is only to ensure I have enough WU locally to cover any server downtime)

When a WU completes it "sends" the file to the server, but I noticed that it is not listed as a returned result on my results page.

I thought that completed WU's would show as completed once my computer downloaded new WU's, but this is not the case.

The point is that WU only show up as completed after I manually choose "update" within my boinc program. My concern is that, if this is project wide, then the machines which are just left alone, may have WU reported as not returned in time, even though they were returned to server days earlier?

http://einsteinathome.org/account/tasks

Saenger
Saenger
Joined: 15 Feb 05
Posts: 403
Credit: 33009522
RAC: 0

Possible Bug, Please Check

Hello Steve,
It's not a bug, it's a feature ;)
No really, that's just the way it's supposed to happen.
Reporting a result is a two-step thing:
1. Upload the data
2. Report it (and claim credit)
Afterwards it will appear in your list.
And your machines will contact at least every 2-3 days, each time it decides 'I havn't got enough to do for the next 2 days, come on, send me some WUs' and contact the scheduler therefore.
The manual update is only a speed-up of the process.

BTW:
A good manual for Boinc is Paul D. Bucks BOINC Powered Projects Documentation and his FAQ therein.

For information regarding Credits look here!

Grüße vom Sänger

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 118342645697
RAC: 25433249

> The point is that WU only

> The point is that WU only show up as completed after I manually choose
> "update" within my boinc program. My concern is that, if this is project
> wide, then the machines which are just left alone, may have WU reported as not
> returned in time, even though they were returned to server days earlier?
>
> http://einsteinathome.org/account/tasks
>

Steve,

I have written about this very problem in an article about 6 items down from yours entitled "Results truncated to 16" or something like that. The problem is that when your client finishes a WU, the uploading that takes place seems to be effectively an "announcement" rather than a result being uploaded. The scheduling server comes along at some later stage and grabs the actual announced result and sticks it in the list to be validated in due course. My guess is that the time interval between "visits" by the scheduling server may be related to your setting for contacting the server which you said was 2 days. My personal feeling is that if you accept a smaller cache you will get more rapid loading of results into the pending queue.

I notice you have 15 pending results and a queue of 8. Just like you are being held up by others with long caches in getting your pending results validated, I guess there will be others slowed by your last WU which may not even be started for a couple of days. Why not consider a somewhat smaller cache which will speed up the pending process for others and perhaps get your finished results into the validation queue in a more timely fashion. It will be interesting to see if an explanation of why results can take a long time to enter the pending queue is forthcoming.

Cheers,
Gary.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 118342645697
RAC: 25433249

> Hello Steve, > It's not a

Message 6153 in response to message 6151

> Hello Steve,
> It's not a bug, it's a feature ;)
> No really, that's just the way it's supposed to happen.
> Reporting a result is a two-step thing:
> 1. Upload the data
> 2. Report it (and claim credit)
> Afterwards it will appear in your list.
> And your machines will contact at least every 2-3 days, each time it decides
> 'I havn't got enough to do for the next 2 days, come on, send me some WUs' and
> contact the scheduler therefore.
> The manual update is only a speed-up of the process.

Hi Saenger,

Your reply and mine were being composed at the same time but I'd like to point out a flaw in this "feature" as you call it. If step 1 is actually uploading the result and if a machine dies after a number of these results are uploaded but before the step 2 credit claim is done, all those results will be lost if there is just a few days delay in getting the dead machine back to life. Why does there have to be a delay??? Why can't a successfully completed result upload, trigger a reporting and credit claim, say one minute later?

Cheers,
Gary.

Cheers,
Gary.

Saenger
Saenger
Joined: 15 Feb 05
Posts: 403
Credit: 33009522
RAC: 0

> Your reply and mine were

Message 6154 in response to message 6153

> Your reply and mine were being composed at the same time but I'd like to point
> out a flaw in this "feature" as you call it. If step 1 is actually uploading
> the result and if a machine dies after a number of these results are uploaded
> but before the step 2 credit claim is done, all those results will be lost if
> there is just a few days delay in getting the dead machine back to life. Why
> does there have to be a delay??? Why can't a successfully completed result
> upload, trigger a reporting and credit claim, say one minute later?

Hello Gary,
I do not have the faintest...
To put it plainly, my wording for the first step may be absolutely incorrect.
I know it's a two step process from Pauls site.
And I've seen the big amount of data in my transition tab with the first step.
And I've not seen a big amount of data in my transition tab with the second step (if there was any).

I'm just a plain user, regarding the why's of this process. I take them for granted, as an experts solution to some problem I don't even know the existence of.

That explanation is up to David, Rom and all the other dev-people at Boinc.

Grüße vom Sänger

Darren
Darren
Joined: 18 Jan 05
Posts: 94
Credit: 69632
RAC: 0

>If step 1 is actually

Message 6155 in response to message 6153

>If step 1 is actually uploading
> the result and if a machine dies after a number of these results are uploaded
> but before the step 2 credit claim is done, all those results will be lost if
> there is just a few days delay in getting the dead machine back to life. Why
> does there have to be a delay??? Why can't a successfully completed result
> upload, trigger a reporting and credit claim, say one minute later?

Your machine uploads the results as soon as they are finished. It reports them the next time it has a scheduled contact with the assignment server.

If your machine dies and never contacts the assignment server again (or not for a long time) the work unit will still get reported when the deadline arrives. Before it gives up on you, the system will actually check to see if the wu has been uploaded and just never reported. As long as it gets uploaded, it will find it - eventually.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 118342645697
RAC: 25433249

> Hello Gary, > I do not have

Message 6156 in response to message 6154

> Hello Gary,
> I do not have the faintest...
> To put it plainly, my wording for the first step may be absolutely incorrect.
> I know it's a two step process from Pauls site.
> And I've seen the big amount of data in my transition tab with the
> first step.
> And I've not seen a big amount of data in my transition tab with the
> second step (if there was any).
>
> I'm just a plain user, regarding the why's of this process. I take them for
> granted, as an experts solution to some problem I don't even know the
> existence of.
>
> That explanation is up to David, Rom and all the other dev-people at Boinc.

Hi Saenger,

No, I think you are perfectly correct in the way you described it and I think I was wrong in describing stage 1 as an "announcement" of a result completion. It makes more sense that the data of the result be uploaded to a data server and that the local client then claims credit from a scheduling server. Also please don't think I was attacking you in any way. Your explanation to Steve was fine and to the point. I was expressing my frustration with the fact that you can lose a large number of uploaded and otherwise valid results if your machine goes down before a claim is made. I joined the project only on Feb 09 and spent two weeks overseas expecting to come back to find an unattended machine with a whole bunch of completed results. The machine suffered a power glitch just one day into the two week period and 8 completed and uploaded results had the claim for credit delayed for two weeks. Seems like a flaw in the system to me. Losing validly completed results as well as losing two weeks work time is just frustrating. Sorry if it appeared that I was attacking you. That was certainly not my intention.

Cheers,
Gary.

Cheers,
Gary.

Steve Dyer
Steve Dyer
Joined: 24 Feb 05
Posts: 8
Credit: 44607
RAC: 0

I'm glad I raised the issue.

I'm glad I raised the issue. Thank's to those above for the clarification.

Saenger
Saenger
Joined: 15 Feb 05
Posts: 403
Credit: 33009522
RAC: 0

> Hi Saenger, > Sorry if it

Message 6158 in response to message 6156

> Hi Saenger,
> Sorry if it appeared that I was
> attacking you. That was certainly not my intention.

Hello Gary,
I never thought that.
As you can see in my profile, my native tongue is german, so I had the impression, my wording was not that correct, and tried to clarify it.
And this is my last post on mutual excuses ;)
I declare them as superfluous (in my POV).

BUT...
Now you raised my interest.
I really want to know, why this is celebrated in such an extended fashion, and not the quick'n'dirty way.

Can someone of the wizzards please answer this?

Grüße vom Sänger

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

If you would like to report

If you would like to report the result as soon as it's uploaded that can be done.
Start the boinc-client whit this command:

boinc_gui.exe -return_results_immediately

This tells the client to contact the scheduling-server as soon as a result is uploaded. Works whit Boinc ver. 4.19 for windows don't know about the newer ones.

ralic
ralic
Joined: 8 Nov 04
Posts: 128
Credit: 695810
RAC: 0

> I really want to know, why

Message 6160 in response to message 6158

> I really want to know, why this is celebrated in such an extended fashion, and
> not the quick'n'dirty way.

AFAIK, it's part of the design to prevent overloading the servers.

Under a system of reporting results immediately, if a lot of hosts downloaded units at a particular time and they took a similar amount of time to complete the units, then there would be a lot of uploading and reporting activity, followed by much idle time in between reports.

In the current system, each host will upload data, but report results and request more data at almost random intervals, thereby bringing more equilibrium to the load on the servers.

It is also for this purpose that the extended backoff policy exists. In the event that a project went offline and there was no backoff policy, all the hosts would immediately hammer the project as soon as it came back up, thereby effectively causing something akin to a DOS attack in an attempt to get work.

But I could be wrong... ;-)

Once the projects have got most of their kinks ironed out, it would be nice to see them each implement a load & bandwidth graph, so that the users can get an idea of how busy each project is and it can be monitored over periods of weeks, months, years etc. to establish trends.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.