Problem with a new work

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117820781780

RAC: 34754070

I think you are

15 Jul 2007 2:59:39 UTC

Message 38269

(moderation:

)

I think you are misunderstanding how EAH data files work.

From time to time you get large data files sent to you. Once they are all used up, they get marked for deletion and that is exactly what has happened. The results you work on are very small sets of instructions which tell the science app how to interact with the large data files. Over the life of a datafile you could receive many such sets of instructions. If your current result is at 99% it will get finished and returned in due course and the delete instructions received from the server will then be actioned. This is how it works for everybody.

So the problem is not that BOINC is going to delete the large data files. The real problem is that the BOINC server doesn't think that there will be enough time to complete any further work that your client is requesting so it hasn't sent you any more. This issue will probably be resolved when your current result is returned and BOINC gets to reassess things like the duration correction factor (DCF). Over time a debt to Einstein will build up which could probably allow BOINC to send more work. By analysing certain numbers you could work this out yourself. let us know if you need any assistance.

EDIT: One question to ask yourself is if your computer is running 91% of the time why is BOINC only getting 47% of that? If you could increase this percentage, even just a little, it would probably make a big defference to fetching new work.

Cheers,
Gary.

Alinator

Joined: 8 May 05

Posts: 927

Credit: 9352143

RAC: 0

Agreed, if you look at the

15 Jul 2007 15:30:10 UTC

Message 38270

(moderation:

)

Agreed, if you look at the scheduler log, when the project took a look at everything the host has reported back to it about what it's doing, it figured the host would need around 52 days as it stands to return the proposed WU.

Therefore, even if he could double the time BOINC is allowed to crunch instantly I doubt the project would send work. My guess is there is a scheduling jam on the host with the work currently onboard, assuming the performance metrics haven't gotten screwy for some reason.

Alinator

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117820781780

RAC: 34754070

RE: ... need around 52 days

15 Jul 2007 23:10:24 UTC

Message 38271 in response to message 38270

(moderation:

)

Quote:

... need around 52 days ...

Are you sure about that? I've never really developed the skill for trawling through scheduler logs although I do recall taking a peek a long time ago and backing out pretty quick :). With the state of my memory these days it's not surprising that I've forgotten the details of what you can extract from them :).

I would have thought that Gunnar's machine would be able to handle even the monster work units with a 40% resource share. If the server is calculating 52 days, there must be something drastically wrong with the settings his machine is reporting to the server. His benchmarks are shown as 1250/2580 (which is fine) so the machine should be capable of doing a 650 credit result in around 2.5 days or so.

Gunnar, could you please look (with notepad for example) in your state file (client_state.xml) and tell us the values for the DCF for each of your projects, as well as the short and long term debt values, thanks? Be careful not to change anything :).

Cheers,
Gary.

Alinator

Joined: 8 May 05

Posts: 927

Credit: 9352143

RAC: 0

The general procedure is to

16 Jul 2007 18:38:27 UTC

Message 38272

(moderation:

)

The general procedure is to click the 'Last Contact link and then search the text for your Host ID.

There is one catch, which is what's listed in the scheduler log differs depending on what type of scheduler request was received, so you have to look carefully to get the correct info you're looking for.

Here's the snippet from Gunnar's (which show's he got a result sent):

2007-07-15 23:00:27.7956 [PID=11845] [debug ] est cpu dur 294593.013414; running_frac 0.419615; rsf 0.400000; est 4842657.365231
2007-07-15 23:00:27.7956 [PID=11845] [normal ] [HOST#947683] Sending [RESULT#85750584 h1_0498.70_S5R2__219_S5R2c_1] (fills 4842657.37 seconds)
2007-07-15 23:00:27.7968 [PID=11845] [normal ] [HOST#947683] Sent 1 results [scheduler ran 0.500491 seconds]
2007-07-15 23:00:27.7976 [PID=11845] [normal ] sending delay request 60.000000

It will be interesting to see what happen's here, since it looks like he may have 'rammed' the result down BOINC's throat in this case, unless of course the problem is the time stats and performance metrics on the host are whacked.

The other possibility is we are merely seeing BOINC give EAH it's 'fair share' for what it has for resource share, debt, and other factors accounted for.

Alinator

Problem with a new work

Forums › Problems and Bug Reports

I think you are

Agreed, if you look at the

RE: ... need around 52 days

The general procedure is to

Comment viewing options

Forums › Problems and Bug Reports