Results shows 27 in progress, Boinc only shows 14?

Mystic

Joined: 24 Feb 05

Posts: 7

Credit: 806007

RAC: 0

8 Jan 2006 18:18:09 UTC

Topic 190556

(moderation:

)

Hello all,

I have two new issues that showed up this morning. I requested new work, and recieved three new WUs, but y results page now shows 16 new WUs, 13 of which never made it to me?

Second issue, I completed 3 WUs overnight, signed on and uploaded them, and it keeps showing me this:

Einstein@Home - 2006-01-08 12:05:22 - Sending request to scheduler: http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
Einstein@Home - 2006-01-08 12:05:28 - Scheduler RPC to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
Einstein@Home - 2006-01-08 12:05:28 - Already have result r1_1002.0__919_S4R2a_0
Einstein@Home - 2006-01-08 12:05:28 - Already have result r1_1002.0__918_S4R2a_0
Einstein@Home - 2006-01-08 12:05:28 - Already have result r1_1002.0__917_S4R2a_0
Einstein@Home - 2006-01-08 12:07:07 - Sending request to scheduler: http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
Einstein@Home - 2006-01-08 12:07:12 - Scheduler RPC to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
Einstein@Home - 2006-01-08 12:07:12 - Already have result r1_1002.0__919_S4R2a_0
Einstein@Home - 2006-01-08 12:07:12 - Already have result r1_1002.0__918_S4R2a_0
Einstein@Home - 2006-01-08 12:07:12 - Already have result r1_1002.0__917_S4R2a_0
Einstein@Home - 2006-01-08 12:12:09 - Sending request to scheduler: http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
Einstein@Home - 2006-01-08 12:12:13 - Scheduler RPC to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
Einstein@Home - 2006-01-08 12:12:14 - Already have result r1_1002.0__919_S4R2a_0
Einstein@Home - 2006-01-08 12:12:14 - Already have result r1_1002.0__918_S4R2a_0
Einstein@Home - 2006-01-08 12:12:14 - Already have result r1_1002.0__917_S4R2a_0

Any ideas for a culprit or a fix for these?

Sharky T

Joined: 19 Feb 05

Posts: 159

Credit: 1187722

RAC: 0

Results shows 27 in progress, Boinc only shows 14?

8 Jan 2006 18:22:39 UTC

Message 23791

(moderation:

)

Maybe its that way old 4.19 client who wants to retire.. ;)

Mystic

Joined: 24 Feb 05

Posts: 7

Credit: 806007

RAC: 0

He has kicked butt through

8 Jan 2006 18:27:08 UTC

Message 23792

(moderation:

)

He has kicked butt through all this, I was hopin he would make it a little longer...

I just figured out, the three it says "already have result" are the three that I did recieve this morning. But then why is it trying to resend those three and not the ones I didnt get, if that is what its doing? And why wont it sent in the results for the 3 that I did complete overnight?

Sharky T

Joined: 19 Feb 05

Posts: 159

Credit: 1187722

RAC: 0

Maybe EAH servers are about

8 Jan 2006 18:49:02 UTC

Message 23793

(moderation:

)

Maybe EAH servers are about to crap out.
Even your average download speed looks ,well.. lets say a bit fast. LOL
With ones do you try to upload.(I try to look in the scheduler logs if I can find any clues.)
*edit* Found this in scheduler logs:
2006-01-08 18:06:52.8874 [PID=26165] [debug ] REQUEST_METHOD=POST CONTENT_TYPE=application/octet-stream HTTP_ACCEPT= HTTP_USER_AGENT=
2006-01-08 18:06:52.8874 [PID=26165] [debug ] CONTENT_LENGTH=6719
2006-01-08 18:06:54.1277 [PID=26165] [normal ] Handling request: host 494017, platform windows_intelx86, version 4.19.0, RSF 1.000000
2006-01-08 18:06:54.1277 [PID=26165] [normal ] OS version Microsoft Windows XP Professional Edition, Service Pack 2, (05.01.2600.00)
2006-01-08 18:06:54.1335 [PID=26165] [debug ] Request [HOST#494017] Database [HOST#494017] Request [RPC#50] Database [RPC#49]
2006-01-08 18:06:54.1345 [PID=26165] [normal ] Processing request [HOST#494017] [RPC#50] core client version 4.19.0
2006-01-08 18:06:54.1408 [PID=26165] [normal ] [HOST#494017] [RESULT#13706602 r1_1002.0__959_S4R2a_0] got result
2006-01-08 18:06:54.1408 [PID=26165] [CRITICAL] [HOST#494017] [RESULT#13706602 r1_1002.0__959_S4R2a_0] got result twice
2006-01-08 18:06:54.1409 [PID=26165] [normal ] [HOST#494017] [RESULT#13706614 r1_1002.0__958_S4R2a_0] got result
2006-01-08 18:06:54.1409 [PID=26165] [CRITICAL] [HOST#494017] [RESULT#13706614 r1_1002.0__958_S4R2a_0] got result twice
2006-01-08 18:06:54.1409 [PID=26165] [normal ] [HOST#494017] [RESULT#13910792 r1_1002.0__954_S4R2a_1] got result
2006-01-08 18:06:54.1409 [PID=26165] [CRITICAL] [HOST#494017] [RESULT#13910792 r1_1002.0__954_S4R2a_1] got result twice
2006-01-08 18:06:54.1417 [PID=26165] [normal ] sending delay request 61.000000
Anybody out there who can decode this?
I read it like you are sending the same results over again..??
At 18:05 it already got those results.And it does this again at 18:11..

Mystic

Joined: 24 Feb 05

Posts: 7

Credit: 806007

RAC: 0

Ok, I really dont understand

8 Jan 2006 19:29:08 UTC

Message 23794

(moderation:

)

Ok, I really dont understand the logs, so forgive me if Im totally off base here. I was looking at my initial contact and found this:

17:59:47.5281 [PID=23678] [debug ] get_working_set_filename(): returning r1_1120.5
2006-01-08 17:59:47.5282 [PID=23678] [debug ] send_new_file_working_set will try filename r1_1120.5
2006-01-08 17:59:47.5376 [PID=23678] [debug ] in_send_results_for_file(r1_1120.5, 0) prev_result.id=0
2006-01-08 17:59:47.7402 [PID=23678] [debug ] est cpu dur 18103.792152; running_frac 0.997095; rsf 1.000000; est 18156.536892
2006-01-08 17:59:47.7409 [PID=23678] [debug ] [HOST#494017] Sending app_version albert windows_intelx86 437
2006-01-08 17:59:47.7421 [PID=23678] [debug ] est cpu dur 18103.792152; running_frac 0.997095; rsf 1.000000; est 18156.536892
2006-01-08 17:59:47.7421 [PID=23678] [normal ] [HOST#494017] Sending [RESULT#14108942 r1_1120.5__1310_S4R2a_2] (fills 18156.54 seconds)
2006-01-08 17:59:47.7432 [PID=23678] [normal ] [HOST#494017] Sent 13 results [scheduler ran 27.638383 seconds]
2006-01-08 17:59:47.7442 [PID=23678] [normal ] sending delay request 61.000000

I recieved nothing from this batch, also, for this time period, it seems the times on the right hand side seems to jump around a lot, is this normal, jsut something I hadnt noticed before?

Then, a few minutes later, since my computer didtn recieve the origional work, it requested additional new work, and since I had already reiceved "13" it sent out three more, which I did recieve fine, to make the daily quota of 16.

18:01:41.4521 [PID=24417] [normal ] [HOST#494017] Sent 3 results [scheduler ran 2.872020 seconds]

Thoughts?

Sharky T

Joined: 19 Feb 05

Posts: 159

Credit: 1187722

RAC: 0

You mean the times on the

8 Jan 2006 20:12:43 UTC

Message 23795

(moderation:

)

You mean the times on the left side,don't you ;)
I've seen jumps like that before,but this one looks big.
Maybe the server had too much to do right at the time.
I noticed one thing though.It requested a delete of the r1_0099.0 after
the 12 r1_1002.0 unit and then the last r1_1120.5 unit.
Then it requests another delete of the same r1_0099.0 2 minutes later,when you got your 3 successful downloads.
There's one idea I have in my head of what might have happend but I really don't think so..
But what a heck,here's the story..
If you limited your HD space in prefs or in other way refused it to accept more data it couldn't recieve the downloaded stuff until the r1_0099.0 file was gone.And the last r1_1120.0 unit was to fast behind it to succeed.
2 minutes walks by and now you could download some more r1_1002.0 units.
But while I was writing this I saw this line in there "available disk 117.698731 GB",so I guess that crashes my theory.. LOL
Man.. I think I go watch some TV..

Ziran

Joined: 26 Nov 04

Posts: 194

Credit: 681783

RAC: 1392

Something is real fishy here.

8 Jan 2006 23:25:05 UTC

Message 23796

(moderation:

)

Something is real fishy here. First of all, I think you got a ghost WU problem. A fix for that is implemented in later versions of BOINC, so an upgrade would hopefully fix that. Do a search for ghost WU, it was discussed a couple of months ago. Are you behind a proxy?

Looking at the logs something is really messed up here. The hosts logs, in your original post refers to the 3 last results you downloaded at 18:01

The logs from the scheduler refers to the 3 last results you reported at 18:05. According to the logs on the scheduler you are trying to report the 3 results you reported at 18:05 a second time.

So the host and the scheduler isnâ€™t talking about the same 3 results. This is how far I can help you. Any one else having any suggestions?

Then you're really interested in a subject, there is no way to avoid it. You have to read the Manual.

Mystic

Joined: 24 Feb 05

Posts: 7

Credit: 806007

RAC: 0

Lol, yeah, left, the other

9 Jan 2006 0:14:02 UTC

Message 23797

(moderation:

)

Lol, yeah, left, the other right. I have had it set for leaving 10 Gb open with a normal ~120GBs open, so I dont think thats the issue. (I have also set that to 200 for now to stop any more work from coming in till this is figured out.)
No proxy involved.
And I did read about the ghost workunits when all that became a big issue, and I have DLed the newest version, I was just hoping to make it to 100,000 credits on the origional client I started with.

Update:
Completed work still hanging around: I have since finished another WU, it uploaded and was sent in fine, but the client will not remove it from the work screen. The results page shows it as recieved and pending, but its still there with the other 3 from this morning that will not disappear.

On the ghost WU side of the issue, I still get this, which are the 3 that I have recieved:

Einstein@Home - 2006-01-08 18:02:23 - Sending request to scheduler: http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
Einstein@Home - 2006-01-08 18:02:27 - Scheduler RPC to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
Einstein@Home - 2006-01-08 18:02:27 - Already have result r1_1002.0__919_S4R2a_0
Einstein@Home - 2006-01-08 18:02:27 - Already have result r1_1002.0__918_S4R2a_0
Einstein@Home - 2006-01-08 18:02:27 - Already have result r1_1002.0__917_S4R2a_0

Mystic

Joined: 24 Feb 05

Posts: 7

Credit: 806007

RAC: 0

I was going through the logs

9 Jan 2006 4:29:12 UTC

Message 23798

(moderation:

)

I was going through the logs on the client and found this, this should ahve happened on the last upload that worked properly. Any ideas what it means, lol?

Einstein@Home - 2006-01-07 18:51:35 - Error on file upload: [r1_1002.0__961_S4R2a_0_0] locked by file_upload_handler PID=14480

And I have since lower my "connect to network" to .1 days, but the client does not see this. I have updated it a couple times and ut keeps saying:

--- - 2006-01-08 22:21:44 - May run out of work in 2.50 days; requesting more
Einstein@Home - 2006-01-08 22:21:44 - Requesting 216342 seconds of work
Einstein@Home - 2006-01-08 22:21:44 - Sending request to scheduler: http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
Einstein@Home - 2006-01-08 22:21:48 - Scheduler RPC to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
Einstein@Home - 2006-01-08 22:21:48 - Already have result r1_1002.0__919_S4R2a_0
Einstein@Home - 2006-01-08 22:21:48 - Already have result r1_1002.0__918_S4R2a_0
Einstein@Home - 2006-01-08 22:21:48 - Already have result r1_1002.0__917_S4R2a_0
Einstein@Home - 2006-01-08 22:21:48 - No work from project
Einstein@Home - 2006-01-08 22:21:48 - Deferring communication with project for 13 minutes and 53 seconds

Anyone really understand this stuff, 'cause it's sure going over my head?

S@NL - Marleen

Joined: 18 Jan 05

Posts: 25

Credit: 4068135

RAC: 0

RE: I was going through the

10 Jan 2006 23:40:12 UTC

Message 23799 in response to message 23798

(moderation:

)

Quote:

I was going through the logs on the client and found this, this should ahve happened on the last upload that worked properly. Any ideas what it means, lol?

Einstein@Home - 2006-01-07 18:51:35 - Error on file upload: [r1_1002.0__961_S4R2a_0_0] locked by file_upload_handler PID=14480

Looks like a process (file_upload_handler, which I guess is a part of BOINC) on your computer holds the result file open ("locked"). That means the file cannot be deleted etc by another (BOINC) process and Einstein gets confused.

I think you didn't log out or restarted your computer while this was going on? Because that will end such a process and release the file.
You can also open the Task Manager and kill that process.

After this, I expect that Einstein will maybe repeat that message "Already have result" once, but then it should start working normally.
If you get that "locked by file_upload_handler" again, reinstall BOINC or upgrade to a newer version.

Good luck!

Bruce Allen

Moderator

Joined: 15 Oct 04

Posts: 1119

Credit: 172127663

RAC: 0

Mystic, it looks as if things

16 Jan 2006 18:36:15 UTC

Message 23800

(moderation:

)

Mystic, it looks as if things are working for you again. You may have had a file upload problem. Could you confirm that all is now OK?

Director, Einstein@Home

Results shows 27 in progress, Boinc only shows 14?

Forums › Problems and Bug Reports

Comment viewing options

Forums › Problems and Bug Reports