Hello all,
I have two new issues that showed up this morning. I requested new work, and recieved three new WUs, but y results page now shows 16 new WUs, 13 of which never made it to me?
Second issue, I completed 3 WUs overnight, signed on and uploaded them, and it keeps showing me this:
Einstein@Home - 2006-01-08 12:05:22 - Sending request to scheduler: http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
Einstein@Home - 2006-01-08 12:05:28 - Scheduler RPC to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
Einstein@Home - 2006-01-08 12:05:28 - Already have result r1_1002.0__919_S4R2a_0
Einstein@Home - 2006-01-08 12:05:28 - Already have result r1_1002.0__918_S4R2a_0
Einstein@Home - 2006-01-08 12:05:28 - Already have result r1_1002.0__917_S4R2a_0
Einstein@Home - 2006-01-08 12:07:07 - Sending request to scheduler: http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
Einstein@Home - 2006-01-08 12:07:12 - Scheduler RPC to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
Einstein@Home - 2006-01-08 12:07:12 - Already have result r1_1002.0__919_S4R2a_0
Einstein@Home - 2006-01-08 12:07:12 - Already have result r1_1002.0__918_S4R2a_0
Einstein@Home - 2006-01-08 12:07:12 - Already have result r1_1002.0__917_S4R2a_0
Einstein@Home - 2006-01-08 12:12:09 - Sending request to scheduler: http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
Einstein@Home - 2006-01-08 12:12:13 - Scheduler RPC to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
Einstein@Home - 2006-01-08 12:12:14 - Already have result r1_1002.0__919_S4R2a_0
Einstein@Home - 2006-01-08 12:12:14 - Already have result r1_1002.0__918_S4R2a_0
Einstein@Home - 2006-01-08 12:12:14 - Already have result r1_1002.0__917_S4R2a_0
Any ideas for a culprit or a fix for these?
Copyright © 2024 Einstein@Home. All rights reserved.
Results shows 27 in progress, Boinc only shows 14?
)
Maybe its that way old 4.19 client who wants to retire.. ;)
He has kicked butt through
)
He has kicked butt through all this, I was hopin he would make it a little longer...
I just figured out, the three it says "already have result" are the three that I did recieve this morning. But then why is it trying to resend those three and not the ones I didnt get, if that is what its doing? And why wont it sent in the results for the 3 that I did complete overnight?
Maybe EAH servers are about
)
Maybe EAH servers are about to crap out.
Even your average download speed looks ,well.. lets say a bit fast. LOL
With ones do you try to upload.(I try to look in the scheduler logs if I can find any clues.)
*edit* Found this in scheduler logs:
2006-01-08 18:06:52.8874 [PID=26165] [debug ] REQUEST_METHOD=POST CONTENT_TYPE=application/octet-stream HTTP_ACCEPT= HTTP_USER_AGENT=
2006-01-08 18:06:52.8874 [PID=26165] [debug ] CONTENT_LENGTH=6719
2006-01-08 18:06:54.1277 [PID=26165] [normal ] Handling request: host 494017, platform windows_intelx86, version 4.19.0, RSF 1.000000
2006-01-08 18:06:54.1277 [PID=26165] [normal ] OS version Microsoft Windows XP Professional Edition, Service Pack 2, (05.01.2600.00)
2006-01-08 18:06:54.1335 [PID=26165] [debug ] Request [HOST#494017] Database [HOST#494017] Request [RPC#50] Database [RPC#49]
2006-01-08 18:06:54.1345 [PID=26165] [normal ] Processing request [HOST#494017] [RPC#50] core client version 4.19.0
2006-01-08 18:06:54.1408 [PID=26165] [normal ] [HOST#494017] [RESULT#13706602 r1_1002.0__959_S4R2a_0] got result
2006-01-08 18:06:54.1408 [PID=26165] [CRITICAL] [HOST#494017] [RESULT#13706602 r1_1002.0__959_S4R2a_0] got result twice
2006-01-08 18:06:54.1409 [PID=26165] [normal ] [HOST#494017] [RESULT#13706614 r1_1002.0__958_S4R2a_0] got result
2006-01-08 18:06:54.1409 [PID=26165] [CRITICAL] [HOST#494017] [RESULT#13706614 r1_1002.0__958_S4R2a_0] got result twice
2006-01-08 18:06:54.1409 [PID=26165] [normal ] [HOST#494017] [RESULT#13910792 r1_1002.0__954_S4R2a_1] got result
2006-01-08 18:06:54.1409 [PID=26165] [CRITICAL] [HOST#494017] [RESULT#13910792 r1_1002.0__954_S4R2a_1] got result twice
2006-01-08 18:06:54.1417 [PID=26165] [normal ] sending delay request 61.000000
Anybody out there who can decode this?
I read it like you are sending the same results over again..??
At 18:05 it already got those results.And it does this again at 18:11..
Ok, I really dont understand
)
Ok, I really dont understand the logs, so forgive me if Im totally off base here. I was looking at my initial contact and found this:
17:59:47.5281 [PID=23678] [debug ] get_working_set_filename(): returning r1_1120.5
2006-01-08 17:59:47.5282 [PID=23678] [debug ] send_new_file_working_set will try filename r1_1120.5
2006-01-08 17:59:47.5376 [PID=23678] [debug ] in_send_results_for_file(r1_1120.5, 0) prev_result.id=0
2006-01-08 17:59:47.7402 [PID=23678] [debug ] est cpu dur 18103.792152; running_frac 0.997095; rsf 1.000000; est 18156.536892
2006-01-08 17:59:47.7409 [PID=23678] [debug ] [HOST#494017] Sending app_version albert windows_intelx86 437
2006-01-08 17:59:47.7421 [PID=23678] [debug ] est cpu dur 18103.792152; running_frac 0.997095; rsf 1.000000; est 18156.536892
2006-01-08 17:59:47.7421 [PID=23678] [normal ] [HOST#494017] Sending [RESULT#14108942 r1_1120.5__1310_S4R2a_2] (fills 18156.54 seconds)
2006-01-08 17:59:47.7432 [PID=23678] [normal ] [HOST#494017] Sent 13 results [scheduler ran 27.638383 seconds]
2006-01-08 17:59:47.7442 [PID=23678] [normal ] sending delay request 61.000000
I recieved nothing from this batch, also, for this time period, it seems the times on the right hand side seems to jump around a lot, is this normal, jsut something I hadnt noticed before?
Then, a few minutes later, since my computer didtn recieve the origional work, it requested additional new work, and since I had already reiceved "13" it sent out three more, which I did recieve fine, to make the daily quota of 16.
18:01:41.4521 [PID=24417] [normal ] [HOST#494017] Sent 3 results [scheduler ran 2.872020 seconds]
Thoughts?
You mean the times on the
)
You mean the times on the left side,don't you ;)
I've seen jumps like that before,but this one looks big.
Maybe the server had too much to do right at the time.
I noticed one thing though.It requested a delete of the r1_0099.0 after
the 12 r1_1002.0 unit and then the last r1_1120.5 unit.
Then it requests another delete of the same r1_0099.0 2 minutes later,when you got your 3 successful downloads.
There's one idea I have in my head of what might have happend but I really don't think so..
But what a heck,here's the story..
If you limited your HD space in prefs or in other way refused it to accept more data it couldn't recieve the downloaded stuff until the r1_0099.0 file was gone.And the last r1_1120.0 unit was to fast behind it to succeed.
2 minutes walks by and now you could download some more r1_1002.0 units.
But while I was writing this I saw this line in there "available disk 117.698731 GB",so I guess that crashes my theory.. LOL
Man.. I think I go watch some TV..
Something is real fishy here.
)
Something is real fishy here. First of all, I think you got a ghost WU problem. A fix for that is implemented in later versions of BOINC, so an upgrade would hopefully fix that. Do a search for ghost WU, it was discussed a couple of months ago. Are you behind a proxy?
Looking at the logs something is really messed up here. The hosts logs, in your original post refers to the 3 last results you downloaded at 18:01
The logs from the scheduler refers to the 3 last results you reported at 18:05. According to the logs on the scheduler you are trying to report the 3 results you reported at 18:05 a second time.
So the host and the scheduler isn’t talking about the same 3 results. This is how far I can help you. Any one else having any suggestions?
Then you're really interested in a subject, there is no way to avoid it. You have to read the Manual.
Lol, yeah, left, the other
)
Lol, yeah, left, the other right. I have had it set for leaving 10 Gb open with a normal ~120GBs open, so I dont think thats the issue. (I have also set that to 200 for now to stop any more work from coming in till this is figured out.)
No proxy involved.
And I did read about the ghost workunits when all that became a big issue, and I have DLed the newest version, I was just hoping to make it to 100,000 credits on the origional client I started with.
Update:
Completed work still hanging around: I have since finished another WU, it uploaded and was sent in fine, but the client will not remove it from the work screen. The results page shows it as recieved and pending, but its still there with the other 3 from this morning that will not disappear.
On the ghost WU side of the issue, I still get this, which are the 3 that I have recieved:
Einstein@Home - 2006-01-08 18:02:23 - Sending request to scheduler: http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
Einstein@Home - 2006-01-08 18:02:27 - Scheduler RPC to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
Einstein@Home - 2006-01-08 18:02:27 - Already have result r1_1002.0__919_S4R2a_0
Einstein@Home - 2006-01-08 18:02:27 - Already have result r1_1002.0__918_S4R2a_0
Einstein@Home - 2006-01-08 18:02:27 - Already have result r1_1002.0__917_S4R2a_0
I was going through the logs
)
I was going through the logs on the client and found this, this should ahve happened on the last upload that worked properly. Any ideas what it means, lol?
Einstein@Home - 2006-01-07 18:51:35 - Error on file upload: [r1_1002.0__961_S4R2a_0_0] locked by file_upload_handler PID=14480
And I have since lower my "connect to network" to .1 days, but the client does not see this. I have updated it a couple times and ut keeps saying:
--- - 2006-01-08 22:21:44 - May run out of work in 2.50 days; requesting more
Einstein@Home - 2006-01-08 22:21:44 - Requesting 216342 seconds of work
Einstein@Home - 2006-01-08 22:21:44 - Sending request to scheduler: http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
Einstein@Home - 2006-01-08 22:21:48 - Scheduler RPC to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
Einstein@Home - 2006-01-08 22:21:48 - Already have result r1_1002.0__919_S4R2a_0
Einstein@Home - 2006-01-08 22:21:48 - Already have result r1_1002.0__918_S4R2a_0
Einstein@Home - 2006-01-08 22:21:48 - Already have result r1_1002.0__917_S4R2a_0
Einstein@Home - 2006-01-08 22:21:48 - No work from project
Einstein@Home - 2006-01-08 22:21:48 - Deferring communication with project for 13 minutes and 53 seconds
Anyone really understand this stuff, 'cause it's sure going over my head?
RE: I was going through the
)
Looks like a process (file_upload_handler, which I guess is a part of BOINC) on your computer holds the result file open ("locked"). That means the file cannot be deleted etc by another (BOINC) process and Einstein gets confused.
I think you didn't log out or restarted your computer while this was going on? Because that will end such a process and release the file.
You can also open the Task Manager and kill that process.
After this, I expect that Einstein will maybe repeat that message "Already have result" once, but then it should start working normally.
If you get that "locked by file_upload_handler" again, reinstall BOINC or upgrade to a newer version.
Good luck!
Mystic, it looks as if things
)
Mystic, it looks as if things are working for you again. You may have had a file upload problem. Could you confirm that all is now OK?
Director, Einstein@Home