We updated our scheduler (and the database). The downtime was somewhat longer than we expected, sorry for this.
The only difference that you should notice right now is that the "(Project has no jobs available)" messages should have gone away.
If you notice other problems related to the new scheduler, please post here.
BM
BM
Copyright © 2024 Einstein@Home. All rights reserved.
Scheduler update
)
Hi, I've been getting Scheduler Request Failed: HTTP Internal Server Failure so I haven't been able to upload my last results or get new WU's.
I can check the site and see that the servers are up and I can obviously connect to the internet as I'm sending this reply as well.
Any help would be appreciated.
My appinfo file is below:
-
-
einstein_S5R5
-
einstein_S5R5_3.05_windows_intelx86.exe
-
einstein_S5R5_3.05_windows_intelx86_0.exe
-
einstein_S5R5_3.05_windows_intelx86_1.exe
-
einstein_S5R5_3.05_windows_intelx86_2.exe
-
einstein_S5R5_3.05_graphics_windows_intelx86.exe
-
einsteinbinary_ABP1
-
einsteinbinary_ABP1_3.10_graphics_windows_intelx86.exe
-
einsteinbinary_ABP1_3.10_windows_intelx86_cuda.exe
-
cudart.dll
-
cufft.dll
-
einstein_S5R5
305
6.3.0
-
einstein_S5R5_3.05_windows_intelx86.exe
-
einstein_S5R5_3.05_windows_intelx86_0.exe
-
einstein_S5R5_3.05_windows_intelx86_1.exe
-
einstein_S5R5_3.05_windows_intelx86_2.exe
-
einstein_S5R5_3.05_graphics_windows_intelx86.exe
graphics_app
-
einsteinbinary_ABP1
307
cuda
1.0
1.0
-
CUDA
1
6.7.0
-
einsteinbinary_ABP1_3.10_windows_intelx86_cuda.exe
-
einsteinbinary_ABP1_3.10_graphics_windows_intelx86.exe
graphics_app
-
cudart.dll
-
cufft.dll
-
einsteinbinary_ABP1
309
cuda
1.0
1.0
-
CUDA
1
6.7.0
-
einsteinbinary_ABP1_3.10_windows_intelx86_cuda.exe
-
einsteinbinary_ABP1_3.10_graphics_windows_intelx86.exe
graphics_app
-
cudart.dll
-
cufft.dll
-
einsteinbinary_ABP1
310
cuda
1.0
1.0
-
CUDA
1
6.7.0
-
einsteinbinary_ABP1_3.10_windows_intelx86_cuda.exe
-
einsteinbinary_ABP1_3.10_graphics_windows_intelx86.exe
graphics_app
-
cudart.dll
-
cufft.dll
RE: Hi, I've been getting
)
The "HTTP Internal Server Error" comes from a timeout the scheduler (cgi) gets when handling requests from your host. As far as I can see, you have many, many data files on your machine that your client doesn't delete, although it's been sent "delete" messages for them. Try manually deleting files h1_xxxx.xx_S5R4 and l1_xxxx.xx_S5R4 where xxxx.xx < 0995.00 from your project directory.
Yes, the scheduler is inefficient in handling many data files, and this is subject to possible improvements. But that part hasn't changed between the old and the new scheduler.
There is almost no S5R5 work left, you should adapt to S5R6 to get more CPU work. See the CUDA App thread in Cruncher's Corner.
BM
BM
RE: RE: Hi, I've been
)
Thanks for the prompt reply!
I deleted the suggested files (126 files!) and I'll try to adapt to S5R6.
Again, thanks!
Isaac
RE: RE: Hi, I've been
)
RE: First, it gives me an
)
I imagine the parsing error comes from the fact that you seem to have a minus character and a space in front of tags like , , , etc. There should be no such odd characters in front of these tags. You also have at one point a of 101 which can't be correct as it's a Linux version number and not a Windows one. I think it should be 305 rather than 101.
Your simplest solution might be to extract a new copy of app_info.xml from the beta test package and just edit the two lines for and so that they read 'einstein_S5R6' rather than 'einstein_S5R5'. The .... line shouldn't be causing a parsing error. Whether you need it or not depends on which version of BOINC you are using. Since I don't run the CUDA app, I have no knowledge or experience with this. You should reread the discussion about this in both the Linux and Windows CUDA app threads.
All data files that you have on board are listed in your state file (client_state.xml). If your client has been ignoring directives from the server to delete these files (as Bernd mentioned) it means that not only were the physical files not deleted, but also the entries for these files in your state file were not trimmed as well. In the distant past I have seen such behaviour where old left-over data files were hanging around so your experience is not unique. If you just delete the physical files, your client will communicate to the server that it has entries in the state file but is missing the actual physical files. The server will oblige by resending the missing files - probably all 126 of them in your case :-).
A possible solution is to stop BOINC, delete once again all these old files and then use a text editor like notepad to remove the entries for all of them from your state file. If you are careful just to remove the appropriate
h1_xxxx.xx_S5R4
....
....
....
blocks, one for each file that you are deleting (17 lines of text in total for each block), you will have no further problems. It's a bit tedious to do this but I've done this sort of cleanup quite a few times in the past with no problems. You could probably achieve the same outcome by resetting the project but you will lose all work you currently have on board if you do that.
If you don't have tasks on board or any completed work you are trying to upload or report, your easiest action would be to try resetting to see if that removes all the old entries from your state file. Do browse your state file and make sure you understand what I'm talking about before taking any drastic action.
Cheers,
Gary.
RE: I imagine the parsing
)
Hint: Don't do cut-and-paste from XML files by opening them in windows Internet Explorer (The minuses are where you can collapse tree branches in the IE view, they do not belong to the ML content at all).
Instead open the file in notepad, or, after opening in IE, chose "View Source" in the right click context menu.
Bikeman
RE: RE: First, it gives
)
Thanks for the insights Gary and Bikeman!
I edited the app_info file from a backup I made and I get no parsing errors :-)
I also found the state file and deleted all instances of the "old" files.
I restarted BOINC and when it sent the request to fetch CPU work units it gave me the "Scheduler Request Failed: HTTP Internal Server Error". I'm going to let it run during the night and check tomorrow to see what behavior I get.
Again, thanks for the help!
IC
RE: I restarted BOINC and
)
Not long after you posted the above message, your machine started downloading a whole bunch of new tasks. I counted 134, which is a bit surprising since there is supposed to be a 16 tasks per CPU daily limit and 8x16=128. Anyway I was about to suggest (until I looked and saw all the new tasks) that you should force a couple of retries through the 'update' button to see if the server would cooperate. So either you did that or perhaps your machine retried by itself and was certainly very successful at getting new work.
I presume all is back to normal now?
Cheers,
Gary.
RE: RE: I restarted BOINC
)
Gary et al,
YES! Everything seems to be working fine now :-)
Thanks for all the help!
IC
Hi, I don't get any more WU's
)
Hi,
I don't get any more WU's for one of my computers and I don't have any more WU's left to crunch, so in short, I don't get any more jobs from Einstein@home. Can anybody tell me what my problem can be?
Grts
Julie
.
Big Bang Corollary