Scheduler update

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4332
Credit: 252167576
RAC: 33650
Topic 194625

We updated our scheduler (and the database). The downtime was somewhat longer than we expected, sorry for this.

The only difference that you should notice right now is that the "(Project has no jobs available)" messages should have gone away.

If you notice other problems related to the new scheduler, please post here.

BM

BM

I Cruz
I Cruz
Joined: 5 Dec 05
Posts: 5
Credit: 294478
RAC: 0

Scheduler update

Quote:

We updated our scheduler (and the database). The downtime was somewhat longer than we expected, sorry for this.

The only difference that you should notice right now is that the "(Project has no jobs available)" messages should have gone away.

If you notice other problems related to the new scheduler, please post here.

BM

Hi, I've been getting Scheduler Request Failed: HTTP Internal Server Failure so I haven't been able to upload my last results or get new WU's.

I can check the site and see that the servers are up and I can obviously connect to the internet as I'm sending this reply as well.

Any help would be appreciated.

My appinfo file is below:

-
-
einstein_S5R5

-
einstein_S5R5_3.05_windows_intelx86.exe


-
einstein_S5R5_3.05_windows_intelx86_0.exe


-
einstein_S5R5_3.05_windows_intelx86_1.exe


-
einstein_S5R5_3.05_windows_intelx86_2.exe


-
einstein_S5R5_3.05_graphics_windows_intelx86.exe


-
einsteinbinary_ABP1

-
einsteinbinary_ABP1_3.10_graphics_windows_intelx86.exe


-
einsteinbinary_ABP1_3.10_windows_intelx86_cuda.exe


-
cudart.dll


-
cufft.dll


-
einstein_S5R5
305
6.3.0
-
einstein_S5R5_3.05_windows_intelx86.exe


-
einstein_S5R5_3.05_windows_intelx86_0.exe

-
einstein_S5R5_3.05_windows_intelx86_1.exe

-
einstein_S5R5_3.05_windows_intelx86_2.exe

-
einstein_S5R5_3.05_graphics_windows_intelx86.exe
graphics_app


-
einsteinbinary_ABP1
307
cuda
1.0
1.0
-
CUDA
1

6.7.0
-
einsteinbinary_ABP1_3.10_windows_intelx86_cuda.exe


-
einsteinbinary_ABP1_3.10_graphics_windows_intelx86.exe
graphics_app

-
cudart.dll

-
cufft.dll


-
einsteinbinary_ABP1
309
cuda
1.0
1.0
-
CUDA
1

6.7.0
-
einsteinbinary_ABP1_3.10_windows_intelx86_cuda.exe


-
einsteinbinary_ABP1_3.10_graphics_windows_intelx86.exe
graphics_app

-
cudart.dll

-
cufft.dll


-
einsteinbinary_ABP1
310
cuda
1.0
1.0
-
CUDA
1

6.7.0
-
einsteinbinary_ABP1_3.10_windows_intelx86_cuda.exe


-
einsteinbinary_ABP1_3.10_graphics_windows_intelx86.exe
graphics_app

-
cudart.dll

-
cufft.dll


Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4332
Credit: 252167576
RAC: 33650

RE: Hi, I've been getting

Message 95492 in response to message 95491

Quote:

Hi, I've been getting Scheduler Request Failed: HTTP Internal Server Failure so I haven't been able to upload my last results or get new WU's.

I can check the site and see that the servers are up and I can obviously connect to the internet as I'm sending this reply as well.

Any help would be appreciated.

The "HTTP Internal Server Error" comes from a timeout the scheduler (cgi) gets when handling requests from your host. As far as I can see, you have many, many data files on your machine that your client doesn't delete, although it's been sent "delete" messages for them. Try manually deleting files h1_xxxx.xx_S5R4 and l1_xxxx.xx_S5R4 where xxxx.xx < 0995.00 from your project directory.

Yes, the scheduler is inefficient in handling many data files, and this is subject to possible improvements. But that part hasn't changed between the old and the new scheduler.

Quote:
My appinfo file is below:

There is almost no S5R5 work left, you should adapt to S5R6 to get more CPU work. See the CUDA App thread in Cruncher's Corner.

BM

BM

I Cruz
I Cruz
Joined: 5 Dec 05
Posts: 5
Credit: 294478
RAC: 0

RE: RE: Hi, I've been

Message 95493 in response to message 95492

Quote:
Quote:

Hi, I've been getting Scheduler Request Failed: HTTP Internal Server Failure so I haven't been able to upload my last results or get new WU's.

I can check the site and see that the servers are up and I can obviously connect to the internet as I'm sending this reply as well.

Any help would be appreciated.

The "HTTP Internal Server Error" comes from a timeout the scheduler (cgi) gets when handling requests from your host. As far as I can see, you have many, many data files on your machine that your client doesn't delete, although it's been sent "delete" messages for them. Try manually deleting files h1_xxxx.xx_S5R4 and l1_xxxx.xx_S5R4 where xxxx.xx < 0995.00 from your project directory.

Yes, the scheduler is inefficient in handling many data files, and this is subject to possible improvements. But that part hasn't changed between the old and the new scheduler.

Quote:
My appinfo file is below:

There is almost no S5R5 work left, you should adapt to S5R6 to get more CPU work. See the CUDA App thread in Cruncher's Corner.

BM

Thanks for the prompt reply!

I deleted the suggested files (126 files!) and I'll try to adapt to S5R6.

Again, thanks!

Isaac

I Cruz
I Cruz
Joined: 5 Dec 05
Posts: 5
Credit: 294478
RAC: 0

RE: RE: Hi, I've been

Message 95494 in response to message 95492

Quote:
Quote:

Hi, I've been getting Scheduler Request Failed: HTTP Internal Server Failure so I haven't been able to upload my last results or get new WU's.

I can check the site and see that the servers are up and I can obviously connect to the internet as I'm sending this reply as well.

Any help would be appreciated.

The "HTTP Internal Server Error" comes from a timeout the scheduler (cgi) gets when handling requests from your host. As far as I can see, you have many, many data files on your machine that your client doesn't delete, although it's been sent "delete" messages for them. Try manually deleting files h1_xxxx.xx_S5R4 and l1_xxxx.xx_S5R4 where xxxx.xx tags are confusing?

Second, when I restarted BOINC it downloaded a ton of i1 and h1_xxxx.xx_S5R4 with xxxx.xx in the 0900 to 0988 range! These are the files I just deleted. I'm lost.

Any suggestions?

Thanks!

Isaac

Modified appinfo file:
-
-
einstein_S5R6

-
einstein_S5R5_3.05_windows_intelx86.exe


-
einstein_S5R5_3.05_windows_intelx86_0.exe


-
einstein_S5R5_3.05_windows_intelx86_1.exe


-
einstein_S5R5_3.05_windows_intelx86_2.exe


-
einstein_S5R5_3.05_graphics_windows_intelx86.exe


-
einsteinbinary_ABP1

-
einsteinbinary_ABP1_3.10_graphics_windows_intelx86.exe


-
einsteinbinary_ABP1_3.10_windows_intelx86_cuda.exe


-
cudart.dll


-
cufft.dll


-
einstein_S5R6
101
6.3.0
-
einstein_S5R5_3.05_windows_intelx86.exe


-
einstein_S5R5_3.05_windows_intelx86_0.exe

-
einstein_S5R5_3.05_windows_intelx86_1.exe

-
einstein_S5R5_3.05_windows_intelx86_2.exe

-
einstein_S5R5_3.05_graphics_windows_intelx86.exe
graphics_app


-
einsteinbinary_ABP1
307
cuda
3000000000.0
1.0
1.0
-
CUDA
1

6.7.0
-
einsteinbinary_ABP1_3.10_windows_intelx86_cuda.exe


-
einsteinbinary_ABP1_3.10_graphics_windows_intelx86.exe
graphics_app

-
cudart.dll

-
cufft.dll


-
einsteinbinary_ABP1
309
cuda
3000000000.0
1.0
1.0
-
CUDA
1

6.7.0
-
einsteinbinary_ABP1_3.10_windows_intelx86_cuda.exe


-
einsteinbinary_ABP1_3.10_graphics_windows_intelx86.exe
graphics_app

-
cudart.dll

-
cufft.dll


-
einsteinbinary_ABP1
310
cuda
3000000000.0
1.0
1.0
-
CUDA
1

6.7.0
-
einsteinbinary_ABP1_3.10_windows_intelx86_cuda.exe


-
einsteinbinary_ABP1_3.10_graphics_windows_intelx86.exe
graphics_app

-
cudart.dll

-
cufft.dll


Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5877
Credit: 118658620501
RAC: 19059828

RE: First, it gives me an

Message 95495 in response to message 95494

Quote:
First, it gives me an XML parsing error even though (as shown below) it looks like it should work. Maybe the tags are confusing?


I imagine the parsing error comes from the fact that you seem to have a minus character and a space in front of tags like , , , etc. There should be no such odd characters in front of these tags. You also have at one point a of 101 which can't be correct as it's a Linux version number and not a Windows one. I think it should be 305 rather than 101.

Your simplest solution might be to extract a new copy of app_info.xml from the beta test package and just edit the two lines for and so that they read 'einstein_S5R6' rather than 'einstein_S5R5'. The .... line shouldn't be causing a parsing error. Whether you need it or not depends on which version of BOINC you are using. Since I don't run the CUDA app, I have no knowledge or experience with this. You should reread the discussion about this in both the Linux and Windows CUDA app threads.

Quote:
Second, when I restarted BOINC it downloaded a ton of i1 and h1_xxxx.xx_S5R4 with xxxx.xx in the 0900 to 0988 range! These are the files I just deleted. I'm lost.


All data files that you have on board are listed in your state file (client_state.xml). If your client has been ignoring directives from the server to delete these files (as Bernd mentioned) it means that not only were the physical files not deleted, but also the entries for these files in your state file were not trimmed as well. In the distant past I have seen such behaviour where old left-over data files were hanging around so your experience is not unique. If you just delete the physical files, your client will communicate to the server that it has entries in the state file but is missing the actual physical files. The server will oblige by resending the missing files - probably all 126 of them in your case :-).

A possible solution is to stop BOINC, delete once again all these old files and then use a text editor like notepad to remove the entries for all of them from your state file. If you are careful just to remove the appropriate

h1_xxxx.xx_S5R4
....
....
....

blocks, one for each file that you are deleting (17 lines of text in total for each block), you will have no further problems. It's a bit tedious to do this but I've done this sort of cleanup quite a few times in the past with no problems. You could probably achieve the same outcome by resetting the project but you will lose all work you currently have on board if you do that.

If you don't have tasks on board or any completed work you are trying to upload or report, your easiest action would be to try resetting to see if that removes all the old entries from your state file. Do browse your state file and make sure you understand what I'm talking about before taking any drastic action.

Cheers,
Gary.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 780382641
RAC: 1201549

RE: I imagine the parsing

Message 95496 in response to message 95495

Quote:
I imagine the parsing error comes from the fact that you seem to have a minus character and a space in front of tags like , , , etc. There should be no such odd characters in front of these tags.

Hint: Don't do cut-and-paste from XML files by opening them in windows Internet Explorer (The minuses are where you can collapse tree branches in the IE view, they do not belong to the ML content at all).

Instead open the file in notepad, or, after opening in IE, chose "View Source" in the right click context menu.

Bikeman

I Cruz
I Cruz
Joined: 5 Dec 05
Posts: 5
Credit: 294478
RAC: 0

RE: RE: First, it gives

Message 95497 in response to message 95495

Quote:
Quote:
First, it gives me an XML parsing error even though (as shown below) it looks like it should work. Maybe the tags are confusing?

I imagine the parsing error comes from the fact that you seem to have a minus character and a space in front of tags like , , , etc. There should be no such odd characters in front of these tags. You also have at one point a of 101 which can't be correct as it's a Linux version number and not a Windows one. I think it should be 305 rather than 101.

Your simplest solution might be to extract a new copy of app_info.xml from the beta test package and just edit the two lines for and so that they read 'einstein_S5R6' rather than 'einstein_S5R5'. The .... line shouldn't be causing a parsing error. Whether you need it or not depends on which version of BOINC you are using. Since I don't run the CUDA app, I have no knowledge or experience with this. You should reread the discussion about this in both the Linux and Windows CUDA app threads.

Quote:
Second, when I restarted BOINC it downloaded a ton of i1 and h1_xxxx.xx_S5R4 with xxxx.xx in the 0900 to 0988 range! These are the files I just deleted. I'm lost.

All data files that you have on board are listed in your state file (client_state.xml). If your client has been ignoring directives from the server to delete these files (as Bernd mentioned) it means that not only were the physical files not deleted, but also the entries for these files in your state file were not trimmed as well. In the distant past I have seen such behaviour where old left-over data files were hanging around so your experience is not unique. If you just delete the physical files, your client will communicate to the server that it has entries in the state file but is missing the actual physical files. The server will oblige by resending the missing files - probably all 126 of them in your case :-).

A possible solution is to stop BOINC, delete once again all these old files and then use a text editor like notepad to remove the entries for all of them from your state file. If you are careful just to remove the appropriate

h1_xxxx.xx_S5R4
....
....
....

blocks, one for each file that you are deleting (17 lines of text in total for each block), you will have no further problems. It's a bit tedious to do this but I've done this sort of cleanup quite a few times in the past with no problems. You could probably achieve the same outcome by resetting the project but you will lose all work you currently have on board if you do that.

If you don't have tasks on board or any completed work you are trying to upload or report, your easiest action would be to try resetting to see if that removes all the old entries from your state file. Do browse your state file and make sure you understand what I'm talking about before taking any drastic action.

Thanks for the insights Gary and Bikeman!

I edited the app_info file from a backup I made and I get no parsing errors :-)

I also found the state file and deleted all instances of the "old" files.

I restarted BOINC and when it sent the request to fetch CPU work units it gave me the "Scheduler Request Failed: HTTP Internal Server Error". I'm going to let it run during the night and check tomorrow to see what behavior I get.

Again, thanks for the help!

IC

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5877
Credit: 118658620501
RAC: 19059828

RE: I restarted BOINC and

Message 95498 in response to message 95497

Quote:
I restarted BOINC and when it sent the request to fetch CPU work units it gave me the "Scheduler Request Failed: HTTP Internal Server Error". I'm going to let it run during the night and check tomorrow to see what behavior I get.


Not long after you posted the above message, your machine started downloading a whole bunch of new tasks. I counted 134, which is a bit surprising since there is supposed to be a 16 tasks per CPU daily limit and 8x16=128. Anyway I was about to suggest (until I looked and saw all the new tasks) that you should force a couple of retries through the 'update' button to see if the server would cooperate. So either you did that or perhaps your machine retried by itself and was certainly very successful at getting new work.

I presume all is back to normal now?

Cheers,
Gary.

I Cruz
I Cruz
Joined: 5 Dec 05
Posts: 5
Credit: 294478
RAC: 0

RE: RE: I restarted BOINC

Message 95499 in response to message 95498

Quote:
Quote:
I restarted BOINC and when it sent the request to fetch CPU work units it gave me the "Scheduler Request Failed: HTTP Internal Server Error". I'm going to let it run during the night and check tomorrow to see what behavior I get.

Not long after you posted the above message, your machine started downloading a whole bunch of new tasks. I counted 134, which is a bit surprising since there is supposed to be a 16 tasks per CPU daily limit and 8x16=128. Anyway I was about to suggest (until I looked and saw all the new tasks) that you should force a couple of retries through the 'update' button to see if the server would cooperate. So either you did that or perhaps your machine retried by itself and was certainly very successful at getting new work.

I presume all is back to normal now?

Gary et al,

YES! Everything seems to be working fine now :-)

Thanks for all the help!

IC

Julie
Julie
Joined: 7 Dec 09
Posts: 166
Credit: 772927
RAC: 0

Hi, I don't get any more WU's

Hi,
I don't get any more WU's for one of my computers and I don't have any more WU's left to crunch, so in short, I don't get any more jobs from Einstein@home. Can anybody tell me what my problem can be?
Grts
Julie

.

Big Bang Corollary

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.