Scheduler update

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4332

Credit: 252167576

RAC: 33650

13 Nov 2009 15:55:26 UTC

Topic 194625

(moderation:

)

We updated our scheduler (and the database). The downtime was somewhat longer than we expected, sorry for this.

The only difference that you should notice right now is that the "(Project has no jobs available)" messages should have gone away.

If you notice other problems related to the new scheduler, please post here.

I Cruz

Joined: 5 Dec 05

Posts: 5

Credit: 294478

RAC: 0

Scheduler update

18 Nov 2009 2:41:30 UTC

Message 95491

(moderation:

)

Quote:

We updated our scheduler (and the database). The downtime was somewhat longer than we expected, sorry for this.

The only difference that you should notice right now is that the "(Project has no jobs available)" messages should have gone away.

If you notice other problems related to the new scheduler, please post here.

BM

Hi, I've been getting Scheduler Request Failed: HTTP Internal Server Failure so I haven't been able to upload my last results or get new WU's.

I can check the site and see that the servers are up and I can obviously connect to the internet as I'm sending this reply as well.

Any help would be appreciated.

My appinfo file is below:

-
-
einstein_S5R5

-
einstein_S5R5_3.05_windows_intelx86.exe

-
einstein_S5R5_3.05_windows_intelx86_0.exe

-
einstein_S5R5_3.05_windows_intelx86_1.exe

-
einstein_S5R5_3.05_windows_intelx86_2.exe

-
einstein_S5R5_3.05_graphics_windows_intelx86.exe

-
einsteinbinary_ABP1

-
einsteinbinary_ABP1_3.10_graphics_windows_intelx86.exe

-
einsteinbinary_ABP1_3.10_windows_intelx86_cuda.exe

-
cudart.dll

-
cufft.dll

-
einstein_S5R5
305
6.3.0
-
einstein_S5R5_3.05_windows_intelx86.exe

-
einstein_S5R5_3.05_windows_intelx86_0.exe

-
einstein_S5R5_3.05_windows_intelx86_1.exe

-
einstein_S5R5_3.05_windows_intelx86_2.exe

-
einstein_S5R5_3.05_graphics_windows_intelx86.exe
graphics_app

-
einsteinbinary_ABP1
307
cuda
1.0
1.0
-
CUDA
1

6.7.0
-
einsteinbinary_ABP1_3.10_windows_intelx86_cuda.exe

-
einsteinbinary_ABP1_3.10_graphics_windows_intelx86.exe
graphics_app

-
cudart.dll

-
cufft.dll

-
einsteinbinary_ABP1
309
cuda
1.0
1.0
-
CUDA
1

6.7.0
-
einsteinbinary_ABP1_3.10_windows_intelx86_cuda.exe

-
einsteinbinary_ABP1_3.10_graphics_windows_intelx86.exe
graphics_app

-
cudart.dll

-
cufft.dll

-
einsteinbinary_ABP1
310
cuda
1.0
1.0
-
CUDA
1

6.7.0
-
einsteinbinary_ABP1_3.10_windows_intelx86_cuda.exe

-
einsteinbinary_ABP1_3.10_graphics_windows_intelx86.exe
graphics_app

-
cudart.dll

-
cufft.dll

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4332

Credit: 252167576

RAC: 33650

RE: Hi, I've been getting

18 Nov 2009 11:20:43 UTC

Message 95492 in response to message 95491

(moderation:

)

Quote:

Hi, I've been getting Scheduler Request Failed: HTTP Internal Server Failure so I haven't been able to upload my last results or get new WU's.

I can check the site and see that the servers are up and I can obviously connect to the internet as I'm sending this reply as well.

Any help would be appreciated.

The "HTTP Internal Server Error" comes from a timeout the scheduler (cgi) gets when handling requests from your host. As far as I can see, you have many, many data files on your machine that your client doesn't delete, although it's been sent "delete" messages for them. Try manually deleting files h1_xxxx.xx_S5R4 and l1_xxxx.xx_S5R4 where xxxx.xx < 0995.00 from your project directory.

Yes, the scheduler is inefficient in handling many data files, and this is subject to possible improvements. But that part hasn't changed between the old and the new scheduler.

Quote:

My appinfo file is below:

There is almost no S5R5 work left, you should adapt to S5R6 to get more CPU work. See the CUDA App thread in Cruncher's Corner.

I Cruz

Joined: 5 Dec 05

Posts: 5

Credit: 294478

RAC: 0

RE: RE: Hi, I've been

18 Nov 2009 21:28:06 UTC

Message 95493 in response to message 95492

(moderation:

)

Quote:

Quote:
Hi, I've been getting Scheduler Request Failed: HTTP Internal Server Failure so I haven't been able to upload my last results or get new WU's.

I can check the site and see that the servers are up and I can obviously connect to the internet as I'm sending this reply as well.

Any help would be appreciated.

The "HTTP Internal Server Error" comes from a timeout the scheduler (cgi) gets when handling requests from your host. As far as I can see, you have many, many data files on your machine that your client doesn't delete, although it's been sent "delete" messages for them. Try manually deleting files h1_xxxx.xx_S5R4 and l1_xxxx.xx_S5R4 where xxxx.xx < 0995.00 from your project directory.

Yes, the scheduler is inefficient in handling many data files, and this is subject to possible improvements. But that part hasn't changed between the old and the new scheduler.

Quote:
My appinfo file is below:

There is almost no S5R5 work left, you should adapt to S5R6 to get more CPU work. See the CUDA App thread in Cruncher's Corner.

BM

Thanks for the prompt reply!

I deleted the suggested files (126 files!) and I'll try to adapt to S5R6.

Again, thanks!

Isaac

I Cruz

Joined: 5 Dec 05

Posts: 5

Credit: 294478

RAC: 0

RE: RE: Hi, I've been

18 Nov 2009 22:44:05 UTC

Message 95494 in response to message 95492

(moderation:

)

Quote:

Quote:
Hi, I've been getting Scheduler Request Failed: HTTP Internal Server Failure so I haven't been able to upload my last results or get new WU's.

I can check the site and see that the servers are up and I can obviously connect to the internet as I'm sending this reply as well.

Any help would be appreciated.

The "HTTP Internal Server Error" comes from a timeout the scheduler (cgi) gets when handling requests from your host. As far as I can see, you have many, many data files on your machine that your client doesn't delete, although it's been sent "delete" messages for them. Try manually deleting files h1_xxxx.xx_S5R4 and l1_xxxx.xx_S5R4 where xxxx.xx tags are confusing?

Second, when I restarted BOINC it downloaded a ton of i1 and h1_xxxx.xx_S5R4 with xxxx.xx in the 0900 to 0988 range! These are the files I just deleted. I'm lost.

Any suggestions?

Thanks!

Isaac

Modified appinfo file:
-
-
einstein_S5R6

-
einstein_S5R5_3.05_windows_intelx86.exe

-
einstein_S5R5_3.05_windows_intelx86_0.exe

-
einstein_S5R5_3.05_windows_intelx86_1.exe

-
einstein_S5R5_3.05_windows_intelx86_2.exe

-
einstein_S5R5_3.05_graphics_windows_intelx86.exe

-
einsteinbinary_ABP1

-
einsteinbinary_ABP1_3.10_graphics_windows_intelx86.exe

-
einsteinbinary_ABP1_3.10_windows_intelx86_cuda.exe

-
cudart.dll

-
cufft.dll

-
einstein_S5R6
101
6.3.0
-
einstein_S5R5_3.05_windows_intelx86.exe

-
einstein_S5R5_3.05_windows_intelx86_0.exe

-
einstein_S5R5_3.05_windows_intelx86_1.exe

-
einstein_S5R5_3.05_windows_intelx86_2.exe

-
einstein_S5R5_3.05_graphics_windows_intelx86.exe
graphics_app

-
einsteinbinary_ABP1
307
cuda
3000000000.0
1.0
1.0
-
CUDA
1

6.7.0
-
einsteinbinary_ABP1_3.10_windows_intelx86_cuda.exe

-
einsteinbinary_ABP1_3.10_graphics_windows_intelx86.exe
graphics_app

-
cudart.dll

-
cufft.dll

-
einsteinbinary_ABP1
309
cuda
3000000000.0
1.0
1.0
-
CUDA
1

6.7.0
-
einsteinbinary_ABP1_3.10_windows_intelx86_cuda.exe

-
einsteinbinary_ABP1_3.10_graphics_windows_intelx86.exe
graphics_app

-
cudart.dll

-
cufft.dll

-
einsteinbinary_ABP1
310
cuda
3000000000.0
1.0
1.0
-
CUDA
1

6.7.0
-
einsteinbinary_ABP1_3.10_windows_intelx86_cuda.exe

-
einsteinbinary_ABP1_3.10_graphics_windows_intelx86.exe
graphics_app

-
cudart.dll

-
cufft.dll

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5877

Credit: 118658620501

RAC: 19059828

RE: First, it gives me an

19 Nov 2009 7:37:42 UTC

Message 95495 in response to message 95494

(moderation:

)

Quote:

First, it gives me an XML parsing error even though (as shown below) it looks like it should work. Maybe the tags are confusing?

I imagine the parsing error comes from the fact that you seem to have a minus character and a space in front of tags like , , , etc. There should be no such odd characters in front of these tags. You also have at one point a of 101 which can't be correct as it's a Linux version number and not a Windows one. I think it should be 305 rather than 101.

Your simplest solution might be to extract a new copy of app_info.xml from the beta test package and just edit the two lines for and so that they read 'einstein_S5R6' rather than 'einstein_S5R5'. The .... line shouldn't be causing a parsing error. Whether you need it or not depends on which version of BOINC you are using. Since I don't run the CUDA app, I have no knowledge or experience with this. You should reread the discussion about this in both the Linux and Windows CUDA app threads.

Quote:

Second, when I restarted BOINC it downloaded a ton of i1 and h1_xxxx.xx_S5R4 with xxxx.xx in the 0900 to 0988 range! These are the files I just deleted. I'm lost.

All data files that you have on board are listed in your state file (client_state.xml). If your client has been ignoring directives from the server to delete these files (as Bernd mentioned) it means that not only were the physical files not deleted, but also the entries for these files in your state file were not trimmed as well. In the distant past I have seen such behaviour where old left-over data files were hanging around so your experience is not unique. If you just delete the physical files, your client will communicate to the server that it has entries in the state file but is missing the actual physical files. The server will oblige by resending the missing files - probably all 126 of them in your case :-).

A possible solution is to stop BOINC, delete once again all these old files and then use a text editor like notepad to remove the entries for all of them from your state file. If you are careful just to remove the appropriate

h1_xxxx.xx_S5R4
....
....
....

blocks, one for each file that you are deleting (17 lines of text in total for each block), you will have no further problems. It's a bit tedious to do this but I've done this sort of cleanup quite a few times in the past with no problems. You could probably achieve the same outcome by resetting the project but you will lose all work you currently have on board if you do that.

If you don't have tasks on board or any completed work you are trying to upload or report, your easiest action would be to try resetting to see if that removes all the old entries from your state file. Do browse your state file and make sure you understand what I'm talking about before taking any drastic action.

Cheers,
Gary.

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 780382641

RAC: 1201549

RE: I imagine the parsing

19 Nov 2009 8:54:00 UTC

Message 95496 in response to message 95495

(moderation:

)

Quote:

I imagine the parsing error comes from the fact that you seem to have a minus character and a space in front of tags like , , , etc. There should be no such odd characters in front of these tags.

Hint: Don't do cut-and-paste from XML files by opening them in windows Internet Explorer (The minuses are where you can collapse tree branches in the IE view, they do not belong to the ML content at all).

Instead open the file in notepad, or, after opening in IE, chose "View Source" in the right click context menu.

Bikeman

I Cruz

Joined: 5 Dec 05

Posts: 5

Credit: 294478

RAC: 0

RE: RE: First, it gives

20 Nov 2009 2:38:41 UTC

Message 95497 in response to message 95495

(moderation:

)

Quote:

Quote:
First, it gives me an XML parsing error even though (as shown below) it looks like it should work. Maybe the tags are confusing?

I imagine the parsing error comes from the fact that you seem to have a minus character and a space in front of tags like , , , etc. There should be no such odd characters in front of these tags. You also have at one point a of 101 which can't be correct as it's a Linux version number and not a Windows one. I think it should be 305 rather than 101.

Your simplest solution might be to extract a new copy of app_info.xml from the beta test package and just edit the two lines for and so that they read 'einstein_S5R6' rather than 'einstein_S5R5'. The .... line shouldn't be causing a parsing error. Whether you need it or not depends on which version of BOINC you are using. Since I don't run the CUDA app, I have no knowledge or experience with this. You should reread the discussion about this in both the Linux and Windows CUDA app threads.

Quote:
Second, when I restarted BOINC it downloaded a ton of i1 and h1_xxxx.xx_S5R4 with xxxx.xx in the 0900 to 0988 range! These are the files I just deleted. I'm lost.

All data files that you have on board are listed in your state file (client_state.xml). If your client has been ignoring directives from the server to delete these files (as Bernd mentioned) it means that not only were the physical files not deleted, but also the entries for these files in your state file were not trimmed as well. In the distant past I have seen such behaviour where old left-over data files were hanging around so your experience is not unique. If you just delete the physical files, your client will communicate to the server that it has entries in the state file but is missing the actual physical files. The server will oblige by resending the missing files - probably all 126 of them in your case :-).

A possible solution is to stop BOINC, delete once again all these old files and then use a text editor like notepad to remove the entries for all of them from your state file. If you are careful just to remove the appropriate

h1_xxxx.xx_S5R4
....
....
....

blocks, one for each file that you are deleting (17 lines of text in total for each block), you will have no further problems. It's a bit tedious to do this but I've done this sort of cleanup quite a few times in the past with no problems. You could probably achieve the same outcome by resetting the project but you will lose all work you currently have on board if you do that.

If you don't have tasks on board or any completed work you are trying to upload or report, your easiest action would be to try resetting to see if that removes all the old entries from your state file. Do browse your state file and make sure you understand what I'm talking about before taking any drastic action.

Thanks for the insights Gary and Bikeman!

I edited the app_info file from a backup I made and I get no parsing errors :-)

I also found the state file and deleted all instances of the "old" files.

I restarted BOINC and when it sent the request to fetch CPU work units it gave me the "Scheduler Request Failed: HTTP Internal Server Error". I'm going to let it run during the night and check tomorrow to see what behavior I get.

Again, thanks for the help!

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5877

Credit: 118658620501

RAC: 19059828

RE: I restarted BOINC and

20 Nov 2009 4:25:43 UTC

Message 95498 in response to message 95497

(moderation:

)

Quote:

I restarted BOINC and when it sent the request to fetch CPU work units it gave me the "Scheduler Request Failed: HTTP Internal Server Error". I'm going to let it run during the night and check tomorrow to see what behavior I get.

Not long after you posted the above message, your machine started downloading a whole bunch of new tasks. I counted 134, which is a bit surprising since there is supposed to be a 16 tasks per CPU daily limit and 8x16=128. Anyway I was about to suggest (until I looked and saw all the new tasks) that you should force a couple of retries through the 'update' button to see if the server would cooperate. So either you did that or perhaps your machine retried by itself and was certainly very successful at getting new work.

I presume all is back to normal now?

Cheers,
Gary.

I Cruz

Joined: 5 Dec 05

Posts: 5

Credit: 294478

RAC: 0

RE: RE: I restarted BOINC

20 Nov 2009 23:20:13 UTC

Message 95499 in response to message 95498

(moderation:

)

Quote:

Quote:
I restarted BOINC and when it sent the request to fetch CPU work units it gave me the "Scheduler Request Failed: HTTP Internal Server Error". I'm going to let it run during the night and check tomorrow to see what behavior I get.

Not long after you posted the above message, your machine started downloading a whole bunch of new tasks. I counted 134, which is a bit surprising since there is supposed to be a 16 tasks per CPU daily limit and 8x16=128. Anyway I was about to suggest (until I looked and saw all the new tasks) that you should force a couple of retries through the 'update' button to see if the server would cooperate. So either you did that or perhaps your machine retried by itself and was certainly very successful at getting new work.

I presume all is back to normal now?

Gary et al,

YES! Everything seems to be working fine now :-)

Thanks for all the help!

Julie

Joined: 7 Dec 09

Posts: 166

Credit: 772927

RAC: 0

Hi, I don't get any more WU's

13 Dec 2009 22:13:08 UTC

Message 95500

(moderation:

)

Hi,
I don't get any more WU's for one of my computers and I don't have any more WU's left to crunch, so in short, I don't get any more jobs from Einstein@home. Can anybody tell me what my problem can be?
Grts
Julie

Big Bang Corollary

Scheduler update

Forums › Problems and Bug Reports

Comment viewing options

Forums › Problems and Bug Reports