Task running but not progressing

Andre Noel
Andre Noel
Joined: 14 Mar 09
Posts: 3
Credit: 7638954
RAC: 0
Topic 194619

Good Morning

I have a series of task running continously but not gaining any progress

The task numbers are h1_1035.05_S5R4_192n_S5R6a_m

where n is between 2 to 9 and m is 0 or 1

Most tasks have elapsed 10 or 11 hrs with not progress

thanks in advance

Andre

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1079
Credit: 341280
RAC: 0

Task running but not progressing

Did you try suspending/resuming the tasks/BOINC? Does a restart of BOINC or a reboot help?

Are you throttling your CPU via preferences?

The two completed tasks on this host show a very long time between setting up stacks and reading checkpoint, mostly about 10 minutes; but this one almost an hour:

2009-11-07 06:53:11.1406 [debug]: Reading SFTs and setting up stacks ... done
2009-11-07 07:44:44.4531 [debug]: Successfully read checkpoint


On my machine, it lasts only about a minute, except when I run CPU-intensive tasks besides BOINC.

We can't see the stderr output of your currently running tasks, but you could try to look at the file stderr.txt in the appropriate slot directory beneath your BOINC data directory to find those checkpoint times.

Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5883
Credit: 119044121498
RAC: 24877022

RE: I have a series of task

Quote:

I have a series of task running continously but not gaining any progress

The task numbers are h1_1035.05_S5R4_192n_S5R6a_m

where n is between 2 to 9 and m is 0 or 1


What you are quoting are task names and not task numbers. Unfortunately the name isn't much use. It's much better to quote the hostID (2162771 - click 'view computers' on your EAH account page) and even better still to give a link to the task list of the particular host that is having the problem. If you follow this link you will notice that all the tasks listed there have taskIDs or task numbers if you like, which are themselves clickable links. If you hover your mouse over any taskID you will be shown the task name for reference purposes.

The taskID links only have useful information after a result has been completed, uploaded and reported to the server. All your visible tasks that have been reported are Arecibo Binary Pulsar Search tasks (ABP1) and none of those show any problems. The tasks shown in green are all Gravity Wave (GW) tasks and as none are yet completed, you are the only person who can actually dig into what might be going on. As Gundolf mentions, it might give you some information if you examine the stderr.txt files for each task that is stuck. Go to your BOINC Data directory and browse the slots folder that you find there. You will find sub-folders named '0', '1', '2', etc, one for each task that is simultaneously running on your machine. If you browse each sub-folder in turn, it should be pretty obvious which ones contain an 'in-progress' EAH task, as opposed to that of some other project. You will actually be able to recognise the task names from other filenames there. Browse (but don't change) the stderr.txt file with a simple text editor like notepad and report back what you find there.

I took the opportunity to have a look at this host's task list over at Rosetta and could see a few failures there. The messages in those failures are quite indicative of a known BOINC problem that is triggered by setting your CPU use preference (on the website) to anything other than 100%. If you have set this preference lower than 100% please set it back to 100% on the website and then 'Update' the EAH project on the projects tab of BOINC Manager. This will communicate the preference change from the server to the client. Then fully stop and restart your BOINC client and then when the science app is restarted, the crunching should start making some normal progress. If it doesn't then your problem must be something else entirely.

Cheers,
Gary.

Andre Noel
Andre Noel
Joined: 14 Mar 09
Posts: 3
Credit: 7638954
RAC: 0

Thanks for the replies

Thanks for the replies Gundolf and Gary

I have tried the following actions:
- stopping and restarting BOINC.
- restoring my local preference to default. Please note that I am not aware of having modifed my internet preferences.
- Rebooting pc

No results so far

I went into the slot directory. For EAH, I did not found stderr in any of directories but only stderrgfx files. I have found stderr files in Rosetta and Seti, Rosetta and Seti are working correctly

All EAH have the same repeating information in stderrgfx files:
No node found using XPath expression: /project_preferences/graphics/@fps
No node found using XPath expression: /project_preferences/graphics/@quality
No node found using XPath expression: /project_preferences/graphics/@width
No node found using XPath expression: /project_preferences/graphics/@height
No node found using XPath expression: /project_preferences/graphics/@fps
No node found using XPath expression: /project_preferences/graphics/@quality
No node found using XPath expression: /project_preferences/graphics/@width
No node found using XPath expression: /project_preferences/graphics/@height
No node found using XPath expression: /project_preferences/graphics/@fps
No node found using XPath expression: /project_preferences/graphics/@quality
No node found using XPath expression: /project_preferences/graphics/@width
No node found using XPath expression: /project_preferences/graphics/@height

Regards
------
Andre

Geoff_D
Geoff_D
Joined: 6 Oct 09
Posts: 6
Credit: 23625551
RAC: 0

Hi... I posted this info on

Hi... I posted this info on another thread as well... that talks about problem with Einstein S5 WU's... thought this may be relevant here too....

I have had this problem too. Some additional information that may, or may not, help.

This problem seems to occur un my Intel machine, and not my AMD machine. May or may not be relevant.

The Intel machine is a dual - quad core (8 logical cpus) @ 2.33Ghz, with an nVidia 9400GT and 16Gb of Ram.

The AMD machine is dual - dual core (4 logical cpus) @ 2.4Ghz with an nVidia 210 and 6Gb of Ram.

The machine that has the problem also slowly builds up orphaned conhost processes in the task manager, and thus slowly losses availavle Ram.

It also lists quite a few tasks of the errant Einstein S5 applications, seems to be one for each conhost, more or less. These seem to use 0 CPU time in the task manager, but the elapsed time on the BOINC manager continues to climb upwards.

Both machines are running 64 bit versions of windows 7 ultimate, release versions, not release candidates or bets.

The part that confuses me is that it only appears on one machine, not the other.

Oh... the Intel machine that has the problems was recently upgraded from the previous version of BOIN, to the new 6.10.18 version... don't know if that is relevant either.

Hope some of that information may help in finding the problem.

Geoff D.

xyzzy
xyzzy
Joined: 12 Jul 09
Posts: 2
Credit: 25168304
RAC: 0

RE: Hi... I posted this

Message 95440 in response to message 95439

Quote:

Hi... I posted this info on another thread as well... that talks about problem with Einstein S5 WU's... thought this may be relevant here too....

Hi,
I have the same problem, similar machine as your Intel. My question, since I've seen posts re this problem for over a month, has it been solved yet? Or is it time to close down Einstein on my machine?
Thanks

I have had this problem too. Some additional information that may, or may not, help.

This problem seems to occur un my Intel machine, and not my AMD machine. May or may not be relevant.

The Intel machine is a dual - quad core (8 logical cpus) @ 2.33Ghz, with an nVidia 9400GT and 16Gb of Ram.

The AMD machine is dual - dual core (4 logical cpus) @ 2.4Ghz with an nVidia 210 and 6Gb of Ram.

The machine that has the problem also slowly builds up orphaned conhost processes in the task manager, and thus slowly losses availavle Ram.

It also lists quite a few tasks of the errant Einstein S5 applications, seems to be one for each conhost, more or less. These seem to use 0 CPU time in the task manager, but the elapsed time on the BOINC manager continues to climb upwards.

Both machines are running 64 bit versions of windows 7 ultimate, release versions, not release candidates or bets.

The part that confuses me is that it only appears on one machine, not the other.

Oh... the Intel machine that has the problems was recently upgraded from the previous version of BOIN, to the new 6.10.18 version... don't know if that is relevant either.

Hope some of that information may help in finding the problem.

Geoff D.


Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5883
Credit: 119044121498
RAC: 24877022

RE: RE: Hi... I posted

Message 95441 in response to message 95440

Quote:
Quote:
Hi... I posted this info on another thread as well... that talks about problem with Einstein S5 WU's... thought this may be relevant here too....

Hi,
I have the same problem, similar machine as your Intel. My question, since I've seen posts re this problem for over a month, has it been solved yet? Or is it time to close down Einstein on my machine?
Thanks


When I first saw your message, it looked like you had just quoted the previous message, without adding any content of your own. I was on the point of discarding the message when I noticed two paragraphs starting with "Hi" which made me look closer. To prevent your message being disguised like this, you should avoid placing your text within the quoted text of the previous message. You can always break the single quote into two (or more) quotes by placing extra closing and opening quote tags where you wish and then you can place your responses between each of the separate sub-quotes you have created.

As you mention, if you are seeing the 'zero progress' problem, it has indeed been reported by a number of people over the last month or more. Bernd has already mentioned that he will address this as a matter or urgency very soon now. The problem appears to be associated with Win7 and the use of the so-called 'switcher' app which selects the best science app version to run for your particular CPU. The workaround is to bypass the switcher app and run the appropriate science app directly using BOINC's anonymous platform (AP) mechanism. Bikeman made the suggestion here and I posted a suitable app_info.xml file in the message that immediately followed the linked message.

If you are willing to try, it would be good to see if the workaround solves the problem for you. The first thing you need to do is create a file (on your desktop if you like) called exactly app_info.xml using a text editor like notepad. You can get the contents you need from the example given in my previous message. The easiest way to get all the text correctly formatted is to start a reply to the message and then copy and paste everything between the opening and closing 'code' tags but not the tags themselves. When you paste the text into notepad, you will see the correct formatting. You can save as 'all files' with the name 'app_info.xml' and then you can abandon the 'reply' to the message that you started.

All you now need to do is listed below:-

  • * Place a copy of app_info.xml into your E@H project directory.
    * Completely stop BOINC (and confirm with task manager).
    * Re-start BOINC.

In the BOINC startup messages you should see a message to the effect that app_inf.xml was found and that you are using anonymous platform. If you don't see this, you haven't placed the app_info.xml file in the correct folder.

If you don't know where your E@H project directory is, take a look at any set of BOINC startup messages (BOINC Manager Messages Tab) and in the first few lines of messages BOINC will list the BOINC Data directory. If you explore that directory you will find a folder called 'projects' and if you explore 'projects' you will find a folder called 'einstein.phys.uwm.edu'. That is your E@H project directory (or folder).

When Bernd fixes this problem (my guess is a week or two) you will want to stop using AP and revert to the normal method where new apps are distributed automatically. At that time, you need to set NNT (No New Tasks) in BOINC Manager and allow your cache of work to complete and be reported. Use the 'update' button to immediately report the last tasks as they finish. Then simply stop BOINC, delete app_info.xml and restart BOINC. BOINC will then download the new applications and new work to be processed.

If you have any questions, please ask.

Cheers,
Gary.

Andre Noel
Andre Noel
Joined: 14 Mar 09
Posts: 3
Credit: 7638954
RAC: 0

Hi all Everything is back

Message 95442 in response to message 95441

Hi all

Everything is back in shape. Working fine since a couple of days

Thank you

Regards

Andre

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 798588914
RAC: 1199146

As there's a new,

Message 95443 in response to message 95441

As there's a new, "switcher-less" appp for Windows now to fix this problem under Windows 7, please follow Gary's instructions:

Quote:

When Bernd fixes this problem (my guess is a week or two) you will want to stop using AP and revert to the normal method where new apps are distributed automatically. At that time, you need to set NNT (No New Tasks) in BOINC Manager and allow your cache of work to complete and be reported. Use the 'update' button to immediately report the last tasks as they finish. Then simply stop BOINC, delete app_info.xml and restart BOINC. BOINC will then download the new applications and new work to be processed.

CU
HB

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.