How did you achieve that? I only know of stopping ABP1 processing.
By actually aborting them all as they arrive. Take a look back through the results list and you will see that all GW tasks are being aborted as they arrive. Around two days ago, all seemed fine with both types being processed normally. There doesn't appear to be any particular task that gave rise to an error condition. All aborted tasks showed no evidence of CPU time being expended and no useful information in stderr.out.
@cweisel - Currently, on this particular page of your results (this will change as more results are processed), you can see the last of the successfully completed GW tasks and the start of those tasks that you aborted. Whilst all the more recent tasks show 0 CPU time and 0.00 credit claim, there are 5 tasks which do show no CPU time and a partial credit claim so these must have been the tasks stuck in progress when you decided to abort them. Nothing was returned to the server that gives any indication of what the problem really was but it is possible that there may have been something actually in the slot directories (folders) where each of those tasks was actually being processed at the time. BOINC creates a slot directory for each task and certain diagnostic information is recorded there as processing of a task progresses. The slot directories are sub-directories of the directory called 'slots' in your BOINC Data directory.
If a task appears to be stuck and not making progress, the first thing to do is find the slot directory and look for a text file called 'stderr.out' and see if anything useful is recorded there. Then you could stop and restart BOINC to see if that happens to kick-start processing. If there is still no action, you could try completely rebooting Windows. At each of these restarts, see if anything extra is being written to stderr.out and also take a close look in BOINC Manager at the 'messages' tab and see if there is anything unusual in the startup messages.
Why don't you stop aborting the GW tasks now and see if normal processing is now possible. If a task gets stuck again, do some digging in the slot directories (named 0, 1, 2, 3, ... ) and see what you can find. Before aborting any stuck task, report back your findings in case we can think of something else to try.
As Gary wrote, it would be especially interesting to see what's in the slot directories. I remember another similar report and there wasn't even a stderr.txt file created, which is very weird.
the h1-xxxx run ticking off seconds but do not make any progress .. i have let them run for over 24 hrs then abort them. any help appriciated.
I've shifted all messages concerning this problem into a single thread so as to make it easier for us to help you. As mentioned in previous messages we may need you to examine the contents of slot directories. Are you comfortable with doing that or do you need more instructions on how to find these directories (folders)?
If you take a look at your list of tasks on the website, on this particular page (for the moment anyway), you can see a number of GW tasks (h1-xxxx as you call them) that have completed successfully. For example, see taskIDs 148668431, 148668438, 148668447, 148747476, 148750364, 148751876, amongst others. The last successful task was returned 25 Nov 2009 12:23:49 UTC. There were also some successful ABP1 tasks but on 27 Nov 2009 12:33:23 UTC, a whole bunch of aborted GW tasks were returned. There are scores of these problem tasks from that time to the current date with only one further successful GW task - 149210742 - returned on 28 Nov 2009 10:27:43 UTC.
The fact that you used to return successful tasks and the fact that you have indeed returned one since seems to suggest that something (perhaps an internet security suite for example) installed and activated on your machine on or after Nov 25 is interfering with the running of BOINC/E@H. Perhaps on Nov 28, that 'something' was temporarily deactivated for long enough to allow a task to run.
Can you remember installing any particular software on Nov 25?
Can you remember installing any particular software on Nov 25?
Just for information. We had some (unconfirmed and very localized) reports that one or more of the updates of last Tuesday's Windows Update breaks certain parts of BOINC, including GPU support. See this Seti thread for example.
The BOINC developers have so far been unable to reproduce the problem.
I wonder whether the problem is perhaps more likely to happen on systems with a LOT of cores under Win 7, like the i7s with HT.
So maybe it would be worthwhile to lower the number of cores that BOINC is allowed to use at the same time (not as a solution, but to help diagnose the problem)...maybe there is a deadlock waiting for some resource (shared memory...whatever).
I wonder whether the problem is perhaps more likely to happen on systems with a LOT of cores under Win 7, like the i7s with HT.
Interesting idea and certainly worth testing!
I had a quick look through the top hosts list and found this particular host which is a Core i7 920 running Win7 and doing E@H exclusively, pretty much 24/7 by the look of things. It has a RAC close to 7K. I couldn't see any problems in its rather extensive list of tasks.
The following is a task line: Einstein@Home Hierarchical S5 all-sky GW search #6 3.01 h1_1066.80_S5R4_1651_S5R6a_0 17:30:56 0.000% 05:07:30 12/18/092:53:17 PM Running ... Any suggestions welcome i would like to continue to run Einstein @ Home.
Windows 7 will only process binary pulsar wu
)
How did you achieve that? I only know of stopping ABP1 processing.
Computer sind nicht alles im Leben. (Kleiner Scherz)
RE: RE: i have had to
)
By actually aborting them all as they arrive. Take a look back through the results list and you will see that all GW tasks are being aborted as they arrive. Around two days ago, all seemed fine with both types being processed normally. There doesn't appear to be any particular task that gave rise to an error condition. All aborted tasks showed no evidence of CPU time being expended and no useful information in stderr.out.
@cweisel - Currently, on this particular page of your results (this will change as more results are processed), you can see the last of the successfully completed GW tasks and the start of those tasks that you aborted. Whilst all the more recent tasks show 0 CPU time and 0.00 credit claim, there are 5 tasks which do show no CPU time and a partial credit claim so these must have been the tasks stuck in progress when you decided to abort them. Nothing was returned to the server that gives any indication of what the problem really was but it is possible that there may have been something actually in the slot directories (folders) where each of those tasks was actually being processed at the time. BOINC creates a slot directory for each task and certain diagnostic information is recorded there as processing of a task progresses. The slot directories are sub-directories of the directory called 'slots' in your BOINC Data directory.
If a task appears to be stuck and not making progress, the first thing to do is find the slot directory and look for a text file called 'stderr.out' and see if anything useful is recorded there. Then you could stop and restart BOINC to see if that happens to kick-start processing. If there is still no action, you could try completely rebooting Windows. At each of these restarts, see if anything extra is being written to stderr.out and also take a close look in BOINC Manager at the 'messages' tab and see if there is anything unusual in the startup messages.
Why don't you stop aborting the GW tasks now and see if normal processing is now possible. If a task gets stuck again, do some digging in the slot directories (named 0, 1, 2, 3, ... ) and see what you can find. Before aborting any stuck task, report back your findings in case we can think of something else to try.
Cheers,
Gary.
As Gary wrote, it would be
)
As Gary wrote, it would be especially interesting to see what's in the slot directories. I remember another similar report and there wasn't even a stderr.txt file created, which is very weird.
the h1-xxxx run ticking off
)
the h1-xxxx run ticking off seconds but do not make any progress .. i have let them run for over 24 hrs then abort them. any help appriciated.
Did you try a reboot or
)
Did you try a reboot or restart of BOINC?
I didn't find any S5R6 task in your list that wasn't aborted (I didn't check all), so I can't give further advice.
Perhaps you find some hints in the thread no progress.
Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)
RE: the h1-xxxx run ticking
)
I've shifted all messages concerning this problem into a single thread so as to make it easier for us to help you. As mentioned in previous messages we may need you to examine the contents of slot directories. Are you comfortable with doing that or do you need more instructions on how to find these directories (folders)?
If you take a look at your list of tasks on the website, on this particular page (for the moment anyway), you can see a number of GW tasks (h1-xxxx as you call them) that have completed successfully. For example, see taskIDs 148668431, 148668438, 148668447, 148747476, 148750364, 148751876, amongst others. The last successful task was returned 25 Nov 2009 12:23:49 UTC. There were also some successful ABP1 tasks but on 27 Nov 2009 12:33:23 UTC, a whole bunch of aborted GW tasks were returned. There are scores of these problem tasks from that time to the current date with only one further successful GW task - 149210742 - returned on 28 Nov 2009 10:27:43 UTC.
The fact that you used to return successful tasks and the fact that you have indeed returned one since seems to suggest that something (perhaps an internet security suite for example) installed and activated on your machine on or after Nov 25 is interfering with the running of BOINC/E@H. Perhaps on Nov 28, that 'something' was temporarily deactivated for long enough to allow a task to run.
Can you remember installing any particular software on Nov 25?
Cheers,
Gary.
RE: Can you remember
)
Just for information. We had some (unconfirmed and very localized) reports that one or more of the updates of last Tuesday's Windows Update breaks certain parts of BOINC, including GPU support. See this Seti thread for example.
The BOINC developers have so far been unable to reproduce the problem.
Hmmm...But The GW tasks are
)
Hmmm...But The GW tasks are not GPU....
I wonder whether the problem is perhaps more likely to happen on systems with a LOT of cores under Win 7, like the i7s with HT.
So maybe it would be worthwhile to lower the number of cores that BOINC is allowed to use at the same time (not as a solution, but to help diagnose the problem)...maybe there is a deadlock waiting for some resource (shared memory...whatever).
So can you set the number of cores to use to (say) 2 or 4 here :
http://einstein.phys.uwm.edu/prefs.php?subset=global and see if this helps?
Thanks in advance
Bikeman
RE: I wonder whether the
)
Interesting idea and certainly worth testing!
I had a quick look through the top hosts list and found this particular host which is a Core i7 920 running Win7 and doing E@H exclusively, pretty much 24/7 by the look of things. It has a RAC close to 7K. I couldn't see any problems in its rather extensive list of tasks.
Cheers,
Gary.
The following is a task line:
)
The following is a task line: Einstein@Home Hierarchical S5 all-sky GW search #6 3.01 h1_1066.80_S5R4_1651_S5R6a_0 17:30:56 0.000% 05:07:30 12/18/092:53:17 PM Running ... Any suggestions welcome i would like to continue to run Einstein @ Home.