// DBOINCP-300: added node comment count condition in order to get Preview working ?>
JIMG4956
Joined: 24 Dec 15
Posts: 12
Credit: 10532964
RAC: 4058
29 Feb 2016 20:15:08 UTC
Topic 198475
(moderation:
)
This problem just started to happen recently with the new "Gravitational Wave Search 01" tasks. The BOINC "Einstein@Home" tasks continue to run OK, but other system functions hang up.
I had a look at your list of computers which you can see through the link on your account page at Einstein. You have three machines, two of which show recent communication with the project. The Core 2 Duo host seems to be crunching CPU tasks quite satisfactorily so I presume you may be referring to your AMD machine which is crunching both CPU and GPU tasks, the latter on the internal GPU with some CPU support. Please advise if this is not the machine having problems.
I don't have any machines like this so I have no direct experience but it seems highly likely that the misbehaviour you are seeing is due to the machine being overstressed by the workload it's trying to deal with. If you look at that last link you will see tasks from all the different science runs here at Einstein. Sure, the new GW run has just started and those tasks do take quite a long time but I don't think that's the real problem. The real problem is likely to be that you are crunching GPU tasks whilst simultaneously crunching CPU tasks on all available CPU cores.
Can you confirm when you look with BOINC Manager - advanced view that you can see a total of 5 tasks (4 CPU tasks and 1 GPU task) running simultaneously? If you have done nothing to prevent this, that should be the case. If so, could you please go to the Tools->Computing preferences menu item (unless it's on a different menu in your version of BOINC Manager) and in the Window that opens, select the processor usage tab and near the bottom of the page change the setting for % of processors that BOINC is allowed to use from 100% to 50%. Click OK to make the change.
Go back to the tasks tab in the main window and you should see 2 CPU tasks and 1 GPU task running (3 tasks instead of 5) with the other 2 marked as waiting to run. This should allow GPU tasks to run a lot faster than they currently do and the machine should be a lot happier with the reduced load.
Another good thing to do would be to change your preferences on the website to perhaps limit the number of science runs (sub-projects here if you like) in which you participate. Sometimes different runs can conflict with each other and affect crunching efficiency. If you are not sure which science runs to choose, have a look at the first post in this thread for information about the different science runs and for how to select the ones you want to contribute to. If you have questions, please feel free to ask them here.
As you continue to monitor this machine, you should see the big differences between CPU time and elapsed time on CPU tasks become a lot smaller. Both (but particularly the elapsed) may drop significantly. The GPU tasks will still have a much bigger elapsed time but that will be a lot smaller than it currently is. The CPU component of the overall time is not large at the moment and should stay that way on future GPU tasks. Here are some actual examples of the figures I'm talking about taken from your tasks list on that machine.
[pre]
Science Run Task Type Elapsed Time CPU Time Comment
----------- --------- ------------ -------- -------
O1AS20-100T CPU only 224,038.78 177,694.70 Big difference between the two times
FGRPB1 CPU only 165,674.23 131,389.80 Big difference between the two times
BRP6 (Parkes) GPU (+ CPU support) 245,641.81 5,067.24 Elapsed time is far too long
BRP4G-Arecibo GPU (+ CPU support) 81,179.70 3,649.73 Elapsed time is far too long
[/pre]
Hopefully you will see an improvement from these changes. It may be possible to make further adjustments to get optimum output if you wish.
Gary, Thanks VERY much for your reply. It contained a lot of good info. Sorry, I should have been more specific in my original post.
It turns out the the machine on which this error is occurring is the Intel duo core machine running Windows 7. This is a "new-(refurbished)" machine which replaced the other machine running Windows XP which is now sitting unplugged in the corner "JUST IN CASE..."
The other is the AMD machine which is a Laptop with Windows 8.1 which has been running without any problems. Yes, I can see that this one is running 5 tasks at 100% CPU and GPU, but until it has any problems (or burns up) I will probably leave it that way. ( I do have a temperature monitor running called "CPUID HWMonitor" which I check every so often - so far no overheating.)
Both machines have been running the new Gravitation Waves tasks for a while now, so my suspicion is that it is probably related to the Windows 7 machine and not the tasks. I think I will try reducing the load on the Intel machine as you suggested if the problem continues. It's just a pain, because I had to a do a RESTART which took forever.
I am only running projects from Einstein@Home so there shouldn't be any conflicts.
As far as task time to completion, duration, elapsed time, CPU time etc. - is this a consideration that I should be trying to adjust for project reasons and perhaps validation time with my task partner? Or is it more for my potential problems and load on the computers?
... the machine on which this error is occurring is the Intel duo core machine running Windows 7.
OK. I looked again at the full list of tasks for that machine and see crunching times that look entirely normal for the power of that CPU. Note how close CPU and elapsed times for each task are. There is an overall consistency of the task to task times for each of the science runs that seems to indicate the machine is coping well with the work load and there is nothing untoward going on. You could think about adjusting your project preferences on the website to limit that machine to just one CPU science run but I don't really expect that to solve the problem.
Quote:
The other is the AMD machine which is a Laptop with Windows 8.1 which has been running without any problems.
I'm glad it's not that one with the problem you mentioned but it is working with a somewhat degraded performance. It really should benefit from better provision of CPU support for the GPU tasks.
It's a very quick change to make and it should have visible benefits as new tasks complete. When I imagined this machine was the one with the problem, I suggested going from 100% of cores to 50% of cores to take a big load off the machine. You should only need one 'free' core to give support to the GPU so at least give 100% -> 75% a try. This will allow 3 CPU tasks and 1 GPU task to run concurrently and I think you will see a nice improvement in overall output. You may even see a temperature increase :-). If I'm wrong, you can easily go back to 100%.
Quote:
... It's just a pain, because I had to a do a RESTART which took forever.
I'm not sure why you needed to restart anything. Neither of the changes I mentioned (local changes through the preferences menu item in BOINC Manager or global changes made on the website) require a restart, either of the machine itself or of BOINC. Website changes will take time to propagate but even this can be avoided by using the 'update' function on the tasks tab of BOINC Manager - advanced view, which forces immediate contact and transfer of the changes made.
Quote:
I am only running projects from Einstein@Home so there shouldn't be any conflicts.
I was not talking about any conflicts between different projects. I was talking about different science runs (sub-projects if you like) within the one project. I'm not aware of any particular interference between the two CPU only science runs. I am aware of a number of people who have mentioned conflicts between the GPU runs, BRP4G and BRP6, more likely when trying to run multiple concurrent GPU tasks. because of the transient nature of BRP4G work, you might have a better experience by just allowing BRP6. I'm not at all suggesting you should even consider running concurrent GPU tasks. First, lets see how well your GPU can perform with some extra CPU support.
Quote:
As far as task time to completion, duration, elapsed time, CPU time etc. - is this a consideration that I should be trying to adjust for project reasons and perhaps validation time with my task partner? Or is it more for my potential problems and load on the computers?
It was nothing to do with the project or your quorum partners. It was to do with making the machine I saw as 'struggling' work more efficiently. That's not the main reason either. There are many people that browse from time to time without posting or asking specific questions. There is quite a bit of interest in the performance of GPU apps. I chose to use your machine (when it presented itself like that) as a good example of how to improve the performance of everyday low to medium range AMD GPUs. If you make the change suggested on that machine and it works out as I think it might, I would produce another table of results and then say, "Look people, this is what you can achieve if you give your AMD GPU a bit more CPU support".
Of course, it might not work out that way and I might come away with egg on my face. I'll risk that just for the chance to have a good, real world example of the potential improvement. If it doesn't work, it's very easy to reverse, so you wont really suffer from being a 'guinea pig' :-).
Gary,
I made the change from 100% to 75% "of the CPUs" and did an update as you suggested at about 9pm Arizona time. The number of running BOINC tasks went from 5 to 4 as anticipated, with the 5th one waiting.
The Windows 8.1 Task Manager now shows an average of about 81% to 85% of the total CPU being used, which previously was a steady 100% (unless suspended or superceded). I now have three processes each using about 25% of the CPU. I previously had four processes, each using about 21%.
It doesn't look like the GPU task is taking advantage of the additional CPU power now available to it, because it jumps from about 0.7 % to a max of about 2.9% CPU usage.
I'm just reporting the info as I see it. I'll leave it up to you to interpret the data that you were looking at to see what impact the changes make to the overall scheme of things.
BTW, the RESTART was because of the original problem, and not related to any BOINC parameter changes.
And thanks for the clarification on the difference between "projects" and "sub-projects". I have no idea what they all are, so I think I'll just leave things as there are.
Thanks for all your help.
I made the change from 100% to 75% "of the CPUs" and did an update as you suggested at about 9pm Arizona time. The number of running BOINC tasks went from 5 to 4 as anticipated, with the 5th one waiting.
Thanks very much for doing that.
Quote:
It doesn't look like the GPU task is taking advantage of the additional CPU power now available to it, because it jumps from about 0.7 % to a max of about 2.9% CPU usage.
A GPU task never does end up using a large amount of CPU resources. It seems like it's more important to have 'instant availability when needed'. When next you get to read this, it would be good to know the current values showing on the tasks tab in BOINC Manager for both 'Progress %' and 'Elapsed time'. Those two values now and, say, an hour or two later should give a bit of an indication of how much acceleration has been achieved. Either way, we'll find out when the task finishes. The true figure will only be known when the task after that (done entirely under the new settings) is returned. I'll try to look in the morning (my morning) which will be well into your afternoon. If I'm lucky enough, I may even get to see something before I go to bed tonight which will be your morning.
Quote:
BTW, the RESTART was because of the original problem, and not related to any BOINC parameter changes.
No problem - I sorta figured that out by thinking a bit more after I posted :-).
It doesn't look like the GPU task is taking advantage of the additional CPU power now available to it, because it jumps from about 0.7 % to a max of about 2.9% CPU usage.
It is not the increased CPU usage for the GPU-task that makes the difference. It is the GPU being able to work faster because of the CPU support. This matter is a little confusing sometimes.
Anyway, you'll see if there is any speed up. Usually the effect is quite nice.
edit: Gary was faster than me :( Probably his CPU support is better^^
I've just had a fresh look at the BRP4G tasks and as luck would have it, there is a newly returned task as at 2 Mar 2016, 9:06:07 UTC which is only about 10-15 mins ago. It shows a significantly improved crunch time compared to two earlier ones.
This task would have been crunched at least partly under the former settings so a further drop in crunch time is expected for the next task.
EDIT: The next GPU task should be the one that is currently showing as 'in progress' (green) on this list. It might take quite a while to be returned (judging by the previous results) but at least it shouldn't be anything like close to 3 days :-).
edit: Gary was faster than me :( Probably his CPU support is better^^
Thanks very much for being willing to help.
Usually it's the other way around. I'm too verbose and take too much time trying to make things clear. Others will often beat me to it. An occasional 'win' is quite nice :-).
... The next GPU task should be the one that is currently showing as 'in progress' (green) on this list. It might take quite a while ...
And so it did :-). However if you use the above link, you can see that 3rd task (the top one on the list) is no longer green but completed and validated like the previous two. Notice the considerable reduction in elapsed time.
You'll need to wait longer to see the full impact on CPU task crunch times. The ones that have been returned recently were done partly under the previous settings. You will see one of these (a GW task) on the top of this list of O1AST tasks. The elapsed time is quite a bit lower even though only some of the crunching was done under current settings.
Blank window titled "BOINC_app" appears at random and won't go a
)
Hi JimG4956,
Welcome to the Einstein project!
I had a look at your list of computers which you can see through the link on your account page at Einstein. You have three machines, two of which show recent communication with the project. The Core 2 Duo host seems to be crunching CPU tasks quite satisfactorily so I presume you may be referring to your AMD machine which is crunching both CPU and GPU tasks, the latter on the internal GPU with some CPU support. Please advise if this is not the machine having problems.
I don't have any machines like this so I have no direct experience but it seems highly likely that the misbehaviour you are seeing is due to the machine being overstressed by the workload it's trying to deal with. If you look at that last link you will see tasks from all the different science runs here at Einstein. Sure, the new GW run has just started and those tasks do take quite a long time but I don't think that's the real problem. The real problem is likely to be that you are crunching GPU tasks whilst simultaneously crunching CPU tasks on all available CPU cores.
Can you confirm when you look with BOINC Manager - advanced view that you can see a total of 5 tasks (4 CPU tasks and 1 GPU task) running simultaneously? If you have done nothing to prevent this, that should be the case. If so, could you please go to the Tools->Computing preferences menu item (unless it's on a different menu in your version of BOINC Manager) and in the Window that opens, select the processor usage tab and near the bottom of the page change the setting for % of processors that BOINC is allowed to use from 100% to 50%. Click OK to make the change.
Go back to the tasks tab in the main window and you should see 2 CPU tasks and 1 GPU task running (3 tasks instead of 5) with the other 2 marked as waiting to run. This should allow GPU tasks to run a lot faster than they currently do and the machine should be a lot happier with the reduced load.
Another good thing to do would be to change your preferences on the website to perhaps limit the number of science runs (sub-projects here if you like) in which you participate. Sometimes different runs can conflict with each other and affect crunching efficiency. If you are not sure which science runs to choose, have a look at the first post in this thread for information about the different science runs and for how to select the ones you want to contribute to. If you have questions, please feel free to ask them here.
As you continue to monitor this machine, you should see the big differences between CPU time and elapsed time on CPU tasks become a lot smaller. Both (but particularly the elapsed) may drop significantly. The GPU tasks will still have a much bigger elapsed time but that will be a lot smaller than it currently is. The CPU component of the overall time is not large at the moment and should stay that way on future GPU tasks. Here are some actual examples of the figures I'm talking about taken from your tasks list on that machine.
[pre]
Science Run Task Type Elapsed Time CPU Time Comment
----------- --------- ------------ -------- -------
O1AS20-100T CPU only 224,038.78 177,694.70 Big difference between the two times
FGRPB1 CPU only 165,674.23 131,389.80 Big difference between the two times
BRP6 (Parkes) GPU (+ CPU support) 245,641.81 5,067.24 Elapsed time is far too long
BRP4G-Arecibo GPU (+ CPU support) 81,179.70 3,649.73 Elapsed time is far too long
[/pre]
Hopefully you will see an improvement from these changes. It may be possible to make further adjustments to get optimum output if you wish.
Cheers,
Gary.
Gary, Thanks VERY much for
)
Gary, Thanks VERY much for your reply. It contained a lot of good info. Sorry, I should have been more specific in my original post.
It turns out the the machine on which this error is occurring is the Intel duo core machine running Windows 7. This is a "new-(refurbished)" machine which replaced the other machine running Windows XP which is now sitting unplugged in the corner "JUST IN CASE..."
The other is the AMD machine which is a Laptop with Windows 8.1 which has been running without any problems. Yes, I can see that this one is running 5 tasks at 100% CPU and GPU, but until it has any problems (or burns up) I will probably leave it that way. ( I do have a temperature monitor running called "CPUID HWMonitor" which I check every so often - so far no overheating.)
Both machines have been running the new Gravitation Waves tasks for a while now, so my suspicion is that it is probably related to the Windows 7 machine and not the tasks. I think I will try reducing the load on the Intel machine as you suggested if the problem continues. It's just a pain, because I had to a do a RESTART which took forever.
I am only running projects from Einstein@Home so there shouldn't be any conflicts.
As far as task time to completion, duration, elapsed time, CPU time etc. - is this a consideration that I should be trying to adjust for project reasons and perhaps validation time with my task partner? Or is it more for my potential problems and load on the computers?
Thanks
RE: ... the machine on
)
OK. I looked again at the full list of tasks for that machine and see crunching times that look entirely normal for the power of that CPU. Note how close CPU and elapsed times for each task are. There is an overall consistency of the task to task times for each of the science runs that seems to indicate the machine is coping well with the work load and there is nothing untoward going on. You could think about adjusting your project preferences on the website to limit that machine to just one CPU science run but I don't really expect that to solve the problem.
I'm glad it's not that one with the problem you mentioned but it is working with a somewhat degraded performance. It really should benefit from better provision of CPU support for the GPU tasks.
It's a very quick change to make and it should have visible benefits as new tasks complete. When I imagined this machine was the one with the problem, I suggested going from 100% of cores to 50% of cores to take a big load off the machine. You should only need one 'free' core to give support to the GPU so at least give 100% -> 75% a try. This will allow 3 CPU tasks and 1 GPU task to run concurrently and I think you will see a nice improvement in overall output. You may even see a temperature increase :-). If I'm wrong, you can easily go back to 100%.
I'm not sure why you needed to restart anything. Neither of the changes I mentioned (local changes through the preferences menu item in BOINC Manager or global changes made on the website) require a restart, either of the machine itself or of BOINC. Website changes will take time to propagate but even this can be avoided by using the 'update' function on the tasks tab of BOINC Manager - advanced view, which forces immediate contact and transfer of the changes made.
I was not talking about any conflicts between different projects. I was talking about different science runs (sub-projects if you like) within the one project. I'm not aware of any particular interference between the two CPU only science runs. I am aware of a number of people who have mentioned conflicts between the GPU runs, BRP4G and BRP6, more likely when trying to run multiple concurrent GPU tasks. because of the transient nature of BRP4G work, you might have a better experience by just allowing BRP6. I'm not at all suggesting you should even consider running concurrent GPU tasks. First, lets see how well your GPU can perform with some extra CPU support.
It was nothing to do with the project or your quorum partners. It was to do with making the machine I saw as 'struggling' work more efficiently. That's not the main reason either. There are many people that browse from time to time without posting or asking specific questions. There is quite a bit of interest in the performance of GPU apps. I chose to use your machine (when it presented itself like that) as a good example of how to improve the performance of everyday low to medium range AMD GPUs. If you make the change suggested on that machine and it works out as I think it might, I would produce another table of results and then say, "Look people, this is what you can achieve if you give your AMD GPU a bit more CPU support".
Of course, it might not work out that way and I might come away with egg on my face. I'll risk that just for the chance to have a good, real world example of the potential improvement. If it doesn't work, it's very easy to reverse, so you wont really suffer from being a 'guinea pig' :-).
Cheers,
Gary.
Gary, I made the change from
)
Gary,
I made the change from 100% to 75% "of the CPUs" and did an update as you suggested at about 9pm Arizona time. The number of running BOINC tasks went from 5 to 4 as anticipated, with the 5th one waiting.
The Windows 8.1 Task Manager now shows an average of about 81% to 85% of the total CPU being used, which previously was a steady 100% (unless suspended or superceded). I now have three processes each using about 25% of the CPU. I previously had four processes, each using about 21%.
It doesn't look like the GPU task is taking advantage of the additional CPU power now available to it, because it jumps from about 0.7 % to a max of about 2.9% CPU usage.
I'm just reporting the info as I see it. I'll leave it up to you to interpret the data that you were looking at to see what impact the changes make to the overall scheme of things.
BTW, the RESTART was because of the original problem, and not related to any BOINC parameter changes.
And thanks for the clarification on the difference between "projects" and "sub-projects". I have no idea what they all are, so I think I'll just leave things as there are.
Thanks for all your help.
RE: I made the change from
)
Thanks very much for doing that.
A GPU task never does end up using a large amount of CPU resources. It seems like it's more important to have 'instant availability when needed'. When next you get to read this, it would be good to know the current values showing on the tasks tab in BOINC Manager for both 'Progress %' and 'Elapsed time'. Those two values now and, say, an hour or two later should give a bit of an indication of how much acceleration has been achieved. Either way, we'll find out when the task finishes. The true figure will only be known when the task after that (done entirely under the new settings) is returned. I'll try to look in the morning (my morning) which will be well into your afternoon. If I'm lucky enough, I may even get to see something before I go to bed tonight which will be your morning.
No problem - I sorta figured that out by thinking a bit more after I posted :-).
Cheers,
Gary.
RE: It doesn't look like
)
It is not the increased CPU usage for the GPU-task that makes the difference. It is the GPU being able to work faster because of the CPU support. This matter is a little confusing sometimes.
Anyway, you'll see if there is any speed up. Usually the effect is quite nice.
edit: Gary was faster than me :( Probably his CPU support is better^^
I've just had a fresh look at
)
I've just had a fresh look at the BRP4G tasks and as luck would have it, there is a newly returned task as at 2 Mar 2016, 9:06:07 UTC which is only about 10-15 mins ago. It shows a significantly improved crunch time compared to two earlier ones.
This task would have been crunched at least partly under the former settings so a further drop in crunch time is expected for the next task.
EDIT: The next GPU task should be the one that is currently showing as 'in progress' (green) on this list. It might take quite a while to be returned (judging by the previous results) but at least it shouldn't be anything like close to 3 days :-).
Cheers,
Gary.
RE: edit: Gary was faster
)
Thanks very much for being willing to help.
Usually it's the other way around. I'm too verbose and take too much time trying to make things clear. Others will often beat me to it. An occasional 'win' is quite nice :-).
Cheers,
Gary.
RE: ... The next GPU task
)
And so it did :-). However if you use the above link, you can see that 3rd task (the top one on the list) is no longer green but completed and validated like the previous two. Notice the considerable reduction in elapsed time.
You'll need to wait longer to see the full impact on CPU task crunch times. The ones that have been returned recently were done partly under the previous settings. You will see one of these (a GW task) on the top of this list of O1AST tasks. The elapsed time is quite a bit lower even though only some of the crunching was done under current settings.
Cheers,
Gary.