they are listed as "A" and "B" because that is the actual busid in hex, not just random or because the name field is too long. your system just happens to use 0A and 0B as the hex busids. you can see the bus id listed in your second command which is exactly the same as the normal nvidia-smi command. (ie, 00000000:0A:00.0). the 535 series drivers actually expanded the width of the output so you can see the full device name rather than cutting it off with (...).
I had not realized that so that's more good to know stuff... again. Thank you.
Ian&Steve C. wrote:
GPU-Util = GPU utilization (not utility), how much it is being used in terms of a percentage.
Oh... I knew that... just had a brain fart! Utilization! Got it! ;*)
How (and why) are 5 tasks running? Please state precisely how you achieve this. You used to run 6 and the instructions I gave were to run 4 (x2 on each GPU).
Allen wrote:
I am thinking of cutting tasks to 4 and see what happens.
And how exactly do you propose to do that? For how long have you been running 5? The only way I can think of an ability to run 5 (without manual intervention) is if you actually have gpu_usage set for 6 but BOINC is only allowed to use a reduced number of CPU cores that actually support only 5. What % of cores is BOINC allowed to use?
Allen wrote:
The two 560's machine is in fact running 4 total tasks at this time. I am using local prefs and I do have an app_config file for it. Times are still in the 50's.
Can you please post a copy of that app_config.xml file? You are clearly doing something incorrect if the time hasn't changed to a lower value for 4 concurrent if you got "in the 50's" when running 6.
When did you last change it and what was the change you made?
Each time you make a change to that file, do you force a "Options->Read config files" through BOINC Manager?
Do you look in the event log to see the response indicating that the change actually happened? You should see "Found app_config.xml".
If you are indeed correctly editing the file (gpu_usage should show 0.5 and not 0.33), exactly where did you install it? It should be in the einstein project folder.
Something is wrong but it's not having a bad effect since the 'in progress' tasks are getting completed the best part of a day before deadline now so you are still winning the race.
All completed tasks continue to show ~55 min run times. The most recent task was nearly 2 days ahead of its deadline so things are continuing to improve. I'm continuing to wonder why you sometimes mention 5 tasks running, then 4 tasks running. The crunch times would seem to indicate 6 but I'm now wondering if you are seeing a variable number because the complete list of tasks isn't in ascending order of deadline so that maybe there might be tasks running on the second page that you're not seeing.
Please click the "Show active tasks" button on the tasks tab and you will see the short list of running tasks. How many show as active? Just click the button again to show all.
When I spoke of the 5 tasks running, that was in reference to my 580 machine, not the dual 560's.
I am on my laptop right now and it's not easy to get to my sys file from here, but I do all the things you asked about. Everything shows up in the log.
All is running quite well now and I will soon be going down stairs to see how all the other machines are doing. If I find something that might interest you, I will write you back.
When I spoke of the 5 tasks running, that was in reference to my 580 machine, not the dual 560's.
OK, absolutely my bad. I've got the website tasks list for the dual 560s permanently open and I'm focused on that machine and trying to work out why tasks are pretty rock solid at 55 min when that was the exact same as it used to be for x3 on each GPU. So when I saw references to x4 and x5 I completely missed the fact you were talking about a different machine. Sorry about that. If you don't mind, let's try to solve the mysteries for the dual 560s machine. Once that is solved, you'll be all 'skilled up' and able to tweak the others for their best performance.
The fact is that task times on the dual 560s are still 55 mins and that is far too long if each GPU is only running x2. Did you absolutely confirm only 4 tasks running using the 'Show active tasks' control? Are you intending to paste a copy of app_config.xml in your next message? Is that file the "sys file" you refer to? If not, what is the name of the sys file and why do you need to "get to my sys file"?
Things indeed are improving deadline-wise. The most recently returned result that I just saw is getting on towards 2 days ahead of deadline.
The thing that bothers me the most is this quote from your second message in this thread:-
"I'm running 3 WUs on each of them and timing is about every 50 minutes when nothing fails"
I looked at your tasks list when I first read that and saw some quite variable times with the lowest around 3300 secs (55mins). However there were quite a few higher values as well - a clear sign of a machine under stress. From my own single GPU hosts with RX 460 GPUs, running x2 gives maybe 34 mins so for x3 it should translate to 34 x 1.5 = 51 mins. Some of your tasks at the 55 min mark were close to the expected speed. The others (I saw some up to 4800 secs) were clearly in distress.
With all your current tasks showing ~3300 secs (55 mins) the real mystery is why that hasn't reduced to 2200 secs (37 mins) with the reduction to x2. Either you were really only running x2 when you started this thread and you just have extremely slow 560s for some reason, or (much more likely) there is some specific thing to find that is causing very sub-optimal behaviour. Are you sure the GPUs aren't running at some much reduced clock rate?
Everything here points to the fact that I am running 4 total tasks for the 2 560's.
Here is an output from one of the recent tasks near completion.
Computer: Alpha-8
Project Einstein@Home
Name LATeah4021L30_1196.0_0_0.0_14299131_0
Application Gamma-ray pulsar binary search #1 on GPUs 1.22 (FGRPopencl1K-ati)
Workunit name LATeah4021L30_1196.0_0_0.0_14299131
State Running High P.
Received 8/29/2023 6:25:37 AM
Report deadline 9/12/2023 6:25:35 AM
Estimated app speed 101.51 GFLOPs/sec
Estimated task size 525,000 GFLOPs
Resources 0.25 CPUs + 0.5 AMD/ATI GPUs (device 0)
CPU time at last checkpoint 00:03:21
CPU time 00:03:25
Elapsed time 00:54:48
Estimated time remaining 00:05:38
Fraction done 89.791%
Virtual memory size 186.05 MB
Working set size 163.23 MB
Directory slots/3
Process ID 5992
And here is my app_config.xml file, from my Einstein directory.......
Everything here points to the fact that I am running 4 total tasks for the 2 560's.
Except for the run times which are strangely longer than they should be. I have the same model of GPUs and they crunch much faster than yours, seemingly. There has to be a reason.
Thanks for sending both the app_config.xml and the task properties for a running task as produced by the manager. The former looks OK for FGRPB1G and there are also entries for BRP7 and BRP4G which you are not currently running. When FGRPB1G finishes, the logical replacement is BRP7 but have you run it previously and are you sure that running x3 with it (gpu_usage 0.33) gives an improvement over running x2? My experience is that the x1 -> x2 gives a relatively small improvement and that going to x3 doesn't add any further benefit. It's risky to add complexity if there is no additional benefit.
The other info you sent (task properties output) shows clearly 0.5 of the GPU being used so the reason for the slow times lies elsewhere. It also shows that BOINC is running the task in high priority mode (State Running High P.). That's one step short of panic mode :-).
I've just looked again at the latest completed tasks. Previously, all were very close to 3300s (55m) and all have been immediately validating. The very latest tasks (as I write this) are starting to show better times - take a look on the website for yourself. They now show 3000s (50m) or better. The fastest one I saw was 2698s (45m). This suggests that something that was impacting GPU performance has now lessened.
Have you changed anything that might have been responsible for this? If not, maybe it's just a result of tasks not being run at higher priority quite as much, although that's just grasping at straws. I've never noticed this sort of behaviour before, although (for my own sanity) I don't try to push for every last bit of performance so tend to always use settings that don't cause issues.
Allen wrote:
Too bad we're on opposite sides of the World. Makes for fewer communications.
That's not bad - I think it's very good :-). It takes quite a while to compose responses. If you could fire back an instant reply, I'd never get any of my own stuff done :-). Just kidding :-).
If you haven't checked recently, what started out as 1000+ tasks in progress with auto-aborts happening has now reduced to ~500 and I haven't seen any tasks being canceled. There is a nice cushion between completion time and the deadline. You're doing a good job.
Just incase it's important and you may see something odd, here is a recent log file created after a restarting of Boinc. Still showing 55:15 per task.
The startup messages show the host is attached several other projects and has app_config.xml files for some of those as well. I only run Einstein so I don't know if any of that could be having an impact.
You stated in an earlier message that the only tasks that were running were Einstein tasks, so the other projects being listed shouldn't be having any impact. If you ever start accepting work from other projects (even non-GPU work) you should do so slowly and carefully until you are sure there are no adverse side effects. That means don't try to run lots of tasks at some level of multiplicity until you are sure that it's safe to do so.
Since you're running Windows 7 (I know nothing about Windows) are graphics drivers and OpenCL libraries still being maintained? In Linux, there doesn't seem to be any issue with older versions of OpenCL but I don't know if that applies to Windows as well.
One other thing I've tried is to see if the GPU that crunches a task can be easily identified. By clicking the TaskID link for any task of interest, and scrolling down through the stderr output, you can find the actual command string used to launch the task. The task with the shortest crunch time (2698s) mentioned in the previous message had --device 1 as a command line option. That is the second GPU - the device with the low GFLOPS value you originally asked about. I also looked at this task - a --device 0 task that took 3312s (and had the higher GFLOPS value) so you can see that GFLOPS is being misreported for some reason and is not having a negative impact on crunch times. In other words, you can't claim that a higher GFLOPS correlates with faster crunching :-).
Allen wrote: George, Does
)
no it only works with Nvidia cards. It’s a tool packaged with the Nvidia driver.
_________________________________________________________________________
Thanks IAN&STEVE C, that's
)
Thanks IAN&STEVE C, that's what I thought. *:>(
Ian&Steve C. wrote:they are
)
I had not realized that so that's more good to know stuff... again. Thank you.
Oh... I knew that... just had a brain fart! Utilization! Got it! ;*)
Proud member of the Old Farts Association
Allen wrote:...Running five
)
How (and why) are 5 tasks running? Please state precisely how you achieve this. You used to run 6 and the instructions I gave were to run 4 (x2 on each GPU).
And how exactly do you propose to do that? For how long have you been running 5? The only way I can think of an ability to run 5 (without manual intervention) is if you actually have gpu_usage set for 6 but BOINC is only allowed to use a reduced number of CPU cores that actually support only 5. What % of cores is BOINC allowed to use?
Can you please post a copy of that app_config.xml file? You are clearly doing something incorrect if the time hasn't changed to a lower value for 4 concurrent if you got "in the 50's" when running 6.
When did you last change it and what was the change you made?
Each time you make a change to that file, do you force a "Options->Read config files" through BOINC Manager?
Do you look in the event log to see the response indicating that the change actually happened? You should see "Found app_config.xml".
If you are indeed correctly editing the file (gpu_usage should show 0.5 and not 0.33), exactly where did you install it? It should be in the einstein project folder.
Something is wrong but it's not having a bad effect since the 'in progress' tasks are getting completed the best part of a day before deadline now so you are still winning the race.
##EDIT @ 10:30AM Sunday 10th Sep (UTC+10) (~00:30AM UTC)
All completed tasks continue to show ~55 min run times. The most recent task was nearly 2 days ahead of its deadline so things are continuing to improve. I'm continuing to wonder why you sometimes mention 5 tasks running, then 4 tasks running. The crunch times would seem to indicate 6 but I'm now wondering if you are seeing a variable number because the complete list of tasks isn't in ascending order of deadline so that maybe there might be tasks running on the second page that you're not seeing.
Please click the "Show active tasks" button on the tasks tab and you will see the short list of running tasks. How many show as active? Just click the button again to show all.
Cheers,
Gary.
Gary, When I spoke of the
)
Gary,
When I spoke of the 5 tasks running, that was in reference to my 580 machine, not the dual 560's.
I am on my laptop right now and it's not easy to get to my sys file from here, but I do all the things you asked about. Everything shows up in the log.
All is running quite well now and I will soon be going down stairs to see how all the other machines are doing. If I find something that might interest you, I will write you back.
Thanks again for checking on me.
Allen
Allen wrote:When I spoke of
)
OK, absolutely my bad. I've got the website tasks list for the dual 560s permanently open and I'm focused on that machine and trying to work out why tasks are pretty rock solid at 55 min when that was the exact same as it used to be for x3 on each GPU. So when I saw references to x4 and x5 I completely missed the fact you were talking about a different machine. Sorry about that. If you don't mind, let's try to solve the mysteries for the dual 560s machine. Once that is solved, you'll be all 'skilled up' and able to tweak the others for their best performance.
The fact is that task times on the dual 560s are still 55 mins and that is far too long if each GPU is only running x2. Did you absolutely confirm only 4 tasks running using the 'Show active tasks' control? Are you intending to paste a copy of app_config.xml in your next message? Is that file the "sys file" you refer to? If not, what is the name of the sys file and why do you need to "get to my sys file"?
Things indeed are improving deadline-wise. The most recently returned result that I just saw is getting on towards 2 days ahead of deadline.
The thing that bothers me the most is this quote from your second message in this thread:-
"I'm running 3 WUs on each of them and timing is about every 50 minutes when nothing fails"
I looked at your tasks list when I first read that and saw some quite variable times with the lowest around 3300 secs (55mins). However there were quite a few higher values as well - a clear sign of a machine under stress. From my own single GPU hosts with RX 460 GPUs, running x2 gives maybe 34 mins so for x3 it should translate to 34 x 1.5 = 51 mins. Some of your tasks at the 55 min mark were close to the expected speed. The others (I saw some up to 4800 secs) were clearly in distress.
With all your current tasks showing ~3300 secs (55 mins) the real mystery is why that hasn't reduced to 2200 secs (37 mins) with the reduction to x2. Either you were really only running x2 when you started this thread and you just have extremely slow 560s for some reason, or (much more likely) there is some specific thing to find that is causing very sub-optimal behaviour. Are you sure the GPUs aren't running at some much reduced clock rate?
Cheers,
Gary.
Hi Gary, Everything here
)
Hi Gary,
Everything here points to the fact that I am running 4 total tasks for the 2 560's.
Here is an output from one of the recent tasks near completion.
Computer: Alpha-8
Project Einstein@Home
Name LATeah4021L30_1196.0_0_0.0_14299131_0
Application Gamma-ray pulsar binary search #1 on GPUs 1.22 (FGRPopencl1K-ati)
Workunit name LATeah4021L30_1196.0_0_0.0_14299131
State Running High P.
Received 8/29/2023 6:25:37 AM
Report deadline 9/12/2023 6:25:35 AM
Estimated app speed 101.51 GFLOPs/sec
Estimated task size 525,000 GFLOPs
Resources 0.25 CPUs + 0.5 AMD/ATI GPUs (device 0)
CPU time at last checkpoint 00:03:21
CPU time 00:03:25
Elapsed time 00:54:48
Estimated time remaining 00:05:38
Fraction done 89.791%
Virtual memory size 186.05 MB
Working set size 163.23 MB
Directory slots/3
Process ID 5992
And here is my app_config.xml file, from my Einstein directory.......
9/7/2023
<app_config>
<app>
<name>einsteinbinary_BRP7</name>
<gpu_versions>
<gpu_usage>0.33</gpu_usage>
<cpu_usage>.20</cpu_usage>
</gpu_versions>
</app>
<app>
<name>hsgamma_FGRPB1G</name>
<gpu_versions>
<gpu_usage>0.50</gpu_usage>
<cpu_usage>.25</cpu_usage>
</gpu_versions>
</app>
<app>
<name>einsteinbinary_BRP4G</name>
<non_cpu_intensive>0</non_cpu_intensive>
</app>
</app_config>
Too bad we're on opposite sides of the World. Makes for fewer communications.
Hope I have answered your questions for now.
Allen
Gary, Just incase it's
)
Gary,
Just incase it's important and you may see something odd, here is a recent log file created after a restarting of Boinc. Still showing 55:15 per task.
9/10/2023 2:05:29 PM | | cc_config.xml not found - using defaults
9/10/2023 2:05:29 PM | | Starting BOINC client version 7.22.2 for windows_x86_64
9/10/2023 2:05:29 PM | | log flags: file_xfer, sched_ops, task
9/10/2023 2:05:29 PM | | Libraries: libcurl/8.0.1-DEV Schannel zlib/1.2.13
9/10/2023 2:05:29 PM | | Data directory: C:\ProgramData\BOINC
9/10/2023 2:05:29 PM | | Running under account Allen
9/10/2023 2:05:30 PM | | OpenCL: AMD/ATI GPU 0: Radeon RX 560 Series (driver version 2906.10, device version OpenCL 2.0 AMD-APP (2906.10), 4096MB, 4096MB available, 2392 GFLOPS peak)
9/10/2023 2:05:30 PM | | OpenCL: AMD/ATI GPU 1: Radeon RX 560 Series (driver version 2906.10, device version OpenCL 2.0 AMD-APP (2906.10), 4096MB, 4096MB available, 438 GFLOPS peak)
9/10/2023 2:05:30 PM | | Windows processor group 0: 8 processors
9/10/2023 2:05:30 PM | | Host name: Alpha-8
9/10/2023 2:05:30 PM | | Processor: 8 AuthenticAMD AMD FX(tm)-8300 Eight-Core Processor [Family 21 Model 2 Stepping 0]
9/10/2023 2:05:30 PM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni ssse3 fma cx16 sse4_1 sse4_2 popcnt aes f16c syscall nx lm avx svm sse4a osvw ibs xop skinit wdt lwp fma4 tce tbm topx page1gb rdtscp bmi1
9/10/2023 2:05:30 PM | | OS: Microsoft Windows 7: Professional x64 Edition, Service Pack 1, (06.01.7601.00)
9/10/2023 2:05:30 PM | | Memory: 7.97 GB physical, 15.93 GB virtual
9/10/2023 2:05:30 PM | | Disk: 931.29 GB total, 778.33 GB free
9/10/2023 2:05:30 PM | | Local time is UTC -4 hours
9/10/2023 2:05:30 PM | | No WSL found.
9/10/2023 2:05:30 PM | Einstein@Home | Found app_config.xml
9/10/2023 2:05:30 PM | Milkyway@Home | Found app_config.xml
9/10/2023 2:05:30 PM | Milkyway@Home | milkyway_nbody: Max 1 concurrent jobs
9/10/2023 2:05:30 PM | Einstein@Home | General prefs: from Einstein@Home (last modified ---)
9/10/2023 2:05:30 PM | Einstein@Home | Computer location: work
9/10/2023 2:05:30 PM | | General prefs: using separate prefs for work
9/10/2023 2:05:30 PM | | Reading preferences override file
9/10/2023 2:05:30 PM | | Preferences:
9/10/2023 2:05:30 PM | | - When computer is in use
9/10/2023 2:05:30 PM | | - 'In use' means mouse/keyboard input in last 3.0 minutes
9/10/2023 2:05:30 PM | | - max CPUs used: 8
9/10/2023 2:05:30 PM | | - Use at most 100% of the CPU time
9/10/2023 2:05:30 PM | | - suspend if non-BOINC CPU load exceeds 80%
9/10/2023 2:05:30 PM | | - max memory usage: 5.58 GB
9/10/2023 2:05:30 PM | | - When computer is not in use
9/10/2023 2:05:30 PM | | - max CPUs used: 8
9/10/2023 2:05:30 PM | | - Use at most 100% of the CPU time
9/10/2023 2:05:30 PM | | - suspend if non-BOINC CPU load exceeds 80%
9/10/2023 2:05:30 PM | | - max memory usage: 7.97 GB
9/10/2023 2:05:30 PM | | - Suspend if running on batteries
9/10/2023 2:05:30 PM | | - Leave apps in memory if not running
9/10/2023 2:05:30 PM | | - Store at least 2.00 days of work
9/10/2023 2:05:30 PM | | - Store up to an additional 0.00 days of work
9/10/2023 2:05:30 PM | | - max disk usage: 30.00 GB
9/10/2023 2:05:30 PM | | - (to change preferences, visit a project web site or select Preferences in the Manager)
9/10/2023 2:05:30 PM | | Setting up project and slot directories
9/10/2023 2:05:30 PM | | Checking active tasks
9/10/2023 2:05:30 PM | Asteroids@home | URL http://asteroidsathome.net/boinc/; Computer ID 677129; resource share 100
9/10/2023 2:05:30 PM | Einstein@Home | URL https://einstein.phys.uwm.edu/; Computer ID 12883914; resource share 100
9/10/2023 2:05:30 PM | Einstein@Home | Not using CPU: project preferences
9/10/2023 2:05:30 PM | Milkyway@Home | URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 873674; resource share 100
9/10/2023 2:05:30 PM | SETI@home | URL http://setiathome.berkeley.edu/; Computer ID 8411641; resource share 100
9/10/2023 2:05:30 PM | Universe@Home | URL https://universeathome.pl/universe/; Computer ID 594819; resource share 100
9/10/2023 2:05:30 PM | | Setting up GUI RPC socket
9/10/2023 2:05:30 PM | | Checking presence of 1817 project files
Allen wrote:Everything here
)
Except for the run times which are strangely longer than they should be. I have the same model of GPUs and they crunch much faster than yours, seemingly. There has to be a reason.
Thanks for sending both the app_config.xml and the task properties for a running task as produced by the manager. The former looks OK for FGRPB1G and there are also entries for BRP7 and BRP4G which you are not currently running. When FGRPB1G finishes, the logical replacement is BRP7 but have you run it previously and are you sure that running x3 with it (gpu_usage 0.33) gives an improvement over running x2? My experience is that the x1 -> x2 gives a relatively small improvement and that going to x3 doesn't add any further benefit. It's risky to add complexity if there is no additional benefit.
The other info you sent (task properties output) shows clearly 0.5 of the GPU being used so the reason for the slow times lies elsewhere. It also shows that BOINC is running the task in high priority mode (State Running High P.). That's one step short of panic mode :-).
I've just looked again at the latest completed tasks. Previously, all were very close to 3300s (55m) and all have been immediately validating. The very latest tasks (as I write this) are starting to show better times - take a look on the website for yourself. They now show 3000s (50m) or better. The fastest one I saw was 2698s (45m). This suggests that something that was impacting GPU performance has now lessened.
Have you changed anything that might have been responsible for this? If not, maybe it's just a result of tasks not being run at higher priority quite as much, although that's just grasping at straws. I've never noticed this sort of behaviour before, although (for my own sanity) I don't try to push for every last bit of performance so tend to always use settings that don't cause issues.
That's not bad - I think it's very good :-). It takes quite a while to compose responses. If you could fire back an instant reply, I'd never get any of my own stuff done :-). Just kidding :-).
If you haven't checked recently, what started out as 1000+ tasks in progress with auto-aborts happening has now reduced to ~500 and I haven't seen any tasks being canceled. There is a nice cushion between completion time and the deadline. You're doing a good job.
Cheers,
Gary.
Allen wrote:Just incase it's
)
The startup messages show the host is attached several other projects and has app_config.xml files for some of those as well. I only run Einstein so I don't know if any of that could be having an impact.
You stated in an earlier message that the only tasks that were running were Einstein tasks, so the other projects being listed shouldn't be having any impact. If you ever start accepting work from other projects (even non-GPU work) you should do so slowly and carefully until you are sure there are no adverse side effects. That means don't try to run lots of tasks at some level of multiplicity until you are sure that it's safe to do so.
Since you're running Windows 7 (I know nothing about Windows) are graphics drivers and OpenCL libraries still being maintained? In Linux, there doesn't seem to be any issue with older versions of OpenCL but I don't know if that applies to Windows as well.
One other thing I've tried is to see if the GPU that crunches a task can be easily identified. By clicking the TaskID link for any task of interest, and scrolling down through the stderr output, you can find the actual command string used to launch the task. The task with the shortest crunch time (2698s) mentioned in the previous message had --device 1 as a command line option. That is the second GPU - the device with the low GFLOPS value you originally asked about. I also looked at this task - a --device 0 task that took 3312s (and had the higher GFLOPS value) so you can see that GFLOPS is being misreported for some reason and is not having a negative impact on crunch times. In other words, you can't claim that a higher GFLOPS correlates with faster crunching :-).
Cheers,
Gary.