Problem with info posted by Event Log

Ian&Steve C.

Joined: 19 Jan 20

Posts: 4045

Credit: 48047952933

RAC: 34960806

Allen wrote: George, Does

8 Sep 2023 19:25:44 UTC

Message 216733 in response to message 216732

(moderation:

)

Allen wrote:

George,

Does this work with the ATI cards too? I only have Linux on one machine and it has an RX580 in it.

Nice bunch of info. Btw, I am reducing the number of tasks on this machine to see what happens with the timing, thanks!

Allen

no it only works with Nvidia cards. It’s a tool packaged with the Nvidia driver.

_________________________________________________________________________

Allen

Joined: 23 Jan 06

Posts: 75

Credit: 690010199

RAC: 1033958

Thanks IAN&STEVE C, that's

8 Sep 2023 19:37:05 UTC

Message 216735

(moderation:

)

Thanks IAN&STEVE C, that's what I thought. *:>(

GWGeorge007

Joined: 8 Jan 18

Posts: 3118

Credit: 5010063412

RAC: 1630385

Ian&Steve C. wrote:they are

9 Sep 2023 0:53:32 UTC

Message 216750 in response to message 216730

(moderation:

)

Ian&Steve C. wrote:

they are listed as "A" and "B" because that is the actual busid in hex, not just random or because the name field is too long. your system just happens to use 0A and 0B as the hex busids. you can see the bus id listed in your second command which is exactly the same as the normal nvidia-smi command. (ie, 00000000:0A:00.0). the 535 series drivers actually expanded the width of the output so you can see the full device name rather than cutting it off with (...).

I had not realized that so that's more good to know stuff... again. Thank you.

Ian&Steve C. wrote:

GPU-Util = GPU utilization (not utility), how much it is being used in terms of a percentage.

Oh... I knew that... just had a brain fart! Utilization! Got it! ;*)

George

Proud member of the Old Farts Association

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5874

Credit: 118341179039

RAC: 25419763

Allen wrote:...Running five

9 Sep 2023 7:23:00 UTC

Message 216758 in response to message 216731

(moderation:

)

Allen wrote:

...Running five tasks, they runout pretty fast.

How (and why) are 5 tasks running? Please state precisely how you achieve this. You used to run 6 and the instructions I gave were to run 4 (x2 on each GPU).

Allen wrote:

I am thinking of cutting tasks to 4 and see what happens.

And how exactly do you propose to do that? For how long have you been running 5? The only way I can think of an ability to run 5 (without manual intervention) is if you actually have gpu_usage set for 6 but BOINC is only allowed to use a reduced number of CPU cores that actually support only 5. What % of cores is BOINC allowed to use?

Allen wrote:

The two 560's machine is in fact running 4 total tasks at this time. I am using local prefs and I do have an app_config file for it. Times are still in the 50's.

Can you please post a copy of that app_config.xml file? You are clearly doing something incorrect if the time hasn't changed to a lower value for 4 concurrent if you got "in the 50's" when running 6.

When did you last change it and what was the change you made?

Each time you make a change to that file, do you force a "Options->Read config files" through BOINC Manager?

Do you look in the event log to see the response indicating that the change actually happened? You should see "Found app_config.xml".

If you are indeed correctly editing the file (gpu_usage should show 0.5 and not 0.33), exactly where did you install it? It should be in the einstein project folder.

Something is wrong but it's not having a bad effect since the 'in progress' tasks are getting completed the best part of a day before deadline now so you are still winning the race.

##EDIT @ 10:30AM Sunday 10th Sep (UTC+10) (~00:30AM UTC)

All completed tasks continue to show ~55 min run times. The most recent task was nearly 2 days ahead of its deadline so things are continuing to improve. I'm continuing to wonder why you sometimes mention 5 tasks running, then 4 tasks running. The crunch times would seem to indicate 6 but I'm now wondering if you are seeing a variable number because the complete list of tasks isn't in ascending order of deadline so that maybe there might be tasks running on the second page that you're not seeing.

Please click the "Show active tasks" button on the tasks tab and you will see the short list of running tasks. How many show as active? Just click the button again to show all.

Cheers,
Gary.

Allen

Joined: 23 Jan 06

Posts: 75

Credit: 690010199

RAC: 1033958

Gary, When I spoke of the

10 Sep 2023 3:35:55 UTC

Message 216796

(moderation:

)

Gary,

When I spoke of the 5 tasks running, that was in reference to my 580 machine, not the dual 560's.

I am on my laptop right now and it's not easy to get to my sys file from here, but I do all the things you asked about. Everything shows up in the log.

All is running quite well now and I will soon be going down stairs to see how all the other machines are doing. If I find something that might interest you, I will write you back.

Thanks again for checking on me.

Allen

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5874

Credit: 118341179039

RAC: 25419763

Allen wrote:When I spoke of

10 Sep 2023 7:26:58 UTC

Message 216799 in response to message 216796

(moderation:

)

Allen wrote:

When I spoke of the 5 tasks running, that was in reference to my 580 machine, not the dual 560's.

OK, absolutely my bad. I've got the website tasks list for the dual 560s permanently open and I'm focused on that machine and trying to work out why tasks are pretty rock solid at 55 min when that was the exact same as it used to be for x3 on each GPU. So when I saw references to x4 and x5 I completely missed the fact you were talking about a different machine. Sorry about that. If you don't mind, let's try to solve the mysteries for the dual 560s machine. Once that is solved, you'll be all 'skilled up' and able to tweak the others for their best performance.

The fact is that task times on the dual 560s are still 55 mins and that is far too long if each GPU is only running x2. Did you absolutely confirm only 4 tasks running using the 'Show active tasks' control? Are you intending to paste a copy of app_config.xml in your next message? Is that file the "sys file" you refer to? If not, what is the name of the sys file and why do you need to "get to my sys file"?

Things indeed are improving deadline-wise. The most recently returned result that I just saw is getting on towards 2 days ahead of deadline.

The thing that bothers me the most is this quote from your second message in this thread:-

"I'm running 3 WUs on each of them and timing is about every 50 minutes when nothing fails"

I looked at your tasks list when I first read that and saw some quite variable times with the lowest around 3300 secs (55mins). However there were quite a few higher values as well - a clear sign of a machine under stress. From my own single GPU hosts with RX 460 GPUs, running x2 gives maybe 34 mins so for x3 it should translate to 34 x 1.5 = 51 mins. Some of your tasks at the 55 min mark were close to the expected speed. The others (I saw some up to 4800 secs) were clearly in distress.

With all your current tasks showing ~3300 secs (55 mins) the real mystery is why that hasn't reduced to 2200 secs (37 mins) with the reduction to x2. Either you were really only running x2 when you started this thread and you just have extremely slow 560s for some reason, or (much more likely) there is some specific thing to find that is causing very sub-optimal behaviour. Are you sure the GPUs aren't running at some much reduced clock rate?

Cheers,
Gary.

Allen

Joined: 23 Jan 06

Posts: 75

Credit: 690010199

RAC: 1033958

Hi Gary, Everything here

10 Sep 2023 17:53:19 UTC

Message 216817

(moderation:

)

Hi Gary,

Everything here points to the fact that I am running 4 total tasks for the 2 560's.

Here is an output from one of the recent tasks near completion.

Computer:   Alpha-8
Project   Einstein@Home

Name   LATeah4021L30_1196.0_0_0.0_14299131_0

Application   Gamma-ray pulsar binary search #1 on GPUs 1.22 (FGRPopencl1K-ati)
Workunit name   LATeah4021L30_1196.0_0_0.0_14299131
State   Running High P.
Received   8/29/2023 6:25:37 AM
Report deadline   9/12/2023 6:25:35 AM
Estimated app speed   101.51 GFLOPs/sec
Estimated task size   525,000 GFLOPs
Resources   0.25 CPUs + 0.5 AMD/ATI GPUs (device 0)
CPU time at last checkpoint   00:03:21
CPU time   00:03:25
Elapsed time   00:54:48
Estimated time remaining   00:05:38
Fraction done   89.791%
Virtual memory size   186.05 MB
Working set size   163.23 MB
Directory   slots/3
Process ID   5992

And here is my app_config.xml file, from my Einstein directory.......

9/7/2023

<app_config>
<app>
<name>einsteinbinary_BRP7</name>
<gpu_versions>
<gpu_usage>0.33</gpu_usage>
<cpu_usage>.20</cpu_usage>
</gpu_versions>
</app>
<app>
<name>hsgamma_FGRPB1G</name>
<gpu_versions>
<gpu_usage>0.50</gpu_usage>
<cpu_usage>.25</cpu_usage>
</gpu_versions>
</app>
<app>
<name>einsteinbinary_BRP4G</name>
<non_cpu_intensive>0</non_cpu_intensive>
</app>
</app_config>

Too bad we're on opposite sides of the World. Makes for fewer communications.

Hope I have answered your questions for now.

Allen

Joined: 23 Jan 06

Posts: 75

Credit: 690010199

RAC: 1033958

Gary, Just incase it's

10 Sep 2023 18:08:10 UTC

Message 216818

(moderation:

)

Gary,

Just incase it's important and you may see something odd, here is a recent log file created after a restarting of Boinc. Still showing 55:15 per task.

9/10/2023 2:05:29 PM | | cc_config.xml not found - using defaults
9/10/2023 2:05:29 PM | | Starting BOINC client version 7.22.2 for windows_x86_64
9/10/2023 2:05:29 PM | | log flags: file_xfer, sched_ops, task
9/10/2023 2:05:29 PM | | Libraries: libcurl/8.0.1-DEV Schannel zlib/1.2.13
9/10/2023 2:05:29 PM | | Data directory: C:\ProgramData\BOINC
9/10/2023 2:05:29 PM | | Running under account Allen
9/10/2023 2:05:30 PM | | OpenCL: AMD/ATI GPU 0: Radeon RX 560 Series (driver version 2906.10, device version OpenCL 2.0 AMD-APP (2906.10), 4096MB, 4096MB available, 2392 GFLOPS peak)
9/10/2023 2:05:30 PM | | OpenCL: AMD/ATI GPU 1: Radeon RX 560 Series (driver version 2906.10, device version OpenCL 2.0 AMD-APP (2906.10), 4096MB, 4096MB available, 438 GFLOPS peak)
9/10/2023 2:05:30 PM | | Windows processor group 0: 8 processors
9/10/2023 2:05:30 PM | | Host name: Alpha-8
9/10/2023 2:05:30 PM | | Processor: 8 AuthenticAMD AMD FX(tm)-8300 Eight-Core Processor [Family 21 Model 2 Stepping 0]
9/10/2023 2:05:30 PM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni ssse3 fma cx16 sse4_1 sse4_2 popcnt aes f16c syscall nx lm avx svm sse4a osvw ibs xop skinit wdt lwp fma4 tce tbm topx page1gb rdtscp bmi1
9/10/2023 2:05:30 PM | | OS: Microsoft Windows 7: Professional x64 Edition, Service Pack 1, (06.01.7601.00)
9/10/2023 2:05:30 PM | | Memory: 7.97 GB physical, 15.93 GB virtual
9/10/2023 2:05:30 PM | | Disk: 931.29 GB total, 778.33 GB free
9/10/2023 2:05:30 PM | | Local time is UTC -4 hours
9/10/2023 2:05:30 PM | | No WSL found.
9/10/2023 2:05:30 PM | Einstein@Home | Found app_config.xml
9/10/2023 2:05:30 PM | Milkyway@Home | Found app_config.xml
9/10/2023 2:05:30 PM | Milkyway@Home | milkyway_nbody: Max 1 concurrent jobs
9/10/2023 2:05:30 PM | Einstein@Home | General prefs: from Einstein@Home (last modified ---)
9/10/2023 2:05:30 PM | Einstein@Home | Computer location: work
9/10/2023 2:05:30 PM | | General prefs: using separate prefs for work
9/10/2023 2:05:30 PM | | Reading preferences override file
9/10/2023 2:05:30 PM | | Preferences:
9/10/2023 2:05:30 PM | | - When computer is in use
9/10/2023 2:05:30 PM | | -     'In use' means mouse/keyboard input in last 3.0 minutes
9/10/2023 2:05:30 PM | | -     max CPUs used: 8
9/10/2023 2:05:30 PM | | -     Use at most 100% of the CPU time
9/10/2023 2:05:30 PM | | -     suspend if non-BOINC CPU load exceeds 80%
9/10/2023 2:05:30 PM | | -     max memory usage: 5.58 GB
9/10/2023 2:05:30 PM | | - When computer is not in use
9/10/2023 2:05:30 PM | | -     max CPUs used: 8
9/10/2023 2:05:30 PM | | -     Use at most 100% of the CPU time
9/10/2023 2:05:30 PM | | -     suspend if non-BOINC CPU load exceeds 80%
9/10/2023 2:05:30 PM | | -     max memory usage: 7.97 GB
9/10/2023 2:05:30 PM | | - Suspend if running on batteries
9/10/2023 2:05:30 PM | | - Leave apps in memory if not running
9/10/2023 2:05:30 PM | | - Store at least 2.00 days of work
9/10/2023 2:05:30 PM | | - Store up to an additional 0.00 days of work
9/10/2023 2:05:30 PM | | - max disk usage: 30.00 GB
9/10/2023 2:05:30 PM | | - (to change preferences, visit a project web site or select Preferences in the Manager)
9/10/2023 2:05:30 PM | | Setting up project and slot directories
9/10/2023 2:05:30 PM | | Checking active tasks
9/10/2023 2:05:30 PM | Asteroids@home | URL http://asteroidsathome.net/boinc/; Computer ID 677129; resource share 100
9/10/2023 2:05:30 PM | Einstein@Home | URL https://einstein.phys.uwm.edu/; Computer ID 12883914; resource share 100
9/10/2023 2:05:30 PM | Einstein@Home | Not using CPU: project preferences
9/10/2023 2:05:30 PM | Milkyway@Home | URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 873674; resource share 100
9/10/2023 2:05:30 PM | SETI@home | URL http://setiathome.berkeley.edu/; Computer ID 8411641; resource share 100
9/10/2023 2:05:30 PM | Universe@Home | URL https://universeathome.pl/universe/; Computer ID 594819; resource share 100
9/10/2023 2:05:30 PM | | Setting up GUI RPC socket
9/10/2023 2:05:30 PM | | Checking presence of 1817 project files

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5874

Credit: 118341179039

RAC: 25419763

Allen wrote:Everything here

11 Sep 2023 2:18:35 UTC

Message 216840 in response to message 216817

(moderation:

)

Allen wrote:

Everything here points to the fact that I am running 4 total tasks for the 2 560's.

Except for the run times which are strangely longer than they should be. I have the same model of GPUs and they crunch much faster than yours, seemingly. There has to be a reason.

Thanks for sending both the app_config.xml and the task properties for a running task as produced by the manager. The former looks OK for FGRPB1G and there are also entries for BRP7 and BRP4G which you are not currently running. When FGRPB1G finishes, the logical replacement is BRP7 but have you run it previously and are you sure that running x3 with it (gpu_usage 0.33) gives an improvement over running x2? My experience is that the x1 -> x2 gives a relatively small improvement and that going to x3 doesn't add any further benefit. It's risky to add complexity if there is no additional benefit.

The other info you sent (task properties output) shows clearly 0.5 of the GPU being used so the reason for the slow times lies elsewhere. It also shows that BOINC is running the task in high priority mode (State Running High P.). That's one step short of panic mode :-).

I've just looked again at the latest completed tasks. Previously, all were very close to 3300s (55m) and all have been immediately validating. The very latest tasks (as I write this) are starting to show better times - take a look on the website for yourself. They now show 3000s (50m) or better. The fastest one I saw was 2698s (45m). This suggests that something that was impacting GPU performance has now lessened.

Have you changed anything that might have been responsible for this? If not, maybe it's just a result of tasks not being run at higher priority quite as much, although that's just grasping at straws. I've never noticed this sort of behaviour before, although (for my own sanity) I don't try to push for every last bit of performance so tend to always use settings that don't cause issues.

Allen wrote:

Too bad we're on opposite sides of the World. Makes for fewer communications.

That's not bad - I think it's very good :-). It takes quite a while to compose responses. If you could fire back an instant reply, I'd never get any of my own stuff done :-). Just kidding :-).

If you haven't checked recently, what started out as 1000+ tasks in progress with auto-aborts happening has now reduced to ~500 and I haven't seen any tasks being canceled. There is a nice cushion between completion time and the deadline. You're doing a good job.

Cheers,
Gary.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5874

Credit: 118341179039

RAC: 25419763

Allen wrote:Just incase it's

11 Sep 2023 3:24:50 UTC

Message 216842 in response to message 216818

(moderation:

)

Allen wrote:

Just incase it's important and you may see something odd, here is a recent log file created after a restarting of Boinc. Still showing 55:15 per task.

The startup messages show the host is attached several other projects and has app_config.xml files for some of those as well. I only run Einstein so I don't know if any of that could be having an impact.

You stated in an earlier message that the only tasks that were running were Einstein tasks, so the other projects being listed shouldn't be having any impact. If you ever start accepting work from other projects (even non-GPU work) you should do so slowly and carefully until you are sure there are no adverse side effects. That means don't try to run lots of tasks at some level of multiplicity until you are sure that it's safe to do so.

Since you're running Windows 7 (I know nothing about Windows) are graphics drivers and OpenCL libraries still being maintained? In Linux, there doesn't seem to be any issue with older versions of OpenCL but I don't know if that applies to Windows as well.

One other thing I've tried is to see if the GPU that crunches a task can be easily identified. By clicking the TaskID link for any task of interest, and scrolling down through the stderr output, you can find the actual command string used to launch the task. The task with the shortest crunch time (2698s) mentioned in the previous message had --device 1 as a command line option. That is the second GPU - the device with the low GFLOPS value you originally asked about. I also looked at this task - a --device 0 task that took 3312s (and had the higher GFLOPS value) so you can see that GFLOPS is being misreported for some reason and is not having a negative impact on crunch times. In other words, you can't claim that a higher GFLOPS correlates with faster crunching :-).

Cheers,
Gary.

Problem with info posted by Event Log

Forums › Problems and Bug Reports

Comment viewing options

Forums › Problems and Bug Reports