Hey Guys,
I recently picked up a new AMD Radeon VII as an upgrade from my previous two RX480s. As soon as Einstein starts work on the GPU it locks up the whole machine. I think it may be a driver issue, as the GPU does fine under other workload like gaming that use a different driver. I'm not new to Einstien@home but I am new to posting to the forums so let me know what information I can provide. Below is some information I hope someone will find helpful.
OS: Arch Linux
Kernel: 4.20.13
CPU: AMD Ryzen 7 1700X
RAM: 32GB
GPU: AMD Radeon VII
Drivers: Mesa 18.3.4
Occasionally, I'll see an error that mentions the GPU is not configure to be reset or something like that. I've been doing some researching online and I sound like this reset option will be enabled by default in Kernel 4.21 or 5.0. Here is the end of the system log.
Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [Einstein@Home] General prefs: from Einstein@Home (last modified 13-> Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [Einstein@Home] Host location: none Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [Einstein@Home] General prefs: using your defaults Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [---] Reading preferences override file Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [---] Preferences: Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [---] max memory usage when active: 16081.74 MB Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [---] max memory usage when idle: 25730.79 MB Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [---] max disk usage: 32.00 GB Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [---] Number of usable CPUs has changed from 16 to 13. Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [---] max CPUs used: 13 Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [---] don't use GPU while active Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [---] suspend work if non-BOINC CPU load exceeds 25% Mar 02 19:44:00 kludge boinc[1836]: 02-Mar-2019 19:44:00 [---] (to change preferences, visit a project web site or select > Mar 02 19:44:16 kludge boinc[1836]: No protocol specified Mar 02 19:44:16 kludge boinc[1836]: 02-Mar-2019 19:44:16 [---] Resuming GPU computation Mar 02 19:44:16 kludge boinc[1836]: No protocol specified Mar 02 19:44:17 kludge boinc[1836]: No protocol specified Mar 02 19:44:18 kludge boinc[1836]: No protocol specified Mar 02 19:44:19 kludge boinc[1836]: No protocol specified Mar 02 19:44:21 kludge boinc[1836]: No protocol specified Mar 02 19:44:22 kludge boinc[1836]: No protocol specified Mar 02 19:44:23 kludge boinc[1836]: No protocol specified Mar 02 19:44:24 kludge boinc[1836]: No protocol specified
Copyright © 2024 Einstein@Home. All rights reserved.
derek wrote:Hey Guys, I
)
Hi Derek,
The best way to post log snips is to enclose them in BBCode code tags. You can also use font and size tags to control what the log excerpt looks like - particularly useful for longer lines of data in columns. As an example, I've added the tags to your original message and have reproduced it in full so you can see how much easier it is to read.
I presume that the above is part of the startup messages you see in the event log when you launch the client. The more useful information will be the very start of the log where the GPU detection is performed and the OpenCL detection confirms that usable OpenCL libs are installed. Perhaps you could post everything that comes before the above in the log.
The file stdoutdae.txt in the client directory should contain all those lines. The lines from any startup of the client will do. I've never seen "No protocol specified" messages before. I don't know if those are part of 'standard' BOINC or if perhaps it's something added to the Arch version of the client or perhaps it's something else entirely. Have you tried running clinfo to see what that says about the GPU? How does Arch handle the installation of the OpenCL libs? Am I correct in presuming that you had work from when you were running the RX 480s and it's this same work that is failing after you have swapped to the new card? If so, that would suggest there is a missing driver component that you might have to do some research on.
I just want to confirm that the OpenCL capabilities of the card are being properly detected. Perhaps the problem might be related to that. I run a lot of AMD GPUs on Linux (not Arch) so I'm quite interested to see how the new Radeon VII goes. It seems to be quite a hit here under Windows :-) - (based only on a very early report). Maybe you might get better help on the Arch forums. I don't think anyone here has got this card crunching under Linux.
Cheers,
Gary.
Thanks Gary, Here is the
)
Thanks Gary,
Here is the output from clinfo.
Number of platforms 1 Platform Name Clover Platform Vendor Mesa Platform Version OpenCL 1.1 Mesa 18.3.4 Platform Profile FULL_PROFILE Platform Extensions cl_khr_icd Platform Extensions function suffix MESA
Platform Name Clover Number of devices 1 Device Name AMD VEGA20 (DRM 3.27.0, 4.20.13-arch1-1-ARCH, LLVM 7.0.1) Device Vendor AMD Device Vendor ID 0x1002 Device Version OpenCL 1.1 Mesa 18.3.4 Driver Version 18.3.4 Device OpenCL C Version OpenCL C 1.1 Device Type GPU Device Profile FULL_PROFILE Device Available Yes Compiler Available Yes Max compute units 60 Max clock frequency 1802MHz Max work item dimensions 3 Max work item sizes 256x256x256 Max work group size 256 Preferred work group size multiple 64 Preferred / native vector sizes char 16 / 16 short 8 / 8 int 4 / 4 long 2 / 2 half 8 / 8 (cl_khr_fp16) float 4 / 4 double 2 / 2 (cl_khr_fp64) Half-precision Floating-point support (cl_khr_fp16) Denormals No Infinity and NANs Yes Round to nearest Yes Round to zero No Round to infinity No IEEE754-2008 fused multiply-add No Support is emulated in software No Single-precision Floating-point support (core) Denormals No Infinity and NANs Yes Round to nearest Yes Round to zero No Round to infinity No IEEE754-2008 fused multiply-add No Support is emulated in software No Correctly-rounded divide and sqrt operations No Double-precision Floating-point support (cl_khr_fp64) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Address bits 64, Little-Endian Global memory size 17163091968 (15.98GiB) Error Correction support No Max memory allocation 13730473574 (12.79GiB) Unified memory for Host and Device No Minimum alignment for any data type 128 bytes Alignment of base address 32768 bits (4096 bytes) Global Memory cache type None Image support No Local memory type Local Local memory size 32768 (32KiB) Max number of constant args 16 Max constant buffer size 2147483647 (2GiB) Max size of kernel argument 1024 Queue properties Out-of-order execution No Profiling Yes Profiling timer resolution 0ns Execution capabilities Run OpenCL kernels Yes Run native kernels No Device Extensions cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64 cl_khr_fp16
NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) Clover clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [MESA] clCreateContext(NULL, ...) [default] Success [MESA] clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1) Platform Name Clover Device Name AMD VEGA20 (DRM 3.27.0, 4.20.13-arch1-1-ARCH, LLVM 7.0.1) clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1) Platform Name Clover Device Name AMD VEGA20 (DRM 3.27.0, 4.20.13-arch1-1-ARCH, LLVM 7.0.1) clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1) Platform Name Clover Device Name AMD VEGA20 (DRM 3.27.0, 4.20.13-arch1-1-ARCH, LLVM 7.0.1)
ICD loader properties ICD loader Name OpenCL ICD Loader ICD loader Vendor OCL Icd free software ICD loader Version 2.2.12 ICD loader Profile OpenCL 2.2
The contents of the stdoutgpudetext.txt are as follows
cc_config.xml not found - using defaults
Here is the journal log of the client starting showing it recognizes the Radeon VII (Vega 20) GPU.
Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] Starting BOINC client version 7.12.1 for x86_64-pc-linux-gnu Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] log flags: file_xfer, sched_ops, task Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] Libraries: libcurl/7.64.0 OpenSSL/1.1.1b zlib/1.2.11 libidn2/2> Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] Data directory: /var/lib/boinc Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] OpenCL: AMD/ATI GPU 0: AMD VEGA20 (DRM 3.27.0, 4.20.13-arch1-1> Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] [libc detection] gathered: 2.28, GNU libc Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] Host name: kludge Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] Processor: 16 AuthenticAMD AMD Ryzen 7 1700X Eight-Core Proces> Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic se> Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] OS: Linux Arch Linux: Arch Linux [4.20.13-arch1-1-ARCH|libc 2.> Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] Memory: 31.41 GB physical, 0 bytes virtual Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] Disk: 227.75 GB total, 198.72 GB free Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] Local time is UTC -6 hours Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [Einstein@Home] URL http://einstein.phys.uwm.edu/; Computer ID 12763> Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [Einstein@Home] General prefs: from Einstein@Home (last modified 13-> Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [Einstein@Home] Host location: none Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [Einstein@Home] General prefs: using your defaults Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] Reading preferences override file Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] Preferences: Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] max memory usage when active: 16081.74 MB Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] max memory usage when idle: 25730.78 MB Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] max disk usage: 32.00 GB Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] max CPUs used: 13 Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] don't use GPU while active Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] suspend work if non-BOINC CPU load exceeds 25% Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] (to change preferences, visit a project web site or select > Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] Setting up project and slot directories Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] Checking active tasks Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] Setting up GUI RPC socket Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] Checking presence of 114 project files Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 Initialization completed Mar 03 09:13:10 kludge boinc[6486]: 03-Mar-2019 09:13:10 [---] Suspending GPU computation - computer is in use Mar 03 09:14:17 kludge org.gnome.Shell.desktop[5729]: Window manager warning: Buggy client sent a _NET_ACTIVE_WINDOW message > Mar 03 09:14:17 kludge org.gnome.Shell.desktop[5729]: Window manager warning: Buggy client sent a _NET_ACTIVE_WINDOW message > Mar 03 09:14:19 kludge systemd-timesyncd[719]: Synchronized to time server for the first time 107.155.79.108:123 (3.arch.pool> Mar 03 09:14:19 kludge gnome-software[5969]: libostree pull from 'flathub' for appstream2/x86_64 complete security: GPG: summary+commit http: TLS non-delta: meta: 2 content: 0 transfer: secs: 0 size: 791 bytes Mar 03 09:14:20 kludge gnome-software[5969]: libostree pull from 'flathub' for appstream2/x86_64 complete security: GPG: summary+commit http: TLS non-delta: meta: 5 content: 5 transfer: secs: 0 size: 1.7 MB Mar 03 09:14:20 kludge gnome-software[5969]: /var/tmp/flatpak-cache-DGGOXZ/repo-r5UPIy: Pulled appstream2/x86_64 from flathub Mar 03 09:14:20 kludge dbus-daemon[724]: [system] Activating via systemd: service name='org.freedesktop.Flatpak.SystemHelper'> Mar 03 09:14:20 kludge systemd[1]: Starting flatpak system helper... Mar 03 09:14:20 kludge dbus-daemon[724]: [system] Successfully activated service 'org.freedesktop.Flatpak.SystemHelper' Mar 03 09:14:20 kludge systemd[1]: Started flatpak system helper. Mar 03 09:14:20 kludge audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=flatpak-system-helper com> Mar 03 09:14:20 kludge kernel: audit: type=1130 audit(1551626060.371:67): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='uni> Mar 03 09:14:20 kludge flatpak-system-helper[10613]: system: Pulled appstream2/x86_64 from /var/tmp/flatpak-cache-DGGOXZ/repo> Mar 03 09:14:42 kludge boinc[6486]: No protocol specified Mar 03 09:14:42 kludge boinc[6486]: 03-Mar-2019 09:14:42 [---] Resuming GPU computation Mar 03 09:14:43 kludge boinc[6486]: No protocol specified
I've always read about the BOINC clients standard output file like you mentioned, but I've never actually seen it in real life.... I listed the contents of /var/lib/boinc and as you can see I don't have that file.
-rw-r--r-- 1 boinc boinc 3667 Feb 20 21:10 account_einstein.phys.uwm.edu.xml -rw-r--r-- 1 boinc boinc 57883 Feb 20 20:49 all_projects_list.xml -rw-r--r-- 1 boinc boinc 184089 Mar 3 09:13 client_state_prev.xml -rw-r--r-- 1 boinc boinc 184089 Mar 3 09:13 client_state.xml -rw-r--r-- 1 boinc boinc 1897 Mar 3 09:13 coproc_info.xml -rw-r--r-- 1 boinc boinc 364 Mar 3 09:11 daily_xfer_history.xml -rw-r--r-- 1 boinc boinc 12592 Feb 20 21:03 get_current_version.xml -rw-r--r-- 1 boinc boinc 14064 Feb 20 21:02 get_project_config.xml -rw-r--r-- 1 boinc boinc 1497 Mar 2 19:44 global_prefs_override.xml -rw-r--r-- 1 boinc boinc 1649 Feb 20 21:03 global_prefs.xml -rw-r----- 1 boinc boinc 32 Feb 20 20:49 gui_rpc_auth.cfg -rw-r--r-- 1 boinc boinc 0 Mar 3 09:12 lockfile -rw-r--r-- 1 boinc boinc 138 Feb 20 21:02 lookup_account.xml -rw-r--r-- 1 boinc boinc 14460 Feb 20 21:03 master_einstein.phys.uwm.edu.xml drwxrwx--x 2 boinc boinc 4096 Mar 3 09:13 notices drwxrwx--x 3 boinc boinc 4096 Feb 20 21:02 projects -rw-r--r-- 1 boinc boinc 90694 Feb 20 21:10 sched_reply_einstein.phys.uwm.edu.xml -rw-r--r-- 1 boinc boinc 13811 Feb 20 21:10 sched_request_einstein.phys.uwm.edu.xml drwxrwx--x 2 boinc boinc 4096 Mar 3 09:11 slots -rw-r--r-- 1 boinc boinc 435 Feb 20 21:10 statistics_einstein.phys.uwm.edu.xml -rw-r--r-- 1 boinc boinc 0 Feb 20 20:49 stderrgpudetect.txt -rw-r--r-- 1 boinc boinc 748 Mar 3 09:13 stdoutgpudetect.txt -rw-r--r-- 1 boinc boinc 2403 Mar 3 09:13 time_stats_log
This is the event log from the BOINC Manager GUI though, generally i find it mirrors what I can find with journalctl...
Sun 03 Mar 2019 09:12:46 AM CST | | cc_config.xml not found - using defaults Sun 03 Mar 2019 09:13:10 AM CST | | Starting BOINC client version 7.12.1 for x86_64-pc-linux-gnu Sun 03 Mar 2019 09:13:10 AM CST | | log flags: file_xfer, sched_ops, task Sun 03 Mar 2019 09:13:10 AM CST | | Libraries: libcurl/7.64.0 OpenSSL/1.1.1b zlib/1.2.11 libidn2/2.1.1 libpsl/0.20.2 (+libidn2/2.1.1) libssh2/1.8.0 nghttp2/1.36.0 Sun 03 Mar 2019 09:13:10 AM CST | | Data directory: /var/lib/boinc Sun 03 Mar 2019 09:13:10 AM CST | | OpenCL: AMD/ATI GPU 0: AMD VEGA20 (DRM 3.27.0, 4.20.13-arch1-1-ARCH, LLVM 7.0.1) (driver version 18.3.4, device version OpenCL 1.1 Mesa 18.3.4, 16368MB, 16368MB available, 8650 GFLOPS peak) Sun 03 Mar 2019 09:13:10 AM CST | | [libc detection] gathered: 2.28, GNU libc Sun 03 Mar 2019 09:13:10 AM CST | | Host name: kludge Sun 03 Mar 2019 09:13:10 AM CST | | Processor: 16 AuthenticAMD AMD Ryzen 7 1700X Eight-Core Processor [Family 23 Model 1 Stepping 1] Sun 03 Mar 2019 09:13:10 AM CST | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate sme ssbd sev ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca Sun 03 Mar 2019 09:13:10 AM CST | | OS: Linux Arch Linux: Arch Linux [4.20.13-arch1-1-ARCH|libc 2.28 (GNU libc)] Sun 03 Mar 2019 09:13:10 AM CST | | Memory: 31.41 GB physical, 0 bytes virtual Sun 03 Mar 2019 09:13:10 AM CST | | Disk: 227.75 GB total, 198.72 GB free Sun 03 Mar 2019 09:13:10 AM CST | | Local time is UTC -6 hours Sun 03 Mar 2019 09:13:10 AM CST | Einstein@Home | URL http://einstein.phys.uwm.edu/; Computer ID 12763354; resource share 100 Sun 03 Mar 2019 09:13:10 AM CST | Einstein@Home | General prefs: from Einstein@Home (last modified 13-Oct-2016 10:37:51) Sun 03 Mar 2019 09:13:10 AM CST | Einstein@Home | Host location: none Sun 03 Mar 2019 09:13:10 AM CST | Einstein@Home | General prefs: using your defaults Sun 03 Mar 2019 09:13:10 AM CST | | Reading preferences override file Sun 03 Mar 2019 09:13:10 AM CST | | Preferences: Sun 03 Mar 2019 09:13:10 AM CST | | max memory usage when active: 16081.74 MB Sun 03 Mar 2019 09:13:10 AM CST | | max memory usage when idle: 25730.78 MB Sun 03 Mar 2019 09:13:10 AM CST | | max disk usage: 32.00 GB Sun 03 Mar 2019 09:13:10 AM CST | | max CPUs used: 13 Sun 03 Mar 2019 09:13:10 AM CST | | don't use GPU while active Sun 03 Mar 2019 09:13:10 AM CST | | suspend work if non-BOINC CPU load exceeds 25% Sun 03 Mar 2019 09:13:10 AM CST | | (to change preferences, visit a project web site or select Preferences in the Manager) Sun 03 Mar 2019 09:13:10 AM CST | | Setting up project and slot directories Sun 03 Mar 2019 09:13:10 AM CST | | Checking active tasks Sun 03 Mar 2019 09:13:10 AM CST | | Setting up GUI RPC socket Sun 03 Mar 2019 09:13:10 AM CST | | Checking presence of 114 project files Sun 03 Mar 2019 09:13:10 AM CST | | Suspending GPU computation - computer is in use Sun 03 Mar 2019 09:14:42 AM CST | | Resuming GPU computation Sun 03 Mar 2019 09:16:50 AM CST | | Suspending GPU computation - computer is in use Sun 03 Mar 2019 09:17:51 AM CST | | Resuming GPU computation Sun 03 Mar 2019 09:18:57 AM CST | | Suspending GPU computation - computer is in use Sun 03 Mar 2019 09:20:37 AM CST | | Resuming GPU computation Sun 03 Mar 2019 09:20:39 AM CST | | Suspending GPU computation - computer is in use Sun 03 Mar 2019 09:22:45 AM CST | | Resuming GPU computation Sun 03 Mar 2019 09:24:04 AM CST | | Suspending GPU computation - computer is in use Sun 03 Mar 2019 09:25:09 AM CST | | Resuming GPU computation Sun 03 Mar 2019 09:25:16 AM CST | | Suspending GPU computation - computer is in use Sun 03 Mar 2019 09:26:17 AM CST | | Resuming GPU computation
To answer your question, yes I was working previously with my RX480s. I am genuinely excited to see what it can do for computation considering it is basically a consumer grade MI-50. I have see the "No protocol specified" message for a very long time in the past. We're kind of snow'd in today, so maybe i'll put my Polaris cards back in and confirm I was getting that message with those GPUs as well. In Arch the OpenCL libraries are installed right from the Arch repos. I too have thought it was a driver error, from the system logs it seems to be like the driver may crash. I have updated all of my drivers. As i mentioned, other workload like games using the standard Mesa or the Mesa-Vulkan drivers have no problems, but using the Mesa-OpenCL drivers (BOINC and a few other apps, like LIbreOffice of all things) have had some issues.
Thank you for your help Gary. I really do appreciate you taking the time to help me.
Derek
Derek, I'm rather busy right
)
Derek,
I'm rather busy right now. After a substantial outage, this project now has fresh work and I need to get my fleet back to work and under control again :-). This may take me a while.
I'm wondering if (from the clinfo output) the platform name of 'Clover' and the OpenCL version of 1.1 might be the problem. Whilst Mesa drivers are installed on my machines, I don't use the Clover implementation of OpenCL. I found that to get my Polaris GPUs to work I needed to install OpenCL components from the AMDGPU-PRO package available from AMD. As my distro is RPM based, I used the Red Hat versions of that package, starting with version 16.60 in early 2017 and currently on 18.30 from late in 2018. I was able to work out a small subset of files from the full package which allowed the Einstein app to work without problems.
If you're interested, and when I have things under control, I'll go through exactly what I did. I have no idea if this will work for you. I don't know anything about Arch or the packaging format it uses.
Cheers,
Gary.
Thanks Gary I understand that
)
Thanks Gary I understand that you're busy. This isn't an urgent issue.
I too have the proprietary OpenCL bits from the AMDGPU-PRO currently on 18.50.
derek wrote:... I too have
)
But you're not using those bits. The clinfo output only identifies a single platform - Clover. The first line of output clearly says, "Number of Platforms 1".
I know nothing about Clover, other than it is older and seemingly not well supported. I've seen comments quite a while ago that crunching didn't work using Clover. For all I know, it could be different now.
My guess, with the AMDGPU-PRO 18.50 bits installed, you need to investigate why that platform is not being shown by clinfo. You can have multiple platforms installed and clinfo should show them all - that's my understanding. You should be able to select the platform to use from all those installed.
Perhaps you should ask on the Arch forums why clinfo doesn't show anything except Clover. Maybe it's just a matter of tweaking an environment variable like LD_LIBRARY_PATH so the libs can be found. Where did your AMDGPU-PRO stuff get installed? The standard place is under /opt/amdgpu/ for some things and /opt/amdgpu-pro/ for others. The OpenCL libs are under /opt/amdgpu-pro/lib64/.
Cheers,
Gary.
Thanks for the lead Gary!
)
Thanks for the lead Gary!
Interesting development...
)
Interesting development... If I don't start my desktop environment and just run boinc from the CLI console then everything seems to work fine. So i think it is definitely a driver issue. Hopefully kernel 5.0 and mesa 19 will fix my issues. Until then i'll compute with my system running headless for a few days until Arch provides kernel and mesa updates.