Hello everyone,
I am having problems running GPU tasks on my AMD Radeon 5700 XT, Arch Linux. The log shows:
<core_client_version>7.18.1</core_client_version> <![CDATA[ <message> aborted by user</message> <stderr_txt> [20:28:07][7710][INFO ] Application startup - thank you for supporting Einstein@Home! [20:28:07][7710][INFO ] Starting data processing... [20:28:07][7710][INFO ] Using OpenCL platform provided by: Advanced Micro Devices, Inc. [20:28:07][7710][INFO ] Using OpenCL device "gfx1010" by: Advanced Micro Devices, Inc. [20:28:07][7710][ERROR] Couldn't create OpenCL command queue (error: -6)! [20:28:07][7710][INFO ] OpenCL shutdown complete! [20:28:07][7710][ERROR] Demodulation failed (error: 2013)! [20:28:07][7710][WARN ] Sorry, at the moment your system doesn't have enough free CPU/GPU memory to run this task! ------> Returning control to BOINC, delaying next attempt for at least 15 minutes... ------> If this problem persists you should consider aborting this task...</stderr_txt>
]]>
This is a little suprising, because I have 32 GB RAM, GPU has 8 GB and this error is reported even if BOINC is the only significant process running, with no CPU tasks.
The GPU stack itself seems to be OK, I have tried it with Geekbench OpenCL benchmark and SuperTuxKart :). I use kernel (amdgpu) driver, kernel version 5.19.7.
GPU used to be working in BOINC (E@H) about a year ago, but obviously my system underwent many updates since then...
Please tell me how can I further diagnose and ultimately fix this issue.
Copyright © 2024 Einstein@Home. All rights reserved.
Przemysław Kowalczyk
)
You might want to read thru a few pages of this to see if any of it helps, Keith Myers is an EXCELLENT Linux person
mikey wrote: You might want
)
Mikey, you may want to actually give him the link so that he CAN read through it.
And I concur, Keith is an EXCELLENT Linux person.
Proud member of the Old Farts Association
Hard to help troubleshoot
)
Hard to help troubleshoot when the host is hidden. AMD drivers are a bitch to figure out. And I have never run any AMD cards. Just have seen many, many posts about AMD driver issues and have seen a few things that are usually involved.
Main one being are the correct OpenCL components installed. Also, the driver version is usually tied specifically to a kernel version. Different kernels don't work with different driver versions.
And I know very little about Arch Linux which appears to be a very technical distro that does things very different compared to more mainstream distros.
I don't believe the
)
I don't believe the not-enough-memory is the real problem. A few lines up in stderr:
is likely the underlying problem, with not-enough-memory as an exit condition/result.
I had the same problem with ppa boinc 7.18 six months ago - opencl showed valid info, but boinc couldn't create the command queue. With guidance from @Wedge009, who had similar problems with 7.18, rolling back boinc to 7.16.17 solved the problem for me, without having to make any changes to amdgpu or the Ubuntu 7.20.04 install.
Btw, stderr shows boinc 7.18.1, whereas the linked host shows boinc 7.20.2. There may be some inconsistencies in your boinc install. Are you getting boinc from a Arch Linux repo or ppa?
GWGeorge007 wrote: mikey
)
ROFL!!
https://einsteinathome.org/content/em-searches-brp-raidiopulsar-and-fgrp-gamma-ray-pulsar
mikey wrote: GWGeorge007
)
Ataboy!! It's about time... !!!
Proud member of the Old Farts Association
GWGeorge007 wrote: mikey
)
LOL!!
cecht was having similar
)
cecht was having similar issues getting his 5600XT crunching on a newer kernel and drivers.
he ended up reverting his system to older drivers and older kernel to resolve the issue.
https://einsteinathome.org/content/troubleshooting-ubuntu-20-and-fresh-install-amd-drivers?page=4#comment-200833
you might also try the ROCm drivers instead of the amdgpu drivers.
_________________________________________________________________________
Keith Myers wrote: Hard to
)
Info disclosed.
To clarify: latest kernel, latest Mesa, opencl-amd package (opencl-mesa does not support my card yet, learnt it the hard way...). Also tried to install opencl-amd-dev, but error remains the same. Arch is a rolling-release distro and opencl-amd package, although outside of mainstream repository, seems to be up-to-date with AMD's Debian packages, which are the source.
This is because at first I ran BOINC 7.18.1 after a rather long period of not using it, saw the error, updated the whole OS, now it's BOINC 7.20.2.
Thanks, will have a look. Unfortunately it is for Ubuntu, fortunately Arch uses Debian packages as a basis for its own.
Not sure the link is correct... No user named cecht there...
Not sure I have a choice with drivers. In the Arch world there is opencl-mesa (incompatible) or opencl-amd, which uses open-source AMD driver and consists of two OpenCL implementations: ROCr and orca (legacy) (more info here). Not entirely sure, but other options look like a variation of those two. I'll try nevertheless.
While trying to find a solution I found this thread. It has been marked as "solved" because "faulty card", but the whole thread contains some info about transition from "PAL" to "ROCm" - these terms are foreign to me for now, but maybe helpful for someone else...
Przemysław Kowalczyk
)
the link is correct and takes you directly to his comment on his issue/resolution.
I'm not familiar with using Arch, but a quick google landed me at this github repository for ROCm on Arch. maybe it's useful.
https://github.com/rocm-arch/rocm-arch
_________________________________________________________________________