[SOLVED] ATI/AMD OpenCL tasks won't run because not enough mem?

Przemysław Kowalczyk
Przemysław Kowalczyk
Joined: 29 Dec 20
Posts: 3
Credit: 2265074
RAC: 73
Topic 228120

Hello everyone,

I am having problems running GPU tasks on my AMD Radeon 5700 XT, Arch Linux. The log shows:

<core_client_version>7.18.1</core_client_version>
<![CDATA[
<message>
aborted by user</message>
<stderr_txt>
[20:28:07][7710][INFO ] Application startup - thank you for supporting Einstein@Home!
[20:28:07][7710][INFO ] Starting data processing...
[20:28:07][7710][INFO ] Using OpenCL platform provided by: Advanced Micro Devices, Inc.
[20:28:07][7710][INFO ] Using OpenCL device "gfx1010" by: Advanced Micro Devices, Inc.
[20:28:07][7710][ERROR] Couldn't create OpenCL command queue (error: -6)!
[20:28:07][7710][INFO ] OpenCL shutdown complete!
[20:28:07][7710][ERROR] Demodulation failed (error: 2013)!
[20:28:07][7710][WARN ] Sorry, at the moment your system doesn't have enough free CPU/GPU memory to run this task!
------> Returning control to BOINC, delaying next attempt for at least 15 minutes...
------> If this problem persists you should consider aborting this task...

</stderr_txt>
]]>


This is a little suprising, because I have 32 GB RAM, GPU has 8 GB and this error is reported even if BOINC is the only significant process running, with no CPU tasks.

The GPU stack itself seems to be OK, I have tried it with Geekbench OpenCL benchmark and SuperTuxKart :). I use kernel (amdgpu) driver, kernel version 5.19.7.

GPU used to be working in BOINC (E@H) about a year ago, but obviously my system underwent many updates since then...

Please tell me how can I further diagnose and ultimately fix this issue.

mikey
mikey
Joined: 22 Jan 05
Posts: 12776
Credit: 1861084874
RAC: 1446843

Przemysław Kowalczyk

Przemysław Kowalczyk wrote:

Hello everyone,

I am having problems running GPU tasks on my AMD Radeon 5700 XT, Arch Linux. The log shows:

<core_client_version>7.18.1</core_client_version>
<![CDATA[
<message>
aborted by user</message>
<stderr_txt>
[20:28:07][7710][INFO ] Application startup - thank you for supporting Einstein@Home!
[20:28:07][7710][INFO ] Starting data processing...
[20:28:07][7710][INFO ] Using OpenCL platform provided by: Advanced Micro Devices, Inc.
[20:28:07][7710][INFO ] Using OpenCL device "gfx1010" by: Advanced Micro Devices, Inc.
[20:28:07][7710][ERROR] Couldn't create OpenCL command queue (error: -6)!
[20:28:07][7710][INFO ] OpenCL shutdown complete!
[20:28:07][7710][ERROR] Demodulation failed (error: 2013)!
[20:28:07][7710][WARN ] Sorry, at the moment your system doesn't have enough free CPU/GPU memory to run this task!
------> Returning control to BOINC, delaying next attempt for at least 15 minutes...
------> If this problem persists you should consider aborting this task...

</stderr_txt>
]]>


This is a little suprising, because I have 32 GB RAM, GPU has 8 GB and this error is reported even if BOINC is the only significant process running, with no CPU tasks.

The GPU stack itself seems to be OK, I have tried it with Geekbench OpenCL benchmark and SuperTuxKart :). I use kernel (amdgpu) driver, kernel version 5.19.7.

GPU used to be working in BOINC (E@H) about a year ago, but obviously my system underwent many updates since then...

Please tell me how can I further diagnose and ultimately fix this issue. 

You might want to read thru a few pages of this to see if any of it helps, Keith Myers is an EXCELLENT Linux person

GWGeorge007
GWGeorge007
Joined: 8 Jan 18
Posts: 3117
Credit: 5008726749
RAC: 1570133

mikey wrote: You might want

mikey wrote:

You might want to read thru a few pages of this to see if any of it helps, Keith Myers is an EXCELLENT Linux person

Mikey, you may want to actually give him the link so that he CAN read through it.

And I concur, Keith is an EXCELLENT Linux person.

George

Proud member of the Old Farts Association

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 5020
Credit: 18921071207
RAC: 6508770

Hard to help troubleshoot

Hard to help troubleshoot when the host is hidden.  AMD drivers are a bitch to figure out.  And I have never run any AMD cards.  Just have seen many, many posts about AMD driver issues and have seen a few things that are usually involved.

Main one being are the correct OpenCL components installed.  Also, the driver version is usually tied specifically to a kernel version.  Different kernels don't work with different driver versions.

And I know very little about Arch Linux which appears to be a very technical distro that does things very different compared to more mainstream distros.

 

mountkidd
mountkidd
Joined: 14 Jun 12
Posts: 177
Credit: 12732981936
RAC: 4600617

I don't believe the

I don't believe the not-enough-memory is the real problem.  A few lines up in stderr:

[ERROR] Couldn't create OpenCL command queue (error: -6)

is likely the underlying problem, with not-enough-memory as an exit condition/result. 

I had the same problem with ppa boinc 7.18 six months ago - opencl showed valid info, but boinc couldn't create the command queue.  With guidance from @Wedge009, who had similar problems with 7.18, rolling back boinc to 7.16.17 solved the problem for me, without having to make any changes to amdgpu or the Ubuntu 7.20.04 install.

Btw, stderr shows boinc 7.18.1, whereas the linked host shows boinc 7.20.2. There may be some inconsistencies in your boinc install.  Are you getting boinc from a Arch Linux repo or ppa?

mikey
mikey
Joined: 22 Jan 05
Posts: 12776
Credit: 1861084874
RAC: 1446843

GWGeorge007 wrote: mikey

GWGeorge007 wrote:

mikey wrote:

You might want to read thru a few pages of this to see if any of it helps, Keith Myers is an EXCELLENT Linux person

Mikey, you may want to actually give him the link so that he CAN read through it.

And I concur, Keith is an EXCELLENT Linux person. 

ROFL!!

https://einsteinathome.org/content/em-searches-brp-raidiopulsar-and-fgrp-gamma-ray-pulsar

GWGeorge007
GWGeorge007
Joined: 8 Jan 18
Posts: 3117
Credit: 5008726749
RAC: 1570133

mikey wrote: GWGeorge007

mikey wrote:

GWGeorge007 wrote:

mikey wrote:

You might want to read thru a few pages of this to see if any of it helps, Keith Myers is an EXCELLENT Linux person

Mikey, you may want to actually give him the link so that he CAN read through it.

And I concur, Keith is an EXCELLENT Linux person. 

ROFL!!

https://einsteinathome.org/content/em-searches-brp-raidiopulsar-and-fgrp-gamma-ray-pulsar

Ataboy!!  It's about time... !!!

George

Proud member of the Old Farts Association

mikey
mikey
Joined: 22 Jan 05
Posts: 12776
Credit: 1861084874
RAC: 1446843

GWGeorge007 wrote: mikey

GWGeorge007 wrote:

mikey wrote:

GWGeorge007 wrote:

mikey wrote:

You might want to read thru a few pages of this to see if any of it helps, Keith Myers is an EXCELLENT Linux person

Mikey, you may want to actually give him the link so that he CAN read through it.

And I concur, Keith is an EXCELLENT Linux person. 

ROFL!!

https://einsteinathome.org/content/em-searches-brp-raidiopulsar-and-fgrp-gamma-ray-pulsar

Ataboy!!  It's about time... !!! 

LOL!!

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 4045
Credit: 48036147057
RAC: 35294114

cecht was having similar

cecht was having similar issues getting his 5600XT crunching on a newer kernel and drivers.

he ended up reverting his system to older drivers and older kernel to resolve the issue.

 

https://einsteinathome.org/content/troubleshooting-ubuntu-20-and-fresh-install-amd-drivers?page=4#comment-200833

 

you might also try the ROCm drivers instead of the amdgpu drivers.

_________________________________________________________________________

Przemysław Kowalczyk
Przemysław Kowalczyk
Joined: 29 Dec 20
Posts: 3
Credit: 2265074
RAC: 73

Keith Myers wrote: Hard to

Keith Myers wrote:

Hard to help troubleshoot when the host is hidden. [...]

Info disclosed.

Keith Myers wrote:

[...] Main one being are the correct OpenCL components installed.  Also, the driver version is usually tied specifically to a kernel version.  Different kernels don't work with different driver versions. [...]

To clarify: latest kernel, latest Mesa, opencl-amd package (opencl-mesa does not support my card yet, learnt it the hard way...). Also tried to install opencl-amd-dev, but error remains the same. Arch is a rolling-release distro and opencl-amd package, although outside of mainstream repository, seems to be up-to-date with AMD's Debian packages, which are the source.

mountkidd wrote:

[...] Btw, stderr shows boinc 7.18.1, whereas the linked host shows boinc 7.20.2. There may be some inconsistencies in your boinc install.  Are you getting boinc from a Arch Linux repo or ppa?

This is because at first I ran BOINC 7.18.1 after a rather long period of not using it, saw the error, updated the whole OS, now it's BOINC 7.20.2.

Thanks, will have a look. Unfortunately it is for Ubuntu, fortunately Arch uses Debian packages as a basis for its own.

Ian&Steve C. wrote:

cecht was having similar issues getting his 5600XT crunching on a newer kernel and drivers.

he ended up reverting his system to older drivers and older kernel to resolve the issue.

 

https://einsteinathome.org/content/troubleshooting-ubuntu-20-and-fresh-install-amd-drivers?page=4#comment-200833

 

you might also try the ROCm drivers instead of the amdgpu drivers.

Not sure the link is correct... No user named cecht there...

Not sure I have a choice with drivers. In the Arch world there is opencl-mesa (incompatible) or opencl-amd, which uses open-source AMD driver and consists of two OpenCL implementations: ROCr and orca (legacy) (more info here). Not entirely sure, but other options look like a variation of those two. I'll try nevertheless.

While trying to find a solution I found this thread. It has been marked as "solved" because "faulty card", but the whole thread contains some info about transition from "PAL" to "ROCm" - these terms are foreign to me for now, but maybe helpful for someone else...

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 4045
Credit: 48036147057
RAC: 35294114

Przemysław Kowalczyk

Przemysław Kowalczyk wrote:

Not sure the link is correct... No user named cecht there...

Not sure I have a choice with drivers. In the Arch world there is opencl-mesa (incompatible) or opencl-amd, which uses open-source AMD driver and consists of two OpenCL implementations: ROCr and orca (legacy) (more info here). Not entirely sure, but other options look like a variation of those two. I'll try nevertheless.

While trying to find a solution I found this thread. It has been marked as "solved" because "faulty card", but the whole thread contains some info about transition from "PAL" to "ROCm" - these terms are foreign to me for now, but maybe helpful for someone else...

the link is correct and takes you directly to his comment on his issue/resolution.

 

I'm not familiar with using Arch, but a quick google landed me at this github repository for ROCm on Arch. maybe it's useful.

 

https://github.com/rocm-arch/rocm-arch

_________________________________________________________________________

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.