Radeon RX 7900 XTX Linux (Fedora) Boinc Doesn't Detect Usable Driver

mikey
mikey
Joined: 22 Jan 05
Posts: 12694
Credit: 1839100474
RAC: 3703

Paul wrote: mikey

Paul wrote:

mikey wrote:

Have you tried loading up a copy of Ubuntu in a VM box and see if it works there? It might give you some ideas for libraries you aren't using right now.

I've been thinking about this over the day.  I thought of a couple of problems with this idea, but I also thought of solutions for all of them.  The question is, how do I get what you are saying I could get out of it.  So, trying Ubuntu means I can use the amdgpu-install. But, it installs a bunch of packages, silently, I think, so then I need to figure out how to get the apt log, which I assume I can do, but I forget how.  Then I need to look up the 'provides' list, and filter that for libs.  I mean, I think that would help, a little.  I guess I could compare that to the same list on my system.  Seems like a lot of work, but I don't see anything wrong with that approach, in theory.

I could also be useful just to stress test it that way.  At this point, I assume it's not a bad card on delivery, but I also cannot be sure it's not.  I suppose this would be one way to test.  Certainly a better test bed than, say, Windows, for me.

I think what I'm going to do is run down a couple other ideas I have, first.  But, I might return to this one.  Thanks.

I also have the "opportunity" to talk to the manufacturer's support.  I do feel like they own me a little help, considering the price I payed.

So, is anyone using the 7900XT/XTX on Linux, now, for OpenCL crunching? I would love to connect with someone who actually is doing that in this project.  I didn't see any on the top 50 machines list, which is where I expected to find them.

Now if you a quick and easy way to see if the card is bad load up a copy of Windows, download the right driver from amd dot com and crunch away, Windows is easy peasy but the downside is it's Windows. But at least you will know if you can figure it out then it will work in Fedora too!!!

I am not a Linux guru so knowing where all the 'provides' and 'libs' and all the rest of that stuff is or how to get to it is beyond me.

mountkidd
mountkidd
Joined: 14 Jun 12
Posts: 176
Credit: 12603892555
RAC: 8030986

Paul wrote: ...   So,

Paul wrote:

...

  So, trying Ubuntu means I can use the amdgpu-install. But, it installs a bunch of packages, silently, I think, so then I need to figure out how to get the apt log, which I assume I can do, but I forget how.  Then I need to look up the 'provides' list, and filter that for libs. 

...

You also look through the amdgpu repo to see all the pieces that are used in each of the drivers for Ubuntu/SLE/RHEL. 

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4964
Credit: 18754755972
RAC: 7141566

Just open up the apt

Just open up the apt history.log file in /var/log/apt directory to see what any package installation throws into the system.

 

Paul
Paul
Joined: 3 May 07
Posts: 123
Credit: 1785714253
RAC: 260186

All good stuff, thank you. 

All good stuff, thank you.  Keep it coming with these ideas as they occur, please.

@Mikey: yes, true...except, at the present time, I'm hearing even worse things about the AMD driver on Windows. Did you all not hear about that. I don't really listen to that stuff carefully--'cause, Linux--but I try to keep my ear to the ground.  But, yes yes, that would be very strong evidence.  Since I think Ubunutu, if it works, would give additional information for me, I think I'll go that route if I decided to install an alternate OS.  On the other hand, Windows would give me a chance to modify the on-card fan curve, which I might need to do...hmm, second thoughts.

@MountKidd: I'm looking at the ROCm repos, yes.  The problem is there are more than 100 pkgs, and there is no list I can go to that will tell me what is needed for what purposes, or better yet, what I really want is an explanation of all the pieces so I can understand what the **** is going on!  So frustrating to be in the dark all the time.  It takes so much time and research to learn these critical things about the software stack.  It's usually not to hard to figure out just from pkg names, descriptions, and dependencies.  But, in situations like this, we see the limitation of trusting those, alone.  As best I can tell, I have *all the pieces* required.  They just aren't working well for the 7900.  Boot is fine, looks great, seems fast, OpenGL seems fine, but I didn't test my game other than to start it up.  Under OpenCL load, it wouldn't stay up for more than an hour.  But, it also didn't crash immediately when I started crunching.  <shrugs>  Symptoms point to bug in code or wrong version of code, not completely missing pieces.  It only becomes a missing pieces issue when I'm trying to find *better* pieces because of dependencies, only some of which are enforced by the pkgs; not ALL dependencies are in the pkgs because I'm getting some from my distribution and some from third party.  See the complication?

I know *I'm* the cause of some of these complications.  So, apologies to everyone.  Switching to Ubuntu is just not an option, like, tomorrow.  But...

@Keith Meyers: Okay, good tip. Thanks.  Since pkg names are NOT the same between RPM and .deb, this is not as helpful as it might seem, but yes, they should be similar.  When I ran Ubuntu, on this system, for about 2.5 years, I remember thinking they were different enough that I had to learn new naming scheme, and I couldn't just apply what I learned managing pkgs on RHEL.  But, thank you, I didn't remember about /var/log/apt/history.log.

Some potentially good news, though. check out this line from the updates log report I run on installed pkgs every Friday:

Description: Update to ROCm 5.4.1
           :
           : Notice: GFX 11 hardware might not work yet. <---!!!!!!
   Severity: None

So!  The version of ROCm I used that worked best, but not well, says right on it that gfx11xx is not quite here yet.  Now, that's the *old* pkg description, for the pkg I tried. But the update report is telling me there is an update available for that pkg.  The change log for that new pkg doesn't say "fixed gfx11", but it's worth a try, and it wasn't available when I started this thread, so, hopeful!!! (I think gfx11 is the 7900, can't get official AMD doc to say this, but I see RDNA3 & gfx11 being used interchangably in some Phoronix articles, so I think this is right.  I also think I remember seeing 'gfx1100' in clinfo/rocminfo output when I had it installed.

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4964
Credit: 18754755972
RAC: 7141566

You can still control fan

You can still control fan curves and clocks on AMD cards with Ricks-Lab gpu-utils utilities.

Ricks-Lab gpu-utils

Best thing for AMD cards, almost or equivalent to any Windows AMD utility.

Works for displaying graphs and stats for Nvidia cards too.  Just no control over them.

 

mountkidd
mountkidd
Joined: 14 Jun 12
Posts: 176
Credit: 12603892555
RAC: 8030986

Paul wrote: … The change

Paul wrote:

The change log for that new pkg doesn't say "fixed gfx11", but it's worth a try, and it wasn't available when I started this thread, so, hopeful!!! (I think gfx11 is the 7900, can't get official AMD doc to say this, but I see RDNA3 & gfx11 being used interchangably in some Phoronix articles, so I think this is right.  I also think I remember seeing 'gfx1100' in clinfo/rocminfo output when I had it installed.

Your stderr logs show gfx1100 for 7900…

Paul
Paul
Joined: 3 May 07
Posts: 123
Credit: 1785714253
RAC: 260186

Thanks for bringing me back

Thanks for bringing me back to gpu-utils, Keith. I have that installed, but haven't really play with it.  I find gpu-pac works not as I expected, and I just haven't bothered to figure it out.  But, it seems very capable and I should use it. Yes. This is a great idea.  Thanks for pointing it out.

Now, let's hope it works with the 7900.  radeon-profile works fine for the others, but not the 7900. So, maybe gpu-utils will also have the same problem, but I should try it.  In the recent past, there is just a /proc file you can interact with that does it.  It seems silly to have to even rely on these utilities.  But, the fact that the one I was using didn't work for the 7900 makes me worried they changed something for the newest cards.

Also, I may have misunderstood what I saw in the update report I mentioned this morning.  I likely already tried that version of the package I thought was just released.  I think I actually downgraded a few pkgs to get to what I have now.  Anyway, when I get some time to play with it, I will try these things you've been suggesting.  More soon.

 

Paul
Paul
Joined: 3 May 07
Posts: 123
Credit: 1785714253
RAC: 260186

Hmm, well, the Fedora

Hmm, well, the Fedora community has "solved" it.

https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.4.3/page/Prerequisites.html

So, ROCm just doesn't support gfx11...at least not yet.  Apparently, it took a long time to support gfx10.  I don't remember; I didn't get my 6800 until later, so I didn't have any suspicion.  I always expect some delay, but this seems like it is more than just Linux thing.  Isn't ROCm how it works on Windows too?  So, you can't crunch on Windows with gfx11, yet, either?

Maybe I should just return the card?

...research...

Okay, so, I'm still less confident than I was last week, but check out this quote form Phoronix:

source: https://www.phoronix.com/news/Radeon-ROCm-5.4.1 (15 Dec 2022 <-- !!)

Quote:

Here's to hoping that ROCm on the Radeon RX 7900 series will be in good shape in full soon and not having to wait for any Radeon Pro hardware based on RDNA3. But at least we know already HIP seems to be in good shape with the Blender 3.4 support in order, OpenCL is working, and I'll be working to evaluate other ROCm components in the coming days to see how it goes with the Radeon RX 7900 series.

Also, AMD just announced Pro hardware offerings for Q3.  Plus people are saying ROCm 5.5 (latest is 5.4.3) has hints that it will support gfx11 and will be out "long before" Q3.

So, in sum, I think this is just totally normal for Linux, and it is NOT at all clear that OpenCL shouldn't work right now.  It seems that the consumer cards are not really intended to be officially supported, ever, and the fact that the 6800XT received "official" support at long after release later isn't really the goal.  There are many, many pieces at play, and all the required pieces may be there for our application because it is highly application specific.  Eternal optimism, I suppose.  Keep the faith.

More soon.

mikey
mikey
Joined: 22 Jan 05
Posts: 12694
Credit: 1839100474
RAC: 3703

Paul wrote: Hmm, well, the

Paul wrote:

Hmm, well, the Fedora community has "solved" it.

https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.4.3/page/Prerequisites.html

 Isn't ROCm how it works on Windows too?  So, you can't crunch on Windows with gfx11, yet, either?

NO it isn't unless it's built into the drivers from AMD, for Windows, and some Linux distros, you just go to this page and download the file and in Windows just installs it, reboot the pc and it should show up in Boinc. What those install files install is hidden, because of course it's Windows, so I don't know if ROCm is even a part of it at all.

https://www.amd.com/en/support/graphics/amd-radeon-rx-7000-series/amd-radeon-rx-7900-series/amd-radeon-rx-7900xtx

Paul
Paul
Joined: 3 May 07
Posts: 123
Credit: 1785714253
RAC: 260186

Success! And there really

Success!

And there really wasn't anything wrong to begin with.  So sorry, everyone.  Thank you for all your excellent advice, as usual. I learned a lot:

1) I was right, there isn't anything missing in Fedora (for ROCm OpenCL).  OpenCL works "out of the box".  None of the AMD pkgs from the "amdgpu-install" installer are required.  I knew this from my long history with AMD and my experience with recent cards 5700XT & 6800XT, but second guessed myself when I had a bunch of crashes.  After trying again, last night, it worked exactly as expected.  I must have messed it up the first time.

2) For the record, it also appears, from my earlier quote, that *this has worked from (nearly?) the original release date*.  That's great news.  I have the impression that isn't not been this way in the past, but I cannot be sure.  I usually don't buy the new hardware this soon after release.

3) *Other* ROCm-enabled features are different and are still not yet available.  This can be confusing when research.  When you read "ROCm still not working for X card", it means all of the ROCm stack, with, like, BLAS, PRIM(atives), FFT, and other library integrations.

4) When you read "X architecture is now officially supported on ROCm", it means the professional cards, not the consumer cards.  Now, the consumer cards on the same architecture likely also work, but the consumer cards basically *are never officially supported*.  It's just not a promise that AMD is making.  Consumer cards usually work, too, but it's just a side effect of getting support working for their workstation/datacenter products.

5) 3+4 mean that you might not need to wait for "officially supported".  It just depends on your application when the stack you want to use will be completed and pushed out, and that will be *before* the professional cards are available for purchase.

 

THE REAL PROBLEM IS:

6) gfx11 fan control is broken.  Now, this is extra weird, I think, because it doesn't have anything to do with ROCm or software features.  I'm not sure, but I'm assuming this is controlled by the amdgpu kernel module.  Does anyone know how this works?

Thanks, Keith, for the pointer to rick's GPU utils, but even that doesn't work for fans.  *Everyone* is complaining that fans are broken; all the fan controller tools I've seen are broken.  But, the strange part is the files in /sys are there--they just don't work.

This is wild.  I get that implementing, say, OpenCL and BLAS on these cards has to be a separate effort for the OSS side than it is for the proprietary, pro side.  But, why isn't the fan control bits contributed by AMD to amdgpu on day one?  Like, it works on windows.  Fan control is not a secret.  So, why isn't this just something AMD contributes?  I'm so confused.  This is the last thing I thought would be a problem 6 months after launch.

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.