found bug, can someone duplicate it or suggest a fix?

Joseph Stateson
Joseph Stateson
Joined: 7 May 07
Posts: 174
Credit: 3091621754
RAC: 813331
Topic 220517

I assume the problem is the AMD OpenCL driver because I can duplicate the bug only on a system with AMD boards.  Does not occur on NVidia systems.  Looked into the problem when I found I was occasionally running out of disk space for no good reason.  I went to /var/log and looked at syslog.  First thing I noticed were the huge log sizes. Grep'ed for "error" and found  1,000,000's of errors  all the same:
 
    Jan 23 23:07:17 jysdualxeon kernel: [  381.590086] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
 
A reboot stops the errors from being written, other than everything being slow, the system seems to be working ok except for the board that was running einstein.  All subsequent tasks error out.
 
I can duplicate the error as follows:  With a program running an OpenCL app, I run the tool "clinfo".  It may have to be run as many as 3 times before the bug kicks in.  Normally clinfo takes 1-2 seconds to complete.  With the Einstein app using OpenCL, the clinfo programs hangs.  It make take 2 minutes to complete and give a report.  There are no problems in the report, just takes too long generate it.  Looking at syslog I see it increasing by huge amount in only seconds
 

    -rw-r----- 1 syslog adm 15171032 Jan 23 23:08 syslog     -rw-r----- 1 syslog adm 22775421 Jan 23 23:13 syslog     -rw-r----- 1 syslog adm 29911745 Jan 23 23:14 syslog     -rw-r----- 1 syslog adm 30435769 Jan 23 23:14 syslog     -rw-r----- 1 syslog adm 30711297 Jan 23 23:15 syslog
 

With my small 128gb sdd, the system dies in a few days unless I delete the files.  Also the Einstain app reports a computation error at the same time the clinfo program hangs.  All subsequent tasks for that board fail quickly and a reboot is needed
 
 
If you are running Linux then you can try duplicating this bug by executing clinfo in a terminal several times.  This was Ubuntu 18.05 with three RX-570