GPU WU fails in two seconds . no detailled error.

jay
jay
Joined: 25 Jan 07
Posts: 99
Credit: 84044023
RAC: 0
Topic 226757

Greetings!!

Ubuntu Mate released some new OCL drivers and The E@H WU now fails in two seconds.

Here are the newer drivers:

\Start-Date: 2022-01-10  14:38:17 Requested-By: jay (1000) Install: linux-modules-extra-5.13.0-25-generic:amd64 (5.13.0-25.26, automatic), linux-image-5.13.0-25-generic:amd64 (5.13.0-25.26, automatic), linux-headers-5.13.0-25-generic:amd64 (5.13.0-25.26, automatic), linux-modules-5.13.0-25-generic:amd64 (5.13.0-25.26, automatic), l inux-headers-5.13.0-25:amd64 (5.13.0-25.26, automatic) Upgrade: mesa-opencl-icd:amd64 (21.2.2-1ubuntu1, 21.2.6-0ubuntu0.1), libglx-mesa0:amd64 (21.2.2-1ubuntu1, 21.2.6-0ubuntu0.1), linux-headers-generic:amd64 (5.13.0.24.35, 5.13.0.25.36), mesa-common-dev:amd64 (21.2.2-1ubuntu1, 21.2.6-0ubuntu0.1), libgbm1:amd64 (21.2.2-1ubuntu1, 21.2.6-0ubuntu0.1), libxatracker2:amd64 (21.2.2-1ubuntu1, 21.2.6-0ubuntu0.1), linux-generic:amd64 (5.13.0.24.35, 5.13.0.25.36), mesa-va-drivers:amd64 (21.2.2-1ubuntu1, 21.2.6-0ubuntu0.1), libgl1-mesa-dri:amd64 (21.2.2-1ubuntu1, 21.2.6-0ubuntu0.1), mesa-vulkan-drivers:amd64 (21.2.2-1ubuntu1, 21.2.6-0ubuntu0.1), linux-image-generic:amd64 (5.13.0.24.35, 5.13.0.25.36), libglapi-mesa:amd64 (21.2.2-1ubuntu1, 21.2.6-0ubuntu0.1), libegl-mesa0:amd64 (21.2.2-1ubuntu1, 21.2.6-0ubuntu0.1), mesa-vdpau-drivers:amd64 (21.2.2-1ubuntu1, 21.2.6-0ubuntu0.1), linux-libc-dev:amd64 (5.13.0-24.24, 5.13.0-25.26) End-Date: 2022-01-10  14:40:00

 

Below is from the server log. I Noticed that one output file (cohu) does not a path that matches the others. Significant?

Task 1215921806

Name: LATeah3011L06_668.0_0_0.0_26931933_0 Workunit ID: 599429744 Created: 11 Jan 2022 0:22:23 UTC Sent: 11 Jan 2022 1:42:53 UTC Report deadline: 25 Jan 2022 1:42:53 UTC Received: 11 Jan 2022 1:44:32 UTC Server state: Over Outcome: Computation error Client state: Compute error Exit status: 11 (0x0000000B) Unknown error code Computer: 12201025 Run time (sec): 2.33 CPU time (sec): 0.07 Peak working set size (MB): 0 Peak swap size (MB): 0 Peak disk usage (MB): 0.02 Validation state: Invalid Granted credit: 0 Application: Gamma-ray pulsar binary search #1 on GPUs v1.18 (FGRPopencl1K-ati) x86_64-pc-linux-gnu Stderr output 7.16.17 process exited with code 11 (0xb, -245) 20:43:04 (4158): [normal]: This Einstein@home App was built at: Jan 16 2017 08:09:16 20:43:04 (4158): [normal]: Start of BOINC application '../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati'. 20:43:04 (4158): [debug]: 1e+16 fp, 1e+09 fp/s, 10500000 s, 2916h40m00s00 command line: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati --inputfile ../../projects/einstein.phys.uwm.edu/LATeah3011L06.dat --alpha 2.59819959601 --delta -0.694603692878 --skyRadius 1.890770e-06 --ldiBins 15 --f0start 660.0 --f0Band 8.0 --firstSkyPoint 0 --numSkyPoints 1 --f1dot -1e-13 --f1dotBand 1e-13 --df1dot 1.69860773e-15 --ephemdir ../../projects/einstein.phys.uwm.edu/JPLEPH --Tcoh 2097152.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.1 --reftime 56100 --model 0 --f0orbit 0.005 --mismatch 0.1 --demodbinary 1 --BinaryPointFile ../../projects/einstein.phys.uwm.edu/templates_LATeah3011L06_0668_26931933.dat --debug 0 --device 0 -o LATeah3011L06_668.0_0_0.0_26931933_0_0.out

output files:

'LATeah3011L06_668.0_0_0.0_26931933_0_0.out' '../../projects/einstein.phys.uwm.edu/LATeah3011L06_668.0_0_0.0_26931933_0_0' 'LATeah3011L06_668.0_0_0.0_26931933_0_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah3011L06_668.0_0_0.0_26931933_0_1' 20:43:04 (4158): [debug]:

Flags: X64 SSE SSE2 GNUC X86 GNUX86 20:43:04 (4158): [debug]: glibc version/release: 2.34/stable 20:43:04 (4158): [debug]:

Set up communication with graphics process. -- signal handler called: signal 1 2 stack frames obtained for this thread:

Frame 2: Binary file: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati (0x48b101) Source file: hs_boinc_extras.c (Function: sighandler / Line: 291) End of stcaktrace 20:43:04 (4158): called boinc_finish Frame 1: Binary file: [ ((nil)) Offset info: nil '[': No such file ]]>

 

any ideas?

My next step is to go back to the 2020 release

THANKS!!, Jay

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 4045
Credit: 48061682038
RAC: 34526921

try AMDGPU-Pro drivers or the

try AMDGPU-Pro drivers or the ROCm drivers.

_________________________________________________________________________

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 4045
Credit: 48061682038
RAC: 34526921

the drivers I suggested are

the drivers I suggested are distribution independent and can be compiled for any install. particularly Ubuntu flavors like you're using.

 

AMD provides an install script with their drivers. they install with a single command. make sure you use the --opencl=legacy argument for your legacy GPU.

 

the ROCm install is a little more involved, but also they provide good documentation. possible that your CPU isnt supported for the features needed for ROCm to work on such an old GPU though. the oldest platform that I got working with ROCm was a Polaris based GPU (gfx8) with a supported CPU (PCIe gen 3 with PCIe atomics support).

 

AMD openCL support in general on Linux is a cluster. very spotty and inconsistent support, particularly with legacy hardware. people seem to have the most success with AMDGPU-Pro driver installs when you stay with the kernels they are designed for and don't venture too far out of the box. MESA has the worst openCL support of any of the options, which is why almost no one uses them.

_________________________________________________________________________

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.