next problem fft array index error

jay
jay
Joined: 25 Jan 07
Posts: 99
Credit: 84044023
RAC: 0
Topic 227108

All was working well earlier this month.

and now  Error in computing index of fft input array....

----

Running Linux:
   Release 20.04.4 LTS (Focal Fossa) 64-bit
   Kernel Linux 5.13.0-30-generic x86_64
   MATE 1.24.0

boinc
 Package boinc:           i   7.16.6+dfsg-1

boinc-client
   Package boinc-client:  i A 7.16.6+dfsg-1

 

boinc environment....
Mon 28 Feb 2022 04:11:03 PM EST |  | Starting BOINC client version 7.16.6 for x86_64-pc-linux-gnu
Mon 28 Feb 2022 04:11:03 PM EST |  | log flags: file_xfer, sched_ops, task
Mon 28 Feb 2022 04:11:03 PM EST |  | Libraries: libcurl/7.68.0 OpenSSL/1.1.1f zlib/1.2.11 brotli/1.0.7 libidn2/2.2.0 libpsl/0.21.0 (+libidn2/2.2.0) libssh/0.9.3/openssl/zlib nghttp2/1.40.0 librtmp/2.3
Mon 28 Feb 2022 04:11:03 PM EST |  | Data directory: /var/lib/boinc-client
Mon 28 Feb 2022 04:11:03 PM EST |  | OpenCL: AMD/ATI GPU 0: AMD VERDE (DRM 2.50.0, 5.13.0-30-generic, LLVM 12.0.0) (driver version 21.2.6, device version OpenCL 1.1 Mesa 21.2.6, 2048MB, 2048MB available, 512 GFLOPS peak)
Mon 28 Feb 2022 04:11:03 PM EST |  | libc: Ubuntu GLIBC 2.31-0ubuntu9.2 version 2.31
Mon 28 Feb 2022 04:11:03 PM EST |  | Host name: pc-14
Mon 28 Feb 2022 04:11:03 PM EST |  | Processor: 8 AuthenticAMD AMD FX(tm)-8150 Eight-Core Processor [Family 21 Model 1 Stepping 2]
Mon 28 Feb 2022 04:11:03 PM EST |  | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt fma4 nodeid_msr topoext perfctr_core perfctr_nb cpb hw_pstate ssbd ibpb vmmcall arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
Mon 28 Feb 2022 04:11:03 PM EST |  | OS: Linux Ubuntu: Ubuntu 20.04.4 LTS [5.13.0-30-generic|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9.2)]
Mon 28 Feb 2022 04:11:03 PM EST |  | Memory: 11.53 GB physical, 9.31 GB virtual
Mon 28 Feb 2022 04:11:03 PM EST |  | Disk: 91.17 GB total, 82.46 GB free
Mon 28 Feb 2022 04:11:03 PM EST |  | Local time is UTC -5 hours
Mon 28 Feb 2022 04:11:03 PM EST |  | Config: GUI RPCs allowed from:


https://einsteinathome.org/task/1238348142

Name:

LATeah3012L02_852.0_0_0.0_22696641_1
Workunit ID: 610756068
Created: 28 Feb 2022 19:04:01 UTC
Sent: 28 Feb 2022 21:12:09 UTC
Report deadline: 14 Mar 2022 21:12:09 UTC
Received: 28 Feb 2022 21:14:12 UTC
Server state: Over
Outcome: Computation error
Client state: Compute error
Exit status: 65 (0x00000041) Unknown error code
Computer: 12201025
Run time (sec): 25.60
CPU time (sec): 12.47
Peak working set size (MB): 132.19
Peak swap size (MB): 1392.74
Peak disk usage (MB): 0.02
Validation state: Invalid
Granted credit: 0
Application: Gamma-ray pulsar binary search #1 on GPUs v1.18 (FGRPopencl1K-ati) x86_64-pc-linux-gnu

 and

Stderr output

<core_client_version>7.16.6</core_client_version>
<![CDATA[
<message>
process exited with code 65 (0x41, -191)</message>
<stderr_txt>
16:12:19 (6512): [normal]: This Einstein@home App was built at: Jan 16 2017 08:09:16

16:12:19 (6512): [normal]: Start of BOINC application '../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati'.
16:12:19 (6512): [debug]: 1e+16 fp, 1e+09 fp/s, 10500000 s, 2916h40m00s00
command line: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati --inputfile ../../projects/einstein.phys.uwm.edu/LATeah3012L02.dat --alpha 2.59819959601 --delta -0.694603692878 --skyRadius 1.890770e-06 --ldiBins 15 --f0start 844.0 --f0Band 8.0 --firstSkyPoint 0 --numSkyPoints 1 --f1dot -1e-13 --f1dotBand 1e-13 --df1dot 1.69860773e-15 --ephemdir ../../projects/einstein.phys.uwm.edu/JPLEPH --Tcoh 2097152.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.1 --reftime 56100 --model 0 --f0orbit 0.005 --mismatch 0.1 --demodbinary 1 --BinaryPointFile ../../projects/einstein.phys.uwm.edu/templates_LATeah3012L02_0852_22696641.dat --debug 0 --device 0 -o LATeah3012L02_852.0_0_0.0_22696641_1_0.out
output files: 'LATeah3012L02_852.0_0_0.0_22696641_1_0.out' '../../projects/einstein.phys.uwm.edu/LATeah3012L02_852.0_0_0.0_22696641_1_0' 'LATeah3012L02_852.0_0_0.0_22696641_1_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah3012L02_852.0_0_0.0_22696641_1_1'
16:12:19 (6512): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86
16:12:19 (6512): [debug]: glibc version/release: 2.31/stable
16:12:19 (6512): [debug]: Set up communication with graphics process.
boinc_get_opencl_ids returned [0x1941d88 , 0x7ffba6c22ae0]
Using OpenCL platform provided by: Mesa
Using OpenCL device "AMD VERDE (DRM 2.50.0, 5.13.0-30-generic, LLVM 12.0.0)" by: AMD
Max allocation limit: 1503238553Global mem size: 2147483648
OpenCL device has FP64 support
read_checkpoint(): Couldn't open file 'LATeah3012L02_852.0_0_0.0_22696641_1_0.out.cpt': No such file or directory (2)
% fft length: 16777216 (0x1000000)
% Scratch buffer size: 136314880


Error in computing index of fft input array, i:-12829897 pair:33278
ERROR: prepare_ts_2_phase_diff_sorted() returned with error 15
16:12:32 (6512): [CRITICAL]: ERROR: MAIN() returned with error '1'

FPU status flags:
mv: cannot stat 'LATeah3012L02_852.0_0_0.0_22696641_1_0.out': No such file or directory
mv: cannot stat 'LATeah3012L02_852.0_0_0.0_22696641_1_0.out': No such file or directory
mv: cannot stat 'LATeah3012L02_852.0_0_0.0_22696641_1_0.out': No such file or directory

So:

 May mot be E@H related

Question 1:Has anyone else seem this with Radeon/ATI gpu??

Question2: How/who to report err to in package developers????

 

Thanks in advance, Jay

 

 

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 118344572357
RAC: 25515864

jay wrote:... Error in

jay wrote:

... Error in computing index of fft input array....

Question 1:Has anyone else seem this with Radeon/ATI gpu??

Yes, I've seen that message quite a few times over the years.

In my experience, it's a symptom of old/failing hardware.  I find that if I fix/replace the hardware, the problem goes away.  This is easy for me since I have lots of machines and lots of spare bits to swap in order to identify exactly what is causing the issue.

jay wrote:
Question2: How/who to report err to in package developers????

It seems unlikely to be a software issue.  The particular app you're running has been used for many years and hasn't changed lately so with no other similar reports, it's not likely to be the Einstein app.

If you've been running successfully recently and this has suddenly showed up, it's extremely likely to be your hardware, rather than anything else, unfortunately.  You say that things were running well and you don't mention any software update immediately prior to the problem, so hardware is, most likely, what you need to investigate.

Cheers,
Gary.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 4045
Credit: 48050639431
RAC: 34878192

What happens if you use the

What happens if you use the AMDGPU-pro drivers instead of Mesa? 

_________________________________________________________________________

jay
jay
Joined: 25 Jan 07
Posts: 99
Credit: 84044023
RAC: 0

@ Gary, Thanks for your

@ Gary,

Thanks for your input.

I went into my log archives and found:

Start-Date: 2022-02-10  15:07:29
Requested-By: jay (1000)
Install: libclc-12:amd64 (1:12.0.0-3ubuntu1~20.04.4, automatic), libclang-common-12-dev:amd64 (1:12.0.0-3ubuntu1~20.04.4, automatic), libclc-12-
dev:amd64 (1:12.0.0-3ubuntu1~20.04.4, automatic)


Upgrade: mesa-opencl-icd:amd64 (21.0.3-0ubuntu0.3~20.04.5, 21.2.6-0ubuntu0.1~20.04.1),

 

libdrm-nouveau2:amd64 (2.4.105-3~20.04.2, 2.4.107-8ubuntu
1~20.04.1), libegl-mesa0:amd64 (21.0.3-0ubuntu0.3~20.04.5, 21.2.6-0ubuntu0.1~20.04.1), libglapi-mesa:amd64 (21.0.3-0ubuntu0.3~20.04.5, 21.2.6-0u
buntu0.1~20.04.1), ubuntu-advantage-tools:amd64 (27.5~20.04.1, 27.6~20.04.1), libxatracker2:amd64 (21.0.3-0ubuntu0.3~20.04.5, 21.2.6-0ubuntu0.1~
20.04.1), libgbm1:amd64 (21.0.3-0ubuntu0.3~20.04.5, 21.2.6-0ubuntu0.1~20.04.1), libdrm-amdgpu1:amd64 (2.4.105-3~20.04.2, 2.4.107-8ubuntu1~20.04.
1), libspeex1:amd64 (1.2~rc1.2-1.1ubuntu1, 1.2~rc1.2-1.1ubuntu1.20.04.1), libdrm2:amd64 (2.4.105-3~20.04.2, 2.4.107-8ubuntu1~20.04.1), libgl1-me
sa-dri:amd64 (21.0.3-0ubuntu0.3~20.04.5, 21.2.6-0ubuntu0.1~20.04.1), libspeexdsp1:amd64 (1.2~rc1.2-1.1ubuntu1, 1.2~rc1.2-1.1ubuntu1.20.04.1), li
bdrm-intel1:amd64 (2.4.105-3~20.04.2, 2.4.107-8ubuntu1~20.04.1), libdrm-radeon1:amd64 (2.4.105-3~20.04.2, 2.4.107-8ubuntu1~20.04.1), mesa-vdpau-
drivers:amd64 (21.0.3-0ubuntu0.3~20.04.5, 21.2.6-0ubuntu0.1~20.04.1), mesa-vulkan-drivers:amd64 (21.0.3-0ubuntu0.3~20.04.5, 21.2.6-0ubuntu0.1~20
.04.1), mesa-va-drivers:amd64 (21.0.3-0ubuntu0.3~20.04.5, 21.2.6-0ubuntu0.1~20.04.1), libglx-mesa0:amd64 (21.0.3-0ubuntu0.3~20.04.5, 21.2.6-0ubu
ntu0.1~20.04.1), libdrm-common:amd64 (2.4.105-3~20.04.2, 2.4.107-8ubuntu1~20.04.1)
Remove: libclc-amdgcn:amd64 (0.2.0+git20190827-7~ubuntu0.20.04.1), libclang-common-11-dev:amd64 (1:11.0.0-2~ubuntu20.04.1), libclc-dev:amd64 (0.
2.0+git20190827-7~ubuntu0.20.04.1), libclc-r600:amd64 (0.2.0+git20190827-7~ubuntu0.20.04.1), libllvm11:amd64 (1:11.0.0-2~ubuntu20.04.1)
End-Date: 2022-02-10  15:08:03

---

the last successful task was:  LATeah3012L00_876.0_0_0.0_29395278_0

Sent: 5 Feb 2022 23:54:33 UTC;   Reported: 6 Feb 2022 4:00:38 UTC

and another task still pending- reported: 6 Feb 2022 20:52:57 UTC

It took me about a week before I screwed up my courage to work out the mangled address.

So the time fits for a bad upgrade.

I will see if I can get rid of the upgrade and go back a version; and re-test.

currently the available versions are:

Package mesa-opencl-icd:                              
p   20.0.4-2ubuntu1               
i (installed)   21.2.6-0ubuntu0.1~20.04.1

_______________________________________

@ Ian  and steve:

I do not have an amdCPU gpu. Those drivers are not precompiled and included in my distribution. I am not running windows.

:-)

jay

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 118344572357
RAC: 25515864

jay wrote:... I went into my

jay wrote:
... I went into my log archives and found: ...

OK, your last successful task was almost a month ago.  I misunderstood your previous message when you said, "and now  Error in computing index of fft input array...." which I thought meant that this had just suddenly happened when all had been working correctly.

With the situation clarified, I agree that you probably don't have a working set of OpenCL libs that the gamma-ray pulsar app can use.  Ian&Steve C. has given you the best advice - ditch the Mesa OpenCL stuff and install the OpenCL components from a compatible version of AMDGPU-PRO.  You can get this straight from AMD and it will come with an installation script for your supported OS.  The main thing seems to be to make sure you properly understand the various installation options for just installing OpenCL.

You are incorrect to assert that you don't have a compatible GPU.  Just go check on the Einstein website to see how BOINC describes it.  It is a Southern Islands series (Cape Verde) which is GCN 1st generation and it will work with the amdgpu video driver when you install the OpenCL components from AMDGPU-PRO.  You just need to be careful to pick a version of AMDGPU-PRO that is compatible with the version of your OS and use the appropriate switches when invoking the install script.

There are full installation instructions somewhere on the AMD website and there would be dozens of previous messages from Ubuntu users here on the Einstein website (just do a search) who have described what they had to do to get OpenCL successfully installed.  I have never used Ubuntu.  I tried it once (probably more than 10 years ago) and was so horrified at how painful the installation and setup experience was compared to what I was used to that I've never been back to see if anything has changed.

I have many GCN 1st gen cards all using the amdgpu video driver with the OpenCL libs from AMDGPU-PRO (Red Hat version) installed on top.  They all work just fine.  With Southern Islands cards (which yours is) there are two extra boot parameters I use to make sure the OS uses the amdgpu kernel driver rather than the radeon (ati) driver.  Those options are 'radeon.si_support=0' and 'amdgpu.si_support=1'.  I have no idea if this is necessary or not with Ubuntu.

I notice that your two completed tasks from early Feb took about 5.5K secs.  This host of mine has a Cape Verde GPU (a HD 7750) and it's doing tasks in about 3.2K secs.  It's a HP tower machine which was donated to me so I fired it up, deleted windows and installed my standard PCLinuxOS remaster.  Unfortunately, I didn't really want to run the old GPU it came with but there is only enough space in the case for a single width card and the better ones I have are all double width.  So I just ran with the original card and the thing has been working fine right from the get go.  Currently, it has a RAC of around 93K.

Cheers,
Gary.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 4045
Credit: 48050639431
RAC: 34878192

jay wrote:I do not have an

jay wrote:

I do not have an amdCPU gpu.


this doesn’t matter if you have an AMD APU or not  the AMDGPU-pro drivers are for any AMD device. You had an AMD GPU and they will work. 99% of the AMD GPU users are using them. 

 

jay wrote:

Those drivers are not precompiled and included in my distribution.

You mean the precompiled drivers in your distribution that don’t work and require extensive hackery to work around and then breaks again after every update? 

 

jay wrote:

I am not running windows.

this also doesn’t matter. AMDGPU-pro drivers are available for both windows as well as Linux. 
 

you should really just use the AMD drivers and stop trying to use broken drivers. 

_________________________________________________________________________

jay
jay
Joined: 25 Jan 07
Posts: 99
Credit: 84044023
RAC: 0

The Goal - to share info on

The Goal - to share info on how to 'fix'.

It was an error in the builds of Mesa. for the Southern Island Radeon/ATI GPU.

There is a Debian/Ubuntu PPA that will fix - I hope. I am testing it now, (7 minutes and no errors. 4 hours to go.)

Google PPA Kisak to follow the breadcrumbs.. :-)

Jay

jay
jay
Joined: 25 Jan 07
Posts: 99
Credit: 84044023
RAC: 0

    OK, Good driver

 

 

OK, Good driver working.

see https://einsteinathome.org/workunit/611986362

@ Iain $ Steve:

NO!!  I mean the AMDGPU-PRO have never worked for me.

I do not appreciate you comments!

Jay

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.