Tasks erroring out on Nvidia P620 2GB

SJC_Steve
SJC_Steve
Joined: 20 Jul 11
Posts: 28
Credit: 550564797
RAC: 854036
Topic 224775

I'm using an Nvidia P620 Quadro card and it's showing mostly errored results, (195 of 312) with 76 Valid and 41 Pending.

Here's the stderr txt results. Is this a PC issue or inappropriate tasks from the server or???

Thanks,

Steve

 

TASK 1069913851

Name:h1_0532.10_O2C02Cl4In0__O2MDFS3_Spotlight_532.60Hz_351_2

Workunit ID:521692585

Created:9 Feb 2021 2:39:26 UTC

Sent:9 Feb 2021 2:45:07 UTC

Report deadline:16 Feb 2021 2:45:07 UTC

Received:9 Feb 2021 4:00:55 UTC

Server state:Over

Outcome:Computation error

Client state:Compute error

Exit status:38 (0x00000026) Unknown error code

Computer:12603057

Run time (sec):134.00

CPU time (sec):130.27

Peak working set size (MB):386.93

Peak swap size (MB):11065.57

Peak disk usage (MB):0.03

Validation state:Invalid

Granted credit:0

Application:Gravitational Wave search O2 Multi-Directional GPU v2.09 (GW-opencl-nvidia)
x86_64-pc-linux-gnu


Stderr output

<core_client_version>7.16.6</core_client_version>
<![CDATA[
<message>
process exited with code 38 (0x26, -218)</message>
<stderr_txt>
putenv 'LAL_DEBUG_LEVEL=3'
2021-02-08 20:01:31.4584 (17703) [normal]: This program is published under the GNU General Public License, version 2
2021-02-08 20:01:31.4584 (17703) [normal]: For details see http://einstein.phys.uwm.edu/license.php
2021-02-08 20:01:31.4584 (17703) [normal]: This Einstein@home App was built at: Jul 29 2020 12:47:10

2021-02-08 20:01:31.4585 (17703) [normal]: Start of BOINC application '../../projects/einstein.phys.uwm.edu/einstein_O2MDF_2.09_x86_64-pc-linux-gnu__GW-opencl-nvidia'.
[DEBUG} GPU type: 1
[DEBUG} got GPU info from BOINC
[DEBUG} got VendorID 4318
2021-02-08 20:01:34.9786 (17703) [debug]: BSGL output files
2021-02-08 20:01:34.9789 (17703) [debug]: Flags: LAL_DEBUG, OPTIMIZE, HS_OPTIMIZATION, GC_SSE2_OPT, X64, SSE, SSE2, GNUC X86 GNUX86
2021-02-08 20:01:34.9789 (17703) [debug]: glibc version/release: 2.31/stable
2021-02-08 20:01:34.978988 - mytime()
2021-02-08 20:01:34.9790 (17703) [debug]: Set up communication with graphics process.
2021-02-08 20:01:34.9807 (17703) [normal]: Parsed user input successfully

DEPRECATION WARNING: program has invoked obsolete function XLALGetVersionString(). Please see XLALVCSInfoString() for information about a replacement.
Code-version: %% LAL: 6.21.0.1 (CLEAN 2d5112416ed80b559c941e5fa76095b3fd4e61a8)
%% LALPulsar: 1.18.2.1 (CLEAN 2d5112416ed80b559c941e5fa76095b3fd4e61a8)
%% LALApps: 6.25.1.1 (CLEAN 2d5112416ed80b559c941e5fa76095b3fd4e61a8)

2021-02-08 20:01:35.1449 (17703) [normal]: Reading input data ...
2021-02-08 20:01:35.1449 (17703) [normal]: Loading SFTs matching '../../projects/einstein.phys.uwm.edu/h1_0532.10_O2C02Cl4In0.6VXR;../../projects/einstein.phys.uwm.edu/l1_0532.10_O2C02Cl4In0.6VXR;../../projects/einstein.phys.uwm.edu/h1_0532.15_O2C02Cl4In0.fy2e;../../projects/einstein.phys.uwm.edu/l1_0532.15_O2C02Cl4In0.fy2e;../../projects/einstein.phys.uwm.edu/h1_0532.20_O2C02Cl4In0.8d_0;../../projects/einstein.phys.uwm.edu/l1_0532.20_O2C02Cl4In0.8d_0;../../projects/einstein.phys.uwm.edu/h1_0532.25_O2C02Cl4In0.axOW;../../projects/einstein.phys.uwm.edu/l1_0532.25_O2C02Cl4In0.axOW;../../projects/einstein.phys.uwm.edu/h1_0532.30_O2C02Cl4In0.h7t9;../../projects/einstein.phys.uwm.edu/l1_0532.30_O2C02Cl4In0.h7t9;../../projects/einstein.phys.uwm.edu/h1_0532.35_O2C02Cl4In0.EBxy;../../projects/einstein.phys.uwm.edu/l1_0532.35_O2C02Cl4In0.EBxy;../../projects/einstein.phys.uwm.edu/h1_0532.40_O2C02Cl4In0.ohNr;../../projects/einstein.phys.uwm.edu/l1_0532.40_O2C02Cl4In0.ohNr;../../projects/einstein.phys.uwm.edu/h1_0532.45_O2C02Cl4In0.DShC;../../projects/einstein.phys.uwm.edu/l1_0532.45_O2C02Cl4In0.DShC;../../projects/einstein.phys.uwm.edu/h1_0532.50_O2C02Cl4In0.LZ8e;../../projects/einstein.phys.uwm.edu/l1_0532.50_O2C02Cl4In0.LZ8e;../../projects/einstein.phys.uwm.edu/h1_0532.55_O2C02Cl4In0.NN7b;../../projects/einstein.phys.uwm.edu/l1_0532.55_O2C02Cl4In0.NN7b;../../projects/einstein.phys.uwm.edu/h1_0532.60_O2C02Cl4In0.5Swe;../../projects/einstein.phys.uwm.edu/l1_0532.60_O2C02Cl4In0.5Swe;../../projects/einstein.phys.uwm.edu/h1_0532.65_O2C02Cl4In0.jRnX;../../projects/einstein.phys.uwm.edu/l1_0532.65_O2C02Cl4In0.jRnX;../../projects/einstein.phys.uwm.edu/h1_0532.70_O2C02Cl4In0.nyjF;../../projects/einstein.phys.uwm.edu/l1_0532.70_O2C02Cl4In0.nyjF;../../projects/einstein.phys.uwm.edu/h1_0532.75_O2C02Cl4In0.zHke;../../projects/einstein.phys.uwm.edu/l1_0532.75_O2C02Cl4In0.zHke;../../projects/einstein.phys.uwm.edu/h1_0532.80_O2C02Cl4In0.h7qX;../../projects/einstein.phys.uwm.edu/l1_0532.80_O2C02Cl4In0.h7qX;../../projects/einstein.phys.uwm.edu/h1_0532.85_O2C02Cl4In0.kOc1;../../projects/einstein.phys.uwm.edu/l1_0532.85_O2C02Cl4In0.kOc1;../../projects/einstein.phys.uwm.edu/h1_0532.90_O2C02Cl4In0.UJs2;../../projects/einstein.phys.uwm.edu/l1_0532.90_O2C02Cl4In0.UJs2;../../projects/einstein.phys.uwm.edu/h1_0532.95_O2C02Cl4In0.9OBh;../../projects/einstein.phys.uwm.edu/l1_0532.95_O2C02Cl4In0.9OBh;../../projects/einstein.phys.uwm.edu/h1_0533.00_O2C02Cl4In0.pp-9;../../projects/einstein.phys.uwm.edu/l1_0533.00_O2C02Cl4In0.pp-9;../../projects/einstein.phys.uwm.edu/h1_0533.05_O2C02Cl4In0.cBnZ;../../projects/einstein.phys.uwm.edu/l1_0533.05_O2C02Cl4In0.cBnZ;../../projects/einstein.phys.uwm.edu/h1_0533.10_O2C02Cl4In0.Vwz4;../../projects/einstein.phys.uwm.edu/l1_0533.10_O2C02Cl4In0.Vwz4;../../projects/einstein.phys.uwm.edu/h1_0533.15_O2C02Cl4In0.OcRk;../../projects/einstein.phys.uwm.edu/l1_0533.15_O2C02Cl4In0.OcRk' into catalog ...2021-02-08 20:01:35.8443 (17703) [normal]: done.
2021-02-08 20:01:35.8443 (17703) [normal]: Validating SFTs ... 2021-02-08 20:01:38.2340 (17703) [normal]: success.
2021-02-08 20:01:38.8603 (17703) [normal]: Search FstatMethod used: 'ResampOpenCL'
2021-02-08 20:01:38.8603 (17703) [normal]: Recalc FstatMethod used: 'DemodSSE'
2021-02-08 20:01:38.8606 (17703) [normal]: OpenCL Device used for Search/Recalc and/or semi coherent step: 'Quadro P620 (Platform: NVIDIA CUDA, global memory: 1998 MiB)'
2021-02-08 20:01:38.8606 (17703) [normal]: OpenCL version is used for the semi-coherent step!
2021-02-08 20:01:50.0802 (17703) [normal]: Number of segments: 22, total number of SFTs in segments: 9902
2021-02-08 20:01:50.1276 (17703) [normal]: Finished reading input data.
% --- GPS reference time = 1177858472.0000 , GPS data mid time = 1177806642.5000
2021-02-08 20:01:50.1277 (17703) [normal]: dFreqStack = 7.061150e-07, df1dot = 4.521800e-12, df2dot = 2.284100e-18, df3dot = 0.000000e+00
% --- Setup, N = 22, T = 604800 s, Tobs = 19646545 s, gammaRefine = 37, gamma2Refine = 45, gamma3Refine = 1

DEPRECATION WARNING: program has invoked obsolete function InitDopplerSkyScan(). Please see XLALInitDopplerSkyScan() for information about a replacement.
2021-02-08 20:01:50.1301 (17703) [normal]: INFO: No checkpoint checkpoint.cpt found - starting from scratch
% --- Cpt:0, total:42, sky:1/1, f1dot:1/42

0.% --- CG:1573264 FG:70824 f1dotmin_fg:-4.062266800079e-08 df1dot_fg:1.222108108108e-13 f2dotmin_fg:-1.116671111111e-18 df2dot_fg:5.075777777778e-20 f3dotmin_fg:0 df3dot_fg:1
........................................................
2021-02-08 20:03:42.794262
-- signal handler called: signal 8
FPU status word: 0, flags:
7 stack frames obtained for this thread:
Use gdb command: 'info line *0xADDRESS' to print corresponding line numbers.
../../projects/einstein.phys.uwm.edu/einstein_O2MDF_2.09_x86_64-pc-linux-gnu__GW-opencl-nvidia[0x4229d7]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7f947af293c0]
../../projects/einstein.phys.uwm.edu/einstein_O2MDF_2.09_x86_64-pc-linux-gnu__GW-opencl-nvidia[0x416ee8]
../../projects/einstein.phys.uwm.edu/einstein_O2MDF_2.09_x86_64-pc-linux-gnu__GW-opencl-nvidia[0x41cf7a]
../../projects/einstein.phys.uwm.edu/einstein_O2MDF_2.09_x86_64-pc-linux-gnu__GW-opencl-nvidia[0x40995c]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f947a9ef0b3]
../../projects/einstein.phys.uwm.edu/einstein_O2MDF_2.09_x86_64-pc-linux-gnu__GW-opencl-nvidia[0x40b741]

End of stacktrace
Stack trace of LAL functions in worker thread:
InitDopplerSkyScan at /home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/LIBC215/TARGET/linux-x86_64/EinsteinAtHome/source/lalsuite/lalpulsar/lib/DopplerScan.c:291
At lowest level status code = 0: NO LAL ERROR REGISTERED
20:03:42 (17703): called boinc_finish

</stderr_txt>
]]>


Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

SJC_Steve wrote:Nvidia P620

SJC_Steve wrote:
Nvidia P620 Quadro

Hi !

That 2GB is one thing that may cause problems with GW GPU tasks. At least some time the situation was that GPUs with 2GB didn't have enough memory for some of the tasks. Those tasks then crashed.

mikey
mikey
Joined: 22 Jan 05
Posts: 12780
Credit: 1867349561
RAC: 1831130

SJC_Steve wrote: I'm using

SJC_Steve wrote:

I'm using an Nvidia P620 Quadro card and it's showing mostly errored results, (195 of 312) with 76 Valid and 41 Pending.

Here's the stderr txt results. Is this a PC issue or inappropriate tasks from the server or???

Thanks,

Steve

Run the Gamma Ray Pulsar Search #1 for gpu's tasks instead.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.