Hi, All:
All jobs that attempt to use the GTX-650 GPU on my eight-core Win10 machine fail immediately with "unknown error". CPU-only tasks work fine. I'm not able to interpret the log file, but if someone else can see what the problem is, I would love to know!
Recent example:
Name: h1_0323.80_O3aC01Cl1In0__O3AS1_324.00Hz_4747_1
Workunit ID: 586973982
Created: 14 Nov 2021 3:01:14 UTC
Sent: 14 Nov 2021 3:18:02 UTC
Report deadline: 21 Nov 2021 3:18:02 UTC
Received: 14 Nov 2021 8:17:38 UTC
Server state: Over
Outcome: Computation error
Client state: Compute error
Exit status: -1 (0xFFFFFFFF) Unknown error code
Computer: 12901980
Run time (sec): 37.07
CPU time (sec): 34.20
Peak working set size (MB): 270.64
Peak swap size (MB): 1047.95
Peak disk usage (MB): 0.01
Validation state: Invalid
Granted credit: 0
Application: Gravitational Wave search O3 All-Sky #1 v1.01 (GW-opencl-nvidia) windows_x86_64 |
<core_client_version>7.16.20</core_client_version> <![CDATA[ <message> (unknown error) - exit code 4294967295 (0xffffffff)</message> <stderr_txt> putenv 'LAL_DEBUG_LEVEL=3' 2021-11-14 00:07:41.3030 (916) [normal]: This program is published under the GNU General Public License, version 2 2021-11-14 00:07:41.3059 (916) [normal]: For details see http://einstein.phys.uwm.edu/license.php 2021-11-14 00:07:41.3089 (916) [normal]: This Einstein@home App was built at: Aug 5 2021 15:20:43
2021-11-14 00:07:41.3108 (916) [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_O3AS_1.01_windows_x86_64__GW-opencl-nvidia.exe'.
Activated exception handling...
[DEBUG} GPU type: 1
[DEBUG} got GPU info from BOINC
[DEBUG} got VendorID 4318
2021-11-14 00:07:41.3635 (916) [debug]: Flags: LAL_DEBUG, OPTIMIZE, HS_OPTIMIZATION, GC_SSE2_OPT, X64, SSE, SSE2, GNUC X86 GNUX86
2021-11-14 00:07:41.3723 (916) [debug]: Set up communication with graphics process.
2021-11-14 00:07:41.3743 (916) [normal]: Parsed user input successfully
DEPRECATION WARNING: program has invoked obsolete function XLALGetVersionString(). Please see XLALVCSInfoString() for information about a replacement.
Code-version: %% LAL: 6.21.0.1 (CLEAN 8d0838c264f9ff9adc8c3cdbfa17b5154eaa2994)
%% LALPulsar: 1.18.2.1 (CLEAN 8d0838c264f9ff9adc8c3cdbfa17b5154eaa2994)
%% LALApps: 6.25.1.1 (CLEAN 8d0838c264f9ff9adc8c3cdbfa17b5154eaa2994)
2021-11-14 00:07:41.3782 (916) [normal]: Initialise compartments with freqWidth = 0.05 and candidates per compartment = 3000.
2021-11-14 00:07:42.3283 (916) [normal]: Reading input data ...
2021-11-14 00:07:42.3293 (916) [normal]: Loading SFTs matching '..\..\projects\einstein.phys.uwm.edu\h1_0323.80_O3aC01Cl1In0;..\..\projects\einstein.phys.uwm.edu\l1_0323.80_O3aC01Cl1In0;..\..\projects\einstein.phys.uwm.edu\h1_0324.00_O3aC01Cl1In0;..\..\projects\einstein.phys.uwm.edu\l1_0324.00_O3aC01Cl1In0;..\..\projects\einstein.phys.uwm.edu\h1_0324.20_O3aC01Cl1In0;..\..\projects\einstein.phys.uwm.edu\l1_0324.20_O3aC01Cl1In0;..\..\projects\einstein.phys.uwm.edu\h1_0324.40_O3aC01Cl1In0;..\..\projects\einstein.phys.uwm.edu\l1_0324.40_O3aC01Cl1In0;..\..\projects\einstein.phys.uwm.edu\h1_0324.60_O3aC01Cl1In0;..\..\projects\einstein.phys.uwm.edu\l1_0324.60_O3aC01Cl1In0' into catalog ...2021-11-14 00:07:44.9833 (916) [normal]: done.
2021-11-14 00:07:44.9843 (916) [normal]: Validating SFTs (detectors: H1, L1, ) ... success.
2021-11-14 00:07:53.4428 (916) [normal]: Search FstatMethod used: 'ResampGPU'
2021-11-14 00:07:53.4428 (916) [normal]: Recalc FstatMethod used: 'DemodSSE'
2021-11-14 00:07:53.4438 (916) [normal]: GPU Device used for Search/Recalc and/or semi coherent step: 'NVIDIA GeForce GTX 650 ( Platform: NVIDIA CUDA )'
2021-11-14 00:07:53.4457 (916) [normal]: GPU Backend used for Search/Recalc and/or semi coherent step: 'OpenCL'
2021-11-14 00:07:53.4467 (916) [normal]: GPU version is used for the semi-coherent step!
2021-11-14 00:08:08.9044 (916) [normal]: Number of segments: 37, total number of SFTs in segments: 11745
2021-11-14 00:08:08.9229 (916) [normal]: Finished reading input data.
% --- GPS reference time = 1246070525.0000 , GPS data mid time = 1246070525.0000
2021-11-14 00:08:08.9239 (916) [normal]: dFreqStack = 2.000000e-006, df1dot = 1.500000e-010, df2dot = 0.000000e+000, df3dot = 0.000000e+000
% --- Setup, N = 37, T = 432000 s, Tobs = 15809012 s, gammaRefine = 250, gamma2Refine = 4653, gamma3Refine = 1
DEPRECATION WARNING: program has invoked obsolete function InitDopplerSkyScan(). Please see XLALInitDopplerSkyScan() for information about a replacement.
2021-11-14 00:08:15.0298 (916) [normal]: INFO: No checkpoint checkpoint.cpt found - starting from scratch
% --- Cpt:0, total:2000, sky:1/100, f1dot:1/20
0.% --- CG:9272015 FG:250000 f1dotmin_fg:-2.717183860529e-009 df1dot_fg:5.97609561753e-013 f2dotmin_fg:0 df2dot_fg:0 f3dotmin_fg:0 df3dot_fg:1
XLAL Error - XLALOpenCLExecuteKernel (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/lib/GPUUtils/OpenCLUtils.c:506): Enqueue OpenCL kernel failed with OpenCL error: CL_MEM_OBJECT_ALLOCATION_FAILURE
XLAL Error - XLALOpenCLExecuteKernel (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/lib/GPUUtils/OpenCLUtils.c:506): Generic failure
XLAL Error - XLALSemiCohStep_OpenCL (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalapps/src/pulsar/GCT/HierarchSearchGCT_OpenCL.c:137): Check failed: XLALOpenCLExecuteKernel ( &(GCTOpenCLKernels.kernel_SemiCohStep), &size, 1 ) == XLAL_SUCCESS
XLAL Error - XLALSemiCohStep_OpenCL (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalapps/src/pulsar/GCT/HierarchSearchGCT_OpenCL.c:137): Internal function call failed: Generic failure
XLAL Error - XLALSemiCohStep_GPU (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalapps/src/pulsar/GCT/HierarchSearchGCT.c:4396): Check failed: (*usefulparams->gct_gpu_funcs->SemiCohStep) ( coarsegrid, finegrid, stacks, NSegmentsInv, toplists_sortby, usefulparams->BSGLsetupGPU, usefulparams->computeBSGL, usefulparams->getMaxFperSeg, toplist1_last_entryGPU->data, toplist2_last_entryGPU->data, toplist3_last_entryGPU->data ) == XLAL_SUCCESS
XLAL Error - XLALSemiCohStep_GPU (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalapps/src/pulsar/GCT/HierarchSearchGCT.c:4396): Internal function call failed: Generic failure
XLAL Error - MAIN (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalapps/src/pulsar/GCT/HierarchSearchGCT.c:2214): Check failed: XLALSemiCohStep_GPU( &coarsegrid, &finegrid, nStacks, &usefulParams, NSegmentsInv, uvar->SortToplist, compartment, compartment2, compartment3) == XLAL_SUCCESS
XLAL Error - MAIN (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalapps/src/pulsar/GCT/HierarchSearchGCT.c:2214): Internal function call failed: Generic failure
2021-11-14 00:08:15.6958 (916) [CRITICAL]: ERROR: MAIN() returned with error '-1'
Code-version: %% LAL: 6.21.0.1 (CLEAN 8d0838c264f9ff9adc8c3cdbfa17b5154eaa2994)
%% LALPulsar: 1.18.2.1 (CLEAN 8d0838c264f9ff9adc8c3cdbfa17b5154eaa2994)
%% LALApps: 6.25.1.1 (CLEAN 8d0838c264f9ff9adc8c3cdbfa17b5154eaa2994)
FPU status flags: PRECISION
2021-11-14 00:08:15.7113 (916) [debug]: worker done. return(-1) to caller
2021-11-14 00:08:15.7123 (916) [normal]: done. calling boinc_finish(-1).
00:08:15 (916): called boinc_finish
</stderr_txt>
]]>
Copyright © 2024 Einstein@Home. All rights reserved.
After reading through earlier
)
After reading through earlier forum posts, I now understand that the problem is likely due to too little GPU memory, and so I have disabled GW searches.
probably the key entry in
)
probably the key entry in your stderr is this:
CL_MEM_OBJECT_ALLOCATION_FAILURE
If you do a search on the Einsteinathome.org site, you'll find that many people report seeing this message when the GPU they are providing lacks sufficient RAM for the task(s) sent to it.
Your system is reported as providing:
GeForce GTX 650 (1024MB)
Possibly if you restrict task provision to the GPU Gamma-Ray pulsar tasks you may find the card more consistently able to support them than the Gravity-Wave tasks.
Thank you!
)
Thank you!