Computation Error - Output file Absent

Larry Hubble
Larry Hubble
Joined: 13 Mar 05
Posts: 3
Credit: 324275624
RAC: 143456
Topic 222195

Lately, I frequently see computation errors reported. Here is a recent example.

4/27/2020 8:15:45 AM | Einstein@Home | Output file h1_1457.10_O2C02Cl4In0__O2MDFV2g_VelaJr1_1457.95Hz_339_1_0 for task h1_1457.10_O2C02Cl4In0__O2MDFV2g_VelaJr1_1457.95Hz_339_1 absent

4/27/2020 8:15:45 AM | Einstein@Home | Output file h1_1457.10_O2C02Cl4In0__O2MDFV2g_VelaJr1_1457.95Hz_339_1_1 for task h1_1457.10_O2C02Cl4In0__O2MDFV2g_VelaJr1_1457.95Hz_339_1 absent
4/27/2020 8:15:45 AM | Einstein@Home | Output file h1_1457.10_O2C02Cl4In0__O2MDFV2g_VelaJr1_1457.95Hz_339_1_2 for task h1_1457.10_O2C02Cl4In0__O2MDFV2g_VelaJr1_1457.95Hz_339_1 absent

I there a way to correct this?

 

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

To see the real error

To see the real error reported by the application go to your task list and then click on the Task ID for one of the failed tasks.

Here's an example from one of the failed tasks:

XLALExecuteKernel_OpenCL failed: CL_MEM_OBJECT_ALLOCATION_FAILURE

Are you running more than one tasks at a time on the GPU?
Try reducing the number of tasks you run at a time to see if that fixes the problem.
There's been discussions in other threads that some of the Gravity Wave tasks requires quite a lot of GPU memory.

Larry Hubble
Larry Hubble
Joined: 13 Mar 05
Posts: 3
Credit: 324275624
RAC: 143456

I've reduced GPU tasks to 1

I've reduced GPU tasks to 1 rather than 2 But still getting errors:

 

XLALExecuteKernel_OpenCL failed: CL_MEM_OBJECT_ALLOCATION_FAILURE
XLAL Error - XLALExecuteKernel_OpenCL (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/OpenCLutils.c:565): Internal function call failed
XLAL Error - XLALCLMEMVectorMemsetCOMPLEX8 (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/OpenCLutils.c:181): Check failed: XLALExecuteKernel_OpenCL ( &openclObj.kernel.kernel_MemsetCOMPLEX8 , &in->length, 1, last ) ==

 

 

 

 

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 118391795384
RAC: 25657099

Larry Hubble wrote:I've

Larry Hubble wrote:
I've reduced GPU tasks to 1 rather than 2 But still getting errors:

Whilst you would think that a 3GB GPU should have no trouble, apparently nvidia chose to 'encourage' the use of professional (and more expensive) GPUs for compute use by severely restricting the available memory on consumer grade cards being used for that purpose. There have been quite a few other reports of exactly this same issue with nvidia cards of 3GB or less VRAM.

The amount of RAM needed is variable, so some tasks will succeed.   If you look at your stats for the GW search (O2MDF), you currently have nearly 570 failed tasks and around 110 that are pending. None have yet validated. Chances are that the pendings will validate, but a 1 in 6 ratio means that until the current VelaJr1 tasks are gone and memory requirements are lower, you should consider changing your preferences to opt out of the O2MDF search and just choose the gamma-ray pulsar search (FGRPB1G).  For that search you have 64 valid and 30 pending and there certainly shouldn't be a similar problem, as the memory requirements are much lower.  You should be able to run 2 concurrent tasks if you were doing that previously.  Each concurrent task will need at least a full CPU core for support.

If you do decide to opt out of O2MDF, you should also set the pref for "Allow non-preferred apps" to NO.  This makes sure the scheduler has no excuse to send you the 'wrong' type of task :-).

Cheers,
Gary.

Larry Hubble
Larry Hubble
Joined: 13 Mar 05
Posts: 3
Credit: 324275624
RAC: 143456

Thanks Gary   I will take

Thanks Gary

 

I will take that approach

 

Cheers,

Larry

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.