GPU calculation error

Jim Remington
Jim Remington
Joined: 14 Sep 21
Posts: 5
Credit: 526185101
RAC: 471478
Topic 226594

Hi, All:

I tried running binary pulsar search #1 on an old but seemingly perfectly functional NVIDIA GT 420 graphics card, and so far both jobs that ran failed with the following error log (PRECISION seems to be the real sticking point, although I wonder about the checkpoint read error). I ran OCCT on the card, but it did not detect any errors in a one hour run. The card is claimed to do 64 bit floats.

The GPU is so slow that it is probably not worth using, but I wondered if anyone might know or could guess what the actual problem is.

Thanks in advance for your thoughts!

 

<core_client_version>7.16.20</core_client_version>
<![CDATA[
<message>
The printer is out of paper.
 (0x1c) - exit code 28 (0x1c)</message>
<stderr_txt>
23:32:38 (3624): [normal]: This Einstein@home App was built at: May  8 2019 13:29:27

23:32:38 (3624): [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.22_windows_x86_64__FGRPopencl-nvidia.exe'.
23:32:38 (3624): [debug]: 1e+016 fp, 3.3e+009 fp/s, 3218738 s, 894h05m37s69
23:32:38 (3624): [normal]: % CPU usage: 1.000000, GPU usage: 1.000000
command line: projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.22_windows_x86_64__FGRPopencl-nvidia.exe --inputfile ../../projects/einstein.phys.uwm.edu/LATeah3011L03.dat --alpha 2.59819959601 --delta -0.694603692878 --skyRadius 1.890770e-06 --ldiBins 15 --f0start 508.0 --f0Band 8.0 --firstSkyPoint 0 --numSkyPoints 1 --f1dot -1e-13 --f1dotBand 1e-13 --df1dot 1.69860773e-15 --ephemdir ..\..\projects\einstein.phys.uwm.edu\JPLEPH --Tcoh 2097152.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.1 --reftime 56100 --model 0 --f0orbit 0.005 --mismatch 0.1 --demodbinary 1 --BinaryPointFile ../../projects/einstein.phys.uwm.edu/templates_LATeah3011L03_0516_30139713.dat --debug 0 --device 1 -o LATeah3011L03_516.0_0_0.0_30139713_1_0.out
output files: 'LATeah3011L03_516.0_0_0.0_30139713_1_0.out' '../../projects/einstein.phys.uwm.edu/LATeah3011L03_516.0_0_0.0_30139713_1_0' 'LATeah3011L03_516.0_0_0.0_30139713_1_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah3011L03_516.0_0_0.0_30139713_1_1'
23:32:38 (3624): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86
23:32:38 (3624): [debug]: Set up communication with graphics process.
boinc_get_opencl_ids returned [0000000003d33770 , 0000000003d342b0]
Using OpenCL platform provided by: NVIDIA Corporation
Using OpenCL device "GeForce GT 420" by: NVIDIA Corporation
Max allocation limit: 536870912
Global mem size: 2147483648
OpenCL device has FP64 support
read_checkpoint(): Couldn't open file 'LATeah3011L03_516.0_0_0.0_30139713_1_0.out.cpt': No such file or directory (2)
% fft length: 16777216 (0x1000000)
% Scratch buffer size: 136314880
INFO: Major Windows version: 6
% C 0 2
% C 0 4
% C 0 6
% C 0 8
% C 0 10
% C 0 12
% C 0 14
% C 0 16
% C 0 18
% C 0 20
% C 0 22
% C 0 24
% C 0 26
% C 0 28
% C 0 30
% C 0 32
% C 0 34
% C 0 36
% C 0 38
% C 0 40
% C 0 42
% C 0 44
% C 0 46
% C 0 48
% C 0 50
% C 0 52
% C 0 54
% C 0 56
% C 0 58
% C 0 60
% C 0 62
% C 0 64
% C 0 66
% C 0 68
% C 0 70
% C 0 72
% C 0 74
% C 0 76
% C 0 78
% C 0 80
% C 0 82
% C 0 84
% C 0 86
% C 0 88
% C 0 90
% C 0 92
% C 0 94
% C 0 96
% C 0 98
% C 0 100
% C 0 102
% C 0 104
% C 0 106
% C 0 108
% C 0 110
% C 0 112
% C 0 114
% C 0 116
% C 0 118
% C 0 120
% C 0 122
% C 0 124
% C 0 126
% C 0 128
% C 0 130
% C 0 132
% C 0 134
% C 0 136
% C 0 138
% C 0 140
% C 0 142
% C 0 144
% C 0 146
% C 0 148
% C 0 150
% C 0 152
% C 0 154
% C 0 156
% C 0 158
% C 0 160
% C 0 162
% C 0 164
% C 0 166
% C 0 168
% C 0 170
% C 0 172
% C 0 174
% C 0 176
% C 0 178
% C 0 180
% C 0 182
% C 0 184
% C 0 186
% C 0 188
% C 0 190
% C 0 192
% C 0 194
% C 0 196
% C 0 198
% C 0 200
% C 0 202
% C 0 204
% C 0 206
% C 0 208
% C 0 210
% C 0 212
% C 0 214
% C 0 216
% C 0 218
% C 0 220
% C 0 222
% C 0 224
% C 0 226
% C 0 228
% C 0 230
% C 0 232
% C 0 234
% C 0 236
% C 0 238
% C 0 240
% C 0 242
% C 0 244
% C 0 246
% C 0 248
% C 0 250
% C 0 252
% C 0 254
% C 0 256
% C 0 258
% C 0 260
% C 0 262
% C 0 264
% C 0 266
% C 0 268
% C 0 270
% C 0 272
% C 0 274
% C 0 276
% C 0 278
% C 0 280
% C 0 282
% C 0 284
% C 0 286
% C 0 288
% C 0 290
% C 0 292
% C 0 294
% C 0 296
% C 0 298
% C 0 300
% C 0 302
% C 0 304
% C 0 306
% C 0 308
% C 0 310
% C 0 312
% C 0 314
% C 0 316
% C 0 318
% C 0 320
% C 0 322
% C 0 324
% C 0 326
% C 0 328
% C 0 330
% C 0 332
% C 0 334
% C 0 336
% C 0 338
% C 0 340
% C 0 342
% C 0 344
% C 0 346
% C 0 348
% C 0 350
% C 0 352
% C 0 354
% C 0 356
% C 0 358
% C 0 360
% C 0 362
% C 0 364
% C 0 366
% C 0 368
% C 0 370
% C 0 372
% C 0 374
% C 0 376
% C 0 378
% C 0 380
% C 0 382
% C 0 384
% C 0 386
% C 0 388
% C 0 390
% C 0 392
% C 0 394
% C 0 396
% C 0 398
% C 0 400
% C 0 402
% C 0 404
% C 0 406
% C 0 408
% C 0 410
% C 0 412
% C 0 414
% C 0 416
% C 0 418
% C 0 420
% C 0 422
% C 0 424
% C 0 426
% C 0 428
% C 0 430
% C 0 432
% C 0 434
% C 0 436
% C 0 438
% C 0 440
% C 0 442
% C 0 444
% C 0 446
% C 0 448
% C 0 450
% C 0 452
% C 0 454
% C 0 456
% C 0 458
% C 0 460
% C 0 462
% C 0 464
% C 0 466
% C 0 468
% C 0 470
% C 0 472
% C 0 474
% C 0 476
% C 0 478
% C 0 480
% C 0 482
% C 0 484
% C 0 486
% C 0 488
% C 0 490
% C 0 492
% C 0 494
% C 0 496
% C 0 498
% C 0 500
% C 0 502
% C 0 504
% C 0 506
% C 0 508
% C 0 510
% C 0 512
% C 0 514
% C 0 516
% C 0 518
% C 0 520
% C 0 522
% C 0 524
% C 0 526
% C 0 528
% C 0 530
% C 0 532
% C 0 534
% C 0 536
% C 0 538
% C 0 540
% C 0 542
% C 0 544
% C 0 546
% C 0 548
% C 0 550
% C 0 552
% C 0 554
% C 0 556
% C 0 558
% C 0 560
% C 0 562
% C 0 564
% C 0 566
% C 0 568
% C 0 570
% C 0 572
% C 0 574
% C 0 576
% C 0 578
% C 0 580
% C 0 582
% C 0 584
% C 0 586
% C 0 588
% C 0 590
% C 0 592
% C 0 594
% C 0 596
% C 0 598
% C 0 600
% C 0 602
% C 0 604
% C 0 606
% C 0 608
% C 0 610
% C 0 612
% C 0 614
% C 0 616
% C 0 618
% C 0 620
% C 0 622
% C 0 624
% C 0 626
% C 0 628
% C 0 630
% C 0 632
% C 0 634
% C 0 636
% C 0 638
ERROR: /home/bema/source/fermilat/src/bridge_fft_clfft.c:1073: clFinish failed. status=-36
06:29:52 (3624): [CRITICAL]: ERROR: MAIN() returned with error '-36'
FPU status flags: COND_1 PRECISION
06:29:58 (3624): [normal]: done. calling boinc_finish(28).
06:29:58 (3624): called boinc_finish

</stderr_txt>
]]>

 

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4955
Credit: 18613370669
RAC: 5701626

It's obvious " the printer is

It's obvious " the printer is out of paper "

Ha ha ha LOL.  Just kidding.  Love BOINC's attempt to classify the error.

Seriously, as a guess not enough memory for the job trying to fit into 1 GB of VRAM.

As seen in the stderr.log the real error code is -36 which is openCL specific and translates to CL_INVALID_COMMAND_QUEUE

Again, likely because it ran out of RAM space.

 

Jim Remington
Jim Remington
Joined: 14 Sep 21
Posts: 5
Credit: 526185101
RAC: 471478

Thanks, but I'm not convinced

Thanks, but I'm not convinced that memory is the problem. The GT 420 has 2 Gb of RAM, and the task has run many times just fine on the other, much faster GPU in the computer (GTX 760), also with 2 Gb RAM. 

Incidentally the task ran for about 7.2 hours before failing, so it seemed to be running OK.

Edit: just ran across this thread, which suggests that the GPU is being overtaxed and reducing the clock rates might help: https://einsteinathome.org/content/new-user-most-gpu-tasks-failing-status-36-clinvalidcommandqueue

But as mentioned the minuscule gain is probably not worth the effort to fix it.

 

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4955
Credit: 18613370669
RAC: 5701626

OK, when I looked up the GT

OK, when I looked up the GT 420 on the Tech PowerUP database it lists the card with either 512MB or 1024MB of memory.  NOT 2 GB.

https://www.techpowerup.com/gpu-specs/

 

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

The site sometimes lists OEM

The site sometimes lists OEM versions only when searching for a model. At bottom of the OEM version page there was "Retail boards based on this design". Looks like Asus has been selling a hot rodded model.

https://www.techpowerup.com/gpu-specs/asus-gt-420-2-gb.b1545

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.