Windows 10 - R5700 throwing computation errors

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6582
Credit: 9655864693
RAC: 2826374
Topic 223826

I am getting "extreme backoff" because of gpu errors like the following.  Any ideas?

 

"...

Name:LATeah2049Lad_196.0_0_0.0_49973840_1

Workunit ID:498220066

Created:27 Oct 2020 18:42:29 UTC

Sent:27 Oct 2020 18:53:52 UTC

Report deadline:10 Nov 2020 18:53:52 UTC

Received:28 Oct 2020 5:57:09 UTC

Server state:Over

Outcome:Computation error

Client state:Compute error

Exit status:65 (0x00000041) Unknown error code

Computer:12835716

Run time (sec):31.11

CPU time (sec):15.72

Peak working set size (MB):96.01

Peak swap size (MB):604.1

Peak disk usage (MB):0.01

Validation state:Invalid

Granted credit:0

Application:Gamma-ray pulsar binary search #1 on GPUs v1.22 (FGRPopencl1K-ati)
windows_x86_64


Stderr output

<core_client_version>7.16.7</core_client_version>
<![CDATA[
<message>
Network access is denied.
 (0x41) - exit code 65 (0x41)</message>
<stderr_txt>
15:54:05 (8460): [normal]: This Einstein@home App was built at: May  8 2019 13:29:27

15:54:05 (8460): [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.22_windows_x86_64__FGRPopencl1K-ati.exe'.
15:54:05 (8460): [debug]: 1e+016 fp, 4.7e+009 fp/s, 2246913 s, 624h08m33s25
15:54:05 (8460): [normal]: % CPU usage: 0.500000, GPU usage: 0.333000
command line: projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.22_windows_x86_64__FGRPopencl1K-ati.exe --inputfile ../../projects/einstein.phys.uwm.edu/LATeah2049Lad.dat --alpha 1.41058464281 --delta -0.444366280137 --skyRadius 5.090540e-07 --ldiBins 30 --f0start 188.0 --f0Band 8.0 --firstSkyPoint 0 --numSkyPoints 1 --f1dot -1e-13 --f1dotBand 1e-13 --df1dot 2.512676418e-15 --ephemdir ..\..\projects\einstein.phys.uwm.edu\JPLEPH --Tcoh 2097152.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.1 --reftime 56100 --model 0 --f0orbit 0.005 --mismatch 0.1 --demodbinary 1 --BinaryPointFile ../../projects/einstein.phys.uwm.edu/templates_LATeah2049Lad_0196_49973840.dat --debug 1 --device 0 -o LATeah2049Lad_196.0_0_0.0_49973840_1_0.out
output files: 'LATeah2049Lad_196.0_0_0.0_49973840_1_0.out' '../../projects/einstein.phys.uwm.edu/LATeah2049Lad_196.0_0_0.0_49973840_1_0' 'LATeah2049Lad_196.0_0_0.0_49973840_1_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah2049Lad_196.0_0_0.0_49973840_1_1'
15:54:05 (8460): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86
15:54:05 (8460): [debug]: Set up communication with graphics process.
boinc_get_opencl_ids returned [0000000003895fb0 , 00007ffdf95b0000]
Using OpenCL platform provided by: Advanced Micro Devices, Inc.
Using OpenCL device "gfx1010" by: Advanced Micro Devices, Inc.
Max allocation limit: 2764046336
Global mem size: 4278190080
OpenCL device has FP64 support
% Opening inputfile: ../../projects/einstein.phys.uwm.edu/LATeah2049Lad.dat
% Total amount of photon times: 8950
% Preparing toplist of length: 10
% Read 1631 binary points
read_checkpoint(): Couldn't open file 'LATeah2049Lad_196.0_0_0.0_49973840_1_0.out.cpt': No such file or directory (2)
% fft_size: 16777216 (0x1000000); alloc: 67108872
% Sky point 1/1
% Binary point 1/1631
% Creating FFT plan.
% fft length: 16777216 (0x1000000)
% Scratch buffer size: 136314880
% Starting semicoherent search over f0 and f1.
% nf1dots: 41 df1dot: 2.512676418e-015 f1dot_start: -1e-013 f1dot_band: 1e-013
% Filling array of photon pairs
Error in computing index of fft input array, i:1038085824 pair:281367
ERROR: prepare_ts_2_phase_diff_sorted() returned with error 20115224
15:54:22 (8460): [CRITICAL]: ERROR: MAIN() returned with error '1'
FPU status flags: PRECISION
15:54:34 (8460): [normal]: done. calling boinc_finish(65).
15:54:34 (8460): called boinc_finish

</stderr_txt>
]]>..."


A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)  I want some more patience. RIGHT NOW!

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 5023
Credit: 18931235441
RAC: 6468875

Drivers missing.  Card

Drivers missing.  Card overheated.  Too much overclocking.  Card failure.

 

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6582
Credit: 9655864693
RAC: 2826374

It "finally" occurred to me

It "finally" occurred to me that while I was fighting with my 4th gpu over a "Windows shut it down because it reported an error" problem (gpu is still currently dead) I used the Amd Windows update tool and put in the 2nd latest "gamers" driver.

Since I was also getting computational errors on the Gpu from Prime Grid it had to be "my" problem.  Just downloaded the "content creator" version of the latest driver.

Things seem to be working better.  I am not (yet) seeing computation errors.

Tom M

 

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)  I want some more patience. RIGHT NOW!

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6582
Credit: 9655864693
RAC: 2826374

Tom M wrote: It "finally"

Tom M wrote:

It "finally" occurred to me that while I was fighting with my 4th gpu over a "Windows shut it down because it reported an error" problem (gpu is still currently dead) I used the Amd Windows update tool and put in the 2nd latest "gamers" driver.

Since I was also getting computational errors on the Gpu from Prime Grid it had to be "my" problem.  Just downloaded the "content creator" version of the latest driver.

Things seem to be working better.  I am not (yet) seeing computation errors.

Tom M

I don't know if I want to cuss or swear.  It appears that the 4th gpu throwing an error was a 1 to 4 expansion card dieing problem.  Using individual Pcie to USB 3.0 adapters all 4 gpus are now running again.

Tom M :)

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)  I want some more patience. RIGHT NOW!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.