I am getting "extreme backoff" because of gpu errors like the following. Any ideas?
"...
Name:LATeah2049Lad_196.0_0_0.0_49973840_1
Workunit ID:498220066
Created:27 Oct 2020 18:42:29 UTC
Sent:27 Oct 2020 18:53:52 UTC
Report deadline:10 Nov 2020 18:53:52 UTC
Received:28 Oct 2020 5:57:09 UTC
Server state:Over
Outcome:Computation error
Client state:Compute error
Exit status:65 (0x00000041) Unknown error code
Computer:12835716
Run time (sec):31.11
CPU time (sec):15.72
Peak working set size (MB):96.01
Peak swap size (MB):604.1
Peak disk usage (MB):0.01
Validation state:Invalid
Granted credit:0
Application:Gamma-ray pulsar binary search #1 on GPUs v1.22 (FGRPopencl1K-ati) |
<core_client_version>7.16.7</core_client_version> <![CDATA[ <message> Network access is denied. (0x41) - exit code 65 (0x41)</message> <stderr_txt> 15:54:05 (8460): [normal]: This Einstein@home App was built at: May 8 2019 13:29:27
15:54:05 (8460): [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.22_windows_x86_64__FGRPopencl1K-ati.exe'.
15:54:05 (8460): [debug]: 1e+016 fp, 4.7e+009 fp/s, 2246913 s, 624h08m33s25
15:54:05 (8460): [normal]: % CPU usage: 0.500000, GPU usage: 0.333000
command line: projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.22_windows_x86_64__FGRPopencl1K-ati.exe --inputfile ../../projects/einstein.phys.uwm.edu/LATeah2049Lad.dat --alpha 1.41058464281 --delta -0.444366280137 --skyRadius 5.090540e-07 --ldiBins 30 --f0start 188.0 --f0Band 8.0 --firstSkyPoint 0 --numSkyPoints 1 --f1dot -1e-13 --f1dotBand 1e-13 --df1dot 2.512676418e-15 --ephemdir ..\..\projects\einstein.phys.uwm.edu\JPLEPH --Tcoh 2097152.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.1 --reftime 56100 --model 0 --f0orbit 0.005 --mismatch 0.1 --demodbinary 1 --BinaryPointFile ../../projects/einstein.phys.uwm.edu/templates_LATeah2049Lad_0196_49973840.dat --debug 1 --device 0 -o LATeah2049Lad_196.0_0_0.0_49973840_1_0.out
output files: 'LATeah2049Lad_196.0_0_0.0_49973840_1_0.out' '../../projects/einstein.phys.uwm.edu/LATeah2049Lad_196.0_0_0.0_49973840_1_0' 'LATeah2049Lad_196.0_0_0.0_49973840_1_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah2049Lad_196.0_0_0.0_49973840_1_1'
15:54:05 (8460): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86
15:54:05 (8460): [debug]: Set up communication with graphics process.
boinc_get_opencl_ids returned [0000000003895fb0 , 00007ffdf95b0000]
Using OpenCL platform provided by: Advanced Micro Devices, Inc.
Using OpenCL device "gfx1010" by: Advanced Micro Devices, Inc.
Max allocation limit: 2764046336
Global mem size: 4278190080
OpenCL device has FP64 support
% Opening inputfile: ../../projects/einstein.phys.uwm.edu/LATeah2049Lad.dat
% Total amount of photon times: 8950
% Preparing toplist of length: 10
% Read 1631 binary points
read_checkpoint(): Couldn't open file 'LATeah2049Lad_196.0_0_0.0_49973840_1_0.out.cpt': No such file or directory (2)
% fft_size: 16777216 (0x1000000); alloc: 67108872
% Sky point 1/1
% Binary point 1/1631
% Creating FFT plan.
% fft length: 16777216 (0x1000000)
% Scratch buffer size: 136314880
% Starting semicoherent search over f0 and f1.
% nf1dots: 41 df1dot: 2.512676418e-015 f1dot_start: -1e-013 f1dot_band: 1e-013
% Filling array of photon pairs
Error in computing index of fft input array, i:1038085824 pair:281367
ERROR: prepare_ts_2_phase_diff_sorted() returned with error 20115224
15:54:22 (8460): [CRITICAL]: ERROR: MAIN() returned with error '1'
FPU status flags: PRECISION
15:54:34 (8460): [normal]: done. calling boinc_finish(65).
15:54:34 (8460): called boinc_finish
</stderr_txt>
]]>..."
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Copyright © 2024 Einstein@Home. All rights reserved.
Drivers missing. Card
)
Drivers missing. Card overheated. Too much overclocking. Card failure.
It "finally" occurred to me
)
It "finally" occurred to me that while I was fighting with my 4th gpu over a "Windows shut it down because it reported an error" problem (gpu is still currently dead) I used the Amd Windows update tool and put in the 2nd latest "gamers" driver.
Since I was also getting computational errors on the Gpu from Prime Grid it had to be "my" problem. Just downloaded the "content creator" version of the latest driver.
Things seem to be working better. I am not (yet) seeing computation errors.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Tom M wrote: It "finally"
)
I don't know if I want to cuss or swear. It appears that the 4th gpu throwing an error was a 1 to 4 expansion card dieing problem. Using individual Pcie to USB 3.0 adapters all 4 gpus are now running again.
Tom M :)
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!