All things Nvidia GPU

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4964
Credit: 18743222502
RAC: 7012666

Are you sure you didn't

Are you sure you didn't change the defaults at the beginning and forget you did?

I have never seen a new BOINC installation have those defaults, never.

Unless something has been changed in the code and I am not aware of that.

I regularly read the commits and merges at the BOINC github repository and I don't remember reading anything about changing the client to default to 0.01 days of work and 0.01 days of additional work.

I'll have to visit the site again and do a search for this I guess.

 

San-Fernando-Valley
San-Fernando-Valley
Joined: 16 Mar 16
Posts: 409
Credit: 10205253455
RAC: 23303095

... didn't want to "create"

... didn't want to "create" work for you ...

Maybe I'm getting to old for this kind of fiddeling around.

I tend to believe what you are saying, sometimes I have trouble remembering the basics ...

What do I conclude out of this?

Well, NO MORE POSTS from me, I guess !

Cheers

petri33
petri33
Joined: 4 Mar 20
Posts: 123
Credit: 4052905819
RAC: 6966228

Hi, my normal runtime for

Hi,

my normal runtime for a task is 124-179 seconds depending on which GPU it is run on.

Tonight I got a task that took 1200+ seconds to finish.

I have never seen a line like any of these:

% C 0 154

% C 0 309

% C 0 463

% C 0 615

% C 0 768

% C 0 921

 

Are they "candidates" to be verified or processing errors? Why the extra time?

--

petri33

 

p.s.

Here is a list of the total output of the task:

"

Task 1191519630

Name: LATeah4013L01_1124.0_0_0.0_9033365_1

Workunit ID: 587140640

Created: 14 Nov 2021 21:43:52 UTC

Sent: 14 Nov 2021 21:58:15 UTC

Report deadline: 28 Nov 2021 21:58:15 UTC

Received: 15 Nov 2021 10:33:39 UTC

Server state: Over

Outcome: Success

Client state: Done

Exit status: 0 (0x00000000)

Computer: 12836077

Run time (sec): 1,266.08

CPU time (sec): 269.10

Peak working set size (MB): 420.33

Peak swap size (MB): 19050.06

Peak disk usage (MB): 0.02

Validation state: Valid

Granted credit: 3,465

Application: Gamma-ray pulsar binary search #1 on GPUs v1.28 (FGRPopencl2Pup-nvidia)
x86_64-pc-linux-gnu


Stderr output

<core_client_version>7.16.6</core_client_version>
<![CDATA[
<stderr_txt>
12:12:26 (26533): [normal]: This Einstein@home App was built at: Aug 17 2021 16:19:40

12:12:26 (26533): [normal]: Start of BOINC application '../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.28_x86_64-pc-linux-gnu__FGRPopencl2Pup-nvidia'.
12:12:26 (26533): [debug]: 1e+16 fp, 6.1e+09 fp/s, 1710426 s, 475h07m05s57
12:12:26 (26533): [normal]: % CPU usage: 1.000000, GPU usage: 0.130000
command line: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.28_x86_64-pc-linux-gnu__FGRPopencl2Pup-nvidia --inputfile ../../projects/einstein.phys.uwm.edu/LATeah4013L01.dat --alpha 0.943218186562 --delta 1.30995332125 --skyRadius 2.617990e-08 --ldiBins 30 --f0start 1116.0 --f0Band 8.0 --firstSkyPoint 0 --numSkyPoints 1 --f1dot -1e-13 --f1dotBand 1e-13 --df1dot 1.713401242e-15 --ephemdir ../../projects/einstein.phys.uwm.edu/JPLEPH --Tcoh 2097152.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.1 --reftime 56100 --model 0 --f0orbit 0.005 --mismatch 0.1 --demodbinary 1 --BinaryPointFile ../../projects/einstein.phys.uwm.edu/templates_LATeah4013L01_1124_9033365.dat --debug 0 -o LATeah4013L01_1124.0_0_0.0_9033365_1_0.out
output files: 'LATeah4013L01_1124.0_0_0.0_9033365_1_0.out' '../../projects/einstein.phys.uwm.edu/LATeah4013L01_1124.0_0_0.0_9033365_1_0' 'LATeah4013L01_1124.0_0_0.0_9033365_1_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah4013L01_1124.0_0_0.0_9033365_1_1'
12:12:26 (26533): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86
12:12:26 (26533): [debug]: glibc version/release: 2.31/stable
12:12:26 (26533): [debug]: Set up communication with graphics process.
boinc_get_opencl_ids returned [0x2c72ea0 , 0x2c72ca0]
Using OpenCL platform provided by: NVIDIA Corporation
Using OpenCL device "NVIDIA GeForce RTX 2080 Ti" by: NVIDIA Corporation
Max allocation limit: 2888679424
Global mem size: 11554717696
read_checkpoint(): Couldn't open file 'LATeah4013L01_1124.0_0_0.0_9033365_1_0.out.cpt': No such file or directory (2)
% fft length: 16777216 (0x1000000)
% Scratch buffer size: 136314880
% C 0 154
% C 0 309
% C 0 463
% C 0 615
% C 0 768
% C 0 921
FPU status flags:
12:33:30 (26533): [normal]: done. calling boinc_finish(0).
12:33:30 (26533): called boinc_finish(0)

</stderr_txt>
]]>




"
Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4964
Credit: 18743222502
RAC: 7012666

The % C datapoint is a

The % C datapoint is a checkpoint at various stages in the computation.  Checkpoints on the faster devices normally go straight through to the end and print a single checkpoint stage

But on the slower devices you get multiple checkpoints.  I believe each checkpoint is when BOINC steps or switches away from crunching the task.

For example on my 3080 I got one checkpoint.

Using OpenCL device "NVIDIA GeForce RTX 3080" by: NVIDIA Corporation
Max allocation limit: 2626174976
Global mem size: 10504699904
read_checkpoint(): Couldn't open file 'LATeah4013L02_940.0_0_0.0_5767805_1_0.out.cpt': No such file or directory (2)
% fft length: 16777216 (0x1000000)
% Scratch buffer size: 136314880
% C 0 939
FPU status flags: 
17:06:47 (3969000): [normal]: done. calling boinc_finish(0).
17:06:47 (3969000): called boinc_finish(0)

Task 1193178428

But on my Raspberry Pi 4 I got dozens of checkpoints.

% checkpoint read: skypoint 67 binarypoint 9
% C 68 10
% C 69 11
05:31:34 (6565): [normal]: done. calling boinc_finish(0).
05:31:34 (6565): called boinc_finish(0)

Task 1198454623

 

Tomahawk4196
Tomahawk4196
Joined: 31 Jan 14
Posts: 11
Credit: 2250718702
RAC: 2465588

RTX 3080 - memory

RTX 3080 - memory temps

CPUID HW Monitor says the memory chips on my 3080 are hitting 108 degrees C, which is not good for the life of the card, of course.

There is much discussion on various sites about replacing the thermal transfer pads between the memory modules and the heat sink - is this something I should have done a while ago?  Or can I tweak Einstein@home somehow to reduce the load on the card?

Thanks again

[img]

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4964
Credit: 18743222502
RAC: 7012666

No, there is not much you can

No, there is not much you can tweak on the card other than to declock it so it doesn't work so hard on Einstein tasks.

If you were worried about the memory temps you should have chosen a different card or done as you stated and removed the heat sink and replaced the thermal pads with better quality than the OEM.

Or gone with a water cooled card via a AIO Hybrid or Custom cooling model.

I'm surprised on the temps on the memory as AFAIK the 3080 does not have any memory on the backside of the PCB like the 3090 which also has had these high memory temps on the backside modules. Hadn't seen many reports of high temps on the front side modules.

It might warrant taking the air cooler off the card and check for the fit of the cooler to the die and RAM modules.  You should have obvious indents in the pads. There are better quality thermal pads available that have better heat transfer characteristics. I am a fan of FujiPoly pads myself.

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3956
Credit: 46964962642
RAC: 64684989

Which model of 3080 do you

Which model of 3080 do you have? Nvidia Founders Edition? Or some AIB (3rd party) model? 
 

the FE nvidia cards are known for memory temp issues. Even on the 3080. 
 

i have two EVGA 3070Ti cards with GDDR6X memory, and I don’t seem to be having any issues. But I can’t really check memory temps under Linux. My watercooled 3080Ti showed about 60C memory temps when booted into windows under memory intense loads. 

_________________________________________________________________________

Tomahawk4196
Tomahawk4196
Joined: 31 Jan 14
Posts: 11
Credit: 2250718702
RAC: 2465588

The machine in question is a

The machine in question is a Dell Alienware R11 with liquid cooling, and I use it only for crunching, no mining or gaming at all.  The CPU is also running too hot (crunching WCG), so i will change out the thermal paste on that to see if things improve.

This video discusses changing the thermal pads on the GPU, which the author claims is essentially a Founder's Edition:  https://www.youtube.com/watch?v=bpmYlk4dnys

Edit:  This video is better, it includes the addition of a Noctua case fan: https://www.youtube.com/watch?v=YklybEdoKIM

Wish me luck, the good kind, plz

Thank you

 

[img]

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4964
Credit: 18743222502
RAC: 7012666

All the reviews I have read

All the reviews I have read of that case and system say that it runs hot and loud.

Crappy case that restricts air flow.

You should move the components to a better case that allows the components to shed the heat outside the case if you are tearing apart the gpu to repaste.

 

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4964
Credit: 18743222502
RAC: 7012666

That YT video was for the

That YT video was for the year later R12 version.  Hope your R11 build is identical for your modifications.

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.