Einstein FGRPB1G Linux/Nvidia Special app "AIO"

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3952
Credit: 46789022642
RAC: 64210417

where is the app_info file

where is the app_info file located? it needs to be in the einstein project folder.

i think you've done something wrong with the file or have it in the wrong place. the latest files on your host are trying to use the default app from the project, I can see that it's not running anonymous platform which means that BOINC doesnt see the file. either it has an error in it, or it's not in the right place.

_________________________________________________________________________

Boca Raton Community HS
Boca Raton Comm...
Joined: 4 Nov 15
Posts: 240
Credit: 10560115586
RAC: 24710508

Ian&Steve C. wrote: where is

Ian&Steve C. wrote:

where is the app_info file located? it needs to be in the einstein project folder.

i think you've done something wrong with the file or have it in the wrong place. the latest files on your host are trying to use the default app from the project, I can see that it's not running anonymous platform which means that BOINC doesnt see the file. either it has an error in it, or it's not in the right place.

 

I have it in:

~/.var/app/edu.berkeley.BOINC/projects/einstein.phys.uwm.edu

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3952
Credit: 46789022642
RAC: 64210417

I would restart BOINC for

I have no idea if that's the right location for your system since i do not run BOINC from a repository install (I've compiled a standalone version).

 

I would restart BOINC for good measure. and check your event log to make sure that it reports that it picked up the app_info file for Einstein. if it does, then wait out your penalty box time and try again tomorrow.

_________________________________________________________________________

Boca Raton Community HS
Boca Raton Comm...
Joined: 4 Nov 15
Posts: 240
Credit: 10560115586
RAC: 24710508

I did a few things- I started

I did a few things- I started over to make sure I didn't make a change that I just didn't remember. With that first error, it was reporting as anonymous but then with that last error, it didn't (after I had made some additional changes). 

I now completely understand how the .zip file is organized (which is helpful) and just retried all of additions and edits.

I restarted everything. Checked the event log and see:

Thu 16 Feb 2023 11:27:54 AM EST | Einstein@Home | Found app_info.xml; using anonymous platform

So, I think that is showing up again. I won't know until tomorrow if that original error is still happening.


Thanks again

DF1DX
DF1DX
Joined: 14 Aug 10
Posts: 105
Credit: 3854276854
RAC: 4916811

Gary Roberts schrieb:Out of

Gary Roberts wrote:

Out of interest, I had a quick look at the full tasks list for that machine and saw that (at the time I looked) there were 1466 pendings, 5009 valids, 588 invalids along with 3 errors.  The high invalids ratio (~1 for every 8.5 valid) quite surprised me.

I'm wondering if other users of the AIO app also get high invalid rates or if the above example is some sort of other issue?  I don't have any nvidia GPUs so I haven't been paying much attention to AIO app results.  My hosts using GCN generation GPUs tend to get about 1 invalid for something like 100 valid.  My host under test has the above single invalid along with 86 valid and 49 pending and no errors, so it looks OK at the moment.

I've been testing my new RTX 4090 with the AIO app for a few days now and I'm also seeing about 10% invalid WUs. Mostly when the other two results are calculated by AMD GPUs.

Not much difference with version 0.95 or 1.0.

The GPU is currently running in powerlimit 200 W and max. 3000 MHz. Here in Germany, electric power is very expensive.

sudo nvidia-smi -pl 200

sudo nvidia-smi -lgc 210,3000

My host.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3952
Credit: 46789022642
RAC: 64210417

I've noticed high invalids

I've noticed high invalids with other users who have 4090's also.

I don't have one, so I'm not sure the exact reason. maybe some of the specific tuning variables arent optimal for the 4090's architecture

you might try downclocking the memory a bit to see if that helps. maybe back it off to ~20Gbps? just a guess.

you can try playing with some of the kernel tuning variables in the EAH_SLEEP file, but do so at your own risk. I have no idea which way they should be adjusted or what will work to cause more or less errors. you'd have to trial and error it for yourself.

 

as a reminder, development of this application has stopped some time ago. there will not be any update. use it as-is until FGRPB1G tasks are gone (~8 months by current estimates)

_________________________________________________________________________

Boca Raton Community HS
Boca Raton Comm...
Joined: 4 Nov 15
Posts: 240
Credit: 10560115586
RAC: 24710508

Retried this morning. Getting

Retried this morning. Getting a different error (host: this host).

It looks like the application is starting but an error with the output?

Any help would be appreciated.

Thanks!

 

Stderr output

<core_client_version>7.22.0</core_client_version>
<![CDATA[
<message>
process exited with code 11 (0xb, -245)</message>
<stderr_txt>
07:36:27 (178): [normal]: This Einstein@home App (v1.0 by petri33) was built at: Apr 28 2022 18:47:15

07:36:27 (178): [normal]: Start of BOINC application '../../projects/einstein.phys.uwm.edu/HSgammaPulsar_x86_64-pc-linux-gnu-opencl_v1.0'.
07:36:27 (178): [debug]: 1e+16 fp, 5.3e+09 fp/s, 1992545 s, 553h29m04s87
07:36:27 (178): [normal]: % CPU usage: 1.000000, GPU usage: 1.000000
command line: ../../projects/einstein.phys.uwm.edu/HSgammaPulsar_x86_64-pc-linux-gnu-opencl_v1.0 --inputfile ../../projects/einstein.phys.uwm.edu/LATeah4021L04.dat --alpha 0.943218186562 --delta 1.30995332125 --skyRadius 8.726650e-08 --ldiBins 30 --f0start 572.0 --f0Band 8.0 --firstSkyPoint 0 --numSkyPoints 1 --f1dot -1e-13 --f1dotBand 1e-13 --df1dot 1.413729381e-15 --ephemdir ../../projects/einstein.phys.uwm.edu/JPLEPH --Tcoh 2097152.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.1 --reftime 56100 --model 0 --f0orbit 0.005 --mismatch 0.1 --demodbinary 1 --BinaryPointFile ../../projects/einstein.phys.uwm.edu/templates_LATeah4021L04_0580_6167049.dat --debug 0 -o LATeah4021L04_580.0_0_0.0_6167049_0_0.out
output files: 'LATeah4021L04_580.0_0_0.0_6167049_0_0.out' '../../projects/einstein.phys.uwm.edu/LATeah4021L04_580.0_0_0.0_6167049_0_0' 'LATeah4021L04_580.0_0_0.0_6167049_0_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah4021L04_580.0_0_0.0_6167049_0_1'
07:36:27 (178): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86
07:36:27 (178): [debug]: glibc version/release: 2.35/stable
07:36:27 (178): [debug]: Set up communication with graphics process.
EAH_SLEEP file found, value 0

kernel_compact 256 threads
kernel_raz 256 threads
kernel_ts_2_phase_diff_sorted 64 threads
kernel_prepare_power_toplist 256 threads
kernel_prepareSort 1024 threads
kernel_SortedPhoton 64 threads
kernel_setupPhotonPairsArray 64 threads
kernel_extractPhotonIndex 512 threads
Eah sleep true, 0
boinc_get_opencl_ids returned [0x55e457182a20 , 0x55e457182950]
Using OpenCL platform provided by: NVIDIA Corporation
Using OpenCL device "NVIDIA GeForce RTX 4090" by: NVIDIA Corporation
Max allocation limit: 6347669504
Global mem size: 25390678016
OpenCL device has FP64 support
SemiCoh mode 0 start
skypoints(1)read_checkpoint(): Couldn't open file 'LATeah4021L04_580.0_0_0.0_6167049_0_0.out.cpt': No such file or directory (2)
skypoint loop(1)
S0:dpleph[initephem]: Cannot open file .405, result = 104
dpleph[state]: Time 2454683.289515 outside range of ephemeris
dpleph[state]: Time 2454683.289515 outside range of ephemeris

-- signal handler called: signal 1
9 stack frames obtained for this thread:

End of stcaktrace
07:36:27 (178): called boinc_finish(11)

</stderr_txt>
]]>

 

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3952
Credit: 46789022642
RAC: 64210417

can you run ldd against the

can you run ldd against the executable to make sure you're not missing any dependencies?

in a terminal where the binary is located. run this:

ldd HSgammaPulsar_x86_64-pc-linux-gnu-opencl_v1.0

and post the output.

_________________________________________________________________________

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3952
Credit: 46789022642
RAC: 64210417

oh, actually, you have some

oh, actually, you have some other problem. I've seen this before. your JPLEPH.405 file seems to have some problem or it's corrupted. try copying that file from another system to this one. it's in the einstein project folder.

_________________________________________________________________________

Boca Raton Community HS
Boca Raton Comm...
Joined: 4 Nov 15
Posts: 240
Credit: 10560115586
RAC: 24710508

I replaced the file, same

I replaced the file, same problem.

 

Ran ldd:

ldd HSgammaPulsar_x86_64-pc-linux-gnu-opencl_v1.0
    linux-vdso.so.1 (0x00007ffd6ef6c000)
    libOpenCL.so.1 => /lib/x86_64-linux-gnu/libOpenCL.so.1 (0x00007f9cbeda1000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f9cbed9c000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f9cbed97000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f9cbecb0000)
    libmvec.so.1 => /lib/x86_64-linux-gnu/libmvec.so.1 (0x00007f9cbebb3000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f9cbe989000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f9cbf997000)

 

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.