Someone can explain me this...

ExaGroup
ExaGroup
Joined: 3 Apr 20
Posts: 14
Credit: 861161110
RAC: 1
Topic 224552

Hello ppls.. some of my pc gave me this problem..

Stderr output

<core_client_version>7.6.33</core_client_version>
<![CDATA[
<message>
process exited with code 68 (0x44, -188)
</message>
<stderr_txt>
21:15:40 (9474): [normal]: This Einstein@home App was built at: Jul 26 2017 11:32:40

21:15:40 (9474): [normal]: Start of BOINC application '../../projects/einstein.phys.uwm.edu/hsgamma_FGRP5_1.08_x86_64-pc-linux-gnu__FGRPSSE'.
21:15:40 (9474): [debug]: 2.1e+15 fp, 2.1e+09 fp/s, 1022507 s, 284h01m47s25
command line: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRP5_1.08_x86_64-pc-linux-gnu__FGRPSSE --inputfile ../../projects/einstein.phys.uwm.edu/JPLEPH.405 --alpha 2.1039176188 --delta -0.9808959836 --skyRadius 0.001361356817 --ldiBins 15 --f0start 1048 --f0Band 16 --firstSkyPoint 400954 --numSkyPoints 58 --f1dot -1.0e-13 --f1dotBand 1.0e-13 --df1dot 1.344493449e-15 --ephemdir ../../projects/einstein.phys.uwm.edu/JPLEPH --Tcoh 4194304.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.15 --reftime 56757.0 --f0orbit 0.005 --freeRadiusFactor 2 --mismatch 0.15 --debug 0 -o LATeah1075F_1064.0_400954_0.0_0_0.out
output files: 'LATeah1075F_1064.0_400954_0.0_0_0.out' '../../projects/einstein.phys.uwm.edu/LATeah1075F_1064.0_400954_0.0_0_0' 'LATeah1075F_1064.0_400954_0.0_0_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah1075F_1064.0_400954_0.0_0_1'
21:15:40 (9474): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86
21:15:40 (9474): [debug]: glibc version/release: 2.24/stable
21:15:40 (9474): [debug]: Set up communication with graphics process.
Line 1 in inputfile ../../projects/einstein.phys.uwm.edu/JPLEPH.405 seems to be damaged.
21:15:40 (9474): [CRITICAL]: ERROR: MAIN() returned with error '4'
FPU status flags:
mv: cannot stat 'LATeah1075F_1064.0_400954_0.0_0_0.out': No such file or directory
mv: cannot stat 'LATeah1075F_1064.0_400954_0.0_0_0.out': No such file or directory
mv: cannot stat 'LATeah1075F_1064.0_400954_0.0_0_0.out': No such file or directory
mv: cannot stat 'LATeah1075F_1064.0_400954_0.0_0_0.out': No such file or directory
mv: cannot stat 'LATeah1075F_1064.0_400954_0.0_0_0.out': No such file or directory
mv: cannot stat 'LATeah1075F_1064.0_400954_0.0_0_0.out.cohfu': No such file or directory
mv: cannot stat 'LATeah1075F_1064.0_400954_0.0_0_0.out.cohfu': No such file or directory
mv: cannot stat 'LATeah1075F_1064.0_400954_0.0_0_0.out.cohfu': No such file or directory
mv: cannot stat 'LATeah1075F_1064.0_400954_0.0_0_0.out.cohfu': No such file or directory
mv: cannot stat 'LATeah1075F_1064.0_400954_0.0_0_0.out.cohfu': No such file or directory
mv: cannot stat 'LATeah1075F_1064.0_400954_0.0_0_0.out.cohfu': No such file or directory
21:15:51 (9474): [normal]: done. calling boinc_finish(68).
21:15:51 (9474): called boinc_finish

</stderr_txt>
]]>

Have re-download "http://einstein6.aei.uni-hannover.de/EinsteinAtHome/download/29/JPLEPH.405" and verify MD5 reading from file "/var/lib/boinc-client/client_state.xml" ...the file is good... what i can do for solve the problem ?

mikey
mikey
Joined: 22 Jan 05
Posts: 12780
Credit: 1867389624
RAC: 1832505

ExaGroup wrote: Hello ppls..

ExaGroup wrote:

Hello ppls.. some of my pc gave me this problem..

Stderr output

<core_client_version>7.6.33</core_client_version>
<![CDATA[
<message>
process exited with code 68 (0x44, -188)
</message>
<stderr_txt>
21:15:40 (9474): [normal]: This Einstein@home App was built at: Jul 26 2017 11:32:40

Have re-download "http://einstein6.aei.uni-hannover.de/EinsteinAtHome/download/29/JPLEPH.405" and verify MD5 reading from file "/var/lib/boinc-client/client_state.xml" ...the file is good... what i can do for solve the problem ?


First it would help ALOT if you would unhide your computers so the experts can see what is really going on. Click on my name and then 'show computers' and you will see all that anyone could see about yours, no names just the facts are shown along with the tasks you are running and the outcomes of each of them.

ExaGroup
ExaGroup
Joined: 3 Apr 20
Posts: 14
Credit: 861161110
RAC: 1

Hooo yes sure ..unhide right

Hooo yes sure ..unhide right now.. sorry Mikey

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 118369062192
RAC: 25563906

ExaGroup wrote:Hello ppls..

ExaGroup wrote:

Hello ppls.. some of my pc gave me this problem..

....

Line 1 in inputfile ../../projects/einstein.phys.uwm.edu/JPLEPH.405 seems to be damaged.

You have 91 hosts in your full list so it would help if you provide a link to the particular host that gave this error message.  Does more than one host show the same problem?  Do the failures happen intermittently or does every task fail?

Many years ago, I had an example where BOINC would decide (intermittently) that this same file was corrupt.  With BOINC stopped, running an MD5 check said that the file was OK.  I decided to replace it anyway.  My initial assumption was that perhaps it was sitting on a particular part of the disk that contained an intermittent bad block.  I renamed the file JPLEPH.BAD so it wouldn't move and that way the replacement would occupy a different part of the disk.  For a while, everything seemed OK but then the intermittent failures returned.

After much tearing of (non-existent) hair, I eventually did a full and exhaustive RAM test which revealed a bad memory location.  After replacing the stick, the problem went away completely.  My final assumption was that when BOINC does the MD5 checks of important files like this one, it (by chance) happened to hit the bad memory location intermittently, causing the check to fail.

I seem to remember that my example actually mentioned (in the log) that the MD5 check failed.  The above message doesn't actually say that so it could be something else than the MD5 check.  For reference, here is exactly the full ls -l listing for that file from one of my hosts.  Is yours exactly the same number of bytes?

-rw-r--r-- 1 gary gary 9319680 Oct  5  2014 JPLEPH.405

Cheers,
Gary.

ExaGroup
ExaGroup
Joined: 3 Apr 20
Posts: 14
Credit: 861161110
RAC: 1

I have more than one host

I have more than one host that has the problem and unfortunately the problem is intermittently...

Woking on this project having 71 VM and other 20 phisical machine.. (and other 20 out of the project that running other ;-) ..)

..haven't checked how many have problems ..(no free time at the moment for doing).. here the 1st three machine..
https://einsteinathome.org/it-it/task/1047078615
https://einsteinathome.org/it-it/task/1047078719
https://einsteinathome.org/it-it/task/1047079004

On that three have verifiy JPLEPH.405 file.. that have the correct MD5 and "exactly the same number of bytes"

thx you Gary for the help

ExaGroup
ExaGroup
Joined: 3 Apr 20
Posts: 14
Credit: 861161110
RAC: 1

maybe found something.. get 3

maybe found something.. get 3 VM linux and 3 PH win2003 having the problem.. from my list


all of these saying trouble with "Line 1 in inputfile ../../projects/einstein.phys.uwm.edu/JPLEPH.405 seems to be damaged."

check MD5 & bytes of JPLEPH.405 ..and are ok

..but from Stderr output reading..
<message>
process exited with code 68 (0x44, -188)
</message>
..for the linux box

..while for the win2003 box
<message>
The name limit for the local computer network adapter card was exceeded.
 (0x44) - exit code 68 (0x44)</message>
<stderr_txt>

..the messages seems equal

..so have increase (on all the 6 machine) the dynamicport range
..and now wait.. ;-)

Harri Liljeroos
Harri Liljeroos
Joined: 10 Dec 05
Posts: 4463
Credit: 3264386494
RAC: 1897879

We had similar incident just

We had similar incident just before Christmas. See here https://einsteinathome.org/content/cpu-tasks-error-out-after-12-seconds

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 118369062192
RAC: 25563906

Harri Liljeroos wrote: We

Harri Liljeroos wrote:

We had similar incident just before Christmas. See here https://einsteinathome.org/content/cpu-tasks-error-out-after-12-seconds

Thanks for posting the link.  That incident didn't affect me and I'd completely forgotten about it.

I'd been checking the tasks on one of the OP's hosts and noticed that all the failures dated back to around December 20 - so obviously the same issue.  All the current results were being completed successfully so the OP mustn't have noticed how long ago the problem actually was and that everything was now fine :-).

Cheers,
Gary.

ExaGroup
ExaGroup
Joined: 3 Apr 20
Posts: 14
Credit: 861161110
RAC: 1

well.. thx Harri and Gary

well.. thx Harri and Gary ..yup i hadn't noticed the dates ..sorry

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.