I browsed around a little among quorum partners, and soon found that my Stoll8 host is not alone in having the new (to me) 114 (0x72) exit status with data gap or overlap.
archae86 see Christians second bullet point in the message right before your 2 consecutive messages about error 114, it's because of a misconfiguration in the work generator that's been fixed now. Already generated tasks have to go through the system but should clear pretty quickly.
* The problem with missing result files was also addressed and should not happen for tasks that are send out today.
The first of my three trial tuning files run today ran seemingly with normal progress for ten hours until a slight pause at the indicated 99.000% completion point, then errored.
Perhaps this is just the known missing result file problem.
The WU was created 11 Feb 2016, 12:56:55 UTC
The Task was created 12 Feb 2016, 8:59:41 UTC
Here is some text from the end of stderr:
The second of my three hosts to finish a trial tuning job obtained after the first batch of fixes went in ran to 99% with normal-looking progess, then hung at exactly 99.000 % claimed completion for a little under five minutes, then completed with error status with two files indicated as "not found":
Over on the Technical News thread both Betreger and robl have reported similar-manifesting failures, and I found more just by back-tracing quorum partners to find hosts actually running this beta. What I did not find was any apparently successful completions yet, though it is early times. For myself, I plan to wait out the weekend before trying again, unless I see a post indicating better hope.
The problem with the missing result files persisted for an unknown reason. I stopped distribution of O1AS20-100T tasks again until we can assess the situation on Monday.
2016-02-13 02:28:45.4898 (5888) [normal]: Finished main analysis.
2016-02-13 02:28:45.4898 (5888) [normal]: Recalculating statistics for the final toplist...
2016-02-13 02:37:09.0030 (4548) [normal]: This program is published under the GNU General Public License, version 2
2016-02-13 02:37:09.0030 (4548) [normal]: For details see http://einstein.phys.uwm.edu/license.php
2016-02-13 02:37:09.0030 (4548) [normal]: This Einstein@home App was built at: Feb 11 2016 16:21:10
Zalster: The result log you quoted is a problem on your host. Are at least a very unlucky coincidence. The app was preempted just while creating the output files. When it restarted it couldn't resume from this state so it failed. I'll send this along to the app developers and they'll look into making this more robust.
The problem with the missing result files persisted for an unknown reason. I stopped distribution of O1AS20-100T tasks again until we can assess the situation on Monday.
This should be fixed with the new app versions 1.02 that I built and published yesterday. The remaining "tasks to send" will be distributed, more work will be created on Monday.
I browsed around a little
)
I browsed around a little among quorum partners, and soon found that my Stoll8 host is not alone in having the new (to me) 114 (0x72) exit status with data gap or overlap.
somebody else error task 1
somebody else error task 2
somebody else error task 3
These three are all from one host owned by user Thomander
also one single WU has generated this type of error on at least three different hosts:
same WU on host 1
same WU on host 2
same WU on host 3
Possibly this hints that there is a batch of WUs formed in a way that is incompatible with at least a subset of currently active hosts in this way.
archae86 see Christians
)
archae86 see Christians second bullet point in the message right before your 2 consecutive messages about error 114, it's because of a misconfiguration in the work generator that's been fixed now. Already generated tasks have to go through the system but should clear pretty quickly.
Holmis--regarding error 114,
)
Holmis--regarding error 114, I see now. Unfortunately too late to edit my useless posts.
Thanks.
RE: * The problem with
)
The first of my three trial tuning files run today ran seemingly with normal progress for ten hours until a slight pause at the indicated 99.000% completion point, then errored.
Perhaps this is just the known missing result file problem.
The WU was created 11 Feb 2016, 12:56:55 UTC
The Task was created 12 Feb 2016, 8:59:41 UTC
Here is some text from the end of stderr:
h1_0029.00_O1C01Cl1In1__O1AS20-100T_29.05Hz_246_3_2
-161 (not found)
Here is a link to the task page
The second of my three hosts
)
The second of my three hosts to finish a trial tuning job obtained after the first batch of fixes went in ran to 99% with normal-looking progess, then hung at exactly 99.000 % claimed completion for a little under five minutes, then completed with error status with two files indicated as "not found":
h1_0021.80_O1C01Cl1In1__O1AS20-100T_21.85Hz_171_2_2
-161 (not found)
Over on the Technical News thread both Betreger and robl have reported similar-manifesting failures, and I found more just by back-tracing quorum partners to find hosts actually running this beta. What I did not find was any apparently successful completions yet, though it is early times. For myself, I plan to wait out the weekend before trying again, unless I see a post indicating better hope.
The problem with the missing
)
The problem with the missing result files persisted for an unknown reason. I stopped distribution of O1AS20-100T tasks again until we can assess the situation on Monday.
Yes, my first task just
)
Yes, my first task just errored out
Here is the link
https://einsteinathome.org/task/545187574
and here is the last portion of the stderr
1 more in progress but I think it will probably do the same.
Thanks Christian
Zalster
Zalster: The result log you
)
Zalster: The result log you quoted is a problem on your host. Are at least a very unlucky coincidence. The app was preempted just while creating the output files. When it restarted it couldn't resume from this state so it failed. I'll send this along to the app developers and they'll look into making this more robust.
I will repost here since my
)
I will repost here since my first post was in the wrong thread:
FYI:
I had and "01" job with runtime/cputime ~38500
It failed with:
116.......c
.....................................c
................................c
.................................c
..................................c
..............................c
.....................................c
........
2016-02-11 22:19:22.1356 (19492) [normal]: Finished main analysis.
2016-02-11 22:19:22.1356 (19492) [normal]: Recalculating statistics for the final toplist...
2016-02-11 22:21:58.9230 (19492) [normal]: Finished recalculating toplist statistics.
2016-02-11 22:21:58.9230 (19492) [debug]: Writing output ... toplist2 ... toplist3 ... done.
FPU status flags: COND_3 PRECISION
2016-02-11 22:21:59.6677 (19492) [normal]: done. calling boinc_finish(0).
22:21:59 (19492): called boinc_finish
upload failure:
h1_0024.55_O1C01Cl1In1__O1AS20-100T_24.6Hz_171_1_1
-161 (not found)
h1_0024.55_O1C01Cl1In1__O1AS20-100T_24.6Hz_171_1_2
-161 (not found)
]]>
RE: The problem with the
)
This should be fixed with the new app versions 1.02 that I built and published yesterday. The remaining "tasks to send" will be distributed, more work will be created on Monday.
BM
BM