Immediate Computation error with Gravitational Wave search O1 all-sky tuning v1.00

Christian Beer
Christian Beer
Joined: 9 Feb 05
Posts: 595
Credit: 196990024
RAC: 186975

Another Update: The 1.02

Another Update:

The 1.02 apps solve the missing result file problem (upload failure -161) and we already receive all of the result files. The validator is already running and we keep a lookout for any validation errors.

We will grant Credit to all those who suffered from the upload failure later this week.

There will be an update to 1.03 shortly that fixes some problems with checkpointing that we found.

I'm also going to generate more work after the apps are updated so your machines can keep busy.

We are aware that runtimes seem to be "off the scale". But his was a little bit expected so we can tune the main search. The runtimes on a host seem to be consistent. Why some hosts take 6h and some 24h we don't know yet. I will dig into that when there are more successful results available to make a proper statistic.

If you find new problems with the 1.03 version please open a new thread in Problems and Bug Reports.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 777255030
RAC: 1213021

Hi, Zalster

Hi,

Zalster wrote:

Quote:

Yes, my first task just errored out

Here is the link

https://einsteinathome.org/task/545187574

and here is the last portion of the stderr

Quote:

[...]
2016-02-13 02:37:16.8655 (4548) [CRITICAL]: Checksum error: -6272615
% --- Cpt:25506, total:25506, sky:118/117, f1dot:1/218
[...]

This was caused by a bug in the ckeckpointing code that we will fix in the 1.03 app version, as Christian just mentioned.

Actually it is exceptional that this bug causes a computation error, unfortunately it affects a lot of results that were completed, but the uploaded results will fail to validate. If some of your results are now showing an "inconclusive validation", this is the most likely reason. Sorry for the inconvenience.

HB

tullio
tullio
Joined: 22 Jan 05
Posts: 2118
Credit: 61407735
RAC: 0

The first 1.03 task crashed

The first 1.03 task crashed on my Windows 10 PC. I now have two waiting on SUN WS with Opteron 1210 CPU and SuSE Leap 42.1 64-bit OS.
Tullio

archae86
archae86
Joined: 6 Dec 05
Posts: 3161
Credit: 7303798356
RAC: 2278155

RE: The first 1.03 task

Quote:
The first 1.03 task crashed on my Windows 10 PC. I now have two waiting on SUN WS with Opteron 1210 CPU and SuSE Leap 42.1 64-bit OS.
Tullio


Tullio,
The stderr file first four lines starting with the task startup for your failed task end with something possibly interesting which differs from my successful ones:
yours

2016-02-17 20:41:14.2774 (6556) [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_O1AS20-100T_1.03_windows_x86_64__AVX.exe'.
Activated exception handling...
2016-02-17 20:41:14.2793 (6556) [debug]: BSGL output files
2016-02-17 20:41:14.4336 (6556) [debug]: Flags: LAL_NDEBUG, OPTIMIZE, HS_OPTIMIZATION, GC_SSE2_OPT, X64, SSE, SSE2, GNUC X86 GNUX86
2016-02-17 20:41:14.4346 (6556) [normal]: WARNING: Resultfile 'GCT.out' present - doing nothing

mine

2016-02-17 00:29:48.8779 (57768) [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_O1AS20-100T_1.03_windows_x86_64__AVX.exe'.
Activated exception handling...
2016-02-17 00:29:48.8817 (57768) [debug]: BSGL output files
2016-02-17 00:29:48.9017 (57768) [debug]: Flags: LAL_NDEBUG, OPTIMIZE, HS_OPTIMIZATION, GC_SSE2_OPT, X64, SSE, SSE2, GNUC X86 GNUX86
2016-02-17 00:29:48.9042 (57768) [debug]: Set up communication with graphics process.

I have no idea what a GCT.out file might be doing there, but possibly this is a clue to your problem. It may also be completely insignificant to the problem.

tullio
tullio
Joined: 22 Jan 05
Posts: 2118
Credit: 61407735
RAC: 0

The first Linux task is

The first Linux task is running OK on my old Opteron 1210 (vintage 2008) while has crashed on the more recent A10-6700 of the Windows PC.
Tullio

Zalster
Zalster
Joined: 26 Nov 13
Posts: 3117
Credit: 4050672230
RAC: 0

Just going to say I had 6

Just going to say I had 6 v1.4 error out today. That was due to an issue with my computer and the ram. It had nothing to do with the work units. Just a FYI

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.