Severe trouble with LAT-files

astro-marwil
astro-marwil
Joined: 28 May 05
Posts: 536
Credit: 680456543
RAC: 478886
Topic 196202

Hallo !
Since long time I have trouble to crunch LAT-files. Most of them endup with crunching error.

Quote:
27.02.2012 15:19:41 | Einstein@Home | Aborting task LATeah2222A_96.0_21600_-2.7e-11_0: exceeded disk limit: 52.37MB > 19.07MB
27.02.2012 15:19:41 | Einstein@Home | Aborting task LATeah2222A_96.0_21550_-4.8e-11_0: exceeded disk limit: 41.61MB > 19.07MB
27.02.2012 15:19:42 | Einstein@Home | Computation for task LATeah2222A_96.0_21600_-2.7e-11_0 finished
27.02.2012 15:19:42 | Einstein@Home | Output file LATeah2222A_96.0_21600_-2.7e-11_0_0 for task LATeah2222A_96.0_21600_-2.7e-11_0 absent
27.02.2012 15:19:42 | Einstein@Home | Output file LATeah2222A_96.0_21600_-2.7e-11_0_1 for task LATeah2222A_96.0_21600_-2.7e-11_0 absent
27.02.2012 15:19:42 | Einstein@Home | Computation for task LATeah2222A_96.0_21550_-4.8e-11_0 finished
27.02.2012 15:19:42 | Einstein@Home | Output file LATeah2222A_96.0_21550_-4.8e-11_0_0 for task LATeah2222A_96.0_21550_-4.8e-11_0 absent
27.02.2012 15:19:42 | Einstein@Home | Output file LATeah2222A_96.0_21550_-4.8e-11_0_1 for task LATeah2222A_96.0_21550_-4.8e-11_0 absent
27.02.2012 15:19:42 | Einstein@Home | Starting task LATeah2222A_96.0_21700_-1.8e-11_0 using hsgamma_FGRP1 version 23
27.02.2012 15:19:42 | Einstein@Home | Starting task LATeah2222A_96.0_21700_-1.6e-11_0 using hsgamma_FGRP1 version 23


This error remark exceeded disk limit: 41.61MB > 19.07MB is peculiar, as there are more than 50GB free available for BOINC and EaH. Peculiar is also, that 2 LAT-tasks can be crunched in parallel (processor i3-2100, dual core with HT, so four cores are emulated). With S6Bucket I had no problems at all.
I carried out a repair of BOINC and let all EaH files running out regularily, deleted all EaH-files form the directory and let reload fresh files from the server, with no effect. Furthermore I took attention that the 4 LAT-tasks did not start close in time, with no effect.
It seem to me that the files end up with crunching error when the result became written the first time to the outputfiles to become stored on disc. This is set at me for all 5 min.
Noticable is also, that the progressbar for LAT-tasks is stepping up only in multiples or 2.000%.
In the taksmanager all things seems to run properly.

As the LAT application is running for so long time without reporting such difficulties from other participants, I´m venting the idea, that my processor shows an individual malfunctioning. But S6Buccket-tasks were running well! So there must be also something happen in the programming. The S6Bucket is obviously more tollerant to this error.

I will be pleased to get some ideas to overcome this malfunctioning.

Kind regards
Martin

tullio
tullio
Joined: 22 Jan 05
Posts: 2118
Credit: 61407735
RAC: 0

Severe trouble with LAT-files

I renounced processing LAT files on my Linux box since they all end in validate error. I am processing Arecibo pulsar data and gravitational wave searches with no problem.
Tullio

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1079
Credit: 341280
RAC: 0

RE: This error remark

Quote:
This error remark exceeded disk limit: 41.61MB > 19.07MB is peculiar, as there are more than 50GB free available for BOINC and EaH.


Where do you see those 50 GiB? Did you check the startup messages in your BOINC event log?
(example from mine: 26/02/2012 01:19:44||Preferences limit disk usage to 1.86GB)

Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 3001405274
RAC: 699888

19.07 MB is the binary

19.07 MB is the binary equivalent of the maximum anount of disk space a single LAT workunit is allowed to occupy on disk:

    20000000.000000


That limit is set by the project when the workunits are made: it's nothing to do with your choices of how much disk space is allocated to BOINC generally.

It sounds more likely that the LAT application is generating garbage output (a large amount of it!) on your system, which might be a clue to the validate errors (garbage in result files) too.

astro-marwil
astro-marwil
Joined: 28 May 05
Posts: 536
Credit: 680456543
RAC: 478886

Hallo to all helpers! Hallo

Hallo to all helpers!
Hallo Gundolf! I know my limits are much higher than logically needed. I set it for purpose of test so high. The eventlog sayes

Quote:
27.02.2012 22:04:04 | | max disk usage: 64.00GB


Realy needed are 6 to 11GB only.
Furthermeore, why this error remark

Quote:
27.02.2012 15:19:42 | Einstein@Home | Output file LATeah2222A_96.0_21600_-2.7e-11_0_0 for task LATeah2222A_96.0_21600_-2.7e-11_0 absent


Why is this obviously needed output file not generated by the program? Is this an indication for a programming error, or what?

Hallo Tulio! I also had disarmed LAT. But at now S6Bucket-file are becoming rare and this unchecking is no longer working.

I will be very much pleased to get your comments and suggestions.

Kind regards
Martin

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 3001405274
RAC: 699888

RE: Why is this obviously

Quote:
Why is this obviously needed output file not generated by the program? Is this an indication for a programming error, or what?


It's not generated, because the application has crashed - or, in your particular case, been forcibly terminated by BOINC for generating vastly more output than expected for a valid run.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 798332993
RAC: 1197028

It would be interesting to

It would be interesting to see how the wingman was doing with this particular WU that had disk quota problems.

The per-task disk limit is a dafeguard against a taks running wild (because of programming bug or malfunction of the host) and trash all the remaining allowed diskspace, so it makes sense that this is enforced. If this particular WU happens to generate huge legitimate output files, it would fail with the same error for all wingmen. At least here: http://einsteinathome.org/workunit/116558405 this is not the case. It could still be a sporadic bug, tho, that mysteriously is more likely to happen on your host.

CU
HB

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.