ERROR: could not parse line 1114397 in skyGrid-file '../../projects/einstein.phys.uwm.edu/O3ASHF1_skygrid_1442Hz_m0.008.dat'
a problem reading the file. maybe random. maybe file corruption. maybe disk problems. looks like you had a lot of these errors a few days ago, with other errors showing "I/O error"
I would first try resetting the project to erase all project files and download new versions.
if it happens again, then maybe examine the disk for signs of failure or issues with the disk, and maybe replace it.
I had the exact same problem several weeks ago - a random line in a random skygrid file could (randomly) not be parsed causing all results that depended on the skygrid to fail, even though earlier results for the same skygrid were OK. It happened multiple times with different lines being reported and a few different skygrids. After trying quite a few things (disk checks, replacing files, etc.) I started checking the MD5 checksum of a skygrid against what is stored in the state file and found there was no problem with any file that I looked at.
Since these files are quite large, I wondered if the problem might be something to do with loading the whole file in memory and transmitting relevant parts of it over the PCIe bus to the GPU. I don't know how this all works but I imagined the GPU might call for parts of the file quite regularly and that if the part being called happened to be stored in memory that (intermittently) was not reading correctly, maybe this would explain why things would work for a while and then randomly crash.
I had this thought after days of trying everything else. The system RAM was 2x4GB sticks so I just replaced them both. That was a few weeks ago and there hasn't been a single crash since. If you haven't resolved your problem, replacing the RAM might be worth a try.
I looked at the third link you provided and saw a task that had started successfully and had also stopped and restarted and eventually gave the following error message on about the third startup:-
ERROR: could not parse line 5968662 in skyGrid-file '../../projects/einstein.phys.uwm.edu/O3ASHF1_skygrid_1464Hz_m0.008.dat'
As I mentioned in my previous message, I had this exact same behaviour which was eventually resolved by replacing the RAM after confirming (using an md5sum check utility - multiple times) that the skygrid files involved did indeed always give the correct MD5 checksum values. The correct values are stored in the state file.
If you have replaced your RAM, have you actually confirmed that the file on disk does always give the correct checksum? Perhaps you have some intermittently bad sectors on disk and perhaps the errors only happen when there is a disk read of that part of the skygrid file that contains a flaky sector? Because the example I looked at had quite a lot of successful computation prior to the error, it seems to suggest that the problem shows intermittently so perhaps you need to run multiple checksum scans to see if you get an occasional failure.
in your
)
in your stderr:
a problem reading the file. maybe random. maybe file corruption. maybe disk problems. looks like you had a lot of these errors a few days ago, with other errors showing "I/O error"
I would first try resetting the project to erase all project files and download new versions.
if it happens again, then maybe examine the disk for signs of failure or issues with the disk, and maybe replace it.
_________________________________________________________________________
... will try and report back
)
... will try and report back ...
many thanks for your reply
have a nice sunday !
S-F-V
I had the exact same problem
)
I had the exact same problem several weeks ago - a random line in a random skygrid file could (randomly) not be parsed causing all results that depended on the skygrid to fail, even though earlier results for the same skygrid were OK. It happened multiple times with different lines being reported and a few different skygrids. After trying quite a few things (disk checks, replacing files, etc.) I started checking the MD5 checksum of a skygrid against what is stored in the state file and found there was no problem with any file that I looked at.
Since these files are quite large, I wondered if the problem might be something to do with loading the whole file in memory and transmitting relevant parts of it over the PCIe bus to the GPU. I don't know how this all works but I imagined the GPU might call for parts of the file quite regularly and that if the part being called happened to be stored in memory that (intermittently) was not reading correctly, maybe this would explain why things would work for a while and then randomly crash.
I had this thought after days of trying everything else. The system RAM was 2x4GB sticks so I just replaced them both. That was a few weeks ago and there hasn't been a single crash since. If you haven't resolved your problem, replacing the RAM might be worth a try.
Cheers,
Gary.
Seems to have been a bad
)
Seems to have been a bad drive.
I will watch the behavior.
If it re-appears, new/other memory sticks is on my list.
Thanks.
Switched to new NVMe - Error
)
Resetted and also switched to new NVMe - Error persists.
Will check more.
S-F-V
I am at loss. All kinds of
)
I am at loss.
All kinds of different errors in relation with the resulting XCAL error.
Parsing error -- argument missing --
Replaced Dimms.
Deleted BOINC + files completely and re-installed it.
Reduced GPU load.
Changed percent usage ...
No overheating, no overloading, just a boring setup.
Any ideas would be appriciated.
https://einsteinathome.org/task/1615028363
https://einsteinathome.org/task/1615026855
https://einsteinathome.org/task/1612790312
Thanks S-F-V
you have the root cause same
)
you have the same root cause error in all tasks (just different lines and skygrid files)
_________________________________________________________________________
I looked at the third link
)
I looked at the third link you provided and saw a task that had started successfully and had also stopped and restarted and eventually gave the following error message on about the third startup:-
ERROR: could not parse line 5968662 in skyGrid-file '../../projects/einstein.phys.uwm.edu/O3ASHF1_skygrid_1464Hz_m0.008.dat'
As I mentioned in my previous message, I had this exact same behaviour which was eventually resolved by replacing the RAM after confirming (using an md5sum check utility - multiple times) that the skygrid files involved did indeed always give the correct MD5 checksum values. The correct values are stored in the state file.
If you have replaced your RAM, have you actually confirmed that the file on disk does always give the correct checksum? Perhaps you have some intermittently bad sectors on disk and perhaps the errors only happen when there is a disk read of that part of the skygrid file that contains a flaky sector? Because the example I looked at had quite a lot of successful computation prior to the error, it seems to suggest that the problem shows intermittently so perhaps you need to run multiple checksum scans to see if you get an occasional failure.
Cheers,
Gary.
Finally solved the
)
Finally solved the problem.
The substituted used RAM was also faulty.
Bought a set of brand new RAMs and all is fine.
Thanks for your help.
S-F-V