I've just found a (minor?) glitch with 4.14: it does not record CPU time consumed on one of my hosts. Some further details about host:
* Linux RedHat 7.3 (yep, I know it's ancient)
* GLIBc version: 2.2.5 (ditto)
* BOINC CC: 5.10.21 official
* CPU: dual Pentium III (Coppermine) @ 1GHz
* RAM: 2GB
* Disk 9.5GB free on BOINC installation partition
It used to show CPU consumed just fine while it was running 4.09. It shows CPU time with S@H.
[edit2]
Not that it's sensible to bother with this problem too much if it only shows on my box: I can't even run official S@H apps on this box ...
[/edit2]
[edit]
Just noticed it has plenty of Couldn't sync errors. Filesystem in question is XFS.
These happen also on my other hosts that have XFS file systems.
[/edit]
Can you ty to switch off syncing on your XFS machines by putting EAH_NO_SYNC in your BOINC directory? In principle a "can't sync" message shouldn't be tragic, but it may unneccessarily fill up the stderr log.
Howdy,
I've had a similar problem as reported by Metod, where I've had results returned and validated, but reporting small (but non-zero) CPU time consumed.
These are from 4 different computers. The first example was on a P4 2.8GHz, and the rest were on P4 3.6GHz boxes; and, the last 4 were all on the same computer.
They all have the same OS RHEL 3, 2.4.21 kernel.
Boinc version 5.8.16
glibc 2.3.2-95.50
I have succesfully run 4.14 on a RHEL4 machine (69,335 seconds). So, I'm thinking this has something to do with the kernel or glibc version. Hope this info helps find an explanation for this behavior.
[edit]Just noticed it has plenty of Couldn't sync errors. Filesystem in question is XFS.
These happen also on my other hosts that have XFS file systems.
[/edit]
Quote:
Can you ty to switch off syncing on your XFS machines by putting EAH_NO_SYNC in your BOINC directory? In principle a "can't sync" message shouldn't be tragic, but it may unneccessarily fill up the stderr log.
Actyally you should do that. I just found that in the code of the 4.14 App the checkpoint is actually not written at all in case of a sync error (has been fixed in the current code).
[pre]
Outcome Client error
Client state Compute error
Exit status 11 (0xb)[/pre]
Got one of these ones as well (Signal 11 error).
On this WU
One of the very few errors I have had on this project.
Running Beta 4.14, Linux on AMD Opteron.
Can you ty to switch off syncing on your XFS machines by putting EAH_NO_SYNC in your BOINC directory? In principle a "can't sync" message shouldn't be tragic, but it may unneccessarily fill up the stderr log.
Actyally you should do that. I just found that in the code of the 4.14 App the checkpoint is actually not written at all in case of a sync error (has been fixed in the current code).
Where exactly should this file be and is there anything else to do? I can't seem to get it right. I've tried to put it in both BOINC installation folder as well as EAH project folder but it still shows can't sync errors. Should I restart BOINC or something?
Okay guys... seems like the client errors were basically my own fault, or maybe that of my laptop manufacturer. Just getting back to let you know...
Apparently the WUs got killed by a problem which also caused frequent system freezes on this box (or maybe the freezes itself when the file system got damaged, dunno if this can happen on Linux boxes). I hunted the problem down over the weekend and it turned out to be old and very buggy firmware on my DVD burner. No idea why it was included in the first place, the laptop is almost new, but never mind. Luckily I was able to download a newer firmware version and update. Since then, the laptop has been rock solid- and miraculously the client errors have also stopped. I'm quite certain there is a connection between those things and I'll be fine now.
Sorry for indicating it might be a problem with BOINC.
Where exactly should this file be and is there anything else to do? I can't seem to get it right. I've tried to put it in both BOINC installation folder as well as EAH project folder but it still shows can't sync errors. Should I restart BOINC or something?
Probably easiest is to try the new 4.16 App. It will stop syncing automatically after 5 failures.
As I am still running App version 4.14 on this computer, it was interesting to see that when I lost my ADSL connection last night I then lost 5 WU's in a row with Signal 11 errors.
This only happened on one host as no others lost WU's.
I think it is time to change up to a latter application.
I've just found a (minor?)
)
I've just found a (minor?) glitch with 4.14: it does not record CPU time consumed on one of my hosts. Some further details about host:
* GLIBc version: 2.2.5 (ditto)
* BOINC CC: 5.10.21 official
* CPU: dual Pentium III (Coppermine) @ 1GHz
* RAM: 2GB
* Disk 9.5GB free on BOINC installation partition
It used to show CPU consumed just fine while it was running 4.09. It shows CPU time with S@H.
[edit2]
Not that it's sensible to bother with this problem too much if it only shows on my box: I can't even run official S@H apps on this box ...
[/edit2]
[edit]
Just noticed it has plenty of Couldn't sync errors. Filesystem in question is XFS.
These happen also on my other hosts that have XFS file systems.
[/edit]
Metod ...
Good to know. Can you ty
)
Good to know.
Can you ty to switch off syncing on your XFS machines by putting EAH_NO_SYNC in your BOINC directory? In principle a "can't sync" message shouldn't be tragic, but it may unneccessarily fill up the stderr log.
BM
BM
Howdy, I've had a
)
Howdy,
I've had a similar problem as reported by Metod, where I've had results returned and validated, but reporting small (but non-zero) CPU time consumed.
E.g. 7.42 seconds
3.62 seconds
1.77 seconds
2.94 seconds
4.34 seconds
3.67 seconds
2.99 seconds
These are from 4 different computers. The first example was on a P4 2.8GHz, and the rest were on P4 3.6GHz boxes; and, the last 4 were all on the same computer.
They all have the same OS RHEL 3, 2.4.21 kernel.
Boinc version 5.8.16
glibc 2.3.2-95.50
I have succesfully run 4.14 on a RHEL4 machine (69,335 seconds). So, I'm thinking this has something to do with the kernel or glibc version. Hope this info helps find an explanation for this behavior.
Chris
RE: [edit]Just noticed it
)
Actyally you should do that. I just found that in the code of the 4.14 App the checkpoint is actually not written at all in case of a sync error (has been fixed in the current code).
BM
BM
RE: Error being reported in
)
Got one of these ones as well (Signal 11 error).
On this WU
One of the very few errors I have had on this project.
Running Beta 4.14, Linux on AMD Opteron.
RE: RE: Can you ty to
)
Where exactly should this file be and is there anything else to do? I can't seem to get it right. I've tried to put it in both BOINC installation folder as well as EAH project folder but it still shows can't sync errors. Should I restart BOINC or something?
Metod ...
Okay guys... seems like the
)
Okay guys... seems like the client errors were basically my own fault, or maybe that of my laptop manufacturer. Just getting back to let you know...
Apparently the WUs got killed by a problem which also caused frequent system freezes on this box (or maybe the freezes itself when the file system got damaged, dunno if this can happen on Linux boxes). I hunted the problem down over the weekend and it turned out to be old and very buggy firmware on my DVD burner. No idea why it was included in the first place, the laptop is almost new, but never mind. Luckily I was able to download a newer firmware version and update. Since then, the laptop has been rock solid- and miraculously the client errors have also stopped. I'm quite certain there is a connection between those things and I'll be fine now.
Sorry for indicating it might be a problem with BOINC.
RE: Where exactly should
)
Probably easiest is to try the new 4.16 App. It will stop syncing automatically after 5 failures.
BM
BM
As I am still running App
)
As I am still running App version 4.14 on this computer, it was interesting to see that when I lost my ADSL connection last night I then lost 5 WU's in a row with Signal 11 errors.
This only happened on one host as no others lost WU's.
I think it is time to change up to a latter application.
Work units are
91403900
91435772
91600974
91601257
91607767