The reason for this segfault had been fixed in BOINC, App 1.05 was built with the new BOINC version. Looks like this wasn't the only problem that caused a signal 11, we still get these.
Computers with (intermittent) memory problems, or problems which only show up when the memory is filled to capacity, may make up part of what you see. I doubt you can build an application that takes everything into account. :-)
Computers with (intermittent) memory problems, or problems which only show up when the memory is filled to capacity, may make up part of what you see.
We always had 'signal 11' errors on Linux, that were supposedly caused by the 'optimistic memory allocation' of the OS. But that were less than 1% of all returned tasks. Currently we get ~7% 'signal 11' errors, about 10 times as many as all other errors combined. Currently I can't see that the 1.05 App behaves significantly better in that respect than the 1.04.
I can see how that's a problem. So what changed then? (Besides the applications and BOINC). Additional flags on the compiler? Something in the tasks that's interfering? Is it only on Linux? If so, only on a specific distro or all over the place? 32bit, 64bit (CPUs)? Is ABP2 also seeing some of this effect?
Ubuntu Linux on older PC's ( under 1ghz cpu speed ).
Could you be more specific? Which version of Ubuntu, what's your PC (CPU type, possibly clock speed, single CPU, Hyperthreading)?
Quote:
If the GCS5 wu completes at the same time that the Update Manager is running, or if I am doing system updates, - signal 11.
Maybe the wu api isn't waiting long enough for I/O to complete?
Well, what I can say is that the error happens in kernel mode (or else we would get a stack dump by the signal handler of the App).
Could you install gdb and try the EAH_GDB_DEBUG file?
I'm pretty sure this is something in the BOINC library. The system-specific part of the App code is the same as we used for the HierarchicalSearch (S5R2-S5R6).
Hit a sig 11 on a WU with
)
Hit a sig 11 on a WU with this host here is the details.
http://einsteinathome.org/task/177507852
RE: Hit a sig 11 on a WU
)
Excellent!! Thanks for sharing this info, I forwarded this to the devs and it seems there might be a smoking gun in the log output.
Thanks again
HB
The reason for this segfault
)
The reason for this segfault had been fixed in BOINC, App 1.05 was built with the new BOINC version. Looks like this wasn't the only problem that caused a signal 11, we still get these.
BM
BM
Computers with (intermittent)
)
Computers with (intermittent) memory problems, or problems which only show up when the memory is filled to capacity, may make up part of what you see. I doubt you can build an application that takes everything into account. :-)
RE: Computers with
)
We always had 'signal 11' errors on Linux, that were supposedly caused by the 'optimistic memory allocation' of the OS. But that were less than 1% of all returned tasks. Currently we get ~7% 'signal 11' errors, about 10 times as many as all other errors combined. Currently I can't see that the 1.05 App behaves significantly better in that respect than the 1.04.
BM
BM
I can see how that's a
)
I can see how that's a problem. So what changed then? (Besides the applications and BOINC). Additional flags on the compiler? Something in the tasks that's interfering? Is it only on Linux? If so, only on a specific distro or all over the place? 32bit, 64bit (CPUs)? Is ABP2 also seeing some of this effect?
Data point: Ubuntu Linux
)
Data point:
Ubuntu Linux on older PC's ( under 1ghz cpu speed ).
If the GCS5 wu completes at the same time that the Update Manager is running, or if I am doing system updates, - signal 11.
Maybe the wu api isn't waiting long enough for I/O to complete?
Claude
Thanks! RE: Ubuntu
)
Thanks!
Could you be more specific? Which version of Ubuntu, what's your PC (CPU type, possibly clock speed, single CPU, Hyperthreading)?
Well, what I can say is that the error happens in kernel mode (or else we would get a stack dump by the signal handler of the App).
Could you install gdb and try the EAH_GDB_DEBUG file?
I'm pretty sure this is something in the BOINC library. The system-specific part of the App code is the same as we used for the HierarchicalSearch (S5R2-S5R6).
BM
BM
Ubuntu V 9.10 Celeron
)
Ubuntu V 9.10
Celeron (Coppermine) 650 mhz, 3/4 gb. memory
Pentium III (Coppermine) 850 mhz, 3/4 gb memory
Pentium III (Coppermine) 850 mhz 3/4 gb. memory
I'll put the debug file in the BOINC directory(s) and let you know if I get anything.
Claude
Thanks. With 1.05 the
)
Thanks.
With 1.05 the Linux segfault rate went down from ~7% (1.04) to ~5%. Judging from this I'd say two more stackdumps and we're done!
BM
BM