Linux version generates SIGPIPE

Mike Morgan
Mike Morgan
Joined: 11 Nov 04
Posts: 5
Credit: 243560
RAC: 0
Topic 187109

Is this a known issue on Linux? I receive the following when I run boinc:

2004-11-17 14:51:00 [---] May run out of work in 5.00 days; requesting more
2004-11-17 14:51:00 [Einstein@Home] Requesting 1549354 seconds of work
2004-11-17 14:51:00 [Einstein@Home] Sending request to scheduler: http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
2004-11-17 14:51:00 [Einstein@Home] Started upload of ft12_I2_f184.5_b0.1_sg05_0_0
SIGPIPE: write on a pipe with no reader
Exiting...

Boinc (4.13) receives the SIGPIPE and exits with error code 125.

I'm not sure if this is an issue with the Einstein@Home client or boinc, but the problem does not happen with the SETI@Home client.

Mike Morgan

Bruce Allen
Bruce Allen
Moderator
Joined: 15 Oct 04
Posts: 1119
Credit: 172127663
RAC: 0

Linux version generates SIGPIPE

We've just posted a new linux executable (details on top E@h page). Try deleting everything from your working directory apart from the boinc core client and starting over. Hopefully the new Linux executable will be less problematic on your system.

Director, Einstein@Home

Shaktai
Shaktai
Joined: 8 Nov 04
Posts: 183
Credit: 426451
RAC: 0

> We've just posted a new

Message 349 in response to message 348

> We've just posted a new linux executable (details on top E@h page). Try
> deleting everything from your working directory apart from the boinc core
> client and starting over. Hopefully the new Linux executable will be less
> problematic on your system.

I would be interested in hearing how the new Linux core compares to the Windows core, when somewhen has a chance.

Steffen Grunewald, for Merlin/Morgane
Steffen Grunewa...
Joined: 18 Oct 04
Posts: 39
Credit: 592286604
RAC: 0

Mike, can you tell us a

Mike,

can you tell us a bit more about your setup?
We have seen this here, but only under certain circumstances.
If it appeared, the only cure seemed to be a re-initialization
of your client (detach, remove subdirectories, reattach).

Are you using NFS for your working directory, if not, are you
using ext[23], xfs, reiser?

Since this is now an Einstein problem in the first place, we'd
like to feedback to the BOINC devel team...

Thanks, Steffen

logboyd
logboyd
Joined: 11 Nov 04
Posts: 6
Credit: 760012
RAC: 0

utilizing software downloaded

Message 351 in response to message 348

utilizing software downloaded one week ago the run_cpu_benchmark times out on my 2.6.8 system and assigns a default float and integer value approximately 1/2 the value determined when the same hardware runs under win xp...is this the type of problem resolved with the new executable? where is the new executable located? (version number?)

thanks

logboyd

Mike Morgan
Mike Morgan
Joined: 11 Nov 04
Posts: 5
Credit: 243560
RAC: 0

> Mike, > > can you tell us

Message 352 in response to message 350

> Mike,
>
> can you tell us a bit more about your setup?
> We have seen this here, but only under certain circumstances.
> If it appeared, the only cure seemed to be a re-initialization
> of your client (detach, remove subdirectories, reattach).
>
> Are you using NFS for your working directory, if not, are you
> using ext[23], xfs, reiser?
>
> Since this is now an Einstein problem in the first place, we'd
> like to feedback to the BOINC devel team...
>
> Thanks, Steffen
>

Steffen

All disk access is local to an ext3 partition (option data=writeback). I'm running kernel 2.4.9-e.25smp (Red Hat ES 2.1). The system has 2 Xeon CPUs so boinc thinks I have a total of 4 processors.

The problem did not go away after re-initialization (actually it did go away after re-initialization until the first set of results were complete, now its back).

I'll try the new Linux executable today.

Mike

Mike Morgan
Mike Morgan
Joined: 11 Nov 04
Posts: 5
Credit: 243560
RAC: 0

One note with the Linux

One note with the Linux version 4.36 (I can't verify if this did/did not happen with version 4.30); the process is making a lot of mmap/munmap/brk calls.

I guess that the clients check their resource utilization once per second (seti and einstein both make a getrusage() call once per second). However the einstein client seems to be doing a lot more work each check period.

During each check it is allocating anonymous memory segments (3) and then immediately releasing them. strace reports:

mmap2(NULL, 233472, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40a41000
mmap2(NULL, 233472, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40a7a000
mmap2(NULL, 233472, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40ab3000
brk(0x8189000) = 0x8189000
brk(0x81a6000) = 0x81a6000
brk(0x816c000) = 0x816c000
munmap(0x40a41000, 233472) = 0
munmap(0x40a7a000, 233472) = 0
munmap(0x40ab3000, 233472) = 0
brk(0x8189000) = 0x8189000
brk(0x81a6000) = 0x81a6000
brk(0x816c000) = 0x816c000

Would it be possible for the client to either keep the segments mapped or not to map them in the first place? It seems like unneccessary overhead.

Mike

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250362713
RAC: 35188

Thanks for reporting this. I

Thanks for reporting this. I think that's realted to a boinc bug we are currently hunting and fixing. There will be a new Linux binary on the server soon, please tell us if it shows the same problem.

BM

BM

Mike Morgan
Mike Morgan
Joined: 11 Nov 04
Posts: 5
Credit: 243560
RAC: 0

Using version 4.43, I'm still

Using version 4.43, I'm still getting the SIGPIPE error. Was any issue regarding this fixed in 4.46? Here's the log and strace output:

2004-11-29 11:05:18 [Einstein@Home] Deferring computation for result ft14_I1_f225.5_b0.1_sg04_1
2004-11-29 11:05:18 [---] GUI RPC bind failed: -1
2004-11-29 11:05:18 [---] GUI RPC bind failed: -1
2004-11-29 11:05:18 [Einstein@Home] Started upload of ft14_I1_f224.5_b0.1_sg00_1_0
SIGPIPE: write on a pipe with no reader
Exiting...

strace:
[pid 14642] --- SIGALRM (Alarm clock) @ 0 (0) ---
[pid 14642] gettimeofday({1101744350, 445701}, NULL) = 0
[pid 14642] write(2, "No heartbeat from core client - "..., 40) = 40
[pid 14642] munmap(0xb75d9000, 4096) = 0
[pid 14642] exit_group(0) = ?
Process 14642 detached
{35, 0}) = 0
open("boinc_lockfile", O_WRONLY|O_CREAT, 0644) = 5
flock(5, LOCK_EX|LOCK_NB) = -1 EAGAIN (Resource temporarily unavailable)
write(2, "Can't acquire lockfile - exiting"..., 33) = 33
munmap(0xb75d9000, 4096) = 0
exit_group(0) = ?
Process 14640 detached

Mike Morgan
Mike Morgan
Joined: 11 Nov 04
Posts: 5
Credit: 243560
RAC: 0

It looks like the application

It looks like the application is not checking for closed connections:

[pid 14616] connect(4, {sa_family=AF_INET, sin_port=htons(3128), sin_addr=inet_addr("198.200.138.207")}, 16) = -1 EINPROGRESS (Operation now in progress)
[pid 14616] select(1024, [], [4], [4], {0, 0}) = 1 (out [4], left {0, 0})
[pid 14616] getsockopt(4, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
[pid 14616] send(4, "POST http://einstein.phys.uwm.ed"..., 236, 0) = 236
[pid 14616] send(4, "\\n [pid 14616] select(1024, [], [4], [4], {0, 0}) = 1 (out [4], left {0, 0})
[pid 14616] write(4, "224.500197916667 3.14159265 -1.5"..., 16384) = 11584
[pid 14616] select(1024, [], [4], [4], {0, 0}) = 1 (out [4], left {0, 0})
[pid 14616] write(4, " 27.68174n224.542399305556 3.7"..., 4800) = -1 EAGAIN (Resource temporarily unavailable)
[pid 14616] select(1024, [], [4], [4], {0, 0}) = 0 (Timeout)
[pid 14616] select(1024, [], [4], [4], {0, 0}) = 0 (Timeout)
[pid 14616] select(1024, [], [4], [4], {1, 0}) = 1 (out [4], left {0, 980000})
[pid 14616] write(4, " 27.68174n224.542399305556 3.7"..., 4800) = -1 EPIPE (Broken pipe)
[pid 14616] --- SIGPIPE (Broken pipe) @ 0 (0) ---
[pid 14616] write(2, "SIGPIPE: write on a pipe with no"..., 39) = 39
[pid 14616] write(2, "\\nExiting...\\n", 12) = 12
[pid 14616] exit_group(-125) = ?

It does not look at the input file descriptor set when calling select() to determine if the server (or proxy) has preemtively closed connection.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.