ssh-agent terminates

Bug #1446448 reported by Andrej Ricnik
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
openssh (Ubuntu)
Confirmed
Low
Unassigned

Bug Description

I have set-up password less ssh connectivity for my workstation. ssh-agent gets started via keychain when fluxbox boots.

When I use ssh to connect to a server, and then use scp to shuffle a file from that server to my workstation the ssh-agent on my workstation dies after the file was transferred.

This has started happening after the upgrade from 12.04 to 14.04.

I have wrapped strace into the fluxbox startup to keep an eye on ssh-agent (I initially didn't see a pattern to the sudden death), here's an example of the breakdown post transfer.

select(4, [3], [], NULL, NULL) = 1 (in [3])
accept(3, {sa_family=AF_LOCAL, NULL}, [2]) = 4
getsockopt(4, SOL_SOCKET, SO_PEERCRED, {pid=21130, uid=1000, gid=1000}, [12]) = 0
getuid() = 1000
fcntl(4, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
select(5, [3 4], [], NULL, NULL) = 1 (in [4])
read(4, "\0\0\0\1\v", 1024) = 5
select(5, [3 4], [4], NULL, NULL) = 1 (out [4])
write(4, "\0\0\1<\f\0\0\0\1\0\0\1\27\0\0\0\7ssh-rsa\0\0\0\3\1\0\1\0"..., 320) = 320
select(5, [3 4], [], NULL, NULL) = 1 (in [4])
read(4, "", 1024) = 0
close(4) = 0
select(4, [3], [], NULL, NULL) = 1 (in [3])
accept(3, {sa_family=AF_LOCAL, NULL}, [2]) = 4
getsockopt(4, SOL_SOCKET, SO_PEERCRED, {pid=21284, uid=1000, gid=1000}, [12]) = 0
getuid() = 1000
fcntl(4, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
select(5, [3 4], [], NULL, NULL) = 1 (in [4])
read(4, "", 1024) = 0
close(4) = 0
select(4, [3], [], NULL, NULL) = ? ERESTARTNOHAND (To be restarted if no handler)
--- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=21284, si_uid=0} ---
unlink("/tmp/ssh-DEwJJTGLUETC/agent.19556") = 0
rmdir("/tmp/ssh-DEwJJTGLUETC") = 0
close(-1) = -1 EBADF (Bad file descriptor)
exit_group(2) = ?
+++ exited with 2 +++

1) lsb_release -rd
Description: Ubuntu 14.04.2 LTS
Release: 14.04

2) apt-cache policy openssh-client
openssh-client:
  Installed: 1:6.6p1-2ubuntu2
  Candidate: 1:6.6p1-2ubuntu2
  Version table:
 *** 1:6.6p1-2ubuntu2 0
        500 http://nz.archive.ubuntu.com/ubuntu/ trusty-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     1:6.6p1-2ubuntu1 0
        500 http://nz.archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages

3) I expect ssh-agent to carry on running as it did in previous releases.

4) It dies when connecting back from a machine I ssh into to my workstation.

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: openssh-client 1:6.6p1-2ubuntu2
ProcVersionSignature: Ubuntu 3.13.0-49.81-generic 3.13.11-ckt17
Uname: Linux 3.13.0-49-generic x86_64
NonfreeKernelModules: wl
ApportVersion: 2.14.1-0ubuntu3.10
Architecture: amd64
Date: Tue Apr 21 15:28:02 2015
InstallationDate: Installed on 2015-02-03 (77 days ago)
InstallationMedia: Ubuntu 14.04 LTS "Trusty Tahr" - Release amd64 (20140417)
RelatedPackageVersions:
 ssh-askpass N/A
 libpam-ssh 2.0-1
 keychain 2.7.1-1
 ssh-askpass-gnome 1:6.6p1-2ubuntu2
SSHClientVersion: OpenSSH_6.6.1p1 Ubuntu-2ubuntu2, OpenSSL 1.0.1f 6 Jan 2014
SourcePackage: openssh
UpgradeStatus: No upgrade log present (probably fresh install)
upstart.ssh-agent.log: ssh-agent stop/pre-start, process 1973

Revision history for this message
Andrej Ricnik (a-ricnik) wrote :
Revision history for this message
Robie Basak (racb) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better.

Your strace suggests that the process is being killed by a SIGTERM from pid 21284 which is running as root. Can you reproduce again and try to determine which process your SIGTERM is coming from?

 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=21284, si_uid=0} ---

It sounds like perhaps your session management is creating a new agent process and killing the old one?

Changed in openssh (Ubuntu):
status: New → Incomplete
Revision history for this message
Andrej Ricnik (a-ricnik) wrote : Re: [Bug 1446448] Re: ssh-agent terminates
Download full text (5.2 KiB)

Hi Robie,

The pid I posted didn't exist on the system anymore when you asked.

I took a "snapshot" of current pids, ran the scp from remote again, this
time the pid in question was 31397, and that, too, appears to be transient.
After ssh-agent died that pid didn't exist anymore. Any cunning plan on how
to track this down?

Cheers,
Andrej

On 21 April 2015 at 22:21, Robie Basak <email address hidden> wrote:

> Thank you for taking the time to report this bug and helping to make
> Ubuntu better.
>
> Your strace suggests that the process is being killed by a SIGTERM from
> pid 21284 which is running as root. Can you reproduce again and try to
> determine which process your SIGTERM is coming from?
>
> --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=21284, si_uid=0}
> ---
>
> It sounds like perhaps your session management is creating a new agent
> process and killing the old one?
>
> ** Changed in: openssh (Ubuntu)
> Status: New => Incomplete
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1446448
>
> Title:
> ssh-agent terminates
>
> Status in openssh package in Ubuntu:
> Incomplete
>
> Bug description:
> I have set-up password less ssh connectivity for my workstation. ssh-
> agent gets started via keychain when fluxbox boots.
>
> When I use ssh to connect to a server, and then use scp to shuffle a
> file from that server to my workstation the ssh-agent on my
> workstation dies after the file was transferred.
>
> This has started happening after the upgrade from 12.04 to 14.04.
>
> I have wrapped strace into the fluxbox startup to keep an eye on ssh-
> agent (I initially didn't see a pattern to the sudden death), here's
> an example of the breakdown post transfer.
>
> select(4, [3], [], NULL, NULL) = 1 (in [3])
> accept(3, {sa_family=AF_LOCAL, NULL}, [2]) = 4
> getsockopt(4, SOL_SOCKET, SO_PEERCRED, {pid=21130, uid=1000, gid=1000},
> [12]) = 0
> getuid() = 1000
> fcntl(4, F_GETFL) = 0x2 (flags O_RDWR)
> fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
> select(5, [3 4], [], NULL, NULL) = 1 (in [4])
> read(4, "\0\0\0\1\v", 1024) = 5
> select(5, [3 4], [4], NULL, NULL) = 1 (out [4])
> write(4, "\0\0\1<\f\0\0\0\1\0\0\1\27\0\0\0\7ssh-rsa\0\0\0\3\1\0\1\0"...,
> 320) = 320
> select(5, [3 4], [], NULL, NULL) = 1 (in [4])
> read(4, "", 1024) = 0
> close(4) = 0
> select(4, [3], [], NULL, NULL) = 1 (in [3])
> accept(3, {sa_family=AF_LOCAL, NULL}, [2]) = 4
> getsockopt(4, SOL_SOCKET, SO_PEERCRED, {pid=21284, uid=1000, gid=1000},
> [12]) = 0
> getuid() = 1000
> fcntl(4, F_GETFL) = 0x2 (flags O_RDWR)
> fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
> select(5, [3 4], [], NULL, NULL) = 1 (in [4])
> read(4, "", 1024) = 0
> close(4) = 0
> select(4, [3], [], NULL, NULL) = ? ERESTARTNOHAND (To be
> restarted if no handler)
> --- ...

Read more...

Revision history for this message
Robie Basak (racb) wrote :

Try looking into "process accounting".

Revision history for this message
Seth Arnold (seth-arnold) wrote :

Brendan Gregg has an awesome execsnoop tool that can report systemwide execs in his perf-tools package, the whole thing is a goldmine of amazing tools:

http://www.brendangregg.com/blog/2014-07-28/execsnoop-for-linux.html

Probably this is easier than the process accounting.

Thanks

Revision history for this message
Andrej Ricnik (a-ricnik) wrote :
Download full text (4.6 KiB)

Aight, thanks Seth for the execsnoop link, excellent stuff to have in your tool-box. I just re-ran my test-case, and this is the new strace output, with the correlating execsnoop snippet.

select(5, [3 4], [], NULL, NULL) = 1 (in [4])
read(4, "\0\0\0\1\v", 1024) = 5
select(5, [3 4], [4], NULL, NULL) = 1 (out [4])
write(4, "\0\0\1<\f\0\0\0\1\0\0\1\27\0\0\0\7ssh-rsa\0\0\0\3\1\0\1\0"..., 320) = 320
select(5, [3 4], [], NULL, NULL) = 1 (in [4])
read(4, "\0\0\2\231\r\0\0\1\27\0\0\0\7ssh-rsa\0\0\0\3\1\0\1\0\0\1\1\0"..., 1024) = 669
select(5, [3 4], [4], NULL, NULL) = 1 (out [4])
write(4, "\0\0\1\24\16\0\0\1\17\0\0\0\7ssh-rsa\0\0\1\0z4DFM\312_\377"..., 280) = 280
select(5, [3 4], [], NULL, NULL) = 1 (in [4])
read(4, "", 1024) = 0
close(4) = 0
select(4, [3], [], NULL, NULL) = 1 (in [3])
accept(3, {sa_family=AF_LOCAL, NULL}, [2]) = 4
getsockopt(4, SOL_SOCKET, SO_PEERCRED, {pid=24507, uid=1000, gid=1000}, [12]) = 0
getuid() = 1000
fcntl(4, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
select(5, [3 4], [], NULL, NULL) = 1 (in [4])
read(4, "\0\0\0\1\v", 1024) = 5
select(5, [3 4], [4], NULL, NULL) = 1 (out [4])
write(4, "\0\0\1<\f\0\0\0\1\0\0\1\27\0\0\0\7ssh-rsa\0\0\0\3\1\0\1\0"..., 320) = 320
select(5, [3 4], [], NULL, NULL) = 1 (in [4])
read(4, "", 1024) = 0
close(4) = 0
select(4, [3], [], NULL, NULL) = 1 (in [3])
accept(3, {sa_family=AF_LOCAL, NULL}, [2]) = 4
getsockopt(4, SOL_SOCKET, SO_PEERCRED, {pid=24900, uid=1000, gid=1000}, [12]) = 0
getuid() = 1000
fcntl(4, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
select(5, [3 4], [], NULL, NULL) = 1 (in [4])
read(4, "", 1024) = 0
close(4) = 0
select(4, [3], [], NULL, NULL) = ? ERESTARTNOHAND (To be restarted if no handler)
--- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=24900, si_uid=0} ---
unlink("/tmp/ssh-vj0FsYTMJ3y0/agent.23119") = 0
rmdir("/tmp/ssh-vj0FsYTMJ3y0") = 0
close(-1) = -1 EBADF (Bad file descriptor)
exit_group(2) = ?
+++ exited with 2 +++

./execsnoop
Tracing exec()s. Ctrl-C to end.
Instrumenting sys_execve
   PID PPID ARGS
 24896 24894 cat -v trace_pipe
 24895 24891 gawk -v o=1 -v opt_name=0 -v name= -v opt_duration=0 [...]
 24897 24515 ssh playpen
 24898 24897 sh [?]
 24899 24898 /usr/bin/xauth list :0.0
 24900 1326 /usr/sbin/sshd -D -R
 24903 24902 /sbin/pam-tmpdir-helper
 24904 24902 /bin/sh -c command -v debian-sa1 > /dev/null && debian-sa1 1 1
 24905 24904 debian-sa1 1 1
 24907 24900 /sbin/pam-tmpdir-helper
 24908 1770 /usr/lib/ConsoleKit/ck-collect-session-info --uid 0 --pid 24900
 24909 1770 /usr/lib/ConsoleKit/run-session.d/pam-foreground-compat.ck
 24910 24900 sh [?]
 24911 24910 /usr/bin/env -i PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin run-...

Read more...

Revision history for this message
Andrej Ricnik (a-ricnik) wrote :

Still incomplete?

Revision history for this message
Andrej Ricnik (a-ricnik) wrote :

knock knock

Revision history for this message
Andrej Ricnik (a-ricnik) wrote :

Can you please let me know in which way this is still incomplete?

Revision history for this message
Seth Arnold (seth-arnold) wrote :

Andrej, sorry, normally the person who supplies the additional information sets the status back to 'new' or 'confirmed' as needed. We don't say that nearly often enough. sorry.

Changed in openssh (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Andrej Ricnik (a-ricnik) wrote :
Download full text (4.6 KiB)

Ooops ... sorry ... thanks for the update!

On 12 May 2015 at 08:44, Seth Arnold <email address hidden> wrote:

> Andrej, sorry, normally the person who supplies the additional
> information sets the status back to 'new' or 'confirmed' as needed. We
> don't say that nearly often enough. sorry.
>
> ** Changed in: openssh (Ubuntu)
> Status: Incomplete => Confirmed
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1446448
>
> Title:
> ssh-agent terminates
>
> Status in openssh package in Ubuntu:
> Confirmed
>
> Bug description:
> I have set-up password less ssh connectivity for my workstation. ssh-
> agent gets started via keychain when fluxbox boots.
>
> When I use ssh to connect to a server, and then use scp to shuffle a
> file from that server to my workstation the ssh-agent on my
> workstation dies after the file was transferred.
>
> This has started happening after the upgrade from 12.04 to 14.04.
>
> I have wrapped strace into the fluxbox startup to keep an eye on ssh-
> agent (I initially didn't see a pattern to the sudden death), here's
> an example of the breakdown post transfer.
>
> select(4, [3], [], NULL, NULL) = 1 (in [3])
> accept(3, {sa_family=AF_LOCAL, NULL}, [2]) = 4
> getsockopt(4, SOL_SOCKET, SO_PEERCRED, {pid=21130, uid=1000, gid=1000},
> [12]) = 0
> getuid() = 1000
> fcntl(4, F_GETFL) = 0x2 (flags O_RDWR)
> fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
> select(5, [3 4], [], NULL, NULL) = 1 (in [4])
> read(4, "\0\0\0\1\v", 1024) = 5
> select(5, [3 4], [4], NULL, NULL) = 1 (out [4])
> write(4, "\0\0\1<\f\0\0\0\1\0\0\1\27\0\0\0\7ssh-rsa\0\0\0\3\1\0\1\0"...,
> 320) = 320
> select(5, [3 4], [], NULL, NULL) = 1 (in [4])
> read(4, "", 1024) = 0
> close(4) = 0
> select(4, [3], [], NULL, NULL) = 1 (in [3])
> accept(3, {sa_family=AF_LOCAL, NULL}, [2]) = 4
> getsockopt(4, SOL_SOCKET, SO_PEERCRED, {pid=21284, uid=1000, gid=1000},
> [12]) = 0
> getuid() = 1000
> fcntl(4, F_GETFL) = 0x2 (flags O_RDWR)
> fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
> select(5, [3 4], [], NULL, NULL) = 1 (in [4])
> read(4, "", 1024) = 0
> close(4) = 0
> select(4, [3], [], NULL, NULL) = ? ERESTARTNOHAND (To be
> restarted if no handler)
> --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=21284, si_uid=0}
> ---
> unlink("/tmp/ssh-DEwJJTGLUETC/agent.19556") = 0
> rmdir("/tmp/ssh-DEwJJTGLUETC") = 0
> close(-1) = -1 EBADF (Bad file descriptor)
> exit_group(2) = ?
> +++ exited with 2 +++
>
>
> 1) lsb_release -rd
> Description: Ubuntu 14.04.2 LTS
> Release: 14.04
>
>
> 2) apt-cache policy openssh-client
> openssh-client:
> Installed: 1:6.6p1-2ubuntu2
> Candidate: 1:6.6p1-2ubuntu2
> Version table:
> *** 1:6.6p1-2ubuntu2 0
>...

Read more...

Revision history for this message
Andrej Ricnik (a-ricnik) wrote :

Over a year gone, and the problem still exists?

Am I the only one who occasinally logs into a machine running ssh-agent?
Are others happy to log out of the running X session and log back in every
time the ssh'ed to a machine?
No one really gives a rats arse?

Revision history for this message
Seth Arnold (seth-arnold) wrote :

Sorry Andrej; you may be the only one experiencing this issue, and it sounds extremely frustrating. But I don't know what to suggest next to debug why it's happening.

About all I can suggest is that I gave up on the keychain script myself ~five years ago, but I can't recall why. You may reconsider if it's helpful or not in your environment.

Thanks

Revision history for this message
Robie Basak (racb) wrote :

Let's set Importance to Low to make it clear that this won't be worked on any time soon. Sorry Andrej - with only one person marked as affected we can't prioritise this.

Changed in openssh (Ubuntu):
importance: Undecided → Low
Revision history for this message
Andrej Ricnik (a-ricnik) wrote :

Thanks Seth, I doubt that keychain is (even in part) responsible for this behaviour; I'll try to confirm Tuesday - after the long weekend - when I'm back at work.

Thanks for responding Robie.

I apologise - was on a rampage because the other bug that I'm sitting on (not as the original reporter, but as an impatient watcher) has been open since 2012 ... and it bites many people, anyone who has a) resonably frequent changes to network interfaces and b) runs an iSCSI connected SAN.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.