ControlPath unix socket is created in a weird way

Bug #1593657 reported by Uqbar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Medium
Unassigned
openssh (Ubuntu)
Invalid
Medium
Unassigned

Bug Description

STEPS.

1. Create a ControlPath socket with something like "-o ControlPath=/tmp/sshcontrol-%C" during an SSH connection.
2. Check with ls the socket is there. I am getting

srw------- 1 enzo enzo 0 2016-06-17 10:17:21 sshcontrol-369d1187d3900c559f063f947218528001d0e705

3. Ask lsof who's that socket belonging to:

lsof /tmp/sshcontrol-369d1187d3900c559f063f947218528001d0e705

4. Ask lsof the same but with grep

lsof | grep /tmp/sshcontrol-369d1187d3900c559f063f947218528001d0e705

ACTUAL RESULTS.
For step no. 3 I am getting nothing (but the WARNING about /sys/kernel/debug/tracing)
For step no. 4 I am getting:

[enzo@Feynman ~] lsof | grep /tmp/sshcontrol-369d1187d3900c559f063f947218528001d0e705
lsof: WARNING: can't stat() tracefs file system /sys/kernel/debug/tracing
      Output information may be incomplete.
ssh 10667 enzo 4u unix 0x0000000000000000 0t0 173813 /tmp/sshcontrol-369d1187d3900c559f063f947218528001d0e705.NDScoG3mgs0mWc4q type=STREAM

Where you can see clearly that the socket name has a trailing part ".NDScoG3mgs0mWc4q".

EXTRA STEP.
5. Ask lsof about the socket with the extended name

EXTRA RESULT

[enzo@Feynman ~] lsof /tmp/sshcontrol-369d1187d3900c559f063f947218528001d0e705.NDScoG3mgs0mWc4q
lsof: WARNING: can't stat() tracefs file system /sys/kernel/debug/tracing
      Output information may be incomplete.
lsof: status error on /tmp/sshcontrol-369d1187d3900c559f063f947218528001d0e705.NDScoG3mgs0mWc4q: No such file or directory
lsof 4.89
 latest revision: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/
 latest FAQ: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/FAQ
 latest man page: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/lsof_man
 usage: [-?abhKlnNoOPRtUvVX] [+|-c c] [+|-d s] [+D D] [+|-E] [+|-e s] [+|-f[gG]]
 [-F [f]] [-g [s]] [-i [i]] [+|-L [l]] [+m [m]] [+|-M] [-o [o]] [-p s]
 [+|-r [t]] [-s [p:s]] [-S [t]] [-T [t]] [-u s] [+|-w] [-x [fl]] [--] [names]
Use the ``-h'' option to get more help information.

EXPECTED RESULT
A reasonable output from lsof detailing the real and actual unix socket.

SOURCES

man 5 ssh_config at the section "ControlPath"

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: openssh-client 1:7.2p2-4ubuntu1
Uname: Linux 4.4.13-040413-lowlatency x86_64
NonfreeKernelModules: rfcomm ipt_REJECT nf_reject_ipv4 pci_stub vboxpci vboxnetadp vboxnetflt vboxdrv bnep btusb btrtl btbcm btintel xt_multiport iptable_filter ip_tables x_tables 8812au ath3k cfg80211 bluetooth uas usb_storage input_leds hp_wmi sparse_keymap snd_hda_codec_realtek snd_hda_codec_generic intel_rapl x86_pkg_temp_thermal intel_powerclamp snd_hda_intel coretemp snd_hda_codec kvm_intel snd_hda_core kvm snd_hwdep snd_pcm irqbypass crct10dif_pclmul crc32_pclmul aesni_intel snd_seq_midi aes_x86_64 snd_seq_midi_event lrw snd_rawmidi gf128mul glue_helper ablk_helper cryptd snd_seq snd_seq_device serio_raw snd_timer mei_me snd mei soundcore shpchp lpc_ich 8250_fintek mac_hid parport_pc ppdev lp parport autofs4 xfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 multipath linear hid_generic usbhid hid raid0 i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt psmouse ahci fb_sys_fops libahci drm r8169 mii wmi fjes video
ApportVersion: 2.20.1-0ubuntu2.1
Architecture: amd64
CurrentDesktop: KDE
Date: Fri Jun 17 11:49:04 2016
InstallationDate: Installed on 2016-04-22 (55 days ago)
InstallationMedia: Kubuntu 16.04 LTS "Xenial Xerus" - Release amd64 (20160420.1)
RelatedPackageVersions:
 ssh-askpass N/A
 libpam-ssh N/A
 keychain N/A
 ssh-askpass-gnome N/A
SSHClientVersion: OpenSSH_7.2p2 Ubuntu-4ubuntu1, OpenSSL 1.0.2g-fips 1 Mar 2016
SourcePackage: openssh
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Uqbar (uqbar) wrote :
Revision history for this message
Seth Arnold (seth-arnold) wrote :

Interesting find. This appears to be a kernel issue, the extra chars are coming from the kernel's /proc/net/unix file.

0000000000000000: 00000002 00000000 00010000 0001 01 1362751 /home/sarnold/.ssh/sockets/hunt-localhost-22-sarnold.uNTuKprtBZbYW0k9

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1593657

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Seth Arnold (seth-arnold) wrote :

Reproduced on:

Linux wopr 4.4.0-22-generic #40-Ubuntu SMP Thu May 12 22:03:46 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Linux dean 3.2.0-1480-omap4 #106-Ubuntu SMP PREEMPT Thu Apr 21 22:23:42 UTC 2016 armv7l armv7l armv7l GNU/Linux

May also need "ControlMaster auto" to use ssh to reproduce this.

Changed in linux (Ubuntu):
status: Incomplete → Opinion
status: Opinion → Confirmed
Changed in openssh (Ubuntu):
status: New → Invalid
Revision history for this message
Uqbar (uqbar) wrote :

I am reporting here the debugging logs I've got with "-vvv" and also expunging all stuff not related to the bug:

OpenSSH_7.2p2 Ubuntu-4ubuntu1, OpenSSL 1.0.2g-fips 1 Mar 2016
debug1: auto-mux: Trying existing master
debug1: Control socket "/tmp/sshcontrol-f1b458a85cd2aa9d5c115abc12afee9f485ad6b0" does not exist
debug1: setting up multiplex master socket
debug3: muxserver_listen: temporary control path /tmp/sshcontrol-f1b458a85cd2aa9d5c115abc12afee9f485ad6b0.pC6BCXUsqE7aLbzh
debug2: fd 5 setting O_NONBLOCK
debug3: fd 5 is O_NONBLOCK
debug3: fd 5 is O_NONBLOCK
debug1: channel 1: new [/tmp/sshcontrol-f1b458a85cd2aa9d5c115abc12afee9f485ad6b0]
debug3: muxserver_listen: mux listener channel 1 fd 5
debug2: fd 5 setting O_NONBLOCK
debug1: forking to background
debug1: channel 2: new [mux-control]
debug3: channel_post_mux_listener: new mux channel 2 fd 6
debug3: mux_master_read_cb: channel 2: hello sent
debug3: mux_master_read_cb: channel 2 packet type 0x00000001 len 4
debug2: process_mux_master_hello: channel 2 slave version 4
debug2: mux_client_hello_exchange: master version 4
debug3: mux_client_forwards: request forwardings: 1 local, 0 remote
debug3: mux_master_read_cb: channel 2 packet type 0x10000006 len 39
debug2: process_mux_open_fwd: channel 2: request dynamic forward 127.72.1.2:8080 -> *
debug2: process_mux_open_fwd: found existing forwarding
debug3: mux_client_request_session: entering
debug3: mux_client_request_alive: entering

Please, note that the first "debug3" log line is quoting a temporary socket with expanded name.

Revision history for this message
Uqbar (uqbar) wrote :

I am not really sure this is not OpenSSH.
The logs above are quoting both socket names. And the name, afaik, is defined by the program (openssh-client) not by the kernel.
But I could be badly wrong as I have no time to inspect the software source code.

Revision history for this message
Uqbar (uqbar) wrote :

OK. I did just a bare minimum checks.
The file is called mux.c, the function is "void muxserver_listen(void)" at line 1241.
Two comments seem quite important:

        /*
         * Use a temporary path before listen so we can pseudo-atomically
         * establish the listening socket in its final location to avoid
         * other processes racing in between bind() and listen() and hitting
         * an unready socket.
         */

and

        /* Now atomically "move" the mux socket into position */

Unluckily I haven't enough time to go on. Maybe later...

Revision history for this message
Seth Arnold (seth-arnold) wrote :

I gave misc.c a quick skim before looking into the kernel:

int
unix_listener(const char *path, int backlog, int unlink_first)
{
        struct sockaddr_un sunaddr;
        int saved_errno, sock;

        memset(&sunaddr, 0, sizeof(sunaddr));
        sunaddr.sun_family = AF_UNIX;
        if (strlcpy(sunaddr.sun_path, path, sizeof(sunaddr.sun_path)) >= sizeof(sunaddr.sun_path)) {
                error("%s: \"%s\" too long for Unix domain socket", __func__,
                    path);
                errno = ENAMETOOLONG;
                return -1;
        }

        sock = socket(PF_UNIX, SOCK_STREAM, 0);
        ...
        if (bind(sock, (struct sockaddr *)&sunaddr, sizeof(sunaddr)) < 0) {

They memset 0 the struct, and pass the size of the struct. So I suspect OpenSSH is fine.

Thanks

Revision history for this message
Uqbar (uqbar) wrote :

I am not 100% sure it's not the OpenSSH client. These two debug lines make me think there's something more to check:

debug3: muxserver_listen: temporary control path /tmp/sshcontrol-f1b458a85cd2aa9d5c115abc12afee9f485ad6b0.OqWT4bhQP7Hgit3A
debug1: channel 1: new [/tmp/sshcontrol-f1b458a85cd2aa9d5c115abc12afee9f485ad6b0]

The former path matches what lsof reports (without arguments), the latter matches what the file system does.
At line 1262 the path appendix is created in a for loop.
The expanded path (options.control_path) is then passed to the unix_listener function.

That path extension is clearly created by OpenSSH client that way.
But maybe there's something going wrong with the subsequent move, as lsof is still reporting that expanded path!

Changed in openssh (Ubuntu):
status: Invalid → Opinion
Changed in linux (Ubuntu):
status: Confirmed → Opinion
Revision history for this message
Uqbar (uqbar) wrote :

I added a couple of extra debugging lines in mux.c, recompiled and ran it.

debug3: link( "/tmp/sshcontrol-f1b458a85cd2aa9d5c115abc12afee9f485ad6b0.XSLUj2BsBSDnTkhZ","/tmp/sshcontrol-f1b458a85cd2aa9d5c115abc12afee9f485ad6b0" ): 0
debug3: unlink( "/tmp/sshcontrol-f1b458a85cd2aa9d5c115abc12afee9f485ad6b0.XSLUj2BsBSDnTkhZ" ): 0

The ": 0" is the value of the int returned by the two calls. They are successful according to LINK(3POSIX) and UNLINK(3POSIX).

Seth Arnold is right. OpenSSH is doing the things in the right way.

But somehow the kernel (or could it be the libc?) is messing stuff around with link() and/or unlink().

Changed in linux (Ubuntu):
status: Opinion → Confirmed
Changed in openssh (Ubuntu):
status: Opinion → Invalid
Changed in openssh (Ubuntu):
importance: Undecided → Medium
Changed in linux (Ubuntu):
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.