Comment 17 for bug 1807615

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote : Re: gce: network broken, no apt update/install with LXD container

Back to the behavior in #16. I think there is something wrong in how bootcmd is generated by Juju or processed by cloud-init.

1) LXD container user.user-data config: https://paste.ubuntu.com/p/cXn75hS8J9/
2) user-data generated out of it /var/lib/cloud/seed/nocloud-net/user-data https://paste.ubuntu.com/p/6B4KXJ6xvd/

see https://github.com/lxc/lxd/blob/master/doc/cloud-init.md

Notice that the command to write out /etc/apt/apt.conf.d/95-juju-proxy-settings is the last bootcmd in both (1) and (2).

3) if I disable and mask Juju services

sudo systemctl mask jujud-unit-openstack-dashboard-4.service
sudo systemctl disable jujud-unit-openstack-dashboard-4.service

sudo systemctl disable jujud-machine-3-lxd-3.service
sudo systemctl mask jujud-machine-3-lxd-3.service

And then do

rm /etc/apt/apt.conf.d/95-juju-proxy-settings
reboot

95-juju-proxy-settings does not appear which means that this bootcmd never runs and the file never gets created.

3) If I move the bootcmd to write out the proxy file (95-juju-proxy-settings) to the top of the list of bootcmds (https://paste.ubuntu.com/p/tHMyrNWQYK/
) and run `rm -r /var/lib/cloud/instances/juju-6d9aac-3-lxd-3/ ; reboot`, then I get 95-juju-proxy-settings at the right place and time (crtime reported via debugfs is before bootcmd stage ends).

In other words, there is something wrong either in the format of one of the bootcmds or the way cloud-init processes this list of commands.

The behavior in #16 is due to the fact that the bootcmd related to writing 95-juju-proxy-settings never runs while a machine agent has its own code (in proxyupdater goroutine) to write 95-juju-proxy-settings (which obviously happens after bootcmd finishes):

https://github.com/juju/juju/blob/juju-2.5.0/worker/proxyupdater/proxyupdater.go#L250-L261
 AptProxyConfigFile = AptConfigDirectory + "/95-juju-proxy-settings"

https://github.com/juju/packaging/blob/ba21344fff207f4fa06b0a08bac386b179a2e6a1/config/apt_constants.go#L37-L40
  content := paccmder.ProxyConfigContents(w.aptProxy) + "\n"
  err = ioutil.WriteFile(config.AptProxyConfigFile, []byte(content), 0644)

I also explored if bootcmd code could be doing work with child processes incorrectly but it does wait for a child process to exit by calling "communicate" and this means that all file descriptors will be closed by the time parent gets unblocked (http://man7.org/linux/man-pages/man2/exit.2.html "... _exit() does close open file descriptors ...":

https://github.com/cloud-init/cloud-init/blob/ubuntu/18.4-0ubuntu1_18.04.1/cloudinit/config/cc_bootcmd.py#L79-L102
def handle(name, cfg, cloud, log, _args):
# ...
            cmd = ['/bin/sh', tmpf.name]
            util.subp(cmd, env=env, capture=False)
# ...

https://github.com/cloud-init/cloud-init/blob/ubuntu/18.4-0ubuntu1_18.04.1/cloudinit/util.py#L1919-L2033
def subp(args, data=None, rcs=None, env=None, capture=True,
         combine_capture=False, shell=False,
         logstring=False, decode="replace", target=None, update_env=None,
         status_cb=None):
# ...
    stdin = None
    stdout = None
    stderr = None
    if capture:
        stdout = subprocess.PIPE
        stderr = subprocess.PIPE

        sp = subprocess.Popen(bytes_args, stdout=stdout,
                              stderr=stderr, stdin=stdin,
                              env=env, shell=shell)
        (out, err) = sp.communicate(data)