usage of /tmp during boot is not safe due to systemd-tmpfiles-clean

Bug #1707222 reported by Scott Moser
28
This bug affects 4 people
Affects Status Importance Assigned to Milestone
cloud-init
Fix Released
High
Unassigned
cloud-init (Ubuntu)
Fix Released
High
Unassigned
systemd (Ubuntu)
Won't Fix
Undecided
Unassigned

Bug Description

Earlier this week on Zesty on Azure I saw a cloud-init failure in its 'mount_cb' function.
That function esentially does:
 a.) make a tmp directory for a mount point
 b.) mount some filesystem to that mount point
 c.) call a function
 d.) unmount the directory

What I recall was that access to a file inside the mount point failed during 'c'.
This seems possible as systemd-tmpfiles-clean may be running at the same time as cloud-init (cloud-init.service in this example).

It seems that this service basically inhibits *any* other service from using tmp files.
It's ordering statements are only:

  After=local-fs.target time-sync.target
  Before=shutdown.target

So while in most cases only services that run early in the boot process like cloud-init will be affected, any service could have its tmp files removed. this service could take quite a long time to run if /tmp/ had been filled with lots of files in the previous boot.

Related branches

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

systemd-tmpfiles-clean is racy, but only cleans things as per tmpfiles.d/ configs in /run /etc /usr/lib, for things that explicitely specify to clean themself older than some value.

For /tmp the affected paths are older than 10 days only:
d /tmp/.X11-unix 1777 root root 10d
d /tmp/.ICE-unix 1777 root root 10d
d /tmp/.XIM-unix 1777 root root 10d
d /tmp/.font-unix 1777 root root 10d
d /tmp/.Test-unix 1777 root root 10d

To figure out what actually happened, we need a reproducer or detailed logs, including journal, and contents of /run/tmpfiles.d /etc/tmpfiles.d /usr/lib/tmpfiles.d

I do not recommend using /tmp on security grounds, but I do recommend to set PrivateTmp=true in the systemd units to get a secure /tmp /var/tmp for your service.

Changed in systemd (Ubuntu):
status: New → Incomplete
Revision history for this message
Scott Moser (smoser) wrote :

So Something is definitely cleaning on boot.
Maybe I just misunderstood your statement, but above it seems like you were saying that only old files named like those would be removed.

Try this:
 $ lxc launch ubuntu-daily:artful a1
 $ lxc exec a1 -- touch /tmp/foo
 $ lxc restart a1
 $ lxc exec a1 -- ls -l /tmp/foo
 ls: cannot access '/tmp/foo': No such file or directory

Changed in systemd (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

/tmp is only guaranteed to be in a usable state after sysinit.target currently. Or more generally, after system is fully up only.

During boot (initramfs, post-initramfs, early boot, post-boot) system services should use /run, private runtime dir, private tmp dir.

This is not a bug in Ubuntu boot sequence. Please fix cloud-init to use /run.

Changed in systemd (Ubuntu):
status: Confirmed → Won't Fix
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in cloud-init (Ubuntu):
status: New → Confirmed
Revision history for this message
Scott Moser (smoser) wrote :

From another bug with the same root cause, a failure might look like this:
Apr 12 07:56:33 ubuntu [CLOUDINIT] util.py[DEBUG]: Running command ['mount', '-o', 'ro,sync', '-t', 'auto', '/dev/sr0', '/tmp/tmpzq70nqyi'] with allowed return codes [0] (shell=False, capture=True)
Apr 12 07:56:33 ubuntu [CLOUDINIT] util.py[DEBUG]: Failed mount of '/dev/sr0' as 'auto': Unexpected error while running command.#012Command: ['mount', '-o', 'ro,sync', '-t', 'auto', '/dev/sr0', '/tmp/tmpzq70nqyi']#012Exit code: 32#012Reason: -#012Stdout: ''#012Stderr: 'mount: mount point /tmp/tmpzq70nqyi does not exist\n'
Apr 12 07:56:33 ubuntu [CLOUDINIT] util.py[DEBUG]: Recursively deleting /tmp/tmpzq70nqyi

Changed in cloud-init:
status: New → Confirmed
importance: Undecided → Medium
Changed in cloud-init (Ubuntu):
importance: Undecided → High
Changed in cloud-init:
importance: Medium → High
Revision history for this message
Scott Moser (smoser) wrote :

Note there is some discussion of this bug in irc and other options at https://irclogs.ubuntu.com/2017/07/28/%23ubuntu-devel.html#t16:24

Revision history for this message
Scott Moser (smoser) wrote :

Some solutions:
 a.) Use systemd PrivateTmp which mounts a filesystem over /tmp the process. The issue here is that would only solve for systemd boot.
 b.) set up our own tmp and use it.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 0.7.9-267-g922c3c5c-0ubuntu1

---------------
cloud-init (0.7.9-267-g922c3c5c-0ubuntu1) artful; urgency=medium

  * New upstream snapshot.
    - Ec2: only attempt to operate at local mode on known platforms.
      (LP: #1715128)
    - Use /run/cloud-init for tempfile operations. (LP: #1707222)
    - ds-identify: Make OpenStack return maybe on arch other than intel.
      (LP: #1715241)
    - tests: mock missed openstack metadata uri network_data.json
      [Chad Smith] (LP: #1714376)
    - relocate tests/unittests/helpers.py to cloudinit/tests
      [Lars Kellogg-Stedman]
    - tox: add nose timer output [Joshua Powers]
    - upstart: do not package upstart jobs, drop ubuntu-init-switch module.
    - tests: Stop leaking calls through unmocked metadata addresses
      [Chad Smith] (LP: #1714117)

 -- Scott Moser <email address hidden> Thu, 07 Sep 2017 16:59:04 -0400

Changed in cloud-init (Ubuntu):
status: Confirmed → Fix Released
Scott Moser (smoser)
Changed in cloud-init:
status: Confirmed → Fix Committed
Revision history for this message
Scott Moser (smoser) wrote : Fixed in Cloud-init 17.1

This bug is believed to be fixed in cloud-init in 17.1. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in cloud-init:
status: Fix Committed → Fix Released
Revision history for this message
norman shen (jshen28) wrote :

2021-06-17 01:37:56,633 - util.py[WARNING]: Failed: growpart /dev/vda 2
2021-06-17 01:37:56,634 - util.py[DEBUG]: Failed: growpart /dev/vda 2
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 156, in resize
    subp.subp(["growpart", diskdev, partnum])
  File "/usr/lib/python3/dist-packages/cloudinit/subp.py", line 295, in subp
    cmd=args)
cloudinit.subp.ProcessExecutionError: Unexpected error while running command.
Command: ['growpart', '/dev/vda', '2']
Exit code: 2
Reason: -
Stdout: FAILED: pt_resize failed
Stderr: /usr/bin/growpart: 554: /usr/bin/growpart: cannot create /tmp/growpart.g8m8DI/pt_update.err: Directory nonexistent
        failed [pt_update:2] pt_update /dev/vda 2
        cat: /tmp/growpart.g8m8DI/pt_update.err: No such file or directory
2021-06-17 01:37:56,652 - util.py[DEBUG]: resize_devices took 4.224 seconds
2021-06-17 01:37:56,652 - cc_growpart.py[DEBUG]: '/' FAILED: failed to resize: disk=/dev/vda, ptnum=2: Unexpected error while running command.
Command: ['growpart', '/dev/vda', '2']
Exit code: 2
Reason: -
Stdout: FAILED: pt_resize failed
Stderr: /usr/bin/growpart: 554: /usr/bin/growpart: cannot create /tmp/growpart.g8m8DI/pt_update.err: Directory nonexistent
        failed [pt_update:2] pt_update /dev/vda 2
        cat: /tmp/growpart.g8m8DI/pt_update.err: No such file or directory

no sure related, but I still saw similar logs using cloud-init 20.4.1...

Revision history for this message
Scott Moser (smoser) wrote :

Norman,
Thanks for the comment. On first pass, it looks like you've diagnosed a failure correctly.
Please open another bug and add output of 'cloud-init collect-logs'.

thanks.

Revision history for this message
James Falcon (falcojr) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.