apache2 fails to start if installed via cloud config (on Xenial)

Bug #1575572 reported by Dan Watkins on 2016-04-27
62
This bug affects 10 people
Affects Status Importance Assigned to Milestone
init-system-helpers (Debian)
Fix Released
Unknown
init-system-helpers (Ubuntu)
High
Martin Pitt
Xenial
High
Martin Pitt
systemd (Debian)
Fix Released
Unknown

Bug Description

SRU TEST CASE:

Using the following cloud config, apache2 will fail to start on installation on Xenial:

#cloud-config
packages:
- apache2

See for example:

$ gcloud compute instances create xenial-$(date +%y%m%d-%H%M) --image ubuntu-1604-xenial-v20160420c --image-project ubuntu-os-cloud --metadata-from-file user-data=cloud-config
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
xenial-160427-1050 europe-west1-d n1-standard-1 10.240.0.7 104.155.86.94 RUNNING

$ ssh ubuntu@104.155.86.94 systemctl status apache2.service
● apache2.service - LSB: Apache2 web server
   Loaded: loaded (/etc/init.d/apache2; bad; vendor preset: enabled)
  Drop-In: /lib/systemd/system/apache2.service.d
           └─apache2-systemd.conf
   Active: inactive (dead)
     Docs: man:systemd-sysv-generator(8)

With the fixed init-system-helpers, apache2.service (or any other service you install via "packages:") should start correctly.

Related Bugs:
 * bug 1576692: [cloud-init] fully support package installation in systemd

This issue was originally reported by a customer, who investigated it as follows:

#####

"If you run up a 16.04 cloud image using EC2 compatible user-data containing a simple shell script.

#!/bin/sh

sudo DEBIAN_FRONTEND=noninteractive apt-get -q -y update
sudo DEBIAN_FRONTEND=noninteractive apt-get install -q -y apache2
sudo nc -k -l -d 443&

then the apache2 daemon will rarely start - if ever.

(You get the same problem running the equivalent 'cloud-config' script).

This is because 'cloud-init' runs the script when the run level is 'unknown', and the 'invoke-rc.d' script has no case for handling 'unknown' run levels - so it defaults to not starting anything.

I've seen this a few times with packages that have old style SysV init scripts and that start a daemon automatically in the 'postinst' script.

cloud-init should defer running scripts and cloud-config installs until systemd has achieved a defined run level - or the SysV backward compatibility needs improving so that it can deal with systemd being between run levels."

###

I could consistently reproduce across 20+ tests using the user data supplied by the customer; either:

#cloud-config
packages:
- apache2
runcmd:
- "nc -k -l -d 443&"

or:

#!/bin/sh

output_runlevel() {
echo -n "Current runlevel is "
sudo /sbin/runlevel
}
output_runlevel
sudo DEBIAN_FRONTEND=noninteractive apt-get -q -y update
#sudo sed -ie 's/set +e$/& -x/' /usr/sbin/invoke-rc.d
output_runlevel
sudo DEBIAN_FRONTEND=noninteractive apt-get install -q -y apache2
output_runlevel
sudo nc -k -l -d 443&

Apache 2 is correctly installed, but never started (nc is started as expected):

$ sudo systemctl status apache2; sudo netstat -A inet -lnp | grep 443
● apache2.service - LSB: Apache2 web server
Loaded: loaded (/etc/init.d/apache2; bad; vendor preset: enabled)
Drop-In: /lib/systemd/system/apache2.service.d
└─apache2-systemd.conf
Active: inactive (dead)
Docs: man:systemd-sysv-generator(8)
tcp 0 0 0.0.0.0:443 0.0.0.0:* LISTEN 2657/nc

Runlevel output is as per the customer's description (unknown). Edited excerpt from cloud-init-output.log on a VM that ran the script:

[snip]
Reading package lists...
Current runlevel is unknown # <-- output of output_runlevel(), as per above
Reading package lists...
Building dependency tree...
Reading state information...
The following additional packages will be installed:
  apache2-bin apache2-data apache2-utils libapr1 libaprutil1
[snip]

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in cloud-init (Ubuntu):
status: New → Confirmed
Scott Moser (smoser) wrote :

Ok, here is what is happening:
 a.) dpkg (or apt) install of apache2 (or just about any service) starts services by calling to /usr/sbin/invoke-rc.d (like 'invoke-rc.d apache2 start').
 b.) invoke-rc.d calls 'runlevel' to get the current runlevel and see if this service should be started.
 c.) runlevel during systemd boot writes 'unknown' (single token rather than previous and current runlevel) and exits 1
       the test for failure of RUNLEVELHELPER is bogus in at least 2 ways ('test ! $?' will return 0 for 0 or 1, and additionally it actually only checks the return code of 'sed' which is going to be 0 anyway as sed did its job).
 d.) output of runlevel is stored in RL and then it goes looking for an SLINK or SSLINK (/etc/rc$RL.d/S??apache2 or /etc/rcS.d/S??apache2). Neither of these exist so it exits without starting the service.

The problem seems to be that any package install during systemd boot of a package that only provides sysvinit scripts will fail.
Reading invoke-rc.d, i'm not really sure why a proper systemd service woudlnt fail also.

Changed in cloud-init (Ubuntu):
importance: Undecided → Medium
Changed in init-system-helpers (Ubuntu):
importance: Undecided → High
status: New → Confirmed
Steve Langasek (vorlon) wrote :

> The problem seems to be that any package install during systemd boot of
> a package that only provides sysvinit scripts will fail. Reading invoke-rc.d,
> i'm not really sure why a proper systemd service woudlnt fail also.

The RUNLEVELHELPER check is buggy, yes. However, the behavior of invoke-rc.d is by design; if the runlevel can't be determined (which according to systemd it still would not be, even if the bug in invoke-rc.d was fixed), invoke-rc.d should default to doing nothing because there's no system policy to say whether the service should be started.

I think there's a case to be made that this is a bug in systemd, for returning 'unknown' for the runlevel in these cases. The historical behavior of sysvinit is that everything triggered from /etc/init.d/rc inherits the value of the current runlevel in the RUNLEVEL environment variable; and running e.g. 'RUNLEVEL=2 runlevel' - including on systemd - will report this env value as the target runlevel.

$ runlevel
N 5
$ PREVLEVEL=5 RUNLEVEL=2 runlevel
5 2
$

Unless cloud-init is being called so early in boot that the runlevel target is not known - which seems very unlikely - then I think the correct thing for systemd to do is to honor this historical behavior by making the 'runlevel' command return the target runlevel when called from a systemd unit that's run at boot, even if that target has not yet been reached. Opening a task on systemd for this.

Note, BTW, that the above also suggests a workaround for cloud-init to employ. If for whatever reason systemd will not set itself up to return the proper runlevel value, cloud-init can simply set RUNLEVEL=2 in the environment prior to calling dpkg, and invoke-rc.d will pick this up and DTRT with it.

Changed in systemd (Ubuntu):
assignee: nobody → Martin Pitt (pitti)
importance: Undecided → High
status: New → Triaged
Changed in init-system-helpers (Ubuntu):
importance: High → Low
Martin Pitt (pitti) wrote :

Indeed this isn't related to native systemd units vs. sysvinit scripts.

> honor this historical behavior by making the 'runlevel' command return the target runlevel when called from a systemd unit that's run at boot, even if that target has not yet been reached.

OK, I wasn't aware of that. I'll look into that.

> d.) output of runlevel is stored in RL and then it goes looking for an SLINK or SSLINK (/etc/rc$RL.d/S??apache2 or /etc/rcS.d/S??apache2). Neither of these exist so it exits without starting the service.

To be sure we are talking about the same thing: Surely update-rc.d should run before invoke-rc.d in the postinst, so the links should exist in rc[2345].d/. This is solely about $RL having the wrong value here, not about the links not existing at all, right?

Martin Pitt (pitti) wrote :

I tested the output of "runlevel" and the env under Debian sid with sysvinit-core:

 - In an rcS script: runlevel says "unknown" (env has RUNLEVEL=S and PREVLEVEL=N)
 - In an rc[2345] script: runlevel says "N 2" (env has RUNLEVEL=2 and PREVLEVEL=N)

With systemd, runlevel says "unknown" until finishing the boot, and indeed systemd-update-utmp only ran once. That's the bug we need to fix here.

Martin Pitt (pitti) wrote :

I had a closer look how systemd emulates the old runlevels. systemd-update-utmp-runlevel.service runs after graphical.target, multi-user.target etc. started (After=runlevel1.target runlevel2.target runlevel3.target runlevel4.target runlevel5.target, those are activated in multi-user.target etc.). src/update-utmp/update-utmp.c determines which runlevel to report in get_current_runlevel() using this table:

        static const struct {
                const int runlevel;
                const char *special;
        } table[] = {
                /* The first target of this list that is active or has
                 * a job scheduled wins. We prefer runlevels 5 and 3
                 * here over the others, since these are the main
                 * runlevels used on Fedora. It might make sense to
                 * change the order on some distributions. */
                { '5', SPECIAL_GRAPHICAL_TARGET },
                { '3', SPECIAL_MULTI_USER_TARGET },
                { '1', SPECIAL_RESCUE_TARGET },

But targets only have states "dead" and "active", unlike units there is no "activating". So we cannot use this approach to see if multi-user.target *will* be active (or not) at some point in the future if we just made it past basic.target. So fixing this in systemd itself will require larger architectural changes which we presumably don't want in an SRU (and maybe also don't want at all, as sysvinit and its runlevel concept haven't mapped well to parallel/dynamic init systems such as upstart or systemd).

Therefore I'd rather fix this in invoke-rc.d -- use runlevel only for SysVinit, and drop the whole runlevel and /etc/rc?.d/ parsing stuff under systemctl and just use "systemctl is-enabled" to determine whether to start a service or not. That avoids the guesswork on both sides, is structurally much simpler, and more robust.

Changed in systemd (Ubuntu):
assignee: Martin Pitt (pitti) → nobody
status: Triaged → Won't Fix
Changed in init-system-helpers (Ubuntu):
assignee: nobody → Martin Pitt (pitti)
importance: Low → High
status: Confirmed → In Progress
Martin Pitt (pitti) wrote :

For the record, even after fixing this, installing packages from current cloud-init will still not be 100% reliable in its current form.

The design problem here is that cloud-init runs *in* the boot sequence, i. e. in the transaction where default.target and its dependencies get started. You can't place/start new services into the boot dependency tree (i. e. dependencies of default.target) while default.target is still being started. Any new service which wants to start needs to wait until after the initial boot happens, and will then start afterwards.

invoke-rc.d has some code to deal with that situation, to avoid deadlocks: You can't run the blocking "systemctl start" in a postinst that you run within the boot sequence -- you can at most try to start it without any dependencies and in non-blocking mode, which is why invoke-rc.d has this code:

                # avoid deadlocks during bootup and shutdown from units/hooks
                # which call "invoke-rc.d service reload" and similar, since
                # the synchronous wait plus systemd's normal behaviour of
                # transactionally processing all dependencies first easily
                # causes dependency loops
                if ! OUT=$(systemctl is-system-running 2>/dev/null) && [ "$OUT" != "degraded" ]; then
                    sctl_args="--job-mode=ignore-dependencies"
                fi

This was mostly added for things like /etc/network/ifup.d/ scripts that start stuff, to avoid getting deadlocks on boot. But it would apply here too.

For the most part this should work fine, unless apache expects that any of its dependencies actually get started in that situation (they can't). But if you *do* install a package with an init.d script or service that has dependencies that are not already running, they will *not* be started.

This behaviour of installing packages or configuring your system while the system isn't booted yet might also bite you in other cases (it certainly bit me a number of cases). Maybe we should discuss how to move that after the boot? I do that in https://git.launchpad.net/~ubuntu-release/+git/autopkgtest-cloud/tree/tools/armf-lxd-slave.userdata but this doesn't look very pretty.

Martin Pitt (pitti) on 2016-04-29
Changed in init-system-helpers (Ubuntu Xenial):
status: New → Triaged
assignee: nobody → Martin Pitt (pitti)
importance: Undecided → High
Changed in systemd (Ubuntu Xenial):
status: New → Won't Fix
Scott Moser (smoser) wrote :

Pitti, in reponse to comment 5 above:
> > d.) output of runlevel is stored in RL and then it goes looking for an
> > SLINK or SSLINK (/etc/rc$RL.d/S??apache2 or /etc/rcS.d/S??apache2). Neither
> > of these exist so it exits without starting the service.

> To be sure we are talking about the same thing: Surely update-rc.d should run
> before invoke-rc.d in the postinst, so the links should exist in rc[2345].d/.
> This is solely about $RL having the wrong value here, not about the links not
> existing at all, right?

Yes, the links are present and that is functional. but if RL=unknown, then there is no file /etc/rcunknown.d/S??apache2. My statement wasn't terribly clear.

If RL=5 all would be good.

Scott Moser (smoser) wrote :

> This behaviour of installing packages or configuring your system while the
> system isn't booted yet might also bite you in other cases (it certainly bit
> me a number of cases). Maybe we should discuss how to move that after the
> boot? I do that in
> https://git.launchpad.net/~ubuntu-release/+git/autopkgtest-cloud/tree/tools/armf-lxd-slave.userdata
> but this doesn't look very pretty.

The goal of cloud-init is basically to allow the user to do whatever they could have done by creating their own image. But instead, do that via feeding data to the system.

The portion of cloud-init that installs packages for a user can be moved to a later stage, but can't really be moved *out* of boot.

sysvinit and upstart support this functionality well. upstart *really* does the right thing in that you can add new jobs and when conditions are met those jobs will be handled.

Also note that "in the boot sequence" is not really the problem. The problem is that the package installation is happening before the system is completely booted. As an example of the difference, consider that any hung or blocked systemd service will break installation of services. I can show this by:
 a.) new xenial container
 b.) modify /etc/rc.local to have 'sleep 5m'
 c.) start container
 d.) lxc exec name apt-get install apache2
 e.) apt-get install apache2

The same problem can be shown with 'd' being invoked via ssh, or *any* other mechanism on the system.

Scott Moser (smoser) on 2016-04-29
description: updated
description: updated
no longer affects: cloud-init (Ubuntu Xenial)
no longer affects: cloud-init (Ubuntu)
Martin Pitt (pitti) wrote :

This commit fixes the bogus "runlevel fails" check: http://anonscm.debian.org/cgit/collab-maint/init-system-helpers.git/commit/?id=2c444f0cd . It does modify behaviour during early boot, but I think this is a correct change; nevertheless I would *not* want to SRU this, as this check has been broken forever, and thus invoke-rc.d happily started stuff in early boot (i. e. during rcS in SysV terms, before sysinit.target in systemd terms).

This is an SRUable bandaid for the main bug here: http://anonscm.debian.org/cgit/collab-maint/init-system-helpers.git/commit/?id=161b76221 . Indeed it would be nicer to fix that in systemd itself, but as I explained above this will be a lot more intrusive and much less adequate for an SRU.

Changed in init-system-helpers (Ubuntu):
status: In Progress → Fix Committed
Martin Pitt (pitti) wrote :
Changed in systemd (Debian):
status: Unknown → Confirmed
Martin Pitt (pitti) on 2016-05-06
Changed in systemd (Ubuntu):
status: Won't Fix → Triaged
importance: High → Low
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package init-system-helpers - 1.32ubuntu1

---------------
init-system-helpers (1.32ubuntu1) yakkety; urgency=medium

  * Merge from Debian unstable. Remaining changes:
    - init: Drop sysvinit-core as alternative pre-depends, and add
      upstart-sysv instead.

init-system-helpers (1.32) unstable; urgency=medium

  * dh_systemd_{enable,start}: Quiesce "No such file or directory" error
    messages when calling on a package without /lib/systemd/system/.
    (Closes: #822710)
  * invoke-rc.d: Fix check for failing "runlevel" command.
  * invoke-rc.d: Under systemd, "runlevel" only switches to 3 or 5 when
    multi-user.target/graphical.target have been reached, not before.
    Adjust the runlevel check accordingly. This is only relevant for the check
    for wrong/dangling rcN.d/ symlinks, so just pin it to "5" (the precise
    value does not matter much). Fixing this in systemd requires bigger
    architectural changes, so use this tiny (and backportable) bandaid for the
    time being. (LP: #1575572, see #608456)

 -- Martin Pitt <email address hidden> Thu, 05 May 2016 22:35:39 -0500

Changed in init-system-helpers (Ubuntu):
status: Fix Committed → Fix Released
Martin Pitt (pitti) on 2016-05-07
Changed in systemd (Ubuntu):
assignee: nobody → Martin Pitt (pitti)
Martin Pitt (pitti) on 2016-05-07
Changed in init-system-helpers (Ubuntu Xenial):
status: Triaged → In Progress
description: updated
Robie Basak (racb) wrote :

Bug 1577596 is diagnosed as having a similar "unknown" runlevel invoke-rc.d problem. Is that a dupe of this bug?

Hello Dan, or anyone else affected,

Accepted init-system-helpers into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/init-system-helpers/1.29ubuntu2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in init-system-helpers (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed

The fix seems to work for me. Here's what I did to check:

I used a VM (xenial_test) booted from a pristine copy of the image at <https://uec-images.ubuntu.com/xenial/20160528/xenial-server-cloudimg-amd64-disk1.img> in a local OpenStack environment to verify the fix. I modified the VM to enable -proposed, and upgraded init-system-helpers to the -proposed package:

sudo apt-get install init-system-helpers=1.29ubuntu2

I then shut down the VM and generated a new image from it:

nova image-create xenial_test lp1575572-test-img

Finally, I tested with both Cloud Config data and User-Data script syntax:

1) Cloud Config

cat cloud_conf.yaml
#cloud-config
packages:
- apache2
runcmd:
- "nc -k -l -d 443&"

 nova boot --key-name $KEY --flavor m1.small --image lp1575572-test-img --user-data cloud_conf.yaml lp1575572-test-vm

SSH-ing to the instance:

$ sudo systemctl status apache2
● apache2.service - LSB: Apache2 web server
   Loaded: loaded (/etc/init.d/apache2; bad; vendor preset: enabled)
  Drop-In: /lib/systemd/system/apache2.service.d
           └─apache2-systemd.conf
   Active: active (running) since Tue 2016-05-31 07:23:45 UTC; 40s ago
     Docs: man:systemd-sysv-generator(8)
    Tasks: 55
   Memory: 6.5M
      CPU: 87ms
   CGroup: /system.slice/apache2.service
           ├─2321 /usr/sbin/apache2 -k start
           ├─2324 /usr/sbin/apache2 -k start
           └─2325 /usr/sbin/apache2 -k start

May 31 07:23:44 lp1575572-test-vm systemd[1]: Starting LSB: Apache2 web server...
May 31 07:23:44 lp1575572-test-vm apache2[2297]: * Starting Apache httpd web server apache2
May 31 07:23:45 lp1575572-test-vm apache2[2297]: *
May 31 07:23:45 lp1575572-test-vm systemd[1]: Started LSB: Apache2 web server.

# ==> pass

2) User Data

$ cat user-data.sh
#!/bin/sh

output_runlevel() {
echo -n "Current runlevel is "
sudo /sbin/runlevel
}
output_runlevel
sudo DEBIAN_FRONTEND=noninteractive apt-get -q -y update
#sudo sed -ie 's/set +e$/& -x/' /usr/sbin/invoke-rc.d
output_runlevel
sudo DEBIAN_FRONTEND=noninteractive apt-get install -q -y apache2
output_runlevel
sudo nc -k -l -d 443&

nova boot --key-name $KEY --flavor m1.small --image lp1575572-test-img --user-data user-data.sh lp1575572-test-vm

SSH-ing to the instance:

$ sudo systemctl status apache2
● apache2.service - LSB: Apache2 web server
   Loaded: loaded (/etc/init.d/apache2; bad; vendor preset: enabled)
  Drop-In: /lib/systemd/system/apache2.service.d
           └─apache2-systemd.conf
   Active: active (running) since Tue 2016-05-31 07:29:40 UTC; 1min 20s ago
     Docs: man:systemd-sysv-generator(8)
    Tasks: 55
   Memory: 6.6M
      CPU: 119ms
   CGroup: /system.slice/apache2.service
           ├─2332 /usr/sbin/apache2 -k start
           ├─2335 /usr/sbin/apache2 -k start
           └─2336 /usr/sbin/apache2 -k start

May 31 07:29:38 lp1575572-test-vm systemd[1]: Starting LSB: Apache2 web server...
May 31 07:29:38 lp1575572-test-vm apache2[2308]: * Starting Apache httpd web server apache2
May 31 07:29:40 lp1575572-test-vm apache2[2308]: *
May 31 07:29:40 lp1575572-test-vm systemd[1]: Started LSB: Apache2 web server.

# ==> pass

Martin Pitt (pitti) wrote :

Thanks Dominique for the careful testing!

tags: added: verification-done
removed: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package init-system-helpers - 1.29ubuntu2

---------------
init-system-helpers (1.29ubuntu2) xenial; urgency=medium

  * invoke-rc.d: Under systemd, "runlevel" only switches to 3 or 5 when
    multi-user.target/graphical.target have been reached, not before.
    Adjust the runlevel check accordingly. This is only relevant for the check
    for wrong/dangling rcN.d/ symlinks, so just pin it to "5" (the precise
    value does not matter much). Fixing this in systemd requires bigger
    architectural changes, so use this tiny (and backportable) bandaid for the
    time being. (LP: #1575572)

 -- Martin Pitt <email address hidden> Fri, 06 May 2016 20:51:40 -0500

Changed in init-system-helpers (Ubuntu Xenial):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for init-system-helpers has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Changed in init-system-helpers (Debian):
status: Unknown → Fix Released
Martin Pitt (pitti) on 2016-07-14
no longer affects: systemd (Ubuntu)
no longer affects: systemd (Ubuntu Xenial)
Changed in systemd (Debian):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.