php7.0-fpm_7.0.33-0ubuntu0.16.04.2 update stop fpm service for 10 minutes

Bug #1819033 reported by Cicatrice on 2019-03-07
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
php7.0 (Ubuntu)
Undecided
Karl Stenerud
Xenial
Undecided
Unassigned

Bug Description

[Impact]

Since php7.0 still follows the compat level 9 defaults of stopping the service at the start of an upgrade, and then starting it again at the end of the upgrade, there is potentially a long period of the service being stopped when there are many packages to upgrade, leading to unwanted downtime.

[Test Case]

A simple apt install --reinstall will exhibit the behavior in /var/log/syslog:

Command:

apt install --reinstall -y php7.0-fpm

/var/log/syslog:

Mar 12 13:40:17 tester systemd[1]: Reloading.
Mar 12 13:40:17 tester systemd[1]: Stopping The PHP 7.0 FastCGI Process Manager...
Mar 12 13:40:17 tester systemd[1]: Stopped The PHP 7.0 FastCGI Process Manager.
Mar 12 13:40:17 tester systemd[1]: Reloading.
Mar 12 13:40:18 tester systemd[1]: Reloading.
Mar 12 13:40:19 tester systemd[1]: Reloading.
Mar 12 13:40:19 tester systemd[1]: Reloading.
Mar 12 13:40:19 tester systemd[1]: Starting The PHP 7.0 FastCGI Process Manager...
Mar 12 13:40:19 tester systemd[1]: Started The PHP 7.0 FastCGI Process Manager.

php-fpm is down during the entire update process.

When upgrading to the fixed PPA, it will still shut down at the start due to the old script being run.

Command:

add-apt-repository -y ppa:kstenerud/xenial-php7.0-restart-after-upgrade-1819033 && apt update && apt dist-upgrade -y

/var/log/syslog:

Mar 12 13:41:40 tester systemd[1]: Reloading.
Mar 12 13:41:41 tester systemd[1]: Stopping The PHP 7.0 FastCGI Process Manager...
Mar 12 13:41:41 tester systemd[1]: Stopped The PHP 7.0 FastCGI Process Manager.
Mar 12 13:41:41 tester systemd[1]: Reloading.
Mar 12 13:41:43 tester systemd[1]: Reloading.
Mar 12 13:41:52 tester systemd[1]: Reloading.
Mar 12 13:41:52 tester systemd[1]: Reloading.
Mar 12 13:41:52 tester systemd[1]: Stopped The PHP 7.0 FastCGI Process Manager.
Mar 12 13:41:52 tester systemd[1]: Starting The PHP 7.0 FastCGI Process Manager...
Mar 12 13:41:52 tester systemd[1]: Started The PHP 7.0 FastCGI Process Manager.

The extra "Stopped" at 13:41:52 is due to the new behavior calling restart.

With the new package installed, it no longer stops when beginning an upgrade, and does a restart at the end (which php-fpm accomplishes with a stop and start).

Command:

apt install --reinstall -y php7.0-fpm

/var/log/syslog:

Mar 12 13:43:00 tester systemd[1]: Reloading.
Mar 12 13:43:00 tester systemd[1]: Reloading.
Mar 12 13:43:01 tester systemd[1]: Reloading.
Mar 12 13:43:02 tester systemd[1]: Reloading.
Mar 12 13:43:02 tester systemd[1]: Stopping The PHP 7.0 FastCGI Process Manager...
Mar 12 13:43:02 tester systemd[1]: Stopped The PHP 7.0 FastCGI Process Manager.
Mar 12 13:43:02 tester systemd[1]: Starting The PHP 7.0 FastCGI Process Manager...
Mar 12 13:43:02 tester systemd[1]: Started The PHP 7.0 FastCGI Process Manager.

[Regression Potential]

If the service tries to load a file that has been replaced during upgrade, but before it was restarted, it could cause a failure. However, the service will soon be restarted anyway.

If the restart takes too long it could fail.

[Original Description]

The unattended update has launch the update of php7.0-fpm that trigger the stop of php7.0-fpm service (timestamp match that operation). The service does not restart within 10 minutes.

# systemctl status php7.0-fpm
● php7.0-fpm.service - The PHP 7.0 FastCGI Process Manager
   Loaded: loaded (/lib/systemd/system/php7.0-fpm.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Thu 2019-03-07 15:21:00 UTC; 8min ago
 Main PID: 3505 (code=exited, status=0/SUCCESS)
   Status: "Processes active: 0, idle: 32, Requests: 0, slow: 0, Traffic: 0req/sec"

Mar 07 15:14:55 xxx systemd[1]: Starting The PHP 7.0 FastCGI Process Manager...
Mar 07 15:14:55 xxx systemd[1]: Started The PHP 7.0 FastCGI Process Manager.
Mar 07 15:21:00 xxx systemd[1]: Stopping The PHP 7.0 FastCGI Process Manager...
Mar 07 15:21:00 xxx systemd[1]: Stopped The PHP 7.0 FastCGI Process Manager.

# cat /var/log/dpkg.log | grep php | grep fpm
[...]
2019-03-07 15:20:58 status triggers-pending php7.0-fpm:amd64 7.0.30-0ubuntu0.16.04.1
2019-03-07 15:21:00 upgrade php7.0-fpm:amd64 7.0.30-0ubuntu0.16.04.1 7.0.33-0ubuntu0.16.04.2
2019-03-07 15:21:00 status half-configured php7.0-fpm:amd64 7.0.30-0ubuntu0.16.04.1
2019-03-07 15:21:00 status unpacked php7.0-fpm:amd64 7.0.30-0ubuntu0.16.04.1
2019-03-07 15:21:00 status half-installed php7.0-fpm:amd64 7.0.30-0ubuntu0.16.04.1
2019-03-07 15:21:00 status half-installed php7.0-fpm:amd64 7.0.30-0ubuntu0.16.04.1
2019-03-07 15:21:00 status unpacked php7.0-fpm:amd64 7.0.33-0ubuntu0.16.04.2
2019-03-07 15:21:00 status unpacked php7.0-fpm:amd64 7.0.33-0ubuntu0.16.04.2
2019-03-07 15:31:31 configure php7.0-fpm:amd64 7.0.33-0ubuntu0.16.04.2 <none>
2019-03-07 15:31:31 status unpacked php7.0-fpm:amd64 7.0.33-0ubuntu0.16.04.2
2019-03-07 15:31:31 status unpacked php7.0-fpm:amd64 7.0.33-0ubuntu0.16.04.2
2019-03-07 15:31:31 status unpacked php7.0-fpm:amd64 7.0.33-0ubuntu0.16.04.2
2019-03-07 15:31:31 status unpacked php7.0-fpm:amd64 7.0.33-0ubuntu0.16.04.2
2019-03-07 15:31:31 status unpacked php7.0-fpm:amd64 7.0.33-0ubuntu0.16.04.2
2019-03-07 15:31:31 status unpacked php7.0-fpm:amd64 7.0.33-0ubuntu0.16.04.2
2019-03-07 15:31:31 status unpacked php7.0-fpm:amd64 7.0.33-0ubuntu0.16.04.2
2019-03-07 15:31:31 status half-configured php7.0-fpm:amd64 7.0.33-0ubuntu0.16.04.2
2019-03-07 15:31:32 status installed php7.0-fpm:amd64 7.0.33-0ubuntu0.16.04.2
2019-03-07 15:31:32 status triggers-pending php7.0-fpm:amd64 7.0.33-0ubuntu0.16.04.2

$ lsb_release -rd
Description: Ubuntu 16.04.4 LTS
Release: 16.04

$ apt-cache policy php7.0-fpm
php7.0-fpm:
  Installed: 7.0.33-0ubuntu0.16.04.2
  Candidate: 7.0.33-0ubuntu0.16.04.2
  Version table:
 *** 7.0.33-0ubuntu0.16.04.2 500
        500 http://us-east-1.ec2.archive.ubuntu.com/ubuntu xenial-updates/universe amd64 Packages
        500 http://security.ubuntu.com/ubuntu xenial-security/universe amd64 Packages
        100 /var/lib/dpkg/status
     7.0.4-7ubuntu2 500
        500 http://us-east-1.ec2.archive.ubuntu.com/ubuntu xenial/universe amd64 Packages

Related branches

Odd, in a quick repro I haven't seen any delay.

There are prerm/postrm/postinst steps in this package.
In addition as we have seen with other packages it has sysV and systemd units which sometimes can cause double actions in maintainer scripts.

Prerm of the old version will stop it (sysV)
  invoke-rc.d php7.0-fpm stop
Postrm will do nothind on upgrade
Postinst of the new version will then start it
  invoke-rc.d php7.0-fpm start

I checked - even 7.2.15 in Disco fixed it as it correctly adds
  dh_installinit --restart-after-upgrade
  dh_systemd_start --restart-after-upgrade
Which will make the stop in prerm only happen if you are removing the package.

Without that if the upgrades take long (for whatever reason overload/slow-disk/...) the described issue can happen. That was the main point when introducing --restart-after-upgrade.
Although there has to be taken extra care as the behaviro of dh_installinit and co changed in different compat levels.

That said - the fix seems clear, lets check when it was added [1].

Hmm that is pre-xenial for sure 7.0.6-13~2 to be specific.
And latter 7.1 / 7.2 hold it as well.

What happened to our 7.0?
E.g. the version we had in Artful had this for sure
+php7.0 (7.0.6-12) unstable; urgency=medium
+
+ * Enable --restart-after-upgrade for both dh_installinit and
+ dh_systemd_start to minimize downtimes

Xenial was originally on 7.0.4-7ubuntu2 and since then got only security updates.
That means the fix from 7.0.6-12 isn't in there.

IMHO One would need to pick and package [2] for Xenial.

[1]: https://salsa.debian.org/php-team/php/commit/8e34b064ccdcb9736668dec2e184a577a202880f
[2]: https://git.launchpad.net/ubuntu/+source/php7.0/diff/debian/rules?h=import/7.0.6-12&id=ee74f97a368a45eb8254ae9a26cb001732b85b6f

Changed in php7.0 (Ubuntu):
status: New → Triaged
assignee: nobody → Karl Stenerud (kstenerud)
tags: added: server-next
Cicatrice (cicatrice) wrote :

Thanks you very much for your analysis.

To give a bit more context, we are using this package on several AWS instances, and we got them randomly stop working all suddenly. As killing the instances and waiting on Auto-scaling group to spawn new one (with the same base AMI and php-fpm at the same original version) didn't help, I immediately suspected security updates.

When I track the logs, I go to that conclusion. This "bug" is also affecting another environment and we are thinking about a workaround (i.e. rebuilding base AMI with up to date php-fpm inside).

Many thanks again !

description: updated
description: updated
Robie Basak (racb) wrote :

Thank you for working on this, figuring out the root cause and proposing the fix for 16.04.

I'd like to question though whether an SRU is really appropriate in this case. It seems to me that this kind of change risks unintended consequences more so than usual, that the number of users affected is small and a better general solution is possible.

I wondered how it is that a package upgrade takes ten minutes, and realised that based on Cicatrice's description, this is only happening because of the use of a "baked" image that is missing the latest updates, doesn't install them before entering production, and does get them all during the first unattended-upgrades run which will take place the first night after deployment.

This seems like a poor approach to me, exactly because the instance is being put into production in a way that is set to immediately change overnight, and additionally because the instance is being put into production with outstanding security updates. Better would be to ask cloud-init to automatically upgrade on first boot, perhaps limiting it to security updates only if that is required. I think that would be a more sensible way of doing it regardless of this particular issue, and one consequence of doing this would be that this issue would then be moot, because unattended-upgrades would be expected to run much faster when it doesn't ever have more than a day of security updates catchup to perform.

Apart from this very specific use case, I'm not sure that this issue is likely to impact any others. Given my solution above appears to be better for everyone, I'm soft-declining this SRU, pending further discussion, to minimise regression risk to existing unaffected users. If you disagree, please do discuss that here, and we can reconsider. Otherwise, I'll mark this "Won't Fix" for php7.0 and reject from the queue when I next see this again after a couple of weeks.

Separately, I think this scenario is a trap for those deploying like Cicatrice did. Ubuntu images should do the right thing by default. I think cloud-init might be the right place to fix this, and filed bug 1827204 to track the "it should do the right thing by default" problem.

Robie Basak (racb) wrote :

> Otherwise, I'll mark this "Won't Fix" for php7.0 and reject from the queue when I next see this again after a couple of weeks.

As there have been no objections, I'm rejecting this from the queue and marking the SRU "Won't Fix" for the reasons above.

Changed in php7.0 (Ubuntu):
status: Triaged → Fix Released
Changed in php7.0 (Ubuntu Xenial):
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers