Model upgrade terminates daemons spawned by hooks

Bug #1772601 reported by Stuart Bishop
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Low
Unassigned

Bug Description

Upgrading a model to 2.3.7 caused an unexpected outage when PostgreSQL was shutdown. The charm thankfully recovered when the first hook was run post-upgrade, so the outage was limited to 60 seconds.

It seems that upgradeing jujud also sends kill signals to all processes spawned by its subprocesses, such as the PostgreSQL daemon started by running 'pg_ctlcluster 9.5 main start' from a hook.

We think everything linked by the cgroup gets affected, so even though pg_ctlcluster switches user id and group it is killed:

# cat /proc/4776/cgroup # The root PostgreSQL pid
11:cpu,cpuacct:/system.slice/jujud-unit-postgresql-1.service
10:pids:/system.slice/jujud-unit-postgresql-1.service
9:devices:/system.slice/jujud-unit-postgresql-1.service
8:perf_event:/
7:memory:/system.slice/jujud-unit-postgresql-1.service
6:freezer:/
5:net_cls,net_prio:/
4:hugetlb:/
3:blkio:/system.slice/jujud-unit-postgresql-1.service
2:cpuset:/
1:name=systemd:/system.slice/jujud-unit-postgresql-1.service

To reproduce, deploy a cs:postgresql unit with an older version of juju, and upgrade to 2.3.7. /var/log/postgresql-9.5-main.log will show the shutdown signal being received, and starting up again when the next hook is run.

systemd services seem unaffected, so pgbouncer (started by systemctl start pgbouncer) did not get signaled.

Stuart Bishop (stub)
description: updated
Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 1772601] Re: Model upgrade terminates daemons spawned by hooks

Is there a reason it is spawned directly by the hook rather than something
like creating a systemd job?

Offhand, I didn't think we used a signal to stop, so I'm not sure why it
would be killing other processes, but I suppose you have evidence to the
contrary.

On Tue, May 22, 2018 at 12:29 PM, Stuart Bishop <email address hidden>
wrote:

> ** Description changed:
>
> Upgrading a model to 2.3.7 caused an unexpected outage when PostgreSQL
> was shutdown. The charm thankfully recovered when the first hook was run
> post-upgrade, so the outage was limited to 60 seconds.
>
> - It seems that upgradin jujud also sends kill signals to all processes
> + It seems that upgradeing jujud also sends kill signals to all processes
> spawned by its subprocesses, such as the PostgreSQL daemon started by
> running 'pg_ctlcluster 9.5 main start' from a hook.
>
> We think everything linked by the cgroup gets affected, so even though
> pg_ctlcluster switches user id and group it is killed:
>
> # cat /proc/4776/cgroup # The root PostgreSQL pid
> 11:cpu,cpuacct:/system.slice/jujud-unit-postgresql-1.service
> 10:pids:/system.slice/jujud-unit-postgresql-1.service
> 9:devices:/system.slice/jujud-unit-postgresql-1.service
> 8:perf_event:/
> 7:memory:/system.slice/jujud-unit-postgresql-1.service
> 6:freezer:/
> 5:net_cls,net_prio:/
> 4:hugetlb:/
> 3:blkio:/system.slice/jujud-unit-postgresql-1.service
> 2:cpuset:/
> 1:name=systemd:/system.slice/jujud-unit-postgresql-1.service
>
> -
> - To reproduce, deploy a cs:postgresql unit with an older version of juju,
> and upgrade to 2.3.7. /var/log/postgresql-9.5-main.log will show the
> shutdown signal being received, and starting up again when the next hook is
> run.
> + To reproduce, deploy a cs:postgresql unit with an older version of juju,
> + and upgrade to 2.3.7. /var/log/postgresql-9.5-main.log will show the
> + shutdown signal being received, and starting up again when the next hook
> + is run.
>
> systemd services seem unaffected, so pgbouncer (started by systemctl
> start pgbouncer) did not get signaled.
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1772601
>
> Title:
> Model upgrade terminates daemons spawned by hooks
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1772601/+subscriptions
>

Revision history for this message
William Grant (wgrant) wrote :

It's likely that systemd notices the agent shutting down and cleans up the whole cgroup.

Revision history for this message
Stuart Bishop (stub) wrote :

On 22 May 2018 at 18:34, John A Meinel <email address hidden> wrote:

> Is there a reason it is spawned directly by the hook rather than something
> like creating a systemd job?

The charm predates systemd, and until recently supported trusty. Yes,
the charm could work around the issue that way, now that we know what
the issue is. I have no idea what other charms are affected, but
grepping for /etc/init.d would be a start.

--
Stuart Bishop <email address hidden>

Revision history for this message
Stuart Bishop (stub) wrote :

(erm, the PostgreSQL charm still supports trusty. Ignore my wishful thinking ;) )

Stuart Bishop (stub)
tags: added: canonical-is
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

I think I know what is happening here.

Under the systemd booted systems, pg_ctlcluster usually re-execs systemctl and starts/stops the appropriate postgresql@ instance unit.

The unit itself... executes pg_ctlcluster however with _SYSTEMCTL_SKIP_REDIRECT set, such that the actual code to spawn and manage postgresql cluster is used.

However, there is extra logic when to use systemctl, and when to bypass it.

Ideally, in this case, the charm should be resilient and it should ensure that systemctl redirection was not skipped, such that postgresql cluster is spawned a systemd unit; and not as a child.

E.g. note the
    # if extra options are given, proceed with pg_ctlcluster with a warning
    if (@postgres_auxoptions != 0 or @pg_ctl_opts_from_cli != 0 or $bindir) { # extra options given
        print "Notice: extra pg_ctl/postgres options given, bypassing systemctl for $action operation\n" if (-t 1);
    # if we are root, redirect to systemctl unless stopping a cluster running outside systemd

I would love to check the charm details, to see what exactly it invokes. Ideally it should figure out the default cluster, check if pid 1 is upstart; check if /run/systemd/system exists; and execute the most native command possible - e.g. systemctl start postgresql@9.5-main.service; not pg_ctlcluster 9.5 main start

Revision history for this message
Joel Sing (jsing) wrote :

The charm is indeed executing pg_ctlcluster, which in turn runs pg_ctl - this means that it is starting and running the PostgreSQL daemons outside of a systemd service. Even though pg_ctl (or something in this chain) takes care to daemonise and detach (running under a separate process group, etc), it is still part of the same cgroup.

The default systemd behaviour is to kill all processes in the cgroup on service stop/restart, which means it takes down any other processes that were started under this cgroup. In the PostgreSQL case this is reproducible by running 'service restart jujud-unit-postgresql-0', which results in the postgres processes also being terminated and restarted.

While there is an argument that this should be fixed by making the charm use systemd semantics (i.e. 'systemctl start postgresql' or 'service start postgresql'), there is also an argument that this could have far reaching implications for many existing charms.

The easy solution is probably to add a KillMode=process to the jujud systemctl service configuration:

  https://www.freedesktop.org/software/systemd/man/systemd.kill.html#KillMode=

This results in systemd only terminating the jujud process and not all processes within that cgroup. I've tested and confirmed that this prevents 'service restart jujud-unit-postgresql-0' from restarting the postgres processes.

Revision history for this message
Nobuto Murata (nobuto) wrote :

FWIW, another surfaced instance in the past:
https://launchpad.net/bugs/1664025

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Depends if the intention was to fork a custom daemon within the same
unit; or to manage the system configured postgresql cluster with the
system/package integration of clusters management with the custom ctl
command that calls systemctl.

No, there is no postgresql unit, there are only instance unit types
per version/cluster. And thus one should be invoking pg_ctlcluster in
a way that does trigger the systemd unit. If one is using
pg_ctlcluster, the intention is to use the system units infrastructure
and e.g. have these clusters start up and recover on reboots. I would
have expected a jujud unit to create its own pg cluster and have that
running as postgresql@X.Y-$unit.service instances.

I would advise against adding KillMode=process, as this will result in
opening up the avenue for left-over lingering processors. Which may
end up blocking shutdown for up to 30 mins. And e.g. systemctl
restarts not actually killing and restarting all the processes, as one
is currently experiencing.

pg_ctlcluster integration with systemctl is not at all common to all
daemons. This is very Debian's packaging of postgresql specific
integration with systemd on Debian & Ubuntu.

Where is the code for the charm in question?

--
Regards,

Dimitri.

Revision history for this message
Stuart Bishop (stub) wrote :

This bug is not on the PostgreSQL charm, and the Juju upgrade behavior has nothing specifically to do with PostgreSQL. It is a general problem, that is so far known to have affected two of the most popular charms, and is expected to affect many more.

To avoid derailing this bug report further with PostgreSQL specifics, here is simple test case not involving PostgreSQL:

juju deploy cs:ubuntu
juju run --unit=ubuntu/0 'sshd -p 2222'
juju run --unit=ubuntu/0 'ps | grep sshd | wc -l' # Returns 2
juju ssh ubuntu/0 -- sudo systemctl restart jujud-unit-ubuntu-0 # Simulate juju upgrade
juju run --unit=ubuntu/0 'ps | grep sshd | wc -l' # Returns 1, daemon killed

Fixing all charms to only launch background process on systemd based systems using systemd would be impractical, so even if this becomes documented behavior it will not save us from future production outages with our older charms.

Revision history for this message
John A Meinel (jameinel) wrote :

So I'd like to try to summarize what I think Dimitri's thoughts are.
All processes that are running should ultimately be managed as part of some service under init (in this case systemd). If it isn't, then you run into potential problems around shutdown/reboot as it doesn't have a way to properly indicate the process should stop at this point.
If we defaulted to KillMode=Process, then *all* processes that are started from charms would end up decoupled from the init system.
Whereas, *if* a charm wants to start a process that has a lifecycle that is decoupled from Juju's lifecycle, then it can do so by creating a new system process and putting it under that. Otherwise, it should be coupled to Juju's lifecycle so that the process is coupled to *some* lifecycle.
Put another way, all processes should be coupled to *a* managed cgroup, and charms are responsible if they want to decouple it from Juju's cgroup.
Juju should not default to decoupling, because that just leaves lots of unmanaged processes.

Changed in juju:
importance: Undecided → High
status: New → Incomplete
Revision history for this message
Richard Harding (rharding) wrote :

@xnox the starting of it is located in this python subprocess call in the function:

    def start(ignore_failure=False):

https://api.jujucharms.com/charmstore/v5/postgresql/archive/reactive/postgresql/postgresql.py

        subprocess.check_call(['pg_ctlcluster',
                               version(), 'main', 'start',
                               # These extra options cause pg_ctl to wait
                               # for startup to finish, so we don't have to.
                               '--', '-w', '-t', str(STARTUP_TIMEOUT)],
                              universal_newlines=True)

Revision history for this message
Stuart Bishop (stub) wrote :

How do I move this off incomplete?

I think KillMode=Process is exactly what we want. We *want* all processes that are started directly by the charm hooks to be decoupled from the init system, and I imagine this is exactly the sort of case the systemd flag exists for and how similar services like sshd or crond need to handle the problem. In what cases is it at all acceptable for a Juju upgrade to trigger termination of a background process spawned by a charm? The charm spawned it, the charm is responsible for it, and if Juju touches it then it is a bug.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

@rharding
Thanks for following up to our corner sofa meeting.

pg_ctlcluster, version(), 'main', 'start' -> alone, would do the right thing and divert to systemd instance unit. The extra options however cause this to not happen. However all of those are active by default (-w -t 60)...

... But the STARTUP_TIMEOUT is increased for this cluster. Instead of upping this time-out as an argument when calling pg_ctrlcluster, the charm should probably instead adjust the /etc/postgresql/version()/main/pg_ctl.conf configuration file for the cluster to up the timeouts there.

Moving the timeout bump from cmdline args to the conf-file will:

- be compatible across all supported ubuntu releases (probably going back to even pre-precise)

- increase timeouts applied on any pg_ctlcluster invocations against that postgres cluster; including system boot and shutdown.

- will result in systemd managed postgres cluster on xenial and up, always.

Revision history for this message
Stuart Bishop (stub) wrote :

Moving back to 'new' since all requested information has been provided, as well as a minimal reproducible test case.

Changed in juju:
status: Incomplete → New
Changed in juju:
status: New → Triaged
Revision history for this message
Richard Harding (rharding) wrote :

Thanks for the added info @xnox. That's useful to understand.

@stub, I'm trying to think this through. To me the hooks are just scripts that fire off due to events. It's a bit like triggering a lambda function or something. They're fired often and are meant to just respond and get done and out of the way. If those scripts do something meaningful/long term it should be up to the charm to figure out the best way to run things on the system for the long term. Charms have to figure out if they're putting in place an upstart script vs a systemd one. How does restarting happen during changes to config/upgrade. etc.

If anything it seems the "correct" fix is to find a way for the hook calls to start, run, and stop and anything done in them to be killed off. Hooks leaving things around by being hooks vs the system being updated to handle it seems dirty. I'm not sure if we can do that the way they're actually triggered via the unit agents or not though. We'll investigate and see.

Unfortunately, I don't think that'll help in this situation as when the hooks completes the pg cluster command would be killed off as it is now when the agent bounces except it would to it at the end of the hook run vs on agent restart.

Revision history for this message
Canonical Juju QA Bot (juju-qa-bot) wrote :

This bug has not been updated in 2 years, so we're marking it Low importance. If you believe this is incorrect, please update the importance.

Changed in juju:
importance: High → Low
tags: added: expirebugs-bot
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.