All containers restarts after docker.service has been restarted

Bug #2065168 reported by Victor Chembaev
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
kolla-ansible
In Progress
Undecided
Unassigned

Bug Description

Starting 2023.1 (Antelope) release, with introduction of "Add systemd container control" feature all containers restarts with docker.service systemd service ignoring "live-restore": true option because of

[Unit]
....
Requires=docker.service
...

statement.

We have to remove this statement and leave just "After=docker.service" - it should be enough

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (master)
Changed in kolla-ansible:
status: New → In Progress
Revision history for this message
Sven Kieske (s-kieske) wrote :

It's not clear to me from the description what the actual bug is supposed to be here, could you elaborate what "all container restarts with docker.service systemd service ignoring "live-restore"" means?

Do you mean containers are not restarted when the docker.service itself is restarted?

For reference, here is the documentation what "Requires=" actually does:

https://www.freedesktop.org/software/systemd/man/latest/systemd.unit.html#Requires=

Thanks!

Revision history for this message
Victor Chembaev (chembervint) wrote :

Hi, Sven

containers Are restarting with docker.service itself is restarted, even in case if live-restore option is set to True in daemon.json. It happens because of Requires=docker.service statement in kolla services systemd unit files.

So it is critical bug for production - any restart of docker.service will restart all openstack deployment

Revision history for this message
Maksim Malchuk (mmalchuk) wrote :

I think Sven would like to say that this is not related to the systemd container control added recently. Even in old/standard configuration restarting of the docker.service would lead to all containers restart. The live-restore option configure what to do with container after service restart, when live-restore enabled all the containers started again.

Revision history for this message
Victor Chembaev (chembervint) wrote :

Actually, not exactly - "You can configure the daemon so that containers remain running if the daemon becomes unavailable. This functionality is called live restore" (https://docs.docker.com/config/containers/live-restore/)

In old/standard configuration (without systemd units for kolla containers) - when I configured "live-restore": true in docker/daemon.json - all containers are remained up and running during docker.service restarting. Now - because of "Requires=docker.service" statement in unit files - all the systemd services are triggered to be restarted together with docker.serivce.

So now - I can't control this behaviour. This is a subject of the bug. And it is really important for the production deployments.

Revision history for this message
Maksim Malchuk (mmalchuk) wrote :

Actually, "daemon becomes unavailable" != "daemon restarted by service" ;)

An quote from the link you're provided: "Restart the Docker daemon. On Linux, you can avoid a restart (and avoid any downtime for your containers) by reloading the Docker daemon. If you use systemd, then use the command systemctl reload docker. Otherwise, send a SIGHUP signal to the dockerd process."

So this is not a subject of the bug you're described.

Revision history for this message
Maksim Malchuk (mmalchuk) wrote :
Revision history for this message
Victor Chembaev (chembervint) wrote :

You just proved my point here (https://paste.openstack.org/show/b8D9tp4XqW8jcAOgAnRu/) "Up 3 seconds"
And if we will remove "Requires=docker.service" and keep "live restore" - we will become able to restart docker.service without affecting on containers.

Revision history for this message
Maksim Malchuk (mmalchuk) wrote :

Not proved, as you can see this not Kolla containers and there are no systemd units for containers.

Revision history for this message
Victor Chembaev (chembervint) wrote :

Do you have "live-restore": true in your docker/daemon.json on this host?

Revision history for this message
Maksim Malchuk (mmalchuk) wrote :

Sure, did you read https://paste.openstack.org/show/b8D9tp4XqW8jcAOgAnRu/ ? at the lines number 6 and 7.

Revision history for this message
Victor Chembaev (chembervint) wrote :

https://paste.openstack.org/show/824069/

Here is an example.

Before 2023.1, on Zed<= deployments w/o systemd units for kolla containers - we used "live-restore": true, and it worked fine for us.

Revision history for this message
Maksim Malchuk (mmalchuk) wrote :

Let's rewind back a little bit and describe more precise:

1. without systemd container control (https://review.opendev.org/c/openstack/kolla-ansible/+/816724) added in stable/2023.1 all containers restarted by the command 'systemctl restart docker.service' even ("live-restore": true) added to docker/daemon.json.

2. the documentation (https://docs.docker.com/config/containers/live-restore/) says:

2.1. "You can configure the daemon so that containers remain running if the daemon becomes unavailable."

and also says:

2.2. "Restart the Docker daemon. On Linux, you can avoid a restart (and avoid any downtime for your containers) by reloading the Docker daemon. If you use systemd, then use the command systemctl reload docker. Otherwise, send a SIGHUP signal to the dockerd process."

Please read 2.2 carefully. To safely restart docker daemon you should use 'systemctl reload' not 'systemctl restart' which will cause restart all your containers. But in Kolla-Ansible with systemd container control [1] this behaviour is changed, so don't quote docker documentation.

The behaviour of the container restart is controlled by 'restart_policy' and 'docker_restart_policy' now.
So, may be you should check your current deployment?

Revision history for this message
Victor Chembaev (chembervint) wrote :

1. No, they kept alive during docker.service has restarted

2. The documentation said that you have to Reload docker daemon to Enable live-restore function after you putted it into the daemon.json config. After it you can Restart docker any way and all the containers will be alive and fun

Revision history for this message
Maksim Malchuk (mmalchuk) wrote :

Okay, please tell me why my containers restarted (https://paste.openstack.org/show/b8D9tp4XqW8jcAOgAnRu/) ?
This is CEPH node for Xena without (https://review.opendev.org/c/openstack/kolla-ansible/+/816724) when deployed. The docker/daemon.json contain ("live-restore": true).

Revision history for this message
Victor Chembaev (chembervint) wrote :
Revision history for this message
Victor Chembaev (chembervint) wrote :

https://paste.openstack.org/show/bi07YFwsjfifbgSvt9In/

As we can see - ceph deploys also with systemd units, which also includes

Requires=docker.service

That is why your ceph has been restarted with docker.service

Revision history for this message
Maksim Malchuk (mmalchuk) wrote :

Oh, really, CEPH containers with systemd units is a bad example. Sorry.
Anyway, for "to Reload docker daemon to Enable live-restore function after you putted it into the daemon.json config" you should run 'systemctl daemon-reload' command.

Revision history for this message
Victor Chembaev (chembervint) wrote :

Hi,

systemctl daemon-reload should be issued after any systemd unit file has beed changed.

If you configure service itself, for example docker - daemon.json is just a config file for Docker daemon - you have not do a systemctl daemon-reload. You have to reload just a service you have been configured yet - for example - systemctl reload docker

Revision history for this message
Maksim Malchuk (mmalchuk) wrote :

Lets discuss this on IRC

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.