kolla-ansible

Bug #2065168
Comment #22

Comment 22 for bug 2065168

Revision history for this message

Sven Kieske (s-kieske) wrote on 2024-06-27 (last edit on 2024-06-28):

#22

Did anybody read this part of the docker docs around this topic and somehow concluded it's not a problem?

to quote:

> Impact of live restore on running containers

> If the daemon is down for a long time, running containers may fill up the FIFO log the daemon normally reads. A full log blocks containers from logging more data. The default buffer size is 64K. If the buffers fill, you must restart the Docker daemon to flush them.

https://docs.docker.com/config/containers/live-restore/#impact-of-live-restore-on-running-containers

and also:

> Live restore allows you to keep containers running across Docker daemon updates, but is only supported when installing patch releases (YY.MM.x), not for major (YY.MM) daemon upgrades.

> If you skip releases during an upgrade, the daemon may not restore its connection to the containers. If the daemon can't restore the connection, it can't manage the running containers and you must stop them manually.

https://docs.docker.com/config/containers/live-restore/#live-restore-during-upgrades

so did someone check what we do when upgrading docker to major versions? are we aware that we need to manually restart the containers now and do we do this?

Did someone test, that the issue with pipes filling up is not an issue for our deployment model?

From my experience, filled up log pipes in docker daemon follow rather soon by filled up ram inside container and subsequent crashes of either the containers or complete oom situations on the host.

I hope someone can confirm that this is not a problem?

# Update with my comment from gerrit code review:

I had no knowledge that anybody is using that already in production! So if you got experience with it, I'm glad it works, it seems it's even enabled in our downstream as well, which I somehow missed (wrong grep I guess).

So apologies for making a fuzz.

Nevertheless it might cause problems if the docker daemon is down for extended periods of time, e.g. when an upgrade of the docker daemon didn't go well for users of live-restore and the containers are running for longer periods of time without being able to shuffle data over the docker pipe.

So it would've been nice if anybody had tested that prior to merging it.