tripleo

Reduce kolla containers image size by moving off puppet config only bits and its dependencies we override for tripleo

Bug #1804822 reported by Bogdan Dobrelya on 2018-11-23

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	tripleo	Opinion	Medium	Unassigned	tripleo stein-3

Bug Description

Bug created for systemd dependencies https://bugs.launchpad.net/tripleo/+bug/1818484

Currently, we include puppet-tripleo (which pulls in puppet, what in turn adds systemd, ruby and more...) into the base container image, which affects [0] the size of all containers for all services, adds more subjects for CVEs handling and potential vectors of attacks. And we use ~101 images for a typical deployment, having a 146 of total images. For edge scenarios, where there are potentially (tens of) thousands nodes distributed over high latency and limited bandwidth WAN networks, that poses a problem.

The solution is creating a side car container and consuming volumes from it, when configuring containerized services via puppet* deployment steps (docker-puppet.py). Note, we cannot just use a single config image that contains all those puppet bits for all of the containers configured via puppet as there is services specific config actions like calling cinder-manage from puppet, for example. Containers need no to keep those puppet packages for the later deployment steps, including runtime/operational stages as well. Nor should any containerized a service require systemd (it brings in a lot of totally useless for containers dependencies for a 190MB of total!)

So we can save approximately 16MB + 61MB + 190MB for the base layer of the container images (checked with):

$ repoquery -R --resolve puppet-tripleo | xargs -n1 -I{} bash -c "rpm -qi {}" 2>&1 | awk '/Size/ {print $NF}' | paste -sd+ - | bc
16610038

$ repoquery -R --resolve puppet | xargs -n1 rpm -qa --queryformat '%10{size} - %-25{name} \t %{version}\n'
[zuul@undercloud ~]$ repoquery -R --resolve puppet | xargs -n1 -I{} bash -c "rpm -qi {}" 2>&1 | awk '/Size/ {print $NF}' | paste -sd+ - | bc
61145246

$ $ repoquery -R --resolve systemd | xargs -n1 -I{} bash -c "rpm -qi {}" 2>&1 | awk '/Size/ {print $NF}' | paste -sd+ - | bc
170945969

We do not want to maintain CVE fixes for those extra components as it has nothing to containerized openstack services, not we do not need them lying in containers as dead weight.

With these numbers of an extra ~270MB per each remote edge compute host, for a 5000 of distributed computes deployed over edge WAN connections, that saves 270*5000 = 1,3TB of traffic (and the time it takes to transfer those from the control plane to remote edge sites and/or local registries sitting there).

Note that some other components packages, like openstack-heat* still do require systemd, that should be fixed as well to not add it back for the upper layers sitting on top of the base one.

[0] http://lists.openstack.org/pipermail/openstack-dev/2018-October/136191.html
[1] http://lists.openstack.org/pipermail/openstack-dev/2018-November/136272.html

See original description

Tags:

Bogdan Dobrelya (bogdando) on 2018-11-23

Changed in tripleo:
status:	New → Triaged
importance:	Undecided → High
milestone:	none → stein-2
tags:	added: containers queens-backport-potential rocky-backport-potential tech-debt
summary:	- Reduce kolla containers image size by moving off puppet bits + Reduce kolla containers image size by moving off puppet bits we override + for tripleo
description:	updated

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2018-11-23: Re: Reduce kolla containers image size by moving off puppet bits we override for tripleo

We already have a container for that, had been added into kolla with https://review.openstack.org/#/c/595866/

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2018-11-23:

It is also addressed by https://review.openstack.org/#/q/topic:base-container-reduction+(status:open+OR+status:merged)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-11-23: Related fix proposed to tripleo-common (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/619744

Bogdan Dobrelya (bogdando) on 2018-11-23

description:	updated
description:	updated

Bogdan Dobrelya (bogdando) on 2018-11-23

description:

updated

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-11-26: Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/620061

Changed in tripleo:
assignee:	nobody → Bogdan Dobrelya (bogdando)
status:	Triaged → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-11-26: Related fix proposed to tripleo-heat-templates (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/620081

Bogdan Dobrelya (bogdando) on 2018-11-26

description:

updated

Bogdan Dobrelya (bogdando) on 2018-11-26

description:	updated
description:	updated
description:	updated

Revision history for this message

Dan Prince (dan-prince) wrote on 2018-11-26: Re: Reduce kolla containers image size by moving off puppet bits we override for tripleo

I couple of points which are relevant that I think you are missing:

1) The network numbers you list above assume you are downloading all the files over the external network. This is very inefficient and isn't what we recommend today. Optimizing container layers for all the cases (storage, and logical deployment of services, which files go where etc) is very difficult. Rather than guess at things why not simply use a container registry to eliminate the network bandwidth and localize the traffic. In short you should only download a container into each remote edge site once. This is perhaps the primary problem you are dealing with here?

2) In order for our puppet modules to work correctly they have to be executed in a container where the packages exist for the service being configure. Typically this means that keystone config files can only be generated inside of the openstack-keystone container.

3) The way the container images are currently layered you only pull down Puppet once in the base layer.

4) Puppet-TripleO is a packaging problem but your solution isn't a good end game I think. Rather than do what you are suggesting here with a "side" container I would rather see us invest the effort in a slight packaging modification to puppet-tripleo such that it can optionally be installed with only the profiles themselves. The we could layer our service containers so that each one containers only its relevant puppet manifests. For example puppet-keystone would only exist in the openstack-keystone container. NOTE: this will actually duplicate some of the puppet modules and thus require more space but is the correct logical way to package containers I think for deployment and thus trumps the space minor disk space usage a couple of puppet modules would cause.

Revision history for this message

Dan Prince (dan-prince) wrote on 2018-11-26:

And if you don't want container registries in the edge sites perhaps a reverse caching proxy layer at each site would eliminate the extra bandwidth. These can be tuned according to the space and don't require much if any cleanup.

Bogdan Dobrelya (bogdando) on 2018-11-27

tags:

added: edge

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2018-11-27: Re: [Bug 1804822] Re: Reduce kolla containers image size by moving off puppet bits we override for tripleo

On 11/26/18 8:09 PM, Dan Prince wrote:
> I couple of points which are relevant that I think you are missing:
>
> 1) The network numbers you list above assume you are downloading all the
> files over the external network. This is very inefficient and isn't what
> we recommend today. Optimizing container layers for all the cases
> (storage, and logical deployment of services, which files go where etc)
> is very difficult. Rather than guess at things why not simply use a

It is very difficult indeed. But now it's time to address that tech-debt
for bright future of Edge :)

> container registry to eliminate the network bandwidth and localize the
> traffic. In short you should only download a container into each remote
> edge site once. This is perhaps the primary problem you are dealing with
> here?

Yes, it is. And "only" downloading the base containers layer into each
remote edge site once still poses a problem of the extra layer size,
when we'll have 30,000-40,000 (and that's for real) remote edge sites as
distributed computes! From some point though I started thinking that
security CVEs patching for unrelated included packages, like puppet that
also brings in ruby and systemd and more things, poses a more important
issue to address. Not limited to edge cases.

>
> 2) In order for our puppet modules to work correctly they have to be
> executed in a container where the packages exist for the service being
> configure. Typically this means that keystone config files can only be
> generated inside of the openstack-keystone container.

Ack, that is proposed to be covered via using --volumes-from a side car
source container.

>
> 3) The way the container images are currently layered you only pull down
> Puppet once in the base layer.
>
> 4) Puppet-TripleO is a packaging problem but your solution isn't a good
> end game I think. Rather than do what you are suggesting here with a
> "side" container I would rather see us invest the effort in a slight
> packaging modification to puppet-tripleo such that it can optionally be
> installed with only the profiles themselves. The we could layer our
> service containers so that each one containers only its relevant puppet
> manifests. For example puppet-keystone would only exist in the
> openstack-keystone container. NOTE: this will actually duplicate some of
> the puppet modules and thus require more space but is the correct
> logical way to package containers I think for deployment and thus trumps
> the space minor disk space usage a couple of puppet modules would cause.

I agree we should decouple off puppet from puppet-tripleo, though I'm
yet getting a full picture of all docker-puppet.py et al changes
required to stop including puppet with its systemd/ruby world into base
layers of containers. I cannot think of other ways than still using
--volumes-from a side-car...

--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

On 11/26/18 8:09 PM, Dan Prince wrote:
> I couple of points which are relevant that I think you are missing:
> 
> 1) The network numbers you list above assume you are downloading all the
> files over the external network. This is very inefficient and isn't what
> we recommend today. Optimizing container layers for all the cases
> (storage, and logical deployment of services, which files go where etc)
> is very difficult. Rather than guess at things why not simply use a

It is very difficult indeed. But now it's time to address that tech-debt 
for bright future of Edge :)

Yes, it is. And "only" downloading the base containers layer into each 
remote edge site once still poses a problem of the extra layer size, 
when we'll have 30,000-40,000 (and that's for real) remote edge sites as 
distributed computes! From some point though I started thinking that 
security CVEs patching for unrelated included packages, like puppet that 
also brings in ruby and systemd and more things, poses a more important 
issue to address. Not limited to edge cases.

> 
> 2) In order for our puppet modules to work correctly they have to be
> executed in a container where the packages exist for the service being
> configure. Typically this means that keystone config files can only be
> generated inside of the openstack-keystone container.

Ack, that is proposed to be covered via using --volumes-from a side car 
source container.

> 
> 3) The way the container images are currently layered you only pull down
> Puppet once in the base layer.
> 
> 4) Puppet-TripleO is a packaging problem but your solution isn't a good
> end game I think. Rather than do what you are suggesting here with a
> "side" container I would rather see us invest the effort in a slight
> packaging modification to puppet-tripleo such that it can optionally be
> installed with only the profiles themselves. The we could layer our
> service containers so that each one containers only its relevant puppet
> manifests. For example puppet-keystone would only exist in the
> openstack-keystone container. NOTE: this will actually duplicate some of
> the puppet modules and thus require more space but is the correct
> logical way to package containers I think for deployment and thus trumps
> the space minor disk space usage a couple of puppet modules would cause.

I agree we should decouple off puppet from puppet-tripleo, though I'm 
yet getting a full picture of all docker-puppet.py et al changes 
required to stop including puppet with its systemd/ruby world into base 
layers of containers. I cannot think of other ways than still using 
--volumes-from a side-car...

-- 
Best regards,
Bogdan Dobrelya,
Irc #bogdando

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-11-27: Change abandoned on tripleo-heat-templates (master)

Change abandoned by Bogdan Dobrelya (<email address hidden>) on branch: master
Review: https://review.openstack.org/620081

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-11-27: Fix proposed to tripleo-common (master)

#10

Fix proposed to branch: master
Review: https://review.openstack.org/620310

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2018-11-28: Re: Reduce kolla containers image size by moving off puppet bits we override for tripleo

#11

Additionally, puppet and cronie, pull in the protected systemd package, which is not needed in containers but only consumes 170MB:

$ repoquery -R --resolve systemd | xargs -n1 -I{} bash -c "rpm -qi {}" 2>&1 | awk '/Size/ {print $NF}' | paste -sd+ - | bc
170945969

Bogdan Dobrelya (bogdando) on 2018-11-28

summary:

- Reduce kolla containers image size by moving off puppet bits we override
- for tripleo
+ Reduce kolla containers image size by moving off puppet/systemd config
+ only dependencies we override for tripleo

Bogdan Dobrelya (bogdando) on 2018-11-28

description:

updated

Bogdan Dobrelya (bogdando) on 2018-11-28

summary:

Reduce kolla containers image size by moving off puppet/systemd config
- only dependencies we override for tripleo
+ only bits and its dependencies we override for tripleo

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-11-28: Change abandoned on tripleo-common (master)

#14

Change abandoned by Bogdan Dobrelya (<email address hidden>) on branch: master
Review: https://review.openstack.org/619744
Reason: This goes against the spirit of containers (which should be self contained)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-11-28:

#15

Change abandoned by Bogdan Dobrelya (<email address hidden>) on branch: master
Review: https://review.openstack.org/620310
Reason: This goes against the spirit of containers (which should be self contained)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-11-28: Change abandoned on tripleo-heat-templates (master)

#16

Change abandoned by Bogdan Dobrelya (<email address hidden>) on branch: master
Review: https://review.openstack.org/620061
Reason: This goes against the spirit of containers (which should be self contained)

Emilien Macchi (emilienm) on 2019-01-13

Changed in tripleo:
milestone:	stein-2 → stein-3

Bogdan Dobrelya (bogdando) on 2019-01-21

Changed in tripleo:
status:	In Progress → Opinion
assignee:	Bogdan Dobrelya (bogdando) → nobody

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2019-02-11: Re: Reduce kolla containers image size by moving off puppet/systemd config only bits and its dependencies we override for tripleo

#17

Because of _tmpfiles used in spec files, we cannot remove systemd & deps from:

* dnf https://bugzilla.redhat.com/show_bug.cgi?id=1671362
* puppet https://bugzilla.redhat.com/show_bug.cgi?id=1654672
* iscsi-initiator-utils https://bugzilla.redhat.com/show_bug.cgi?id=167137
* and kuryr-kubernetes-distgit in RDO

Something to consider for future attempts to fix/improve that miserable situation https://github.com/rpm-software-management/dnf/pull/1315#issuecomment-462326283

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2019-03-04:

#18

For systemd and kernel, let's track that in separate bugs perhaps.

python-libguestfs from centos pulls in the kernel

I can confirm it is being pulled into
kolla/centos-binary-nova-compute-ironic currently but likely others as
well

looking at a recent CI run from kolla I'm seeing the kernel package
installed in the following containers

nova-libvirt
nova-compute-ironic
nova-compute
sahara-base

RPM dep tree
------------------
openstack-nova-compute -> python-libguestfs -> libguestfs -> kernel
nova-libvirt (container) explicitly pulls in libguestfs -> kernel
python2-sahara -> python-libguestfs ->libguestfs -> kernel

summary:	- Reduce kolla containers image size by moving off puppet/systemd config - only bits and its dependencies we override for tripleo + Reduce kolla containers image size by moving off puppet config only bits + and its dependencies we override for tripleo
description:	updated