Bug #1805410 “Adding a new compute starts new service containers...” : Bugs : tripleo

Cédric Jeanneret (cjeanner) on 2018-11-27

Changed in tripleo:
status:	Triaged → Incomplete

Revision history for this message

Cédric Jeanneret (cjeanner) wrote on 2018-11-27:

#1

Download full text (3.5 KiB)

Some logs and info (took time to re-run the whole thing)

In the deploy logs:
        "Running container: keystone_bootstrap",
        "$ podman ps -a --filter label=container_name=keystone --filter label=config_id=tripleo_step3 --format {{.Names}}",
        "keystone-vmn4c9po",
        "keystone",
        "$ podman exec --user=root keystone-vmn4c9po /usr/bin/bootstrap_host_exec keystone keystone-manage bootstrap --bootstrap-password vFViRs1DFjwJrmDz4I1ltcpgZ",
        "cannot exec into container that is not running",
        "Error running ['podman', 'exec', '--user=root', u'keystone-vmn4c9po', '/usr/bin/bootstrap_host_exec', 'keystone', 'keystone-manage', 'bootstrap', '--bootstrap-password', 'vFViRs1DFjwJrmDz4I1ltcpgZ']. [125]",
        "stderr: cannot exec into container that is not running",

On controller-0, we can see those NEW containers:
d64303343f36 docker.io/tripleomaster/centos-binary-keystone:current-tripleo /bin/bash -c /usr... 5 minutes ago Up 5 minutes ago keystone_cron-8p9o9k0h
b674af359a06 docker.io/tripleomaster/centos-binary-mariadb:current-tripleo kolla_start 8 minutes ago Up About a minute ago mysql-kw4r3w5x
069b6ca20a83 docker.io/tripleomaster/centos-binary-haproxy:current-tripleo kolla_start 10 minutes ago Up 10 minutes ago haproxy-nse1lud1
c2f11cba5a32 docker.io/tripleomaster/centos-binary-keepalived:current-tripleo /usr/local/bin/ko... 10 minutes ago Up 10 minutes ago keepalived-qapg6ssa

While the older ones are still running, and apparently in good shape:
e59cdeacdce9 docker.io/tripleomaster/centos-binary-keystone:current-tripleo /bin/bash -c /usr... About an hour ago Up About an hour ago keystone_cron
25d9bb139ed4 docker.io/tripleomaster/centos-binary-mariadb:current-tripleo kolla_start About an hour ago Up About an hour ago mysql
96919bdb2ac2 docker.io/tripleomaster/centos-binary-haproxy:current-tripleo kolla_start About an hour ago Up About an hour ago haproxy
8cca1f0d47cc docker.io/tripleomaster/centos-binary-keepalived:current-tripleo /usr/local/bin/ko... About an hour ago Up About an hour ago keepalived

I'm wondering if I4386b155a4bdba430dc350914db7a6b6fdf92ac0[1] could do that kind of thing?

Having multiple mysqld processes hitting the very same DB creates this issue in service log:
2018-11-27 15:41:37 140499793242304 [ERROR] InnoDB: Unable to lock ./ibdata1, error: 11
2018-11-27 15:41:37 140499793242304 [Note] InnoDB: Check that you do not already have another mysqld process using the same InnoDB data or log files.

Also, regarding the keystone container, if we `podman logs keystone-vmn4c9po` we can see this:
[Tue Nov 27 15:32:23.781274 2018] [alias:warn] [pid 9] AH00671: The Alias directive in /etc/httpd/conf.d/autoindex.conf at line 21 will probably never match because it overlaps an earlier Alias.
(98)Address already in use: AH00072: make_sock: could not bind to address 192.168.24.8:35357
no listening...

Some logs and info (took time to re-run the whole thing)

In the deploy logs:
        "Running container: keystone_bootstrap", 
        "$ podman ps -a --filter label=container_name=keystone --filter label=config_id=tripleo_step3 --format {{.Names}}", 
        "keystone-vmn4c9po", 
        "keystone", 
        "$ podman exec --user=root keystone-vmn4c9po /usr/bin/bootstrap_host_exec keystone keystone-manage bootstrap --bootstrap-password vFViRs1DFjwJrmDz4I1ltcpgZ", 
        "cannot exec into container that is not running", 
        "Error running ['podman', 'exec', '--user=root', u'keystone-vmn4c9po', '/usr/bin/bootstrap_host_exec', 'keystone', 'keystone-manage', 'bootstrap', '--bootstrap-password', 'vFViRs1DFjwJrmDz4I1ltcpgZ']. [125]", 
        "stderr: cannot exec into container that is not running",

On controller-0, we can see those NEW containers:
d64303343f36  docker.io/tripleomaster/centos-binary-keystone:current-tripleo                   /bin/bash -c /usr...  5 minutes ago      Up 5 minutes ago              keystone_cron-8p9o9k0h
b674af359a06  docker.io/tripleomaster/centos-binary-mariadb:current-tripleo                    kolla_start           8 minutes ago      Up About a minute ago         mysql-kw4r3w5x
069b6ca20a83  docker.io/tripleomaster/centos-binary-haproxy:current-tripleo                    kolla_start           10 minutes ago     Up 10 minutes ago             haproxy-nse1lud1
c2f11cba5a32  docker.io/tripleomaster/centos-binary-keepalived:current-tripleo                 /usr/local/bin/ko...  10 minutes ago     Up 10 minutes ago             keepalived-qapg6ssa

While the older ones are still running, and apparently in good shape:
e59cdeacdce9  docker.io/tripleomaster/centos-binary-keystone:current-tripleo                   /bin/bash -c /usr...  About an hour ago  Up About an hour ago          keystone_cron
25d9bb139ed4  docker.io/tripleomaster/centos-binary-mariadb:current-tripleo                    kolla_start           About an hour ago  Up About an hour ago          mysql
96919bdb2ac2  docker.io/tripleomaster/centos-binary-haproxy:current-tripleo                    kolla_start           About an hour ago  Up About an hour ago          haproxy
8cca1f0d47cc  docker.io/tripleomaster/centos-binary-keepalived:current-tripleo                 /usr/local/bin/ko...  About an hour ago  Up About an hour ago          keepalived

I'm wondering if I4386b155a4bdba430dc350914db7a6b6fdf92ac0[1] could do that kind of thing?

Having multiple mysqld processes hitting the very same DB creates this issue in service log:
2018-11-27 15:41:37 140499793242304 [ERROR] InnoDB: Unable to lock ./ibdata1, error: 11
2018-11-27 15:41:37 140499793242304 [Note] InnoDB: Check that you do not already have another mysqld process using the same InnoDB data or log files.

Also, regarding the keystone container, if we `podman logs keystone-vmn4c9po` we can see this:
[Tue Nov 27 15:32:23.781274 2018] [alias:warn] [pid 9] AH00671: The Alias directive in /etc/httpd/conf.d/autoindex.conf at line 21 will probably never match because it overlaps an earlier Alias.
(98)Address already in use: AH00072: make_sock: could not bind to address 192.168.24.8:35357
no listening sockets available, shutting down
AH00015: Unable to open logs

Meaning the container is stopped, and we of course can't `exec` in a stopped container:
e371cac37e51  docker.io/tripleomaster/centos-binary-keystone:current-tripleo                   kolla_start           11 minutes ago     Exited (1) 11 minutes ago            keystone-vmn4c9po

[1] https://review.openstack.org/#/c/613295/

summary:

- Adding a new compute starts a new "mysql" container on controller
+ Adding a new compute starts a new service containers on controller

Cédric Jeanneret (cjeanner) on 2018-11-27

summary:

- Adding a new compute starts a new service containers on controller
+ Adding a new compute starts new service containers on controller

Revision history for this message

Alex Schultz (alex-schultz) wrote on 2018-11-29:

#2

This happens on redeploys if the existing container was run (but is dead). This seems to be a bug in paunch where the old container is not relaunch or cleaned up so paunch is creating a new container instance that follows the <containername>-<randomchars> pattern. This latter name comes from the paunch code https://github.com/openstack/paunch/blob/master/paunch/runner.py#L98-L104

Revision history for this message

Cédric Jeanneret (cjeanner) wrote on 2018-11-30:

#3

@Alex: nope, in my case, the "old" container was running as expected. Proof: keystone port was already occupied, preventing the keystone-<blah> to start and use that very same port.

Revision history for this message

Cédric Jeanneret (cjeanner) wrote on 2018-11-30:

#4

Another thing: I just deployed an undercloud, then ran a `openstack undercloud upgrade`, and I end up with the following situation:
sudo podman ps -a | grep keystone
58d0db49a66b docker.io/tripleomaster/centos-binary-keystone:618d3ab83cd319e03fac86c1d6de510ef4a5134b_be9e0d5c /bin/bash -c /usr... 13 minutes ago Up 13 minutes ago keystone_cron-8tprne4k
49cccd9d16b2 docker.io/tripleomaster/centos-binary-keystone:618d3ab83cd319e03fac86c1d6de510ef4a5134b_be9e0d5c kolla_start 13 minutes ago Exited (1) 12 minutes ago keystone-s99dch8g
17382f319bb2 docker.io/tripleomaster/centos-binary-keystone:618d3ab83cd319e03fac86c1d6de510ef4a5134b_be9e0d5c /usr/bin/bootstra... 13 minutes ago Exited (0) 13 minutes ago keystone_db_sync-srgb7fip
6ee00f10c833 docker.io/tripleomaster/centos-binary-keystone:618d3ab83cd319e03fac86c1d6de510ef4a5134b_be9e0d5c /bin/bash -c chow... 16 minutes ago Exited (0) 16 minutes ago keystone_init_log-9wayk053
8b075f747698 docker.io/tripleomaster/centos-binary-keystone:618d3ab83cd319e03fac86c1d6de510ef4a5134b_be9e0d5c /bin/bash -c /usr... About an hour ago Up About an hour ago keystone_cron
c19a540e2a3e docker.io/tripleomaster/centos-binary-keystone:618d3ab83cd319e03fac86c1d6de510ef4a5134b_be9e0d5c kolla_start About an hour ago Up About an hour ago keystone
925cc11c5d9c docker.io/tripleomaster/centos-binary-keystone:618d3ab83cd319e03fac86c1d6de510ef4a5134b_be9e0d5c /usr/bin/bootstra... About an hour ago Exited (0) About an hour ago keystone_db_sync
50ef09040c85 docker.io/tripleomaster/centos-binary-keystone:618d3ab83cd319e03fac86c1d6de510ef4a5134b_be9e0d5c /bin/bash -c chow... About an hour ago Exited (0) About an hour ago keystone_init_log

In short: I now have 2 keystone-cron containers:
13 minutes ago Up 13 minutes ago keystone_cron-8tprne4k
About an hour ago Up About an hour ago keystone_cron

Also, I have multiple, duplicated containers:
keystone_db_sync vs keystone_db_sync-srgb7fip
keystone_init_log vs keystone_db_sync-srgb7fip
keystone vs keystone-s99dch8g (exited 1 btw)

So yeah. We have a big, big issue, and idempotency is broken for some reason. I see actually two locations where we get the <random> at the end of the container name:
- the one pointed by Alex
- the other one in t-h-t "docker-puppet.py".

Another thing: I just deployed an undercloud, then ran a `openstack undercloud upgrade`, and I end up with the following situation:
sudo podman  ps -a | grep keystone
58d0db49a66b  docker.io/tripleomaster/centos-binary-keystone:618d3ab83cd319e03fac86c1d6de510ef4a5134b_be9e0d5c                   /bin/bash -c /usr...  13 minutes ago     Up 13 minutes ago                    keystone_cron-8tprne4k
49cccd9d16b2  docker.io/tripleomaster/centos-binary-keystone:618d3ab83cd319e03fac86c1d6de510ef4a5134b_be9e0d5c                   kolla_start           13 minutes ago     Exited (1) 12 minutes ago            keystone-s99dch8g
17382f319bb2  docker.io/tripleomaster/centos-binary-keystone:618d3ab83cd319e03fac86c1d6de510ef4a5134b_be9e0d5c                   /usr/bin/bootstra...  13 minutes ago     Exited (0) 13 minutes ago            keystone_db_sync-srgb7fip
6ee00f10c833  docker.io/tripleomaster/centos-binary-keystone:618d3ab83cd319e03fac86c1d6de510ef4a5134b_be9e0d5c                   /bin/bash -c chow...  16 minutes ago     Exited (0) 16 minutes ago            keystone_init_log-9wayk053
8b075f747698  docker.io/tripleomaster/centos-binary-keystone:618d3ab83cd319e03fac86c1d6de510ef4a5134b_be9e0d5c                   /bin/bash -c /usr...  About an hour ago  Up About an hour ago                 keystone_cron
c19a540e2a3e  docker.io/tripleomaster/centos-binary-keystone:618d3ab83cd319e03fac86c1d6de510ef4a5134b_be9e0d5c                   kolla_start           About an hour ago  Up About an hour ago                 keystone
925cc11c5d9c  docker.io/tripleomaster/centos-binary-keystone:618d3ab83cd319e03fac86c1d6de510ef4a5134b_be9e0d5c                   /usr/bin/bootstra...  About an hour ago  Exited (0) About an hour ago         keystone_db_sync
50ef09040c85  docker.io/tripleomaster/centos-binary-keystone:618d3ab83cd319e03fac86c1d6de510ef4a5134b_be9e0d5c                   /bin/bash -c chow...  About an hour ago  Exited (0) About an hour ago         keystone_init_log

In short: I now have 2 keystone-cron containers:
13 minutes ago     Up 13 minutes ago                    keystone_cron-8tprne4k
About an hour ago  Up About an hour ago                 keystone_cron

Also, I have multiple, duplicated containers:
keystone_db_sync vs keystone_db_sync-srgb7fip
keystone_init_log vs keystone_db_sync-srgb7fip
keystone vs keystone-s99dch8g (exited 1 btw)

So yeah. We have a big, big issue, and idempotency is broken for some reason. I see actually two locations where we get the <random> at the end of the container name:
- the one pointed by Alex
- the other one in t-h-t "docker-puppet.py".

Changed in tripleo:
status:	Incomplete → Triaged

Cédric Jeanneret (cjeanner) on 2018-11-30

Changed in tripleo:
importance:	Medium → High

Emilien Macchi (emilienm) on 2018-11-30

Changed in tripleo:
milestone:	stein-3 → stein-2

Revision history for this message

Dan Prince (dan-prince) wrote on 2018-11-30:

#5

Have there been any recent changes to the deployment identifier code? If so I'd start there.

Revision history for this message

Emilien Macchi (emilienm) wrote on 2018-11-30:

#6

Maybe https://review.openstack.org/#/c/619759/ caused the issue?

Revision history for this message

Emilien Macchi (emilienm) wrote on 2018-11-30:

#7

probably not https://review.openstack.org/#/c/619759/ - I just reproduced on the undercloud, where the stack isn't updated but recreated every time. I wonder if it's because we need https://review.openstack.org/#/c/614290/. Trying the patch now.

Revision history for this message

Emilien Macchi (emilienm) wrote on 2018-11-30:

#8

so with https://review.openstack.org/#/c/614290/ I managed to redeploy without error.

Revision history for this message

Emilien Macchi (emilienm) wrote on 2018-11-30:

#9

so when testing with current (and not current-tripleo), it doesn't work. even with https://review.openstack.org/#/c/614290/ so something really broke lately.

Revision history for this message

Emilien Macchi (emilienm) wrote on 2018-12-01:

#10

Maybe https://review.openstack.org/#/c/602969/

Revision history for this message

Cédric Jeanneret (cjeanner) wrote on 2018-12-03:

#11

nope, I doubt 602969 has any side effect... I also tested with the label w/a, but it didn't wrk.

More over, I didn't see that issue while using the docker engine - it's limited to podman only.

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2018-12-03:

#12

It should be something to the missing rename_container for podman. I can't think of more differences of docker vs podman we have in paunch.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-12-03: Fix proposed to paunch (master)

#13

Fix proposed to branch: master
Review: https://review.openstack.org/621607

Changed in tripleo:
assignee:	nobody → Bogdan Dobrelya (bogdando)
status:	Triaged → In Progress

Revision history for this message

Emilien Macchi (emilienm) wrote on 2018-12-04:

#14

I couldn't reproduce the bug with https://review.openstack.org/#/c/614290/. Closing it.

Changed in tripleo:
importance:	High → Medium
status:	In Progress → Fix Released

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2018-12-05:

#15

@Emilien, the bug is a race condition and it only may happen for a consequent executions of 'paunch apply', which attempts to rename containers firstly. It needs to be executed a hundreds of times to really confirm there is no a race any more.

Changed in tripleo:
assignee:	Bogdan Dobrelya (bogdando) → nobody
importance:	Medium → High

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-01-23: Fix merged to paunch (master)

#16

Reviewed: https://review.openstack.org/621607
Committed: https://git.openstack.org/cgit/openstack/paunch/commit/?id=510f0913539e92d2e874ae97efe0606fd277ad4b
Submitter: Zuul
Branch: master

commit 510f0913539e92d2e874ae97efe0606fd277ad4b
Author: Bogdan Dobrelya <email address hidden>
Date: Mon Dec 3 16:23:32 2018 +0100

Implement podman rename via re-apply of containers

    To w/a the missing a container rename feature of podman, implement
    renaming via removing of the original container and re-applying it from
    the same configs but using the new name.

    This fixes idempotency issues when service containers are executed
    under ephemeral names created via the paunch's unique containers names
    generator, while it is expected to have them executed under its wanted
    config names.

    Change-Id: If851604d25b6c7982d950bb9e13dceada3bfc161
    Closes-Bug: #1805410
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Bogdan Dobrelya (bogdando) on 2019-01-24

Changed in tripleo:
assignee:	nobody → Bogdan Dobrelya (bogdando)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-03-14: Fix included in openstack/paunch 4.3.0

#17

This issue was fixed in the openstack/paunch 4.3.0 release.

tripleo

Adding a new compute starts new service containers on controller

Bug Description

Other bug subscribers

Remote bug watches