undercloud_upgrade container job is failing with error stopping containers
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Fix Released
|
Critical
|
Unassigned |
Bug Description
The undercloud container upgrade jobs fails after pulling the container and trying to stop it.
stderr: Trying to pull docker.
2019-02-01 01:29:26 | "Copying blob b940d0165d05: 0 B / 7.65 MiB ",
2019-02-01 01:29:26 | "Copying blob 2c7cc8e8e73b: 0 B / 900 B ",
2019-02-01 01:29:26 | "Copying blob b940d0165d05: 2.90 MiB / 7.65 MiB ",
2019-02-01 01:29:26 | "Copying blob 2c7cc8e8e73b: 900 B / 900 B 0s",
2019-02-01 01:29:26 | "Copying blob b940d0165d05: 5.91 MiB / 7.65 MiB ",
2019-02-01 01:29:26 | "Copying blob b940d0165d05: 7.65 MiB / 7.65 MiB 0s",
2019-02-01 01:29:26 | "Copying config 6f018604a575: 0 B / 26.64 KiB ",
2019-02-01 01:29:26 | "Copying config 6f018604a575: 26.64 KiB / 26.64 KiB 0s",
2019-02-01 01:29:26 | "Error stopping container: swift_rsync_fix",
2019-02-01 01:29:26 | "stdout: ",
2019-02-01 01:29:26 | "stderr: ",
2019-02-01 01:29:26 | "Error stopping container: mistral_db_sync",
2019-02-01 01:29:26 | "stderr: INFO [alembic.
2019-02-01 01:29:26 | "INFO [alembic.
2019-02-01 01:29:26 | "INFO [alembic.
2019-02-01 01:29:26 | "Error stopping container: heat_engine_
2019-02-01 01:29:26 | "Error stopping container: neutron_
2019-02-01 01:29:26 | "stdout: \u001b[0;32mInfo: Loading facts\u001b[0m",
2019-02-01 01:29:26 | "\u001b[0;32mInfo: Loading facts\u001b[0m",
2019-02-01 01:29:26 | "\u001b[mNotice: Compiled catalog for centos-
2019-02-01 01:29:26 | "\u001b[0;32mInfo: Applying configuration version '1548984489'
2019-02-01 01:29:26 | "\u001b[mNotice: /Stage[
2019-02-01 01:29:26 | "\u001b[0;32mInfo: Class[Neutron:
2019-02-01 01:29:26 | "\u001b[0;32mInfo: Creating state file /var/lib/
2019-02-01 01:29:26 | "\u001b[mNotice: Applied catalog in 0.87 seconds\u001b[0m",
2019-02-01 01:29:26 | "stderr: \u001b[
2019-02-01 01:29:26 | " (location: /usr/share/
2019-02-01 01:29:26 | "PMD: net_mlx5: cannot load glue library: libibverbs.so.1: cannot open shared object file: No such file or directory",
2019-02-01 01:29:26 | "PMD: net_mlx5: cannot initialize PMD due to missing run-time dependency on rdma-core libraries (libibverbs, libmlx5)",
2019-02-01 01:29:26 | "PMD: net_mlx4: cannot load glue library: libibverbs.so.1: cannot open shared object file: No such file or directory",
2019-02-01 01:29:26 | "PMD: net_mlx4: cannot initialize PMD due to missing run-time dependency on rdma-core libraries (libibverbs, libmlx4)",
2019-02-01 01:29:26 | "\u001b[
2019-02-01 01:29:26 | " (file: /etc/puppet/
2019-02-01 01:29:26 | "\u001b[
2019-02-01 01:29:26 | "\u001b[
2019-02-01 01:29:26 | "\u001b[
2019-02-01 01:29:26 | "\u001b[
2019-02-01 01:29:26 | " with Stdlib:
2019-02-01 01:29:26 | " (location: /etc/puppet/
2019-02-01 01:29:26 | "Error stopping container: glance_
2019-02-01 01:29:26 | "stdout: Database is up to date. No migrations needed.",
2019-02-01 01:29:26 | "stderr: + sudo -E kolla_set_configs",
It needs to be investigated .
Changed in tripleo: | |
milestone: | stein-3 → stein-rc1 |
I can see two errors in these logs, first one in mistral:
Jan 31 23:20:03 centos- 7-rax-dfw- 0002312926 dockerd- current[ 19403]: ERROR:_ _main__ :Failed to change ownership of /var/lib/ mistral/ undercloud. conf to 42430:42430 7-rax-dfw- 0002312926 dockerd- current[ 19403]: Traceback (most recent call last): 7-rax-dfw- 0002312926 dockerd- current[ 19403]: File "/usr/local/ bin/kolla_ set_configs" , line 345, in set_perms 7-rax-dfw- 0002312926 dockerd- current[ 19403]: os.chown(path, uid, gid) 7-rax-dfw- 0002312926 dockerd- current[ 19403]: OSError: [Errno 30] Read-only file system: '/var/lib/ mistral/ undercloud. conf' 7-rax-dfw- 0002312926 dockerd- current[ 19403]: ++ cat /run_command 7-rax-dfw- 0002312926 dockerd- current[ 19403]: + CMD='/usr/ bin/mistral- server --config- file=/etc/ mistral/ mistral. conf --log-file= /var/log/ mistral/ executor. log --server=executor'
Jan 31 23:20:03 centos-
Jan 31 23:20:03 centos-
Jan 31 23:20:03 centos-
Jan 31 23:20:03 centos-
Jan 31 23:20:03 centos-
Jan 31 23:20:03 centos-
http:// logs.openstack. org/30/ 634330/ 1/check/ tripleo- ci-centos- 7-containerized -undercloud- upgrades/ 94198d2/ logs/undercloud /var/log/ journal. txt.gz# _Jan_31_ 23_20_03
As we're loading a mapping of "/home/ zuul/undercloud .conf:/ var/lib/ mistral/ undercloud. conf:ro" , the question is, why does mistral-executor try to modify the ownership of that file? I guess because of https:/ /github. com/openstack/ tripleo- heat-templates/ blob/master/ deployment/ mistral/ mistral- executor- container- puppet. yaml#L142 as 42430 matches the mistral user.
The second issue seems to be that podman and docker are conflicting with the rabbitmq container:
Jan 31 23:58:01 centos- 7-rax-dfw- 0002312926 podman[200740]: ERROR: node with name "rabbit" already running on "centos- 7-rax-dfw- 0002312926" 7-rax-dfw- 0002312926 dockerd- current[ 19403]: time="2019- 01-31T23: 58:01.980279805 Z" level=debug msg="containerd: process exited" id=1d330e1798a0 f95029dce5b52fd 1367db3dec5ee80 8c8e13261ba7d98 af64f06 pid=496924af7b1 db220ef6baeb0e7 44c55a33e67ea42 f41b725dcee06cf e24a8171 status=0 systemPid=201218 7-rax-dfw- 0002312926 systemd[1]: tripleo_ memcached. service holdoff time over, scheduling restart. 7-rax-dfw- 0002312926 dockerd- current[ 19403]: time="2019- 01-31T23: 58:01.980835032 Z" level=debug msg="containerd: process exited" id=051724265a0d 867891a3bb5a7d9 09733863843aa4e 8f7b7e2200c39ca 970f72d pid=03f035a46a4 1df768ffa6d4865 b51b0b8385a0502 b80cc356282c3d3 3d01b24b status=0 systemPid=201258 7-rax-dfw- 0002312926 systemd[1]: Stopped memcached container. 7-rax-dfw- 0002312926 dockerd- current[ 19403]: time="2019- 01-31T23: 58:01.980933265 Z" level=debug msg="libcontainerd: received containerd event: &types. Event{Type: \"start- process\ ", Id:\"7dbb11a39d 0123039336b8b10 4352a45f9d2f95a 848a973eecd44bf ff80a0435\ ", Status:0x0, Pid:\"48c7ce4ba 708ec4b835fc266 1fcb9b0c4222d27 922f449489da1b5 fa0d78004b\ ", Timestamp: (*timestamp. Timestamp) (0xc420626370) }" 7-rax-dfw- 0002312926 dockerd- current[ 19403]: time="2019- 01-31T23: 58:01.981064999 Z" level=debug msg="libcontainerd: event unhandled: type:\" start-process\ " id:\"7dbb11a39d 0123039336b. ..
Jan 31 23:58:01 centos-
Jan 31 23:58:01 centos-
Jan 31 23:58:01 centos-
Jan 31 23:58:01 centos-
Jan 31 23:58:02 centos-
Jan 31 23:58:02 centos-