undercloud_upgrade container job is failing with error stopping containers

Bug #1814223 reported by chandan kumar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

From the logs http://logs.openstack.org/98/604298/207/check/tripleo-ci-centos-7-containerized-undercloud-upgrades/09a2b9d/logs/undercloud/home/zuul/undercloud_upgrade.log.txt.gz#_2019-02-01_01_29_26

The undercloud container upgrade jobs fails after pulling the container and trying to stop it.
stderr: Trying to pull docker.io/tripleomaster/centos-binary-swift-object:e4a542b9a3ea6f459605ffbaa3c8af97eb81921f_a9a57ed8...Getting image source signatures",
2019-02-01 01:29:26 | "Copying blob b940d0165d05: 0 B / 7.65 MiB ",
2019-02-01 01:29:26 | "Copying blob 2c7cc8e8e73b: 0 B / 900 B ",
2019-02-01 01:29:26 | "Copying blob b940d0165d05: 2.90 MiB / 7.65 MiB ",
2019-02-01 01:29:26 | "Copying blob 2c7cc8e8e73b: 900 B / 900 B 0s",
2019-02-01 01:29:26 | "Copying blob b940d0165d05: 5.91 MiB / 7.65 MiB ",
2019-02-01 01:29:26 | "Copying blob b940d0165d05: 7.65 MiB / 7.65 MiB 0s",
2019-02-01 01:29:26 | "Copying config 6f018604a575: 0 B / 26.64 KiB ",
2019-02-01 01:29:26 | "Copying config 6f018604a575: 26.64 KiB / 26.64 KiB 0s",
2019-02-01 01:29:26 | "Error stopping container: swift_rsync_fix",
2019-02-01 01:29:26 | "stdout: ",
2019-02-01 01:29:26 | "stderr: ",
2019-02-01 01:29:26 | "Error stopping container: mistral_db_sync",
2019-02-01 01:29:26 | "stderr: INFO [alembic.runtime.migration] Context impl MySQLImpl.",
2019-02-01 01:29:26 | "INFO [alembic.runtime.migration] Will assume non-transactional DDL.",
2019-02-01 01:29:26 | "INFO [alembic.runtime.migration] Running upgrade 030 -> 031, Add started_at and finished_at to task execution",
2019-02-01 01:29:26 | "Error stopping container: heat_engine_db_sync",
2019-02-01 01:29:26 | "Error stopping container: neutron_ovs_bridge",
2019-02-01 01:29:26 | "stdout: \u001b[0;32mInfo: Loading facts\u001b[0m",
2019-02-01 01:29:26 | "\u001b[0;32mInfo: Loading facts\u001b[0m",
2019-02-01 01:29:26 | "\u001b[mNotice: Compiled catalog for centos-7-rax-ord-0002314744.localdomain in environment production in 1.69 seconds\u001b[0m",
2019-02-01 01:29:26 | "\u001b[0;32mInfo: Applying configuration version '1548984489'\u001b[0m",
2019-02-01 01:29:26 | "\u001b[mNotice: /Stage[main]/Neutron::Agents::Ml2::Ovs/Neutron::Plugins::Ovs::Bridge[ctlplane:br-ctlplane]/Vs_bridge[br-ctlplane]/external_ids: external_ids changed 'PMD: net_mlx5: cannot load glue library: libibverbs.so.1: cannot open shared object file: No such file or directory,PMD: net_mlx5: cannot initialize PMD due to missing run-time dependency on rdma-core libraries (libibverbs, libmlx5),PMD: net_mlx4: cannot load glue library: libibverbs.so.1: cannot open shared object file: No such file or directory,PMD: net_mlx4: cannot initialize PMD due to missing run-time dependency on rdma-core libraries (libibverbs, libmlx4),bridge-id=br-ctlplane' to 'bridge-id=br-ctlplane'\u001b[0m",
2019-02-01 01:29:26 | "\u001b[0;32mInfo: Class[Neutron::Agents::Ml2::Ovs]: Unscheduling all events on Class[Neutron::Agents::Ml2::Ovs]\u001b[0m",
2019-02-01 01:29:26 | "\u001b[0;32mInfo: Creating state file /var/lib/puppet/state/state.yaml\u001b[0m",
2019-02-01 01:29:26 | "\u001b[mNotice: Applied catalog in 0.87 seconds\u001b[0m",
2019-02-01 01:29:26 | "stderr: \u001b[1;33mWarning: Support for ruby version 2.0.0 is deprecated and will be removed in a future release. See https://puppet.com/docs/puppet/latest/system_requirements.html for a list of supported ruby versions.",
2019-02-01 01:29:26 | " (location: /usr/share/ruby/vendor_ruby/puppet.rb:130:in `<module:Puppet>')\u001b[0m",
2019-02-01 01:29:26 | "PMD: net_mlx5: cannot load glue library: libibverbs.so.1: cannot open shared object file: No such file or directory",
2019-02-01 01:29:26 | "PMD: net_mlx5: cannot initialize PMD due to missing run-time dependency on rdma-core libraries (libibverbs, libmlx5)",
2019-02-01 01:29:26 | "PMD: net_mlx4: cannot load glue library: libibverbs.so.1: cannot open shared object file: No such file or directory",
2019-02-01 01:29:26 | "PMD: net_mlx4: cannot initialize PMD due to missing run-time dependency on rdma-core libraries (libibverbs, libmlx4)",
2019-02-01 01:29:26 | "\u001b[1;33mWarning: /etc/puppet/hiera.yaml: Use of 'hiera.yaml' version 3 is deprecated. It should be converted to version 5",
2019-02-01 01:29:26 | " (file: /etc/puppet/hiera.yaml)\u001b[0m",
2019-02-01 01:29:26 | "\u001b[1;33mWarning: Undefined variable '::deploy_config_name'; \\n (file & line not available)\u001b[0m",
2019-02-01 01:29:26 | "\u001b[1;33mWarning: ModuleLoader: module 'neutron' has unresolved dependencies - it will only see those that are resolved. Use 'puppet module list --tree' to see information about modules\\n (file & line not available)\u001b[0m",
2019-02-01 01:29:26 | "\u001b[1;33mWarning: ModuleLoader: module 'openstacklib' has unresolved dependencies - it will only see those that are resolved. Use 'puppet module list --tree' to see information about modules\\n (file & line not available)\u001b[0m",
2019-02-01 01:29:26 | "\u001b[1;33mWarning: This method is deprecated, please use the stdlib validate_legacy function,",
2019-02-01 01:29:26 | " with Stdlib::Compat::Array. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/neutron/manifests/agents/ml2/ovs.pp\", 214]:[\"unknown\", 1]",
2019-02-01 01:29:26 | " (location: /etc/puppet/modules/stdlib/lib/puppet/functions/deprecation.rb:28:in `deprecation')\u001b[0m",
2019-02-01 01:29:26 | "Error stopping container: glance_api_db_sync",
2019-02-01 01:29:26 | "stdout: Database is up to date. No migrations needed.",
2019-02-01 01:29:26 | "stderr: + sudo -E kolla_set_configs",

It needs to be investigated .

Revision history for this message
Jose Luis Franco (jfrancoa) wrote :
Download full text (3.4 KiB)

I can see two errors in these logs, first one in mistral:

Jan 31 23:20:03 centos-7-rax-dfw-0002312926 dockerd-current[19403]: ERROR:__main__:Failed to change ownership of /var/lib/mistral/undercloud.conf to 42430:42430
Jan 31 23:20:03 centos-7-rax-dfw-0002312926 dockerd-current[19403]: Traceback (most recent call last):
Jan 31 23:20:03 centos-7-rax-dfw-0002312926 dockerd-current[19403]: File "/usr/local/bin/kolla_set_configs", line 345, in set_perms
Jan 31 23:20:03 centos-7-rax-dfw-0002312926 dockerd-current[19403]: os.chown(path, uid, gid)
Jan 31 23:20:03 centos-7-rax-dfw-0002312926 dockerd-current[19403]: OSError: [Errno 30] Read-only file system: '/var/lib/mistral/undercloud.conf'
Jan 31 23:20:03 centos-7-rax-dfw-0002312926 dockerd-current[19403]: ++ cat /run_command
Jan 31 23:20:03 centos-7-rax-dfw-0002312926 dockerd-current[19403]: + CMD='/usr/bin/mistral-server --config-file=/etc/mistral/mistral.conf --log-file=/var/log/mistral/executor.log --server=executor'

http://logs.openstack.org/30/634330/1/check/tripleo-ci-centos-7-containerized-undercloud-upgrades/94198d2/logs/undercloud/var/log/journal.txt.gz#_Jan_31_23_20_03

As we're loading a mapping of "/home/zuul/undercloud.conf:/var/lib/mistral/undercloud.conf:ro", the question is, why does mistral-executor try to modify the ownership of that file? I guess because of https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/mistral/mistral-executor-container-puppet.yaml#L142 as 42430 matches the mistral user.

The second issue seems to be that podman and docker are conflicting with the rabbitmq container:

Jan 31 23:58:01 centos-7-rax-dfw-0002312926 podman[200740]: ERROR: node with name "rabbit" already running on "centos-7-rax-dfw-0002312926"
Jan 31 23:58:01 centos-7-rax-dfw-0002312926 dockerd-current[19403]: time="2019-01-31T23:58:01.980279805Z" level=debug msg="containerd: process exited" id=1d330e1798a0f95029dce5b52fd1367db3dec5ee808c8e13261ba7d98af64f06 pid=496924af7b1db220ef6baeb0e744c55a33e67ea42f41b725dcee06cfe24a8171 status=0 systemPid=201218
Jan 31 23:58:01 centos-7-rax-dfw-0002312926 systemd[1]: tripleo_memcached.service holdoff time over, scheduling restart.
Jan 31 23:58:01 centos-7-rax-dfw-0002312926 dockerd-current[19403]: time="2019-01-31T23:58:01.980835032Z" level=debug msg="containerd: process exited" id=051724265a0d867891a3bb5a7d909733863843aa4e8f7b7e2200c39ca970f72d pid=03f035a46a41df768ffa6d4865b51b0b8385a0502b80cc356282c3d33d01b24b status=0 systemPid=201258
Jan 31 23:58:01 centos-7-rax-dfw-0002312926 systemd[1]: Stopped memcached container.
Jan 31 23:58:02 centos-7-rax-dfw-0002312926 dockerd-current[19403]: time="2019-01-31T23:58:01.980933265Z" level=debug msg="libcontainerd: received containerd event: &types.Event{Type:\"start-process\", Id:\"7dbb11a39d0123039336b8b104352a45f9d2f95a848a973eecd44bfff80a0435\", Status:0x0, Pid:\"48c7ce4ba708ec4b835fc2661fcb9b0c4222d27922f449489da1b5fa0d78004b\", Timestamp:(*timestamp.Timestamp)(0xc420626370)}"
Jan 31 23:58:02 centos-7-rax-dfw-0002312926 dockerd-current[19403]: time="2019-01-31T23:58:01.981064999Z" level=debug msg="libcontainerd: event unhandled: type:\"start-process\" id:\"7dbb11a39d0123039336b...

Read more...

Revision history for this message
Jose Luis Franco (jfrancoa) wrote :

I opened another bug for the undercloud.conf issue mentioned in Comment #1 https://bugs.launchpad.net/tripleo/+bug/1814275

Changed in tripleo:
milestone: stein-3 → stein-rc1
Revision history for this message
wes hayutin (weshayutin) wrote :
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.