openstack overlay deploy failed with docker error "not a shared mount"

Bug #1789836 reported by Diego Abelenda
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Triaged
Low
Unassigned

Bug Description

Description
===========
I tried to deploy a new compute node, on tripleO pike containerized setup. At ComputeDeployment_Step3 I get an error that comes from docker. Apparently the container definition uses multiple mount bind and docker does not like how they are setup.

Steps to reproduce
==================

* add a new node to ironic and run introspection
* configure the node ComputeCount to add the new node (becomes node with index 3)
* execute openstack --verbose overcloud deploy --templates ./openstack-tripleo-heat-templates -e ./openstack-tripleo-heat-templates/environments/network-isolation.yaml -e ./openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-dns.yaml -e ./openstack-tripleo-heat-templates/environments/storage/enable-ceph.yaml -e ./openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml -e ./openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-mds.yaml -e ./openstack-tripleo-heat-templates/environments/manila-cephfsnative-config.yaml -e ./lab-specifics.yaml -e ./openstack-tripleo-heat-templates/environments/docker.yaml -e ./openstack-tripleo-heat-templates/environments/docker-ha.yaml -e ./openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-rgw.yaml -e ./docker_registry.yaml -e ./environment.yaml

Expected result
===============

Deploys the new compute node.

Actual result
=============

[...]
2018-08-30 07:02:39Z [overcloud]: UPDATE_FAILED resources.AllNodesDeploySteps: resources.ComputeDeployment_Step3: Error: resources[3]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2

 Stack overcloud UPDATE_FAILED

START with options: ['stack', 'failures', 'list', 'overcloud']
command: stack failures list -> heatclient.osc.v1.stack_failures.ListStackFailures (auth=True)
Using auth plugin: password
overcloud.AllNodesDeploySteps.ComputeDeployment_Step3.3:
  resource_type: OS::Heat::StructuredDeployment
  physical_resource_id: 60be40f9-2f21-4028-9e07-5d302d3328b7
  status: CREATE_FAILED
  status_reason: |
    Error: resources[3]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
  deploy_stdout: |
    ...
            "stderr: /usr/bin/docker-current: Error response from daemon: linux mounts: Path /var/lib/nova is mounted on / but it is not a shared mount..",
            "stdout: cc736b406ca33c5492b0a61ed2350649efa15c9318d56c2dcf924fb0676e1158",
            "stderr: "
        ]
    }
     to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/b879d494-7f55-426c-8449-849aec4def89_playbook.retry

    PLAY RECAP *********************************************************************
    localhost : ok=7 changed=2 unreachable=0 failed=1

    (truncated, view all with --long)
  deploy_stderr: |

overcloud.AllNodesDeploySteps.ControllerDeployment_Step3:
  resource_type: OS::TripleO::DeploymentSteps
  physical_resource_id: 64a6209e-72c4-4867-9394-5b734d2efa7f
  status: UPDATE_FAILED
  status_reason: |
    UPDATE aborted
END return value: 0
Heat Stack update failed.
Heat Stack update failed.
END return value: 1

Environment
===========

TripleO pike containerized.

packages versions on undercloud: $ rpm -qa | grep openstack
  openstack-utils-2017.1-1.el7.noarch
  openstack-mistral-common-5.2.3-0.20180228103248.0070b25.el7.centos.noarch
  openstack-nova-compute-16.1.1-0.20180308172728.52c0b58.el7.centos.noarch
  openstack-tripleo-common-containers-7.6.11-0.20180308221203.f4db591.el7.centos.noarch
  openstack-nova-api-16.1.1-0.20180308172728.52c0b58.el7.centos.noarch
  openstack-neutron-11.0.3-0.20180308143810.7cb197f.el7.centos.noarch
  openstack-heat-api-cfn-9.0.4-0.20180219193445.2d865ff.el7.centos.noarch
  openstack-swift-account-2.15.2-0.20171207090718.0ff2d5e.el7.centos.noarch
  openstack-mistral-api-5.2.3-0.20180228103248.0070b25.el7.centos.noarch
  openstack-zaqar-5.0.1-0.20180308005637.56dcd81.el7.centos.noarch
  openstack-neutron-openvswitch-11.0.3-0.20180308143810.7cb197f.el7.centos.noarch
  openstack-heat-api-9.0.4-0.20180219193445.2d865ff.el7.centos.noarch
  openstack-swift-object-2.15.2-0.20171207090718.0ff2d5e.el7.centos.noarch
  openstack-tripleo-puppet-elements-7.0.6-0.20180307121447.246581f.el7.centos.noarch
  openstack-puppet-modules-11.0.0-0.20170828113153.71ad01c.el7.centos.noarch
  openstack-glance-15.0.1-1.el7.noarch
  openstack-mistral-executor-5.2.3-0.20180228103248.0070b25.el7.centos.noarch
  openstack-ironic-conductor-9.1.4-0.20180309154721.93012b2.el7.centos.noarch
  openstack-selinux-0.8.14-0.20180207211247.4e6703e.el7.centos.noarch
  openstack-tripleo-ui-7.4.8-0.20180308120932.155d0ef.el7.centos.noarch
  openstack-nova-common-16.1.1-0.20180308172728.52c0b58.el7.centos.noarch
  openstack-mistral-engine-5.2.3-0.20180228103248.0070b25.el7.centos.noarch
  puppet-openstack_extras-11.5.1-0.20180129213833.5c35c74.el7.centos.noarch
  openstack-ironic-common-9.1.4-0.20180309154721.93012b2.el7.centos.noarch
  puppet-openstacklib-11.5.1-0.20180127182713.2908b1f.el7.centos.noarch
  openstack-keystone-12.0.1-0.20180220081935.6de0a14.el7.centos.noarch
  openstack-nova-conductor-16.1.1-0.20180308172728.52c0b58.el7.centos.noarch
  openstack-ironic-inspector-6.0.2-0.20180213213641.3582878.el7.centos.noarch
  openstack-kolla-5.0.2-0.20180307182749.8c82e84.el7.centos.noarch
  python2-openstackclient-3.12.0-1.el7.noarch
  openstack-swift-container-2.15.2-0.20171207090718.0ff2d5e.el7.centos.noarch
  openstack-tempest-17.2.0-4.el7.noarch
  openstack-ironic-api-9.1.4-0.20180309154721.93012b2.el7.centos.noarch
  openstack-tripleo-image-elements-7.0.4-0.20180307130233.a008812.el7.centos.noarch
  openstack-tripleo-common-7.6.11-0.20180308221203.f4db591.el7.centos.noarch
  openstack-neutron-ml2-11.0.3-0.20180308143810.7cb197f.el7.centos.noarch
  python-openstackclient-lang-3.12.0-1.el7.noarch
  openstack-heat-engine-9.0.4-0.20180219193445.2d865ff.el7.centos.noarch
  openstack-swift-proxy-2.15.2-0.20171207090718.0ff2d5e.el7.centos.noarch
  openstack-neutron-common-11.0.3-0.20180308143810.7cb197f.el7.centos.noarch
  openstack-nova-placement-api-16.1.1-0.20180308172728.52c0b58.el7.centos.noarch
  openstack-heat-common-9.0.4-0.20180219193445.2d865ff.el7.centos.noarch
  python2-openstacksdk-0.9.17-1.el7.noarch
  openstack-tripleo-heat-templates-7.0.11-0.20180308114108.5ecab0c.el7.centos.noarch
  openstack-tripleo-validations-7.4.8-0.20180309163320.bc5452a.el7.centos.noarch
  openstack-nova-scheduler-16.1.1-0.20180308172728.52c0b58.el7.centos.noarch

Network is Neutron with OVS
Local storage for compute nodes has no LVM layer
Ceph is deployed through tripleo.

Changed in tripleo:
status: New → Triaged
importance: Undecided → Low
milestone: none → stein-1
Revision history for this message
Diego Abelenda (aaj6xu7ugcbx75sq) wrote :

I tried to update the undercloud and overcloud. But when trying to add the new node I still get the error...

packages versions:

$ rpm -qa | grep openstack
  openstack-tripleo-common-7.6.16-0.20180829163240.65bb298.el7.noarch
  openstack-utils-2017.1-1.el7.noarch
  openstack-nova-compute-16.1.5-0.20180825055616.b612a63.el7.noarch
  openstack-swift-object-2.15.2-0.20180803103947.5da8adb.el7.noarch
  python2-openstackclient-3.12.1-1.el7.noarch
  openstack-nova-scheduler-16.1.5-0.20180825055616.b612a63.el7.noarch
  openstack-selinux-0.8.15-0.20180524133807.b63283a.el7.noarch
  openstack-swift-account-2.15.2-0.20180803103947.5da8adb.el7.noarch
  openstack-mistral-common-5.2.5-0.20180820122407.2b7e529.el7.noarch
  openstack-neutron-openvswitch-11.0.6-0.20180813130436.b87eb48.el7.noarch
  openstack-mistral-executor-5.2.5-0.20180820122407.2b7e529.el7.noarch
  openstack-zaqar-5.0.1-0.20180308005637.56dcd81.el7.centos.noarch
  python-openstackclient-lang-3.12.1-1.el7.noarch
  openstack-neutron-ml2-11.0.6-0.20180813130436.b87eb48.el7.noarch
  openstack-mistral-api-5.2.5-0.20180820122407.2b7e529.el7.noarch
  openstack-tripleo-image-elements-7.0.6-0.20180830012750.0e177c3.el7.noarch
  openstack-puppet-modules-11.0.0-0.20170828113153.71ad01c.el7.centos.noarch
  openstack-ironic-common-9.1.6-0.20180822064811.6222091.el7.noarch
  openstack-nova-common-16.1.5-0.20180825055616.b612a63.el7.noarch
  openstack-heat-api-9.0.5-0.20180808225814.e4312ec.el7.noarch
  openstack-nova-placement-api-16.1.5-0.20180825055616.b612a63.el7.noarch
  openstack-ironic-api-9.1.6-0.20180822064811.6222091.el7.noarch
  puppet-openstack_extras-11.5.1-0.20180129213833.5c35c74.el7.centos.noarch
  openstack-swift-container-2.15.2-0.20180803103947.5da8adb.el7.noarch
  openstack-keystone-12.0.1-1.el7.noarch
  openstack-neutron-11.0.6-0.20180813130436.b87eb48.el7.noarch
  openstack-kolla-5.0.4-0.20180824060508.98276e8.el7.noarch
  puppet-openstacklib-11.5.1-0.20180127182713.2908b1f.el7.centos.noarch
  openstack-neutron-common-11.0.6-0.20180813130436.b87eb48.el7.noarch
  openstack-tripleo-heat-templates-7.0.16-0.20180830013518.a207af7.el7.noarch
  openstack-mistral-engine-5.2.5-0.20180820122407.2b7e529.el7.noarch
  openstack-tempest-17.2.0-4.el7.noarch
  openstack-heat-engine-9.0.5-0.20180808225814.e4312ec.el7.noarch
  openstack-ironic-conductor-9.1.6-0.20180822064811.6222091.el7.noarch
  openstack-heat-common-9.0.5-0.20180808225814.e4312ec.el7.noarch
  openstack-glance-15.0.2-0.20180627162607.a4562ab.el7.noarch
  openstack-tripleo-puppet-elements-7.0.8-1.el7.noarch
  openstack-nova-conductor-16.1.5-0.20180825055616.b612a63.el7.noarch
  openstack-swift-proxy-2.15.2-0.20180803103947.5da8adb.el7.noarch
  python2-openstacksdk-0.9.17-1.el7.noarch
  openstack-heat-api-cfn-9.0.5-0.20180808225814.e4312ec.el7.noarch
  openstack-tripleo-common-containers-7.6.16-0.20180829163240.65bb298.el7.noarch
  openstack-nova-api-16.1.5-0.20180825055616.b612a63.el7.noarch
  openstack-ironic-inspector-6.0.4-0.20180822161323.ef25a98.el7.noarch
  openstack-tripleo-validations-7.4.11-0.20180829215232.050a26f.el7.noarch
  openstack-tripleo-ui-7.4.8-0.20180830012954.bb8db4b.el7.noarch

Revision history for this message
Diego Abelenda (aaj6xu7ugcbx75sq) wrote :
Download full text (5.5 KiB)

The paunch configuration for stage3 is here (it seems to contain the :shared for all bind of /var/lib/nova):

{"neutron_ovs_bridge": {"image": "satellite.cloud.camptocamp.com:5000/default_organization-staging-centos-7-pike-docker-tripleopike-centos-binary-neutron-server:current-tripleo", "pid": "host", "environment": ["KOLLA_CONFIG_STRATEGY=COPY_ALWAYS", "TRIPLEO_CONFIG_HASH=07d328cce962ed18e71edfc4b5e0f1e5"], "command": ["puppet", "apply", "--modulepath", "/etc/puppet/modules:/usr/share/openstack-puppet/modules", "--tags", "file,file_line,concat,augeas,neutron::plugins::ovs::bridge", "-v", "-e", "include neutron::agents::ml2::ovs"], "user": "root", "volumes": ["/etc/hosts:/etc/hosts:ro", "/etc/localtime:/etc/localtime:ro", "/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro", "/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro", "/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro", "/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro", "/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro", "/dev/log:/dev/log", "/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro", "/etc/puppet:/etc/puppet:ro", "/var/lib/kolla/config_files/neutron_ovs_agent.json:/var/lib/kolla/config_files/config.json:ro", "/var/lib/config-data/puppet-generated/neutron/:/var/lib/kolla/config_files/src:ro", "/lib/modules:/lib/modules:ro", "/run/openvswitch:/run/openvswitch", "/etc/puppet:/etc/puppet:ro", "/usr/share/openstack-puppet/modules/:/usr/share/openstack-puppet/modules/:ro", "/var/run/openvswitch/db.sock:/var/run/openvswitch/db.sock"], "net": "host", "detach": false, "privileged": true}, "nova_libvirt": {"start_order": 1, "image": "satellite.cloud.camptocamp.com:5000/default_organization-staging-centos-7-pike-docker-tripleopike-centos-binary-nova-libvirt:current-tripleo", "pid": "host", "environment": ["KOLLA_CONFIG_STRATEGY=COPY_ALWAYS", "TRIPLEO_CONFIG_HASH=173a8fa792aecdcfa940e748d575c78c"], "volumes": ["/etc/hosts:/etc/hosts:ro", "/etc/localtime:/etc/localtime:ro", "/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro", "/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro", "/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro", "/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro", "/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro", "/dev/log:/dev/log", "/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro", "/etc/puppet:/etc/puppet:ro", "/var/lib/kolla/config_files/nova_libvirt.json:/var/lib/kolla/config_files/config.json:ro", "/var/lib/config-data/puppet-generated/nova_libvirt/:/var/lib/kolla/config_files/src:ro", "/etc/ceph:/var/lib/kolla/config_files/src-ceph:ro", "/lib/modules:/lib/modules:ro", "/dev:/dev", "/run:/run", "/sys/fs/cgroup:/sys/fs/cgroup", "/var/lib/nova:/var/lib/nova:shared", "/etc/libvirt:/etc/libvirt", "/var/run/libvirt:/var/run/libvirt", "/var/lib/libvirt:/var/lib/libvirt", "/var/log/libvirt/qemu:/var/log/libvirt/qemu:ro", "/var/log/containers/nova:/var/log/nova", "/var/lib/vhost_sockets:/var/lib/vhost_sockets", "/sys/fs/selinux:/sys/fs/selinux"], "net": "host", "privileged": true, "restart": "always"}, "iscsi...

Read more...

Revision history for this message
Diego Abelenda (aaj6xu7ugcbx75sq) wrote :
Download full text (11.9 KiB)

Does not work on the new compute node:
[root@lab-compute-3 heat-admin]# mount -t xfs
/dev/vda3 on / type xfs (rw,relatime,seclabel,attr2,inode64,noquota)

[root@lab-compute-3 heat-admin]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
dc856c2df1d7 satellite.cloud.camptocamp.com:5000/default_organization-staging-centos-7-pike-docker-tripleopike-centos-binary-nova-libvirt:current-tripleo "kolla_start" 7 minutes ago Created nova_libvirt
7a30cee89559 satellite.cloud.camptocamp.com:5000/default_organization-staging-centos-7-pike-docker-tripleopike-centos-binary-nova-libvirt:current-tripleo "kolla_start" 7 minutes ago Created nova_virtlogd
3e82871d270d satellite.cloud.camptocamp.com:5000/default_organization-staging-centos-7-pike-docker-tripleopike-centos-binary-iscsid:current-tripleo "kolla_start" 22 hours ago Up 22 hours iscsid
a85683d2e60c satellite.cloud.camptocamp.com:5000/default_organization-staging-centos-7-pike-docker-tripleopike-centos-binary-neutron-server:current-tripleo "puppet apply --modul" 22 hours ago Exited (0) 22 hours ago neutron_ovs_bridge

Works on an old compute node:
[root@lab-compute-2 heat-admin]# mount -t xfs
/dev/vda3 on / type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
/dev/vda3 on /var/lib/docker/containers type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
/dev/vda3 on /var/lib/docker/overlay2 type xfs (rw,relatime,seclabel,attr2,inode64,noquota)

[root@lab-compute-2 heat-admin]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d0fdbd7ec038 satellite.cloud.camptocamp.com:5000/default_organization-staging-centos-7-pike-docker-tripleopike-centos-binary-nova-libvirt:current-tripleo "kolla_start" 22 hours ago Up 22 hours nova_libvirt
ea99d2ec8dbf satellite.cloud.camptocamp.com:5000/default_organization-staging-centos-7-pike-docker-tripleopike-centos-binary-nova-libvirt:current-tripleo "kolla_start" 22 hours ago Up 22 hours nova_virtlogd
6c2679ee6815 satellite.cloud.camptocamp.com:5000/default_organization-staging-centos-7-pike-docker-tripleopike-centos-binary-neutron-openvswitch-agent:current-tripleo "kolla_start" 3 days ago Up 25 hours (healthy) neutron_ovs_agent
67846c8e9b06 satellite.cloud.camptocamp.com:5000/default_organization-staging-centos-7-pike-docker-tripleopike-centos-binary-cron:current-tripleo ...

Revision history for this message
Diego Abelenda (aaj6xu7ugcbx75sq) wrote :

"docker info" has some suspect things:
[root@lab-compute-3 heat-admin]# docker info
Containers: 4
 Running: 1
 Paused: 0
 Stopped: 3
Images: 7
Server Version: 1.12.6
Storage Driver: overlay2
 Backing Filesystem: xfs
Logging Driver: journald
Cgroup Driver: systemd
Plugins:
 Volume: local
 Network: overlay null host bridge
Swarm: inactive
Runtimes: docker-runc runc
Default Runtime: docker-runc
Security Options: seccomp
Kernel Version: 3.10.0-693.2.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
Number of Docker Hooks: 3
CPUs: 6
Total Memory: 23.39 GiB
Name: lab-compute-3
ID: CVHV:HFK5:WV6P:N2BV:AUTW:TDBW:JLKX:AEJN:XR5T:SARL:5ZVC:2XMO
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Insecure Registries:
 127.0.0.0/8
Registries: docker.io (secure)

[root@lab-compute-2 heat-admin]# docker info
Containers: 10
 Running: 8
 Paused: 0
 Stopped: 2
Images: 18
Server Version: 1.13.1
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: journald
Cgroup Driver: systemd
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: docker-runc runc
Default Runtime: docker-runc
Init Binary: /usr/libexec/docker/docker-init-current
containerd version: (expected: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1)
runc version: 5eda6f6fd0c2884c2c8e78a6e7119e8d0ecedb77 (expected: 9df8b306d01f59d3a8029be411de015b7304dd8f)
init version: fec3683b971d9c3ef73f284f176672c44b448662 (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
 seccomp
  WARNING: You're not using the default seccomp profile
  Profile: /etc/docker/seccomp.json
Kernel Version: 3.10.0-862.11.6.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
Number of Docker Hooks: 3
CPUs: 6
Total Memory: 23.39 GiB
Name: lab-compute-2.cloud.camptocamp.com
ID: XKMD:ZPU7:XA35:NSV7:TH35:WBTX:2UKO:JMMB:NZAU:THOV:UP7T:WWSX
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: true
Registries: docker.io (secure)

Revision history for this message
Diego Abelenda (aaj6xu7ugcbx75sq) wrote :

So apparently I needed to update the deployment images because de deployment process of a new node does NOT execute yum update... I will try again with deployment images updated...

Revision history for this message
Diego Abelenda (aaj6xu7ugcbx75sq) wrote :

Ok. After updating the deployment images. I finally managed to insert the new node into the cluster.

You can close this bug.

Changed in tripleo:
milestone: stein-1 → stein-2
Changed in tripleo:
milestone: stein-2 → stein-3
Changed in tripleo:
milestone: stein-3 → train-1
Changed in tripleo:
milestone: train-1 → train-2
Changed in tripleo:
milestone: train-2 → train-3
Changed in tripleo:
milestone: train-3 → ussuri-1
Changed in tripleo:
milestone: ussuri-1 → ussuri-2
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-2 → ussuri-3
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-3 → ussuri-rc3
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-rc3 → victoria-1
Changed in tripleo:
milestone: victoria-1 → victoria-3
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.