docker pull from ceph.io fails on overcloud nodes

Bug #1751319 reported by Tom Barron
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Unassigned

Bug Description

Description
===========
Overcloud deploy fails with Step2_Execution ERROR with resource_type: OS::Mistral::ExternalResource.
/var/log/mistral/ceph-install-workflow.log shows failure from overcloud nodes trying to docker pull docker.io/ceph/daemon:tag-stable-3.0-luminous-centos-7. Logging in to an overcloud controller, the nameservers in /etc/resolv.conf are not reachable.

Steps to reproduce
==================
Deploy undercloud using quickstart.sh downloaded as of yesterday on a classic virthost setup.

I ran:

  THT=/home/stack/tht
openstack overcloud deploy --templates $THT\
  --libvirt-type qemu --control-flavor oooq_control --compute-flavor oooq_compute --ceph-storage-flavor oooq_ceph \
  --control-scale 3 --compute-scale 1 --ceph-storage-scale 3 \
  -e /home/stack/custom.yaml \
  -e $THT/environments/disable-telemetry.yaml \
  --disable-validations \
  -e $THT/environments/docker.yaml -e docker_registry.yaml \
  -e $THT/environments/docker-ha.yaml \
  -e $THT/environments/network-isolation.yaml \
  -e $THT/environments/net-single-nic-with-vlans.yaml \
  -e /home/stack/network-environment.yaml \
  -e $THT/environments/low-memory-usage.yaml \
  -e $THT/environments/ceph-ansible/ceph-ansible.yaml \
  -e $THT/environments/ceph-ansible/ceph-mds.yaml \
  --ntp-server pool.ntp.org \
  -e $THT/environments/debug.yaml \
  -e /home/stack/deploy-env.yaml \
  --no-cleanup

/home/stack/tht is freshly cloned and has no customizations.

Expected result
===============

Overcloud will deploy successfully.

Actual result
=============

Failure as noted in the description.

Environment
===========
1. master with everything fresh as of yesterday.

2. [root@overcloud-controller-2 ~]# docker --version
Docker version 1.13.1, build 3f45913-unsupported

3. [root@overcloud-controller-2 ~]# cat /etc/resolv.conf
; generated by /usr/sbin/dhclient-script
search localdomain
nameserver 192.168.23.1
nameserver 10.10.160.2

192.168.23.1 is the IP on br-ex on the virthost
10.10.160.2 is a RedHat nameserver, to reach it you have to transit 192.168.23.1

4. 192.168.23.1 is not reachable. Indeed there is no route to it.

[root@overcloud-controller-2 ~]# ip r
10.0.0.0/24 dev vlan10 proto kernel scope link src 10.0.0.6
169.254.169.254 via 192.168.24.1 dev br-ex
172.16.0.0/24 dev vlan50 proto kernel scope link src 172.16.0.19
172.17.0.0/24 dev vlan20 proto kernel scope link src 172.17.0.19
172.18.0.0/24 dev vlan30 proto kernel scope link src 172.18.0.12
172.19.0.0/24 dev vlan40 proto kernel scope link src 172.19.0.18
172.31.0.0/24 dev docker0 proto kernel scope link src 172.31.0.1
192.168.24.0/24 dev br-ex proto kernel scope link src 192.168.24.14

5. Adding a default route as found with other deployments is not by itself sufficient:

[root@overcloud-controller-2 ~]# ip route add default via 10.0.0.1 dev vlan10

6. However, the default route plus using a big hammer on iptables on the undercloud

[root@undercloud]# iptables --policy FORWARD ACCEPT

is sufficient, afterwards:

[root@overcloud-controller-2 ~]# ip r
default via 10.0.0.1 dev vlan10
10.0.0.0/24 dev vlan10 proto kernel scope link src 10.0.0.6
169.254.169.254 via 192.168.24.1 dev br-ex
172.16.0.0/24 dev vlan50 proto kernel scope link src 172.16.0.19
172.17.0.0/24 dev vlan20 proto kernel scope link src 172.17.0.19
172.18.0.0/24 dev vlan30 proto kernel scope link src 172.18.0.12
172.19.0.0/24 dev vlan40 proto kernel scope link src 172.19.0.18
172.31.0.0/24 dev docker0 proto kernel scope link src 172.31.0.1

[root@overcloud-controller-2 ~]# docker pull docker.io/ceph/daemon:tag-stable-3.0-luminous-centos-7
Trying to pull repository docker.io/ceph/daemon ...
tag-stable-3.0-luminous-centos-7: Pulling from docker.io/ceph/daemon
af4b0a2388c6: Already exists
e7c4d76f7e7f: Pull complete
399a492126ac: Pull complete
2f1a9602903e: Pull complete
ebd6704c1ee1: Pull complete
90badf46365f: Pull complete
959af06d9a0d: Pull complete
7bca9530c0bf: Pull complete
Digest: sha256:bf56b756eb0bf0aa37c9335fb2eca16ca5198c13a29e3f3b267c14e6473df3f3
Status: Downloaded newer image for docker.io/ceph/daemon:tag-stable-3.0-luminous-centos-7

Tom Barron (tpb)
Changed in tripleo:
status: New → Triaged
importance: Undecided → High
milestone: none → queens-rc1
Changed in tripleo:
milestone: queens-rc1 → rocky-1
Changed in tripleo:
milestone: rocky-1 → rocky-2
Revision history for this message
Tom Barron (tpb) wrote :

The docker pull commands to overcloud failed because the overcloud nodes in a virthost setup lacked a default route. This lack turned out to be a temporary problem in tripleo-heat-templates, fixed I think by commit ad8446ecc0d1058060b9734a65c19d797255678a.

Revision history for this message
Tom Barron (tpb) wrote :

https://review.openstack.org/#/c/551402/ fixes this issue in master.

Revision history for this message
Tom Barron (tpb) wrote :

https://review.openstack.org/#/c/552449/ fixes this issue in stable/queens

Changed in tripleo:
status: Triaged → Fix Committed
Changed in tripleo:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.