OVB featureset039 all-TLS job fails in master

Bug #1843422 reported by Sagi (Sergey) Shnaidman on 2019-09-10
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Critical
Kevin Carter
tags: added: alert
tags: added: promotion-blocker
wes hayutin (weshayutin) wrote :

How is it possible that neutron fails to deploy on the controllers.. and the deployment passes??

2019-09-11 16:07:45.051 7 ERROR neutron ConfigFileValueError: Value for option bind_host from LocationInfo(location=<Locations.user: (4, True)>, detail='/etc/neutron/conf.d/neutron-server/networking-sfc.conf') is not valid: is not a valid host address

http://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039-master/c828452/logs/overcloud-controller-0/var/log/containers/neutron/server.log.txt.gz

http://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039-master/c828452/job-output.txt.gz

2019-09-11 16:12:23.713814 | primary | TASK [did the deployment pass or fail?] ****************************************
2019-09-11 16:12:23.731844 | primary | Wednesday 11 September 2019 16:12:23 +0000 (0:00:00.070) 1:26:07.932 ***
2019-09-11 16:12:23.752518 | primary | ok: [localhost] => {
2019-09-11 16:12:23.752644 | primary | "failed_when_result": false,
2019-09-11 16:12:23.752687 | primary | "overcloud_deploy_result": "passed"
2019-09-11 16:12:23.752700 | primary | }

It's not tempest fault, it means that not all services work after the deployment.
For example, the failing request of tempest:

openstack floating ip list -c ID -f value
2019-09-11 04:39:02 | HttpException: 503: Server Error for url: https://overcloud.ooo.test:13696/v2.0/floatingips, No server is available to handle this request.: 503 Service Unavailable
2019-09-11 04:39:02 | ++ openstack router list -c ID -f value
2019-09-11 04:39:04 | HttpException: 503: Server Error for url: https://overcloud.ooo.test:13696/v2.0/routers, No server is available to handle this request.: 503 Service Unavailable
http://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039-master/ed3dbd8/logs/undercloud/home/zuul/tempest.log.txt.gz

Neutron API on undercloud is full of tracebacks:
MissingAuthPlugin: An auth plugin is required to determine endpoint URL
http://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039-master/ed3dbd8/logs/undercloud/var/log/extra/podman/containers/neutron_api/stdout.log.txt.gz
But it may be red herring like in bug: https://bugs.launchpad.net/tripleo/+bug/1818943

On overcloud (where is request failing for tempest) we have neutron container restarting in loop:

2019-09-11 04:37:23.676 7 ERROR neutron ConfigFileValueError: Value for option bind_host from LocationInfo(location=<Locations.user: (4, True)>, detail='/etc/neutron/conf.d/neutron-server/networking-sfc.conf') is not valid: is not a valid host address
http://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039-master/ed3dbd8/logs/overcloud-controller-0/var/log/containers/neutron/server.log.txt.gz

Error response from daemon: Container ef1ca5227d8ec83c1fb2b5a40f789925f020decb88862c1f54914d7d19113c1b is restarting, wait until the container is running
http://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039-master/ed3dbd8/logs/overcloud-controller-0/var/log/extra/docker/containers/neutron_api/docker_info.log.txt.gz

Actually a few more containers restarting, not only neutron: http://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039-master/ed3dbd8/logs/overcloud-controller-0/var/log/extra/failed_containers.log.txt.gz

http://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039-master/ed3dbd8/logs/overcloud-controller-0/var/log/extra/failed_containers.log.txt.gz

839928a2a601 192.168.24.1:8787/tripleomaster/centos-binary-glance-api:38cd8d3cd712e205722bb77639a5a1b08d167e4a_f2e9ca6b-updated-20190911020534 "dumb-init --singl..." 8 minutes ago Restarting (5) About a minute ago glance_api
81d708a19e69 192.168.24.1:8787/tripleomaster/centos-binary-swift-proxy-server:38cd8d3cd712e205722bb77639a5a1b08d167e4a_f2e9ca6b-updated-20190911020534 "dumb-init --singl..." 8 minutes ago Restarting (1) About a minute ago swift_proxy
ef1ca5227d8e 192.168.24.1:8787/tripleomaster/centos-binary-neutron-server-ovn:38cd8d3cd712e205722bb77639a5a1b08d167e4a_f2e9ca6b-updated-20190911020534 "dumb-init --singl..." 8 minutes ago Restarting (1) About a minute ago

Glance API fails with:
ERROR: Value for option bind_host from LocationInfo(location=<Locations.user: (4, True)>, detail='/etc/glance/glance-image-import.conf') is not valid: is not a valid host address
http://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039-master/ed3dbd8/logs/overcloud-controller-0/var/log/extra/docker/containers/glance_api/stdout.log.txt.gz

Swift proxy server fails with:
No handlers could be found for logger "keystonemiddleware._common.config"
Traceback (most recent call last):
  File "/usr/bin/swift-proxy-server", line 23, in <module>
    sys.exit(run_wsgi(conf_file, 'proxy-server', **options))
  File "/usr/lib/python2.7/site-packages/swift/common/wsgi.py", line 1086, in run_wsgi
    error_msg = strategy.do_bind_ports()
  File "/usr/lib/python2.7/site-packages/swift/common/wsgi.py", line 665, in do_bind_ports
    self.sock = get_socket(self.conf)
  File "/usr/lib/python2.7/site-packages/swift/common/wsgi.py", line 163, in get_socket
    bind_addr[0], bind_addr[1], socket.AF_UNSPEC, socket.SOCK_STREAM)
  File "/usr/lib/python2.7/site-packages/eventlet/support/greendns.py", line 527, in getaddrinfo
    socktype, proto, aiflags)
socket.gaierror: [Errno -2] Name or service not known
http://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039-master/ed3dbd8/logs/overcloud-controller-0/var/log/extra/docker/containers/swift_proxy/stdout.log.txt.gz

Nathan Kinder (nkinder) wrote :

It looks like this patch introduced the failure:

https://review.opendev.org/#/c/674060/

The issue is that the older kernel-baremetal-puppet.yaml implementation was setting localhost_address, which was introduced in this patch that merged back in July:

https://review.opendev.org/#/c/668957

The new kernel-baremetal-ansible.yaml that is used now doesn't set localhost_address at all, but services such as neutron and glance require it. This causes those services to fail as seen in these logs:

https://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039-master/c828452/logs/overcloud-controller-0/var/log/extra/errors.txt.txt.gz

Changed in tripleo:
assignee: nobody → Kevin Carter (kevin-carter)
Changed in tripleo:
status: Triaged → In Progress

Reviewed: https://review.opendev.org/681933
Committed: https://git.openstack.org/cgit/openstack/tripleo-ansible/commit/?id=63380569382855770e3c284621031715c98542fa
Submitter: Zuul
Branch: master

commit 63380569382855770e3c284621031715c98542fa
Author: Kevin Carter <email address hidden>
Date: Thu Sep 12 16:15:55 2019 -0500

    Add local_address to the default hieradata on all hosts

    This change will ensure a local_address is always defined within our
    generated hieradata. This change imports existing logic into our config
    to ensure that the value of the local_address matches our expected
    values. We ensure compatibility with legacy installs by inspecting if a
    host has IPv6 enabled on the loopback device. In the event that IPv6 is
    enabled, the value of local_address will be set to "localhost" otherwise
    it will defined as "127.0.0.1".

    Closes-Bug: 1843422
    Change-Id: I20e69315bacdded4bc2d5b47e18609f130f8abc5
    Signed-off-by: Kevin Carter <email address hidden>

Changed in tripleo:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers