OVB featureset039 all-TLS job fails in master

Bug #1843422 reported by Sagi (Sergey) Shnaidman
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Kevin Carter
tags: added: alert
tags: added: promotion-blocker
Revision history for this message
wes hayutin (weshayutin) wrote :
Revision history for this message
wes hayutin (weshayutin) wrote :
Revision history for this message
wes hayutin (weshayutin) wrote :

How is it possible that neutron fails to deploy on the controllers.. and the deployment passes??

2019-09-11 16:07:45.051 7 ERROR neutron ConfigFileValueError: Value for option bind_host from LocationInfo(location=<Locations.user: (4, True)>, detail='/etc/neutron/conf.d/neutron-server/networking-sfc.conf') is not valid: is not a valid host address

http://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039-master/c828452/logs/overcloud-controller-0/var/log/containers/neutron/server.log.txt.gz

http://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039-master/c828452/job-output.txt.gz

2019-09-11 16:12:23.713814 | primary | TASK [did the deployment pass or fail?] ****************************************
2019-09-11 16:12:23.731844 | primary | Wednesday 11 September 2019 16:12:23 +0000 (0:00:00.070) 1:26:07.932 ***
2019-09-11 16:12:23.752518 | primary | ok: [localhost] => {
2019-09-11 16:12:23.752644 | primary | "failed_when_result": false,
2019-09-11 16:12:23.752687 | primary | "overcloud_deploy_result": "passed"
2019-09-11 16:12:23.752700 | primary | }

Revision history for this message
Sagi (Sergey) Shnaidman (sshnaidm) wrote :

It's not tempest fault, it means that not all services work after the deployment.
For example, the failing request of tempest:

openstack floating ip list -c ID -f value
2019-09-11 04:39:02 | HttpException: 503: Server Error for url: https://overcloud.ooo.test:13696/v2.0/floatingips, No server is available to handle this request.: 503 Service Unavailable
2019-09-11 04:39:02 | ++ openstack router list -c ID -f value
2019-09-11 04:39:04 | HttpException: 503: Server Error for url: https://overcloud.ooo.test:13696/v2.0/routers, No server is available to handle this request.: 503 Service Unavailable
http://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039-master/ed3dbd8/logs/undercloud/home/zuul/tempest.log.txt.gz

Neutron API on undercloud is full of tracebacks:
MissingAuthPlugin: An auth plugin is required to determine endpoint URL
http://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039-master/ed3dbd8/logs/undercloud/var/log/extra/podman/containers/neutron_api/stdout.log.txt.gz
But it may be red herring like in bug: https://bugs.launchpad.net/tripleo/+bug/1818943

On overcloud (where is request failing for tempest) we have neutron container restarting in loop:

2019-09-11 04:37:23.676 7 ERROR neutron ConfigFileValueError: Value for option bind_host from LocationInfo(location=<Locations.user: (4, True)>, detail='/etc/neutron/conf.d/neutron-server/networking-sfc.conf') is not valid: is not a valid host address
http://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039-master/ed3dbd8/logs/overcloud-controller-0/var/log/containers/neutron/server.log.txt.gz

Error response from daemon: Container ef1ca5227d8ec83c1fb2b5a40f789925f020decb88862c1f54914d7d19113c1b is restarting, wait until the container is running
http://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039-master/ed3dbd8/logs/overcloud-controller-0/var/log/extra/docker/containers/neutron_api/docker_info.log.txt.gz

Revision history for this message
Sagi (Sergey) Shnaidman (sshnaidm) wrote :

Actually a few more containers restarting, not only neutron: http://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039-master/ed3dbd8/logs/overcloud-controller-0/var/log/extra/failed_containers.log.txt.gz

http://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039-master/ed3dbd8/logs/overcloud-controller-0/var/log/extra/failed_containers.log.txt.gz

839928a2a601 192.168.24.1:8787/tripleomaster/centos-binary-glance-api:38cd8d3cd712e205722bb77639a5a1b08d167e4a_f2e9ca6b-updated-20190911020534 "dumb-init --singl..." 8 minutes ago Restarting (5) About a minute ago glance_api
81d708a19e69 192.168.24.1:8787/tripleomaster/centos-binary-swift-proxy-server:38cd8d3cd712e205722bb77639a5a1b08d167e4a_f2e9ca6b-updated-20190911020534 "dumb-init --singl..." 8 minutes ago Restarting (1) About a minute ago swift_proxy
ef1ca5227d8e 192.168.24.1:8787/tripleomaster/centos-binary-neutron-server-ovn:38cd8d3cd712e205722bb77639a5a1b08d167e4a_f2e9ca6b-updated-20190911020534 "dumb-init --singl..." 8 minutes ago Restarting (1) About a minute ago

Glance API fails with:
ERROR: Value for option bind_host from LocationInfo(location=<Locations.user: (4, True)>, detail='/etc/glance/glance-image-import.conf') is not valid: is not a valid host address
http://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039-master/ed3dbd8/logs/overcloud-controller-0/var/log/extra/docker/containers/glance_api/stdout.log.txt.gz

Swift proxy server fails with:
No handlers could be found for logger "keystonemiddleware._common.config"
Traceback (most recent call last):
  File "/usr/bin/swift-proxy-server", line 23, in <module>
    sys.exit(run_wsgi(conf_file, 'proxy-server', **options))
  File "/usr/lib/python2.7/site-packages/swift/common/wsgi.py", line 1086, in run_wsgi
    error_msg = strategy.do_bind_ports()
  File "/usr/lib/python2.7/site-packages/swift/common/wsgi.py", line 665, in do_bind_ports
    self.sock = get_socket(self.conf)
  File "/usr/lib/python2.7/site-packages/swift/common/wsgi.py", line 163, in get_socket
    bind_addr[0], bind_addr[1], socket.AF_UNSPEC, socket.SOCK_STREAM)
  File "/usr/lib/python2.7/site-packages/eventlet/support/greendns.py", line 527, in getaddrinfo
    socktype, proto, aiflags)
socket.gaierror: [Errno -2] Name or service not known
http://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039-master/ed3dbd8/logs/overcloud-controller-0/var/log/extra/docker/containers/swift_proxy/stdout.log.txt.gz

Revision history for this message
Nathan Kinder (nkinder) wrote :

It looks like this patch introduced the failure:

https://review.opendev.org/#/c/674060/

The issue is that the older kernel-baremetal-puppet.yaml implementation was setting localhost_address, which was introduced in this patch that merged back in July:

https://review.opendev.org/#/c/668957

The new kernel-baremetal-ansible.yaml that is used now doesn't set localhost_address at all, but services such as neutron and glance require it. This causes those services to fail as seen in these logs:

https://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039-master/c828452/logs/overcloud-controller-0/var/log/extra/errors.txt.txt.gz

Changed in tripleo:
assignee: nobody → Kevin Carter (kevin-carter)
Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-ansible (master)

Reviewed: https://review.opendev.org/681933
Committed: https://git.openstack.org/cgit/openstack/tripleo-ansible/commit/?id=63380569382855770e3c284621031715c98542fa
Submitter: Zuul
Branch: master

commit 63380569382855770e3c284621031715c98542fa
Author: Kevin Carter <email address hidden>
Date: Thu Sep 12 16:15:55 2019 -0500

    Add local_address to the default hieradata on all hosts

    This change will ensure a local_address is always defined within our
    generated hieradata. This change imports existing logic into our config
    to ensure that the value of the local_address matches our expected
    values. We ensure compatibility with legacy installs by inspecting if a
    host has IPv6 enabled on the loopback device. In the event that IPv6 is
    enabled, the value of local_address will be set to "localhost" otherwise
    it will defined as "127.0.0.1".

    Closes-Bug: 1843422
    Change-Id: I20e69315bacdded4bc2d5b47e18609f130f8abc5
    Signed-off-by: Kevin Carter <email address hidden>

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-ansible 0.3.0

This issue was fixed in the openstack/tripleo-ansible 0.3.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.