"503 Service Unavailable: No server is available to handle this request." fails on standalone-on-multinode-ipa and fs039

Bug #1936776 reported by Bhagyashri Shewale
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Rabi Mishra

Bug Description

2021-07-19 02:19:43.395014 | primary | TASK [os_tempest : Ensure private network exists] ******************************
2021-07-19 02:19:43.398202 | primary | Monday 19 July 2021 02:19:43 +0000 (0:00:00.056) 0:35:41.057 ***********
2021-07-19 02:19:45.333047 | primary | FAILED - RETRYING: Ensure private network exists (5 retries left).
2021-07-19 02:19:57.598356 | primary | FAILED - RETRYING: Ensure private network exists (4 retries left).
2021-07-19 02:20:09.556963 | primary | FAILED - RETRYING: Ensure private network exists (3 retries left).
2021-07-19 02:20:21.591773 | primary | FAILED - RETRYING: Ensure private network exists (2 retries left).
2021-07-19 02:20:34.122803 | primary | FAILED - RETRYING: Ensure private network exists (1 retries left).
2021-07-19 02:20:46.335296 | primary | fatal: [undercloud -> 127.0.0.2]: FAILED! => {
2021-07-19 02:20:46.335464 | primary | "attempts": 5,
2021-07-19 02:20:46.335486 | primary | "changed": false,
2021-07-19 02:20:46.335492 | primary | "extra_data": {
2021-07-19 02:20:46.335497 | primary | "data": null,
2021-07-19 02:20:46.335502 | primary | "details": "503 Service Unavailable: No server is available to handle this request.",
2021-07-19 02:20:46.335528 | primary | "response": "<html><body><h1>503 Service Unavailable</h1>\nNo server is available to handle this request.\n</body></html>\n"
2021-07-19 02:20:46.335544 | primary | }
2021-07-19 02:20:46.335551 | primary | }
2021-07-19 02:20:46.335556 | primary |
2021-07-19 02:20:46.335561 | primary | MSG:
2021-07-19 02:20:46.335565 | primary |
2021-07-19 02:20:46.335577 | primary | HttpException: 503: Server Error for url: https://overcloud.ctlplane.ooo.test:9696/v2.0/networks?tenant_id=c7443545b5134be4b883feaad1852e0b, 503 Service Unavailable: No server is available to handle this request.
2021-07-19 02:20:46.335591 | primary |
2021-07-19 02:20:46.335604 | primary | NO MORE HOSTS LEFT *************************************************************
2021-07-19 02:20:46.342960 | primary |
2021-07-19 02:20:46.342997 | primary | PLAY RECAP *********************************************************************
2021-07-19 02:20:46.343011 | primary | undercloud : ok=110 changed=44 unreachable=0 failed=1 skipped=158 rescued=0 ignored=0

Affected jobs:

periodic-tripleo-ci-centos-8-standalone-on-multinode-ipa-master
periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp_1supp-featureset039-master

# links:
[1]: https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-8-standalone-on-multinode-ipa-master&pipeline=openstack-periodic-integration-main
[2]: https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-standalone-on-multinode-ipa-master/da17a8a/job-output.txt
[3]: https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp_1supp-featureset039-master/a85f3e8/job-output.txt
[4]: https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp_1supp-featureset039-master&pipeline=openstack-periodic-integration-main
[5]: https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp_1supp-featureset039-master/2a6aff4/logs/undercloud/var/log/extra/errors.txt.gz

Revision history for this message
Bhagyashri Shewale (bhagyashri-shewale) wrote (last edit ):

Hi all,

I see the related bug: [1] but the fix for [1] is here [2] and that got merged on 14th july 2021 and seeing this issue from 17 july 2021.

[1]: https://bugs.launchpad.net/tripleo/+bug/1935974
[2]: https://review.rdoproject.org/r/c/openstack/neutron-distgit/+/34515

Revision history for this message
Carlos Goncalves (cgoncalves) wrote :

This is the error I am seeing in my nightly CI builds since July 18:

(overcloud) [stack@undercloud-0 ~]$ openstack network list
HttpException: 503: Server Error for url: https://overcloud.main.bgp.ftw:13696/v2.0/networks, No server is available to handle this request.: 503 Service Unavailable

Container neutron_server_tls_proxy is stopped with error:

[root@ctrl-1-0 heat-admin]# podman logs neutron_server_tls_proxy
[...]
+ . kolla_extend_start
+ echo 'Running command: '\''/usr/sbin/httpd -DFOREGROUND'\'''
+ exec /usr/sbin/httpd -DFOREGROUND
httpd: Syntax error on line 40 of /etc/httpd/conf/httpd.conf: Syntax error on line 1 of /etc/httpd/conf.modules.d/wsgi.load: Cannot load modules/mod_wsgi_python3.so into server: /etc/httpd/modules/mod_wsgi_python3.so: cannot open shared object file: No such file or directory

This seems related to https://review.opendev.org/q/topic:neutron_mod_wsgi. There is still one patch unmerged, which appears not to fix the issue as job tripleo-ci-centos-8-standalone-on-multinode-ipa is failing on it.

(overcloud) [stack@undercloud-0 ~]$ rpm -qa | grep tripleo
python3-tripleo-repos-0.1.1-0.20210714130758.06f41a6.el8.noarch
openstack-tripleo-validations-15.0.1-0.20210716165219.93ed780.el8.noarch
openstack-tripleo-common-containers-16.1.1-0.20210717110252.748ace6.el8.noarch
python3-tripleoclient-17.0.1-0.20210716164418.efdefd9.el8.noarch
puppet-tripleo-15.0.1-0.20210717102753.a2d8079.el8.noarch
ansible-tripleo-ipsec-11.0.1-0.20210304160420.b5559c8.el8.noarch
python3-tripleo-common-16.1.1-0.20210717110252.748ace6.el8.noarch
openstack-tripleo-heat-templates-15.0.1-0.20210715161752.5ba7f08.el8.noarch
openstack-tripleo-common-16.1.1-0.20210717110252.748ace6.el8.noarch
ansible-role-tripleo-modify-image-1.2.3-0.20210522035422.b304c89.el8.noarch
tripleo-ansible-4.0.1-0.20210715051321.a4cca04.el8.noarch
ansible-tripleo-ipa-0.2.2-0.20210422191945.9159108.el8.noarch

Revision history for this message
Grzegorz Grasza (xek) wrote :

In periodic-tripleo-ci-centos-8-standalone-on-multinode-ipa-master I see an issue with binding to port 9696:

2021-07-19T18:25:18.927633554+00:00 stderr F + echo 'Running command: '\''/usr/sbin/httpd -DFOREGROUND'\'''

2021-07-19T18:25:18.927658965+00:00 stdout F Running command: '/usr/sbin/httpd -DFOREGROUND'

2021-07-19T18:25:18.927697226+00:00 stderr F + exec /usr/sbin/httpd -DFOREGROUND

2021-07-19T18:25:19.096930981+00:00 stderr F (98)Address already in use: AH00072: make_sock: could not bind to address [::]:9696

2021-07-19T18:25:19.096930981+00:00 stderr F (98)Address already in use: AH00072: make_sock: could not bind to address 0.0.0.0:9696

2021-07-19T18:25:19.096930981+00:00 stderr F no listening sockets available, shutting down

2021-07-19T18:25:19.096930981+00:00 stderr F AH00015: Unable to open logs

Revision history for this message
Grzegorz Grasza (xek) wrote :

this is probably a conflict with haproxy, which contains the following configuration:

listen neutron

  bind 192.168.24.210:13696 transparent ssl crt /etc/pki/tls/private/overcloud_endpoint.pem

  bind 192.168.24.210:9696 transparent ssl crt /etc/pki/tls/certs/haproxy/overcloud-haproxy-ctlplane.pem

  mode http

  balance leastconn

  http-request set-header X-Forwarded-Proto https if { ssl_fc }

  http-request set-header X-Forwarded-Proto http if !{ ssl_fc }

  http-request set-header X-Forwarded-Port %[dst_port]

  option httpchk GET /healthcheck

  option httplog

  redirect scheme https code 301 if { hdr(host) -i 192.168.24.210 } !{ ssl_fc }

  rsprep ^Location:\ http://(.*) Location:\ https://\1

  server standalone-0.ctlplane.ooo.test 192.168.24.1:9696 ca-file /etc/ipa/ca.crt check fall 5 inter 2000 rise 2 ssl verify required verifyhost standalone-0.ctlplane.ooo.test

Revision history for this message
Grzegorz Grasza (xek) wrote :

The neutron_server_tls_proxy container doesn't fail in https://review.opendev.org/c/openstack/tripleo-heat-templates/+/800422

I think the difference is in how neutron/etc/httpd/conf.d/10-neutron_wsgi.conf is configured

The current version configures it as <VirtualHost *:9696>, while the version in the above patch configures the neutron::wsgi::apache::bind_host, ending up with <VirtualHost 192.168.24.1:9696>, which is the same binding configured in neutron/etc/httpd/conf.d/25-neutron-api-proxy.conf

Apache merges the configurations with the same value, but produces a port conflict otherwise.

failing configuration:

https://review.rdoproject.org/zuul/build/5cc8d0ba48174cb2963d9129b64e207b/log/logs/undercloud/var/lib/config-data/puppet-generated/neutron/etc/httpd/conf.d/10-neutron_wsgi.conf.txt.gz

https://zuul.opendev.org/t/openstack/build/ad8c6f17e84b4b958ea8fddb7f53b5f9/log/logs/undercloud/var/lib/config-data/puppet-generated/neutron/etc/httpd/conf.d/25-neutron-api-proxy.conf

not failing:

https://zuul.opendev.org/t/openstack/build/ad8c6f17e84b4b958ea8fddb7f53b5f9/log/logs/undercloud/var/lib/config-data/puppet-generated/neutron/etc/httpd/conf.d/10-neutron_wsgi.conf

https://review.rdoproject.org/zuul/build/5cc8d0ba48174cb2963d9129b64e207b/log/logs/undercloud/var/lib/config-data/puppet-generated/neutron/etc/httpd/conf.d/25-neutron-api-proxy.conf.txt.gz

Revision history for this message
Michele Baldessari (michele) wrote :

So the error:
+ exec /usr/sbin/httpd -DFOREGROUND
httpd: Syntax error on line 40 of /etc/httpd/conf/httpd.conf: Syntax error on line 1 of /etc/httpd/conf.modules.d/wsgi.load: Cannot load modules/mod_wsgi_python3.so into server: /etc/httpd/modules/mod_wsgi_python3.so: cannot open shared object file: No such file or directory

Is because the quay.io/tripleomaster containers are three days old and do not have review 800421 applied "Add python3-mod-wsgi to neutron-server image". Once we fix the images with those it fails like xek describes with the:
2021-07-20T15:18:43.838144481+00:00 stderr F (98)Address already in use: AH00072: make_sock: could not bind to address [::]:9696
2021-07-20T15:18:43.838144481+00:00 stderr F (98)Address already in use: AH00072: make_sock: could not bind to address 0.0.0.0:9696

errors

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to puppet-tripleo (master)
Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
Rabi Mishra (rabi) wrote :

Wallaby fs002 error in comment#8 is a different issue.

Changed in tripleo:
assignee: nobody → Rabi Mishra (rabi)
Changed in tripleo:
milestone: xena-2 → xena-3
Revision history for this message
Ronelle Landy (rlandy) wrote :

Testing https://review.opendev.org/c/openstack/puppet-tripleo/+/801557 with tripleo-ci-centos-8-standalone-on-multinode-ipa

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (master)

Reviewed: https://review.opendev.org/c/openstack/puppet-tripleo/+/801557
Committed: https://opendev.org/openstack/puppet-tripleo/commit/fb9ba4b89cc1ec44c61bb9afbc5a80f25429c039
Submitter: "Zuul (22348)"
Branch: master

commit fb9ba4b89cc1ec44c61bb9afbc5a80f25429c039
Author: ramishra <email address hidden>
Date: Wed Jul 21 08:42:04 2021 +0530

    Don't generate 10-neutron_wsgi.conf with internal tls

    When internal TLS is enabled we use a proxy in front of
    neutron server. Config generated in change
    I302558e718ce35c4d632137c5efa08f502939b40 conflicts with
    the one generated for tls_proxy. Till we convert neutron_api
    to be deployed with httpd, let's generate the wsgi config
    only when enable_internal_tls is false.

    Closes-Bug: #1936776
    Change-Id: I2901ea548332a043a8ffeb268f3a0ccbca265377

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
Marios Andreou (marios-b) wrote :
Revision history for this message
Marios Andreou (marios-b) wrote (last edit ):
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-tripleo 16.0.0

This issue was fixed in the openstack/puppet-tripleo 16.0.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.