nova_placement in unhealthy state in containerized overcloud

Bug #1781623 reported by Jose Luis Franco
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Medium
Jose Luis Franco

Bug Description

After deploying a containerized overcloud (Queens and master), the nova_placement containers appears in "unhealthy" state (observed in a local installation as well as in the CI jobs):

babc9cbb7fe3 192.168.24.1:8787/tripleoqueens/centos-binary-nova-placement-api:current-tripleo "kolla_start" 24 minutes ago Up 24 minutes (unhealthy) nova_placement 382 kB (virtual 1.21 GB)

Queens job: Log: http://logs.openstack.org/46/579346/1/check/tripleo-ci-centos-7-containers-multinode/90f0643/logs/subnode-2/var/log/extra/docker/docker_allinfo.log.txt.gz
Master job: Log: http://logs.openstack.org/64/580464/1/check/tripleo-ci-centos-7-containers-multinode/b19e84d/logs/subnode-2/var/log/extra/docker/containers/nova_placement/docker_info.log.txt.gz

When checking the error logs inside the container for the vhost, we can see the following log printed out multiple times in the /var/log/httpd/placement_wsgi_error.log:

[Wed Jul 11 11:53:14.091902 2018] [autoindex:error] [pid 20] [client 172.16.2.5:58606] AH01276: Cannot serve directory /var/www/cgi-bin/nova/: No matching DirectoryIndex (index.html,index.html.var,index.cgi,index.pl,index.php,index.xhtml) found, and server-generated directory index forbidden by Options directive
[Wed Jul 11 11:53:44.207633 2018] [autoindex:error] [pid 20] [client 172.16.2.5:58954] AH01276: Cannot serve directory /var/www/cgi-bin/nova/: No matching DirectoryIndex (index.html,index.html.var,index.cgi,index.pl,index.php,index.xhtml) found, and server-generated directory index forbidden by Options directive

And the /openstack/healcheck calls receive a 403 Forbidden reply:

172.16.2.5 - - [11/Jul/2018:11:53:10 +0000] "OPTIONS / HTTP/1.0" 200 - "-" "-"
172.16.2.5 - - [11/Jul/2018:11:53:12 +0000] "OPTIONS / HTTP/1.0" 200 - "-" "-"
172.16.2.5 - - [11/Jul/2018:11:53:14 +0000] "GET / HTTP/1.1" 403 4897 "-" "curl-healthcheck"
172.16.2.5 - - [11/Jul/2018:11:53:14 +0000] "OPTIONS / HTTP/1.0" 200 - "-" "-"
172.16.2.5 - - [11/Jul/2018:11:53:16 +0000] "OPTIONS / HTTP/1.0" 200 - "-" "-"

The content of the /etc/httpd/conf.d/10-placement_wsgi.conf file is:

<VirtualHost 172.16.2.5:8778>
  ServerName overcloud-controller-0.internalapi.localdomain

  ## Vhost docroot
  DocumentRoot "/var/www/cgi-bin/nova"

  ## Directories, there should at least be a declaration for /var/www/cgi-bin/nova

  <Directory "/var/www/cgi-bin/nova">
    Options Indexes FollowSymLinks MultiViews
    AllowOverride None
    Require all granted
  </Directory>

  ## Logging
  ErrorLog "/var/log/httpd/placement_wsgi_error.log"
  ServerSignature Off
  CustomLog "/var/log/httpd/placement_wsgi_access.log" combined
  SetEnvIf X-Forwarded-Proto https HTTPS=1
  WSGIApplicationGroup %{GLOBAL}
  WSGIDaemonProcess placement-api display-name=placement_wsgi group=nova processes=1 threads=1 user=nova
  WSGIProcessGroup placement-api
  WSGIScriptAlias /placement "/var/www/cgi-bin/nova/nova-placement-api"
</VirtualHost>

Revision history for this message
Jose Luis Franco (jfrancoa) wrote :

When sending the same HTTP request done by /openstack/healthcheck we get a 403:

()[root@overcloud-controller-0 /]# curl -v -g -k -q --fail --max-time 10 --user-agent curl-healthceck http://overcloud-controller-0.internalapi.localdomain:8778
* About to connect() to overcloud-controller-0.internalapi.localdomain port 8778 (#0)
* Trying 172.16.2.5...
* Connected to overcloud-controller-0.internalapi.localdomain (172.16.2.5) port 8778 (#0)
> GET / HTTP/1.1
> User-Agent: curl-healthceck
> Host: overcloud-controller-0.internalapi.localdomain:8778
> Accept: */*
>
* The requested URL returned error: 403 Forbidden
* Closing connection 0
curl (http://overcloud-controller-0.internalapi.localdomain:8778/): response: 403, time: 0.006, size: 0
curl: (22) The requested URL returned error: 403 Forbidden

But adding the /placement/ directory at the end of the URI, as defined in the WSGIScriptAlias seems to return a 201:

()[root@overcloud-controller-0 /]# curl -v -g -k -q --fail --max-time 10 --user-agent curl-healthceck http://overcloud-controller-0.internalapi.localdomain:8778/placement/
* About to connect() to overcloud-controller-0.internalapi.localdomain port 8778 (#0)
* Trying 172.16.2.5...
* Connected to overcloud-controller-0.internalapi.localdomain (172.16.2.5) port 8778 (#0)
> GET /placement/ HTTP/1.1
> User-Agent: curl-healthceck
> Host: overcloud-controller-0.internalapi.localdomain:8778
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Mon, 16 Jul 2018 06:44:14 GMT
< Server: Apache
< OpenStack-API-Version: placement 1.0
< vary: OpenStack-API-Version,Accept-Encoding
< x-openstack-request-id: req-90eaa18e-5e4c-47b1-9ab1-932e0e98bb38
< Content-Length: 75
< Content-Type: application/json
<
* Connection #0 to host overcloud-controller-0.internalapi.localdomain left intact
{"versions": [{"min_version": "1.0", "max_version": "1.17", "id": "v1.0"}]}curl (http://overcloud-controller-0.internalapi.localdomain:8778/placement/): response: 200, time: 0.009, size: 75

The error is in the WSGIScriptAlias value defined in puppet-nova.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (master)

Fix proposed to branch: master
Review: https://review.openstack.org/582883

Changed in tripleo:
assignee: nobody → Jose Luis Franco (jfrancoa)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (master)

Reviewed: https://review.openstack.org/582883
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=f1a1c324fb149d40c960f1eaba650f0e8c71d91e
Submitter: Zuul
Branch: master

commit f1a1c324fb149d40c960f1eaba650f0e8c71d91e
Author: Jose Luis Franco Arza <email address hidden>
Date: Mon Jul 16 09:40:11 2018 +0200

    Take WSGIScriptAlias into account in docker healthcheck.

    Some services running as vhost, as nova_placement make
    use of an alias in which the wsgi service is mapped.
    Currently, the nova_placement healthcheck has hardcoded
    the alias in the curl [0] request, but if this alias
    changes or other service modifies its from / to another,
    then the check will start failing again.

    [0] - https://github.com/openstack/tripleo-common/blob/master/healthcheck/nova-placement#L6

    Change-Id: I5a325be424740e80eafd6b4abd8eb4b3111740aa
    Closes-Bug: #1781623

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-common 9.2.0

This issue was fixed in the openstack/tripleo-common 9.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/586826

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (stable/queens)

Reviewed: https://review.openstack.org/586826
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=7f9606ff47f03162e5d7305aaca2af5d1faecd87
Submitter: Zuul
Branch: stable/queens

commit 7f9606ff47f03162e5d7305aaca2af5d1faecd87
Author: Jose Luis Franco Arza <email address hidden>
Date: Mon Jul 16 09:40:11 2018 +0200

    Take WSGIScriptAlias into account in docker healthcheck.

    Some services running as vhost, as nova_placement make
    use of an alias in which the wsgi service is mapped.
    Currently, the nova_placement healthcheck has hardcoded
    the alias in the curl [0] request, but if this alias
    changes or other service modifies its from / to another,
    then the check will start failing again.

    [0] - https://github.com/openstack/tripleo-common/blob/master/healthcheck/nova-placement#L6

    Change-Id: I5a325be424740e80eafd6b4abd8eb4b3111740aa
    Closes-Bug: #1781623
    (cherry picked from commit f1a1c324fb149d40c960f1eaba650f0e8c71d91e)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-common 8.6.5

This issue was fixed in the openstack/tripleo-common 8.6.5 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.