tripleo-ci: periodic nonha job failing to promote due to SSL

Bug #1613088 reported by James Slagle
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Critical
Emilien Macchi

Bug Description

The nonha job is failing in the periodic test due an SSL error triggered by ironic-python-agent during instance deployment. I was unable to capture the log, but it's a CERTIFICATE_VERIFY_FAILED error.

I believe it could be due to this patch:
https://review.openstack.org/#/c/332774/

which switched to use certmonger to auto generate certs. That actually merged when we were not testing ssl in the check-tripleo queue. Now that we've turned the nonha job back on in the check queue and for periodic job, we're hitting the error.

Changed in tripleo:
status: New → Triaged
importance: Undecided → Critical
milestone: none → newton-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to instack-undercloud (master)

Fix proposed to branch: master
Review: https://review.openstack.org/355261

Changed in tripleo:
assignee: nobody → James Slagle (james-slagle)
status: Triaged → In Progress
Revision history for this message
James Slagle (james-slagle) wrote :

I've proposed a revert of the patch here:
https://review.openstack.org/#/c/355261/

And a test of a periodic job that depends on that patch:
https://review.openstack.org/#/c/346949/

If that passes, then we'll know it's the certmonger patch causing the issue.

Revision history for this message
James Slagle (james-slagle) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/355484

Revision history for this message
James Slagle (james-slagle) wrote :

comparing my 2 environments, current-tripleo (working) and consistent (broken), the difference is how the ipxe configuration is created.

In the working env, http is used for the kernel parameter ipa-api-url. in the broken env, ssl is used.

So, something changed in how this value gets generated in the last 2 weeks. I haven't been able to identify the patch that caused this switch (could be ironic/keystone/instack-undercloud/puppet-*).

Regardless, since we were never using ssl for deployment previously, the quick fix seems to be to configure conductor/api_url in ironic.conf to just use http instead of querying keysone for the endpoint.

this patch does that: https://review.openstack.org/#/c/355484/

and the periodic test: https://review.openstack.org/#/c/346949/

Changed in tripleo:
assignee: James Slagle (james-slagle) → Emilien Macchi (emilienm)
assignee: Emilien Macchi (emilienm) → James Slagle (james-slagle)
Changed in tripleo:
assignee: James Slagle (james-slagle) → Emilien Macchi (emilienm)
Revision history for this message
Juan Antonio Osorio Robles (juan-osorio-robles) wrote :

Might be good to point out that this is being fixed on the ironic side.

Revision history for this message
Emilien Macchi (emilienm) wrote :

Ironic patch is here: https://review.openstack.org/#/c/355537/

Please close this bug when patch is merged & we know it's fixed.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on instack-undercloud (master)

Change abandoned by James Slagle (<email address hidden>) on branch: master
Review: https://review.openstack.org/355261

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to instack-undercloud (master)

Reviewed: https://review.openstack.org/355484
Committed: https://git.openstack.org/cgit/openstack/instack-undercloud/commit/?id=41ef77528da3ff0dd61f781877b3d0c0e6551069
Submitter: Jenkins
Branch: master

commit 41ef77528da3ff0dd61f781877b3d0c0e6551069
Author: James Slagle <email address hidden>
Date: Mon Aug 15 10:12:50 2016 -0400

    Use http for Ironic deployments

    When conductor/api_url is not configured in ironic.conf, Ironic queries
    keystone for the url. When using Undercloud ssl, this results in
    deployments using ssl and IPA is not able to talk to Ironic over ssl
    because it does not trust the certificate.

    Previously, this was not the case, and it would choose the internal url
    instead of the public url. But something has apparently changed
    somewhere and the tripleo-ci promote jobs using undercloud ssl are now
    failing to promote due to this issue.

    To restore the previous behavior, this patch configures
    conductor/api_url to use the internal endpoint.

    Dependst-On: I558b53591b14ed43c725a4d0e0a67401adc7d2f0
    Co-Authorized-By: James Slagle <email address hidden>
    Co-Authorized-By: Emilien Macchi <email address hidden>

    Change-Id: Ib99b8a0bec3b8235a32dab4a67a448ec89707f8a
    Closes-Bug: #1613088

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/instack-undercloud 5.0.0.0b3

This issue was fixed in the openstack/instack-undercloud 5.0.0.0b3 development milestone.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers