Changing DNS hostname for public TLS endpoints post-install breaks neutron-nova rabbitmq communication

Bug #1649886 reported by Andres Toomsalu
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Stanislaw Bogatkin
Mitaka
Fix Released
High
Vladimir Kuklin
Newton
Fix Committed
High
Stanislaw Bogatkin

Bug Description

We needed to reconfigure "DNS hostname for public TLS endpoints" after initial environment deployment - which resulted with VMs creation getting stuck on "spawning" phase.

Message
Build of instance 0344b141-a328-4e0a-a8c1-aaf37b055b3f aborted: Failed to allocate the network(s), not rescheduling.
Code
500
Details
File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1926, in _do_build_and_run_instance filter_properties) File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2102, in _build_and_run_instance reason=msg)
Created
Dec. 13, 2016, 7:35 p.m.

We traced the problem down to rabbitmq/neutron/nova communication issue - basically neutron replies for VIF plugging cannot be delivered to nova.

From compute host:

cat /var/log/neutron-all.log | grep ERROR
<163>Dec 14 09:50:43 node-5 neutron-openvswitch-agent: 2016-12-14 09:50:43.975 170619 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server on 10.80.20.4:5673 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 30 seconds.
2016-12-14 09:50:48.603 170619 ERROR oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2016-12-14 09:50:48.603 170619 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/utils/__init__.py", line 246, in retry_over_time
2016-12-14 09:50:48.603 170619 ERROR oslo.messaging._drivers.impl_rabbit return fun(*args, **kwargs)
2016-12-14 09:50:48.603 170619 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/connection.py", line 237, in connect
2016-12-14 09:50:48.603 170619 ERROR oslo.messaging._drivers.impl_rabbit return self.connection
2016-12-14 09:50:48.603 170619 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/connection.py", line 741, in connection
2016-12-14 09:50:48.603 170619 ERROR oslo.messaging._drivers.impl_rabbit self._connection = self._establish_connection()
2016-12-14 09:50:48.603 170619 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/connection.py", line 696, in _establish_connection
2016-12-14 09:50:48.603 170619 ERROR oslo.messaging._drivers.impl_rabbit conn = self.transport.establish_connection()
2016-12-14 09:50:48.603 170619 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqp.py", line 116, in establish_connection
2016-12-14 09:50:48.603 170619 ERROR oslo.messaging._drivers.impl_rabbit conn = self.Connection(**opts)
2016-12-14 09:50:48.603 170619 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/connection.py", line 165, in __init__
2016-12-14 09:50:48.603 170619 ERROR oslo.messaging._drivers.impl_rabbit self.transport = self.Transport(host, connect_timeout, ssl)

Connection to rabbitmq server seems to be working:

telnet 10.80.20.4 5673
Trying 10.80.20.4...
Connected to 10.80.20.4.
Escape character is '^]'.

From controller side:

cat /<email address hidden>
=ERROR REPORT==== 14-Dec-2016::09:51:08 ===
Channel error on connection <0.8651.0> (10.80.20.4:54597 -> 10.80.20.4:5673, vhost: '/', user: 'nova'), channel 1:
operation basic.publish caused a channel exception not_found: "no exchange 'reply_9e5b4bad40f24ac58e1cdae5d2a1a013' in vhost '/'"

cat /var/log/neutron-all.log | grep ERROR
<163>Dec 14 10:33:59 node-6 neutron-server: 2016-12-14 10:33:59.761 32530 ERROR neutron.notifiers.nova [-] Failed to notify nova on events: [{'status': 'completed', 'tag': u'62947860-9e84-493f-84e7-1b7afc6b7990', 'name': 'network-vif-plugged', 'server_uuid': u'358fba85-9efc-4766-ab2b-7c50fb574fdb'}]
2016-12-14 10:33:59.761 32530 ERROR neutron.notifiers.nova Traceback (most recent call last):
2016-12-14 10:33:59.761 32530 ERROR neutron.notifiers.nova File "/usr/lib/python2.7/dist-packages/neutron/notifiers/nova.py", line 241, in send_events
2016-12-14 10:33:59.761 32530 ERROR neutron.notifiers.nova batched_events)
2016-12-14 10:33:59.761 32530 ERROR neutron.notifiers.nova File "/usr/lib/python2.7/dist-packages/novaclient/v2/contrib/server_external_events.py", line 39, in create
2016-12-14 10:33:59.761 32530 ERROR neutron.notifiers.nova return_raw=True)
2016-12-14 10:33:59.761 32530 ERROR neutron.notifiers.nova File "/usr/lib/python2.7/dist-packages/novaclient/base.py", line 345, in _create
2016-12-14 10:33:59.761 32530 ERROR neutron.notifiers.nova resp, body = self.api.client.post(url, body=body)
2016-12-14 10:33:59.761 32530 ERROR neutron.notifiers.nova File "/usr/lib/python2.7/dist-packages/keystoneauth1/adapter.py", line 179, in post
2016-12-14 10:33:59.761 32530 ERROR neutron.notifiers.nova return self.request(url, 'POST', **kwargs)
2016-12-14 10:33:59.761 32530 ERROR neutron.notifiers.nova File "/usr/lib/python2.7/dist-packages/novaclient/client.py", line 89, in request
2016-12-14 10:33:59.761 32530 ERROR neutron.notifiers.nova **kwargs)
2016-12-14 10:33:59.761 32530 ERROR neutron.notifiers.nova File "/usr/lib/python2.7/dist-packages/keystoneauth1/adapter.py", line 331, in request
2016-12-14 10:33:59.761 32530 ERROR neutron.notifiers.nova resp = super(LegacyJsonAdapter, self).request(*args, **kwargs)
2016-12-14 10:33:59.761 32530 ERROR neutron.notifiers.nova File "/usr/lib/python2.7/dist-packages/keystoneauth1/adapter.py", line 98, in request

Tags: area-library
Revision history for this message
Andres Toomsalu (andres-active) wrote :

fuel --version
9.0.0

Changed in fuel:
milestone: none → 9.2
assignee: nobody → Fuel Sustaining (fuel-sustaining-team)
importance: Undecided → High
status: New → Confirmed
tags: added: area-library
Changed in fuel:
assignee: Fuel Sustaining (fuel-sustaining-team) → Stanislaw Bogatkin (sbogatkin)
Changed in fuel:
milestone: 9.2 → 11.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/413039

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/414103

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/414106

Dmitry Pyzhov (dpyzhov)
no longer affects: fuel/ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/413039
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=924bdc1fee7285ea9147ea92509f288688241789
Submitter: Jenkins
Branch: master

commit 924bdc1fee7285ea9147ea92509f288688241789
Author: Stanislaw Bogatkin <email address hidden>
Date: Tue Dec 20 14:40:28 2016 +0300

    Add DNS name change opportunity

    When change DNS hostname in TLS certificate for OpenStack endpoints,
    make additional conditions to allow services use new certificate.

    Change-Id: Ia2724eb397962f569b8360e684b599c472a891e2
    Closes-Bug: #1649886

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/mitaka)

Reviewed: https://review.openstack.org/414106
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=f3e7a70a98aa71ef55d58e77aa44d715616843f2
Submitter: Jenkins
Branch: stable/mitaka

commit f3e7a70a98aa71ef55d58e77aa44d715616843f2
Author: Stanislaw Bogatkin <email address hidden>
Date: Tue Dec 20 14:40:28 2016 +0300

    Add DNS name change opportunity

    When change DNS hostname in TLS certificate for OpenStack endpoints,
    make additional conditions to allow services use new certificate.

    Change-Id: Ia2724eb397962f569b8360e684b599c472a891e2
    Closes-Bug: #1649886

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/newton)

Reviewed: https://review.openstack.org/414103
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=dafff41a95c1c5d202adc12878901f17473e5458
Submitter: Jenkins
Branch: stable/newton

commit dafff41a95c1c5d202adc12878901f17473e5458
Author: Stanislaw Bogatkin <email address hidden>
Date: Tue Dec 20 14:40:28 2016 +0300

    Add DNS name change opportunity

    When change DNS hostname in TLS certificate for OpenStack endpoints,
    make additional conditions to allow services use new certificate.

    Change-Id: Ia2724eb397962f569b8360e684b599c472a891e2
    Closes-Bug: #1649886

tags: added: on-verification
Revision history for this message
TatyanaGladysheva (tgladysheva) wrote :

This bug is checked on 9.2 snapshot #753 but instance is still created in ERROR status.

Steps to verify:
1. Deploy environment with enabled Public TLS, "DNS hostname for public TLS endpoints" field contains "public.fuel.local"
2. When deployment is finished, change "DNS hostname for public TLS endpoints" field to "public.roark80.tld"
3. When deployment is finished, create instance

Actual results:
Instance is created in ERROR status, fault message and errors from nova-compute.log: http://paste.openstack.org/show/595138/
neutron-all.log from controller: http://paste.openstack.org/show/594995/
neutron-all.log from compute: http://paste.openstack.org/show/594996/

/<email address hidden> doesn't contains any error reports.

Also diagnostic snapshot is attached, could you please take a look at it.

Revision history for this message
TatyanaGladysheva (tgladysheva) wrote :
tags: removed: on-verification
Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

But how disabled ovs related to changed TLS hostname? Did endpoint addresses changed in keystone DB? Are services configs changed too? You should check connectivity for this fix first.

Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

According to logs, task ssl_add_trust_chain was never executed at second deploy. I believe that it is a root problem.

Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

So, after some time there is some bug in other components which lead to bad yaql evaluation. I'll prepare one more workaround to fix this problem.

Changed in fuel:
status: Fix Committed → Triaged
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/421225

Changed in fuel:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/421851

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/422046

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/mitaka)

Reviewed: https://review.openstack.org/421851
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=cb7e9fe7df8c2f95f0a9c230039e3e8b3a936d3d
Submitter: Jenkins
Branch: stable/mitaka

commit cb7e9fe7df8c2f95f0a9c230039e3e8b3a936d3d
Author: Stanislaw Bogatkin <email address hidden>
Date: Tue Jan 17 14:40:35 2017 +0300

    Add more conditions to restart-haproxy yaql to avoid ambiguity

    Some time we have a problem with evaluating yaql expressions on
    nailgun side. Before nailgun fix will be landed, introduce easy
    workaround to get evaluation work properly.

    Change-Id: I0df5c1fa18d012a7ef7aa9c1f627965791dee5d8
    Closes-Bug: #1649886

tags: added: on-verification
Revision history for this message
TatyanaGladysheva (tgladysheva) wrote :

Verified on 9.2 snapshot #779.

Actual results:
Instance is created in ACTIVE status using steps to verify from comment 8.

tags: removed: on-verification
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/421225
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=0549ba9a1223dbfcdd33afe142c855de34e91809
Submitter: Jenkins
Branch: master

commit 0549ba9a1223dbfcdd33afe142c855de34e91809
Author: Stanislaw Bogatkin <email address hidden>
Date: Tue Jan 17 14:40:35 2017 +0300

    Add more conditions to restart-haproxy yaql to avoid ambiguity

    Some time we have a problem with evaluating yaql expressions on
    nailgun side. Before nailgun fix will be landed, introduce easy
    workaround to get evaluation work properly.

    Change-Id: I0df5c1fa18d012a7ef7aa9c1f627965791dee5d8
    Closes-Bug: #1649886

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/newton)

Reviewed: https://review.openstack.org/422046
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=45f7cfee8671d43d83c2a60b4e7eefaab9df970d
Submitter: Jenkins
Branch: stable/newton

commit 45f7cfee8671d43d83c2a60b4e7eefaab9df970d
Author: Stanislaw Bogatkin <email address hidden>
Date: Tue Jan 17 14:40:35 2017 +0300

    Add more conditions to restart-haproxy yaql to avoid ambiguity

    Some time we have a problem with evaluating yaql expressions on
    nailgun side. Before nailgun fix will be landed, introduce easy
    workaround to get evaluation work properly.

    Change-Id: I0df5c1fa18d012a7ef7aa9c1f627965791dee5d8
    Closes-Bug: #1649886

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/fuel-library 11.0.0.0rc1

This issue was fixed in the openstack/fuel-library 11.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.