nova services and transport_url, cannot connect to vhost if specified

Bug #1717915 reported by George Paraskevas
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Expired
Undecided
Unassigned
oslo.messaging
Expired
High
Unassigned

Bug Description

Centos 7
RDO installation

After upgrading from newton to ocata, i configured transport_url as:
transport_url = rabbit://nova:nova_pass@192.168.110.1:5672,nova:nova_pass@192.168.110.2:5672,nova:nova_pass@192.168.110.3:5672/nova_vhost

and I removed [oslo_messaging_rabbit]

But then the services din't start and I could ofcourse launch an instance.

Rabbitmq-logs:

=ERROR REPORT==== 18-Sep-2017::14:30:17 ===
Error on AMQP connection <0.24592.64> (192.168.110.2:51662 -> 192.168.110.2:5672 - nova-conductor:13053:a5fd524a-b479-4dbb-b8e5-4dd46bf8da74, user: 'nova', state: opening):
access to vhost 'nova' refused for user 'nova'

nova-conductor logs:

2017-09-18 13:59:58.600 27799 ERROR oslo_service.service NotAllowed: Connection.open: (530) NOT_ALLOWED - access to vhost 'nova' refused for user 'nova'

When I removed the transport_url and reconfigured the [oslo_messaging_rabbit] section the services started but again I couldnt launch n instance with the same error logs :

[oslo_messaging_rabbit]
rabbit_hosts = 192.168.110.1:5672,192.168.110.2:5672,192.168.110.3:5672
rabbit_userid = "nova"
rabbit_password = "nova_pass"
rabbit_virtual_host = "/nova_vhost"

Finaly I reconfigured transport_url removed the /nova_vhost from the end, gave access to nova user for / vhost and everything worked. Keep in mind that of course nova user had access to /nova_vhost before. I also did cell_v2 update to update the transport_url in the db.

tags: added: openstack-version.ocata
Revision history for this message
Matt Riedemann (mriedem) wrote :

transport_url is a stropt, not a listopt, so I've never seen this specified as a list before.

Revision history for this message
Matt Riedemann (mriedem) wrote :

OK I see that [oslo_messaging_rabbit]/rabbit_hosts is deprecated in favor of [DEFAULT]/transport_url and the former is a ListOpt while the latter is a StrOpt, and the help for transport_url doesn't mention anything about how to format it for clustered rabbit.

I've added oslo.messaging to this bug since it seems at least their docs need to be updated.

Revision history for this message
Matt Riedemann (mriedem) wrote :

From the oslo.messaging code, it looks like transport_url is split on semi-colon (;). So try that instead of commas.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Looks like this is the oslo.messaging change that deprecated the rabbit_hosts option:

https://github.com/openstack/oslo.messaging/commit/2f0d53b

But it doesn't add any testing to make sure that transport_url will work with clustered hosts and doesn't update the transport_url option help docs in any way to clarify this.

Changed in nova:
status: New → Invalid
Revision history for this message
Matt Riedemann (mriedem) wrote :

I think this is the code that parses the transport_url option:

https://github.com/openstack/oslo.messaging/blob/393ecff3451091404832dd6b8a088e1bec760101/oslo_messaging/transport.py#L492

And that wouldn't work with how it's specified above as a comma-delimited list of URLs:

user@ubuntu:~$ python
Python 2.7.12 (default, Nov 19 2016, 06:48:10)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from six.moves.urllib import parse
>>> transport_url = "rabbit://nova:nova_pass@192.168.110.1:5672,nova:nova_pass@192.168.110.2:5672,nova:nova_pass@192.168.110.3:5672/nova_vhost"
>>> url = parse.urlparse(transport_url)
>>> url
ParseResult(scheme='rabbit', netloc='nova:nova_pass@192.168.110.1:5672,nova:nova_pass@192.168.110.2:5672,nova:nova_pass@192.168.110.3:5672', path='/nova_vhost', params='', query='', fragment='')
>>>

Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :

This is the parsed TransportURL object (using git hash 393ecff3451091404832dd6b8a088e1bec760101):

user@ubuntu:~/git/oslo.messaging$ source .tox/venv/bin/activate
(venv) user@ubuntu:~/git/oslo.messaging$ python
Python 2.7.12 (default, Nov 19 2016, 06:48:10)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> transport_url = "rabbit://nova:nova_pass@192.168.110.1:5672,nova:nova_pass@192.168.110.2:5672,nova:nova_pass@192.168.110.3:5672/nova_vhost"
>>> from oslo_messaging import transport
>>> import mock
>>> conf = mock.MagicMock()
>>> trans_url = transport.TransportURL.parse(conf, transport_url)
>>> trans_url
<TransportURL transport='rabbit', virtual_host='nova_vhost', hosts=[<TransportHost hostname='192.168.110.1', port=5672, username='nova', password='nova_pass'>, <TransportHost hostname='192.168.110.2', port=5672, username='nova', password='nova_pass'>, <TransportHost hostname='192.168.110.3', port=5672, username='nova', password='nova_pass'>]>
>>>

Should the virtual_host be applied to all 3 TransportHosts?

Revision history for this message
Matt Riedemann (mriedem) wrote :

What is the value of the transport_url in the nova_api.cell_mappings database table when this fails?

Is it "rabbit://nova:nova_pass@192.168.110.1:5672,nova:nova_pass@192.168.110.2:5672,nova:nova_pass@192.168.110.3:5672/nova_vhost"?

Changed in nova:
status: Invalid → New
Revision history for this message
George Paraskevas (gparaskevas) wrote :

yes that is exactly what it is. Actually its whatever the setting in nova.conf is after you do nova-manage cell_v2 cell_update --cell-uuid uuid. Now its "rabbit://nova:nova_pass@192.168.110.1:5672,nova:nova_pass@192.168.110.2:5672,nova:nova_pass@192.168.110.3:5672" because I didnt set it to a vhost due the problems I mentioned. But before it was exaclty as you wrote it above.

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

Matt,

Looks legit, see

http://git.openstack.org/cgit/openstack/oslo.messaging/tree/oslo_messaging/transport.py#n454

transport://user:pass@host:port[,userN:passN@hostN:portN]/virtual_host?query

So:
"rabbit://nova:nova_pass@192.168.110.1:5672,nova:nova_pass@192.168.110.2:5672,nova:nova_pass@192.168.110.3:5672/nova_vhost"

turns to:
"<TransportURL transport='rabbit', virtual_host='nova_vhost', hosts=[<TransportHost hostname='192.168.110.1', port=5672, username='nova', password='nova_pass'>, <TransportHost hostname='192.168.110.2', port=5672, username='nova', password='nova_pass'>, <TransportHost hostname='192.168.110.3', port=5672, username='nova', password='nova_pass'>]>"

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

Matt,

Note that George says, even after he went back to the old style oslo_messaging_rabbit section, the rabbit_virtual_host = "/nova_vhost" didn't work.

-- Dims

Revision history for this message
George Paraskevas (gparaskevas) wrote :

Yes that is correct, after the migration i retained the [oslo_messaging_rabbit] config from newton and it didnt work with the mentioned error messages.

Revision history for this message
Matt Riedemann (mriedem) wrote :

I suggest asking the TripleO team what they do about configuring nova for cells with the transport_url as I believe TripleO supports clustered rabbit. I'm told Ollie Walsh (owalsh) from Red Hat should know about this.

Revision history for this message
Oliver Walsh (owalsh) wrote :

Yea, looks like a legit issue if the vhost isn't being applied to all 3 TransportHosts.

We happen to avoid this in TripleO as we don't specify a vhost:

E.g from the nova.conf from a HA CI job in TripleO:
transport_url=rabbit://guest:<email address hidden>:5672,guest:<email address hidden>:5672,guest:<email address hidden>:5672/?ssl=0

When setting up cells v2 the value the transport_url value from nova.conf is used (by puppet-nova).

I don't see a workaround that allows a non-default vhost to be used. I assume this applies when handling both the nova.conf transport_url and the cell v2 transport_url. Reverting to the deprecated rabbit_virtual_host option would only address this in nova.conf.

Revision history for this message
George Paraskevas (gparaskevas) wrote :

I am pretty sure that openstack-ansible team also uses clustered rabbitmq and they use the exact same configuration like the one I tried to use. SO maybe they can give us a hint on that too.

Revision history for this message
Hendrik Frenzel (hfrenzel) wrote :

With 'transport_url = rabbit://user1:pass1@host1:port1[,userN:passN@hostN:portN]//vhost' it works. Probably something wrong with parsing the URL?

Matt Riedemann (mriedem)
tags: added: cells
Revision history for this message
Oliver Walsh (owalsh) wrote :

Noticed something the other day that is potentially related...

In nova.conf, transport_url was set correctly, and some, but not all, of the legacy rabbit_* options were set too. nova would not connect to rabbitmq with this conf. IIRC when if I set rabbit_hosts it worked, or if I removed all of the legacy rabbit_* options it worked. I'm wondering if a similar interaction could affect the vhost config.

Revision history for this message
Ken Giusti (kgiusti) wrote :

I wonder if this code:
https://github.com/openstack/oslo.messaging/blob/393ecff3451091404832dd6b8a088e1bec760101/oslo_messaging/transport.py#L505

causes the vhost value to be dropped unless there's an extra '/'?

That would explain Hendrik's observation.

I'm looking into this now.

Changed in oslo.messaging:
assignee: nobody → Ken Giusti (kgiusti)
Revision history for this message
Ken Giusti (kgiusti) wrote :

I've run the rabbit functional tests in both cluster and non cluster, using a single / and a double // prefix for the virtual host value and the tests pass.

One thing I did notice: rabbitmq treats any leading '/' as part of the virtual host name.
For example:

sudo rabbitmqctl add_vhost hostv
sudo rabbitmqctl add_vhost /hostv
sudo rabbitmqctl add_vhost //hostv

creates three distinct virtual hosts on the broker:

sudo rabbitmqctl list_vhosts
Listing vhosts
hostv
/hostv
/
//hostv

not too surprising - rabbit is basically doing exactly what we ask of it.

This implies the following transport_urls address different virtual hosts:

rabbit://guest:guest@localhost:5672///hostv
rabbit://guest:guest@localhost:5672//hostv
rabbit://guest:guest@localhost:5672/hostv

And oslo.messaging always strips the first / after the port. So for the above values of transport_url the actual virtual host name as configured on rabbitmq would be:

//hostv
/hostv
hostv

respectively.

To be sure there's no mismatch between the transport_url value and the virtual host names as configured on rabbitmq can you try the following:

1) remove any use of the deprecated rabbit options from your configuration files - using both transport_url and the deprecated options can lead to unpredictable results (see https://bugs.launchpad.net/oslo.messaging/+bug/1761787)

2) dump the value of transport_url across all involved configuration files - verify the url's corresponding to the same virtual host use exactly the same # of / leading the virtual host portion

3) sudo rabbitmqctl list_vhosts and verify that the configured virtual hosts have the correct # of leading / (one less than the transport_url)

Let me know what you find.

Changed in oslo.messaging:
status: New → Incomplete
importance: Undecided → High
Ken Giusti (kgiusti)
Changed in oslo.messaging:
assignee: Ken Giusti (kgiusti) → nobody
Revision history for this message
melanie witt (melwitt) wrote :

It's hard for me to follow what the actual problem is in this bug after reading all of the comments.

If the problem is that it wasn't possible to have a different vhost per controller in a cell, we do have the ability to use templated URLs in cell mappings as of Rocky 18.0.0.0b3 from this patch:

https://review.openstack.org/578163

Does that solve your issue? Can this bug be considered fixed by this patch?

Changed in nova:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

Changed in nova:
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for oslo.messaging because there has been no activity for 60 days.]

Changed in oslo.messaging:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.