Using generate_service_certificate and undercloud_public_vip in undercloud.conf breaks nova

Bug #1632538 reported by Dan Trainor
58
This bug affects 8 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Incomplete
Undecided
Unassigned
Newton
Incomplete
Undecided
Unassigned
tripleo
Invalid
Medium
Unassigned
python-rfc3986 (Ubuntu)
Fix Released
Medium
Unassigned
Xenial
Fix Released
Medium
Unassigned
Yakkety
Fix Released
Medium
Unassigned
Zesty
Fix Released
Medium
Unassigned

Bug Description

Enabling SSL on the Undercloud using generate_service_certificate results in all Nova services on the undercloud (api, cert, compute, conductor, scheduler), all failing with errors similar to the following:

2016-10-11 22:28:27.327 66082 CRITICAL nova [req-b5f37af3-96fc-42e2-aaa6-52815aca07fe - - - - -] ConfigFileValueError: Value for option url is not valid: invalid URI: 'https://rdo-ci-fx2-06-s5.v103.rdoci.lab.eng.rdu.redhat.com:13696'
2016-10-11 22:28:27.327 66082 ERROR nova Traceback (most recent call last):
2016-10-11 22:28:27.327 66082 ERROR nova File "/usr/bin/nova-cert", line 10, in <module>
2016-10-11 22:28:27.327 66082 ERROR nova sys.exit(main())
2016-10-11 22:28:27.327 66082 ERROR nova File "/usr/lib/python2.7/site-packages/nova/cmd/cert.py", line 49, in main
2016-10-11 22:28:27.327 66082 ERROR nova service.wait()
2016-10-11 22:28:27.327 66082 ERROR nova File "/usr/lib/python2.7/site-packages/nova/service.py", line 415, in wait
2016-10-11 22:28:27.327 66082 ERROR nova _launcher.wait()
2016-10-11 22:28:27.327 66082 ERROR nova File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 328, in wait
2016-10-11 22:28:27.327 66082 ERROR nova status, signo = self._wait_for_exit_or_signal()
2016-10-11 22:28:27.327 66082 ERROR nova File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 303, in _wait_for_exit_or_signal
2016-10-11 22:28:27.327 66082 ERROR nova self.conf.log_opt_values(LOG, logging.DEBUG)
2016-10-11 22:28:27.327 66082 ERROR nova File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2630, in log_opt_values
2016-10-11 22:28:27.327 66082 ERROR nova _sanitize(opt, getattr(group_attr, opt_name)))
2016-10-11 22:28:27.327 66082 ERROR nova File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 3061, in __getattr__
2016-10-11 22:28:27.327 66082 ERROR nova return self._conf._get(name, self._group)
2016-10-11 22:28:27.327 66082 ERROR nova File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2672, in _get
2016-10-11 22:28:27.327 66082 ERROR nova value = self._do_get(name, group, namespace)
2016-10-11 22:28:27.327 66082 ERROR nova File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2715, in _do_get
2016-10-11 22:28:27.327 66082 ERROR nova % (opt.name, str(ve)))
2016-10-11 22:28:27.327 66082 ERROR nova ConfigFileValueError: Value for option url is not valid: invalid URI: 'https://rdo-ci-fx2-06-s5.v103.rdoci.lab.eng.rdu.redhat.com:13696'
2016-10-11 22:28:27.327 66082 ERROR nova

I believe the failure happens inside the [neutron] section of /etc/nova/nova.conf.

This does not look related to the scheme (https) being used as the result of enabling SSL because doing a one-off test with the openstack-nova-conductor service after changing the schema to http results in the same startup failure.

Another one-off test substituting an IP address instead of a FQDN inside of nova.conf with the openstack-nova-conductor service as before results in openstack-nova-conductor starting properly but eventually failing with a connection-related failure due to the one-off data used (an IP address of 1.2.3.4).

Revision history for this message
Dan Trainor (dtrainor) wrote :

Additional information:

This may be caused by the combination of enabling SSL (generate_service_certificate=true) and using an FQDN for the Undercloud Public VIP (undercloud_public_vip=rdo-ci-fx2-06-s5.v103.rdoci.lab.eng.rdu.redhat.com) in undercloud.conf.

Revision history for this message
Dan Trainor (dtrainor) wrote :

Confirmed that updating nova.conf's url=https://<ipaddress>:13696 from url=https://<fqdn>:13696 returns all nova services to a nominal startup state, but haven't yet tested any more than just attempting to start nova services again.

summary: - Enabling SSL on Undercloud with generate_service_certificate breaks nova
+ Using generate_service_certificate and undercloud_public_vip in
+ undercloud.conf breaks nova
Revision history for this message
Juan Antonio Osorio Robles (juan-osorio-robles) wrote :

Well, I just tried this out with a much shorter URI and it worked. I think it might have to do with the length. So this might actually be an issue with oslo.config's URI type http://docs.openstack.org/developer/oslo.config/types.html#oslo_config.types.URI and this is the constructor that's used for that option in nova http://docs.openstack.org/developer/oslo.config/opts.html#oslo_config.cfg.URIOpt

Revision history for this message
Dan Trainor (dtrainor) wrote :

Just an update to note that using a canonical name instead of the FQDN of the system still generates this error. We also used a test script invoking oslo_config.types.URI() to test the string length implementation of oslo-config on this system (python-oslo-config-3.17.0-0.20160920171017.8db0b7c) and confirmed that it is a newer release than when the string length parameters were introduced in to oslo_config.

Revision history for this message
Diana Clarke (diana-clarke) wrote :

If it were an issue with max_length, I would expect to see an error message like the following: "Value 'https://rdo-ci-fx2-06-s5.v103.rdoci.lab.eng.rdu.redhat.com:13696' exceeds maximum length 10".

- Happy path (no max length)

>>> uri_type = types.URI()
>>> uri_type('https://rdo-ci-fx2-06-s5.v103.rdoci.lab.eng.rdu.redhat.com:13696')
'https://rdo-ci-fx2-06-s5.v103.rdoci.lab.eng.rdu.redhat.com:13696'

- Happy path (large max length)

>>> uri_type = types.URI(max_length=1000)
>>> uri_type('https://rdo-ci-fx2-06-s5.v103.rdoci.lab.eng.rdu.redhat.com:13696')
'https://rdo-ci-fx2-06-s5.v103.rdoci.lab.eng.rdu.redhat.com:13696'

- Error path (exceeds max length)

>>> uri_type = types.URI(max_length=10)
>>> uri_type('https://rdo-ci-fx2-06-s5.v103.rdoci.lab.eng.rdu.redhat.com:13696')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<truncated>/python2.7/site-packages/oslo_config/types.py", line 724, in __call__
    (value, self.max_length))
ValueError: Value 'https://rdo-ci-fx2-06-s5.v103.rdoci.lab.eng.rdu.redhat.com:13696' exceeds maximum length 10

Instead that error message appears to be coming from here, but it's not obvious to me what error condition it's hitting in rfc3986.

https://github.com/openstack/oslo.config/blob/4a691931f378109fd9e1ec97bff0f26dde591e89/oslo_config/types.py#L781

I tried various versions of rfc3986, but I can't reproduce an error with that uri.

$ python
>>> import rfc3986
>>> rfc3986.is_valid_uri('https://rdo-ci-fx2-06-s5.v103.rdoci.lab.eng.rdu.redhat.com:13696', require_scheme=True, require_authority=True)
True

Can you perhaps attach an example config file that causes this error?

Extra whitespace would cause an error message like that one, but whitespace doesn't appear to be the issue based on the trace provided.

>>> uri_type(' https://rdo-ci-fx2-06-s5.v103.rdoci.lab.eng.rdu.redhat.com:13696')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<runcated>/python2.7/site-packages/oslo_config/types.py", line 720, in __call__
    raise ValueError('invalid URI: %r' % value)
ValueError: invalid URI: ' https://rdo-ci-fx2-06-s5.v103.rdoci.lab.eng.rdu.redhat.com:13696'

I'll keep looking, but those are my first-pass, quick thoughts.

Revision history for this message
Dan Trainor (dtrainor) wrote :

Diana, attached is the nova.conf that becomes the product of an Undercloud installation.

Revision history for this message
Dan Trainor (dtrainor) wrote :

Diana, attached is the undercloud.conf used for Undercloud installation.

Changed in tripleo:
milestone: none → ocata-1
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Diana Clarke (diana-clarke) wrote :

I'm also not seeing this error when I use log_opt_values (like the original traceback) with the config file provided by dtrainor (nova.conf):

- log_opt_values happy path:

********************************************************************************
Configuration options gathered from:
command line args: ()
config files: ['nova.conf']
================================================================================
config_dir = None
config_file = ['nova.conf']
neutron.extension_sync_interval = 600
neutron.ovs_bridge = br-int
neutron.region_name =
neutron.url = https://rdo-ci-fx2-06-s4.v103.rdoci.lab.eng.rdu.redhat.com:13696

...

********************************************************************************

- An example non-happy path (I removed "https" from neutron.url)

Traceback (most recent call last):
  File "foo.py", line 76, in <module>
    conf.log_opt_values(logger, logging.DEBUG)
  File ".../oslo_config/cfg.py", line 2626, in log_opt_values
    _sanitize(opt, getattr(group_attr, opt_name)))
  File ".../oslo_config/cfg.py", line 3057, in __getattr__
    return self._conf._get(name, self._group)
  File ".../oslo_config/cfg.py", line 2668, in _get
    value = self._do_get(name, group, namespace)
  File ".../oslo_config/cfg.py", line 2711, in _do_get
    % (opt.name, str(ve)))
oslo_config.cfg.ConfigFileValueError: Value for option url is not valid: invalid URI: '://rdo-ci-fx2-06-s4.v103.rdoci.lab.eng.rdu.redhat.com:13696'

So... I'm still not sure how you would end up in an error state with that uri. Does anyone else have an isolated way of reproducing this with the uri: "https://rdo-ci-fx2-06-s4.v103.rdoci.lab.eng.rdu.redhat.com:13696"?

Revision history for this message
Dan Trainor (dtrainor) wrote :

Diana, can I have a copy of the test code you ran, so that I can attempt to run it in my environment and hopefully drum up some more information?

Revision history for this message
Juan Antonio Osorio Robles (juan-osorio-robles) wrote :

I haven't been able to reproduce it in my environment :/

Revision history for this message
Dan Trainor (dtrainor) wrote :

I did some more tests and found that I can set the url parameter to anything I want, any value of schema, fqdn/ip, or port, but as soon as I put a hyphen anywhere in the value, ConfigFileValueError is thrown.

Revision history for this message
Juan Antonio Osorio Robles (juan-osorio-robles) wrote :

We found out that it's actually an issue with rfc3986, which has been fixed already, but we need to update this library in RHEL

Revision history for this message
YaZug (jon-schlueter) wrote :

What version of rfc3986 was causing the issue and do you have a good way to re-produce this? If so let's get global-requirements.txt updated in the requirements repo to reflect the new minimum version that handles the cases we expect with a useful description of the problem

Revision history for this message
YaZug (jon-schlueter) wrote :

was able to re-produce with what I think was the source of this bug rfc3986 == 0.2.0 (RDO had this version for a while but was updated to 0.3.1 on October 6th)

with 0.2.0

>>> import rfc3986
>>> rfc3986.is_valid_uri('https://rdo-ci-fx2-06-s5.v103.rdoci.lab.eng.rdu.redhat.com:13696')
False

with 0.2.2

>>> import rfc3986
>>> rfc3986.is_valid_uri('https://rdo-ci-fx2-06-s5.v103.rdoci.lab.eng.rdu.redhat.com:13696')
True

Revision history for this message
Matt Riedemann (mriedem) wrote :

FWIW the minimum required version of rfc3986 in global-requirements is 0.2.2:

http://git.openstack.org/cgit/openstack/requirements/tree/global-requirements.txt#n241

Revision history for this message
Matt Riedemann (mriedem) wrote :

This is also a problem though:

http://git.openstack.org/cgit/openstack/requirements/tree/global-requirements.txt#n108

oslo.config>=3.14.0 # Apache-2.0

https://github.com/openstack/oslo.config/blob/3.14.0/requirements.txt#L10

rfc3986>=0.2.0 # Apache-2.0

So we need to update the minimum required version of oslo.config in global-requirements too.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Added nova to this bug because this change in newton:

https://github.com/openstack/nova/commit/6091de77eda12286786e28ae4f0779e7efc54634

Changed novncproxy_base_url to be a URIOpt in oslo.config which uses:

https://github.com/openstack/oslo.config/blob/master/oslo_config/types.py#L779

        if not rfc3986.is_valid_uri(value, require_scheme=True,
                                    require_authority=True):
            raise ValueError('invalid URI: %r' % value)

And as noted above this fails with rfc3986 0.2.0.

For nova in newton we could either:

1. Change the novncproxy_base_url option back to StrOpt
2. Put out a release note for the known issue that the minimum oslo.config/rfc3986 is bad.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Actually now I'm confused because in stable/newton nova requires rfc3986>=0.2.2:

https://github.com/openstack/nova/blob/14.0.0/requirements.txt#L52

Changed in nova:
status: New → Incomplete
Revision history for this message
Matt Kassawara (ionosphere80) wrote :

The Ubuntu package python-nova for Newton installs python-rfc3986 0.2.0 as a dependency.

James Page (james-page)
Changed in nova (Ubuntu Yakkety):
status: New → Incomplete
status: Incomplete → Triaged
Changed in nova (Ubuntu Zesty):
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
James Page (james-page) wrote :

Dropped UCA targets; this package comes directly from Ubuntu, so we should fix it there.

no longer affects: cloud-archive
no longer affects: cloud-archive/newton
affects: nova (Ubuntu Yakkety) → python-rfc3986 (Ubuntu Yakkety)
Changed in python-rfc3986 (Ubuntu Xenial):
status: New → Triaged
importance: Undecided → Medium
Changed in python-rfc3986 (Ubuntu Yakkety):
importance: Undecided → Medium
Revision history for this message
James Page (james-page) wrote :

Ubuntu SRU information

[impact]
hostnames containing '-' get incorrectly rejected as valid uri's

[test case]
>>> import rfc3986
>>> rfc3986.is_valid_uri('https://rdo-ci-fx2-06-s5.v103.rdoci.lab.eng.rdu.redhat.com:13696')
False

[regression potential]
0.2.0 -> 0.2.2 contains the fix for this and two other unrelated fixes; as this has been in upstream openstack gate, I'd prefer to align on this version.

Revision history for this message
James Page (james-page) wrote :

Uploaded 0.2.2 to yakkety and xenial for SRU team review; sync pending approval for zesty (archive currently frozen) - 0.3.x from Debian unstable has the same fixes.

Revision history for this message
Robie Basak (racb) wrote :

https://sources.debian.net/src/python-rfc3986/0.3.1-2/HISTORY.rst/ lists the fixes from 0.2.2 and Zesty is synced from Debian's 0.3.1-2, so this is fixed in Zesty.

Changed in python-rfc3986 (Ubuntu Zesty):
status: Triaged → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Dan, or anyone else affected,

Accepted python-rfc3986 into yakkety-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/python-rfc3986/0.2.2-0ubuntu0.16.10.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in python-rfc3986 (Ubuntu Yakkety):
status: Triaged → Fix Committed
tags: added: verification-needed
Changed in python-rfc3986 (Ubuntu Xenial):
status: Triaged → Fix Committed
Revision history for this message
Brian Murray (brian-murray) wrote :

Hello Dan, or anyone else affected,

Accepted python-rfc3986 into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/python-rfc3986/0.2.2-0ubuntu0.16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Revision history for this message
Rich Art (riccardo-patane-ch) wrote :

I have the same bug. How can I fix this problem?

Revision history for this message
Thiago Martins (martinx) wrote :

Mmmm... Nova, from OpenStack Newton for Xenial, com UCA, doesn't work... I'm seeing at my Nova logs:

* nova-api.log:

https://paste.ubuntu.com/23472691/

* nova-consoleauth.log:

ERROR nova ConfigFileValueError: Value for option url is not valid: invalid URI:

* nova-conductor:

ERROR nova ConfigFileValueError: Value for option url is not valid: invalid URI:

* nova-compute:

Timed out waiting for nova-conductor. Is it running? Or did this service start before nova-conductor? Reattempting establishment of nova-conductor connection...

I have:

---
Package: python-rfc3986
Version: 0.2.0-2
---

I'll try to manually backport it from Debian, since it is 0.3.1 there:

https://tracker.debian.org/pkg/python-rfc3986

So, Newton doesn't work no Xenial... Hmmmm... ;-(

Revision history for this message
Thiago Martins (martinx) wrote :

Oh! There is a new python-rfc3986 verison on xenial-proposed! Trying that... :-P

Revision history for this message
Thiago Martins (martinx) wrote :

It is working now!

python-rfc3986 from xenial-proposed fixes the problem, move it to main! :-D

Steven Hardy (shardy)
Changed in tripleo:
milestone: ocata-1 → ocata-2
Changed in tripleo:
milestone: ocata-2 → ocata-3
Revision history for this message
Juan Antonio Osorio Robles (juan-osorio-robles) wrote :

Emilien, this wasn't a tripleo issue by itself; but an issue with a library version. This has been fixed already.

Changed in tripleo:
status: Triaged → Invalid
Revision history for this message
Dr. Jens Harbott (j-harbott) wrote :

Verified the proposed packages for xenial and yakkety.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package python-rfc3986 - 0.2.2-0ubuntu0.16.04.1

---------------
python-rfc3986 (0.2.2-0ubuntu0.16.04.1) xenial; urgency=medium

  * New upstream point release, resolving issue which causes valid
    URLS to be rejected (LP: #1632538).

 -- James Page <email address hidden> Thu, 20 Oct 2016 09:55:32 +0100

Changed in python-rfc3986 (Ubuntu Xenial):
status: Fix Committed → Fix Released
status: Fix Committed → Fix Released
Revision history for this message
Robie Basak (racb) wrote : Update Released

The verification of the Stable Release Update for python-rfc3986 has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package python-rfc3986 - 0.2.2-0ubuntu0.16.10.1

---------------
python-rfc3986 (0.2.2-0ubuntu0.16.10.1) yakkety; urgency=medium

  * New upstream point release, resolving issue which causes valid
    URLS to be rejected (LP: #1632538).

 -- James Page <email address hidden> Thu, 20 Oct 2016 09:55:32 +0100

Changed in python-rfc3986 (Ubuntu Yakkety):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.