nova populates incomplete cell_mappings

Bug #1717545 reported by Arx Cruz
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
networking-bgpvpn
Fix Released
Undecided
Unassigned
puppet-nova
Fix Released
Critical
Mohammed Naser
tripleo
Fix Released
Critical
Unassigned
Arx Cruz (arxcruz)
Changed in tripleo:
status: New → Triaged
importance: Undecided → Critical
tags: added: alert ci
description: updated
Revision history for this message
wes hayutin (weshayutin) wrote :

This issue is evolving while we continue to debug it. Additional relevant log is
here.
http://logs.openstack.org/49/489749/4/check/gate-tripleo-ci-centos-7-scenario001-multinode-oooq/5ebfa03/logs/subnode-2/var/log/nova/nova-compute.log.txt.gz#_2017-09-15_10_40_05_291

This looks related to nova, and possibly the change to CentOS 7.4

Revision history for this message
Emilien Macchi (emilienm) wrote : Re: python 2.7.5-50 breaks PyMySQL in TripleO CI

so I think this is something that broke in Python.

See: https://centos.pkgs.org/7/centos-x86_64/python-2.7.5-58.el7.x86_64.rpm.html

2017-01-18 - Charalampos Stratakis <email address hidden> - 2.7.5-50
- Fix incorrect parsing of certain regular expressions
Resolves: rhbz#1373363

https://bugzilla.redhat.com/show_bug.cgi?id=1373363

So when upgrading from centos 7.3 to centos 7.4, we upgraded Python from python-2.7.5-48.el7.x86_64 to python-2.7.5-58.el7.x86_64, which contains the patch in 2.7.5-50.

summary: - TestNetworkBasicOps failing due puppet-nova change
+ python 2.7.5-50 breaks PyMySQL in TripleO CI
Changed in tripleo:
milestone: none → queens-1
Revision history for this message
Alan Pevec (apevec) wrote :

Initial comment:
1) file RHEL/python bz
2) let's see if we can workaround in pymysql

Revision history for this message
Damien Ciabrini (dciabrin) wrote :

Emilien, Alan, I'm not sure it's linked to python-2.7.5-58.

Downstream, we're using python-2.7.5-58 and python-PyMySQL-0.7.11-1 both in container images (e.g. Nova) and on bare metal (e.g. Cinder), and the overcloud deployments work.

Maybe it's due to something else why changed lately?

What make us think it would be related to PyMySQL?

Revision history for this message
Alan Pevec (apevec) wrote :

@Damine I don't know, I just had a quick chat with Emilien last night, for now we don't have any other reproducer except the error Wes linked in comment 1.

Revision history for this message
Alan Pevec (apevec) wrote :

@Arx @Wes can we get the db dump from the OC where this is failing? Then we could try re-running the query in 7.3 ?

Revision history for this message
wes hayutin (weshayutin) wrote :

bringing up an environment now

Revision history for this message
Mike Bayer (zzzeek) wrote :

just curious, not seeing the nature of the breakage specific to pymysql. Is there a test case that uses pymysql only illustrating what it's doing wrong?

Revision history for this message
Paul Graydon (twirrim) wrote :

Just a quick comment on the situation: I was the one that filed a bug with RedHat. In turn that was based on a bug that had been previously fixed in Python that resulted in it not parsing valid regex. My understanding of what was happening and the fix would be that it's unlikely to break something, just fix the particular case.

From my perspective, if this regex change has broken the program, it's as likely the problem is in the application than in the patch that was applied, and you should expect that the program is going to start breaking on distributions anyway since it is in upstream python (in fact I know for sure the version of python that comes with 16.04 has the fix in it, as I stumbled across this bug when I moved some code I'd developed on Ubuntu over to an instance based on a RHEL derivative and it broke)

Revision history for this message
wes hayutin (weshayutin) wrote :
Revision history for this message
Michele Baldessari (michele) wrote :
Download full text (11.7 KiB)

Folks, the subject here is misleading. PyMySQL was never able to parse our my.cnf* config files. See this older bug here https://github.com/PyMySQL/PyMySQL/issues/548. It is just not robust enough to do that. So, simply put, something has started forcing this parsing of the file and we need to figure out which component it is.

From http://logs.openstack.org/27/495827/3/gate/gate-tripleo-ci-centos-7-scenario003-multinode-oooq/fe03dfb/logs/subnode-2/var/log/nova/nova-api.log.txt.gz#_2017-09-16_20_42_14_190 we have:
2017-09-16 20:42:14.190 56954 ERROR nova.api.openstack.extensions [req-3af7bcd6-f34b-4be2-b272-0f50c75ce2d1 47aca4316a9a40ca8ac91f3c61897fc7 b1b6b239fc014287bf01d0e9ba0d3df9 - default default] Unexpected exception in API method: ParsingError: File contains parsing errors: /etc/my.cnf
 [line 17]: '!includedir /etc/my.cnf.d\n'
2017-09-16 20:42:14.190 56954 ERROR nova.api.openstack.extensions Traceback (most recent call last):
2017-09-16 20:42:14.190 56954 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/site-packages/nova/api/openstack/extensions.py", line 336, in wrapped
2017-09-16 20:42:14.190 56954 ERROR nova.api.openstack.extensions return f(*args, **kwargs)
2017-09-16 20:42:14.190 56954 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 108, in wrapper
2017-09-16 20:42:14.190 56954 ERROR nova.api.openstack.extensions return func(*args, **kwargs)
2017-09-16 20:42:14.190 56954 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 108, in wrapper
2017-09-16 20:42:14.190 56954 ERROR nova.api.openstack.extensions return func(*args, **kwargs)
2017-09-16 20:42:14.190 56954 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 108, in wrapper
2017-09-16 20:42:14.190 56954 ERROR nova.api.openstack.extensions return func(*args, **kwargs)
2017-09-16 20:42:14.190 56954 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 108, in wrapper
2017-09-16 20:42:14.190 56954 ERROR nova.api.openstack.extensions return func(*args, **kwargs)
2017-09-16 20:42:14.190 56954 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 108, in wrapper
2017-09-16 20:42:14.190 56954 ERROR nova.api.openstack.extensions return func(*args, **kwargs)
2017-09-16 20:42:14.190 56954 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 108, in wrapper
2017-09-16 20:42:14.190 56954 ERROR nova.api.openstack.extensions return func(*args, **kwargs)
2017-09-16 20:42:14.190 56954 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 108, in wrapper
2017-09-16 20:42:14.190 56954 ERROR nova.api.openstack.extensions return func(*args, **kwargs)
2017-09-16 20:42:14.190 56954 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/site-packages/nova/api/openstack/compute/servers.py", line 494, in create...

Revision history for this message
Michele Baldessari (michele) wrote :

So one potential fix is to get a pymysql version that has commit https://github.com/PyMySQL/PyMySQL/commit/af73b039ebbf93c3185b5e0fdef8cf591b943f24

With that commit I can parse our my.cnf file. And without it I get:
Traceback (most recent call last):
  File "test.py", line 6, in <module>
    conn = pymysql.connect(host='localhost', port=3306, user='root', passwd='', db='mysql', read_default_file="my.cnf")
  File "/home/michele/Devel/openstack/PyMySQL/pymysql/__init__.py", line 90, in Connect
    return Connection(*args, **kwargs)
  File "/home/michele/Devel/openstack/PyMySQL/pymysql/connections.py", line 619, in __init__
    cfg.read(os.path.expanduser(read_default_file))
  File "/usr/lib64/python2.7/ConfigParser.py", line 305, in read
    self._read(fp, filename)
  File "/usr/lib64/python2.7/ConfigParser.py", line 546, in _read
    raise e
ConfigParser.ParsingError: File contains parsing errors: my.cnf
        [line 17]: '!includedir /etc/my.cnf.d\n'

While we *could* add that commit to our pymysql packages (or rebase them to include it), I think we should understand what has changed so that we now do parse my.cnf. I poked around nova & oslo.db but could not spot anything relevant yet.

Revision history for this message
Damien Ciabrini (dciabrin) wrote :

A quick summary of what we know and what we found so far.

Facts:
- The PyMySQL code has never read /etc/my.cnf because of what Michele said in comment #11.
- All the DB URI that we configure in tripleo are of the form:

   connection=mysql+pymysql://user:pass@IP/service?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf

So all CI jobs error out trying to parse /etc/my.cnf.. by looking at the PyMySQL code [1] it looks like PyMySQL is being passed only half of the options, namely only "read_default_group".
In that case, PyMySQL would use a default value of /etc/my.cnf as read_default_file. And the parsing would break as explained in comment #11.

PyMySQL's read_default_* are being initialized by overriding optional arguments of class Connection.

SQLAlchemy is in charge of passing those arguments to PyMYSQL, but SQLAlchemy itself has no knowledge of those value, it's just forwarding the key/values that's it's being passed by oslo.db, or whichever component parsed and validated the configuration file in the first place.

So our current guess is that something changed lately which makes the DB URI being incorrectly parsed/validated.

Still investigating the reasons why

[1] https://github.com/PyMySQL/PyMySQL/blob/master/pymysql/connections.py#L608-L612

Revision history for this message
Michele Baldessari (michele) wrote :

So thanks to Gabriele who gave us an environment we believe to have found the culprit.
Namely after dumpint nova_api db we see the following:
INSERT INTO `cell_mappings` VALUES ('2017-09-18 03:12:27','2017-09-18 03:12:35',1,'00000000-0000-0000-0000-000000000000','cell0','none:///','mysql+pymysql://nova:trQurN7uwwDbhQWTnN8dCnVxQ@192
.168.24.7/nova_cell0?read_default_group=tripleo'),('2017-09-18 03:12:35',NULL,2,'22db2ca7-51db-4957-b811-765791f10436','default','rabbit://guest:R8nBKQpPGpWY6aeqWmBR8nDs4@multinode-featureset
007-overcloud-0.internalapi.localdomain:5672/?ssl=0','mysql+pymysql://nova:trQurN7uwwDbhQWTnN8dCnVxQ@192.168.24.7/nova?read_default_group=tripleo&read_default_file=/etc/my.cnf.d/tripleo.cnf')
;

Notice how the cell0 entry only has 'read_default_group=tripleo' and does not have 'read_default_file=/etc/my.cnf.d/tripleo.cnf'. Since we are missing read_default_file, we hit the following code in pymysql https://github.com/PyMySQL/PyMySQL/blob/master/pymysql/connections.py#L608-L612 which will set the read_default_file to '/etc/my.cnf'.

We need to understand why this cell0 row is created with only parts of the connection strings.

summary: - python 2.7.5-50 breaks PyMySQL in TripleO CI
+ nova populates incomplete cell_mappings
Revision history for this message
Michele Baldessari (michele) wrote :

I put my 2c on this change being the one that broke us (since we do not do tempest runs on standard gates it would explain the green ci on the tripleo job?):
https://review.openstack.org/#/c/501669/

Maybe after this change under certain conditions we strip off parameters after the '&' sign in the string?

Revision history for this message
Arx Cruz (arxcruz) wrote :

@Michele

We do run tempest on these gates, as you can see in [1] and TestNetworkBasicOps is one of the last tests on the list (second from down to top), and it pass.

1 - http://logs.openstack.org/69/501669/4/gate/gate-puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/244fbb2/logs/testr_results.html.gz

Revision history for this message
Michele Baldessari (michele) wrote :

@Arx, right but that job does not test tripleo (see http://logs.openstack.org/69/501669/4/gate/gate-puppet-openstack-integration-4-scenario001-tempest-ubuntu-xenial/244fbb2/logs/etc/nova/nova.conf.txt.gz and the missing extra parameters in the connection strings), so it is not relevant to this discussion.

Revision history for this message
wes hayutin (weshayutin) wrote :

Here is the review to update the check we do have on puppet-* to use tempest

https://review.openstack.org/#/c/504444/

Revision history for this message
Mohammed Naser (mnaser) wrote :

https://review.openstack.org/#/c/504977/

i think this might solve it, it has a -1 for an unrelated failure but the following is being used to test it:

https://review.openstack.org/#/c/504947/

i'll updated with closes-bug once the t-h-t change passes so it can go through ci again and not reset the stack of changes now.

Revision history for this message
Mohammed Naser (mnaser) wrote :

I didn't want to wait all the way until the long upgrade job finishes but the it did indeed pass:

http://logs.openstack.org/47/504947/3/check/

Please prioritize merging the following to resolve the issue:

https://review.openstack.org/#/c/504977/

In addition, I suggest that we add gate-tripleo-ci-centos-7-scenario jobs to puppet-nova so that we can catch this before next time.

Revision history for this message
Michele Baldessari (michele) wrote :
tags: removed: alert
Changed in tripleo:
status: Triaged → Fix Released
Changed in puppet-nova:
status: New → Fix Released
importance: Undecided → Medium
assignee: nobody → Mohammed Naser (mnaser)
importance: Medium → Critical
Changed in bgpvpn:
status: New → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-nova 11.4.0

This issue was fixed in the openstack/puppet-nova 11.4.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-nova 12.0.0

This issue was fixed in the openstack/puppet-nova 12.0.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to puppet-nova (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/521945

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-nova (stable/ocata)

Reviewed: https://review.openstack.org/521945
Committed: https://git.openstack.org/cgit/openstack/puppet-nova/commit/?id=4bd704f3bb8b04de761ac6cd177f6685a783693c
Submitter: Zuul
Branch: stable/ocata

commit 4bd704f3bb8b04de761ac6cd177f6685a783693c
Author: Mohammed Naser <email address hidden>
Date: Mon Sep 18 20:07:57 2017 -0400

    Pass arguments as array for nova-manage provider

    The arguments were provided as a string which would result in
    mangling if there are any symbols in the arguments. This
    patch changes the behaviour to pass an array which will
    prevent any of those issues.

    Closes-Bug: #1717545

    Change-Id: I9c87072aaa218658b943c1ee30caa448aae8bdd7
    (cherry picked from commit 379e601c6873501818a5ce6033abefb9afde121b)

tags: added: in-stable-ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-nova 10.5.0

This issue was fixed in the openstack/puppet-nova 10.5.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.