n->o: cinder upgrade with pacemaker can fail.

Bug #1709315 reported by Sofer Athlan-Guyot on 2017-08-08
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Critical
Sofer Athlan-Guyot

Bug Description

Hi, originally reported there https://bugzilla.redhat.com/show_bug.cgi?id=1479327.

When doing upgrade of newton to ocata upgrade: major upgrade composable step fails on composable roles deployment while running cinder-manage db sync:

[stack@undercloud-0 ~]$ openstack stack failures list overcloud
overcloud.AllNodesDeploySteps.ControllerUpgrade_Step5.0:
  resource_type: OS::Heat::SoftwareDeployment
  physical_resource_id: 2c5739e7-6099-4830-98ef-eb610d747866
  status: CREATE_FAILED
  status_reason: |
    Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
  deploy_stdout: |
    ...
    TASK [Gathering Facts] *********************************************************
    ok: [localhost]

    TASK [Sync cinder DB] **********************************************************
    fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["cinder-manage", "db", "sync"], "delta": "0:00:01.579537", "end": "2017-08-08 11:10:19.235581", "failed": true, "rc": 1, "start": "2017-08-08 11:10:17.656044", "stderr": "Option \"logdir\" from group \"DEFAULT\" is deprecated. Use option \"log-dir\" from group \"DEFAULT\".", "stderr_lines": ["Option \"logdir\" from group \"DEFAULT\" is deprecated. Use option \"log-dir\" from group \"DEFAULT\"."], "stdout": "", "stdout_lines": []}
     to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/84534f12-7925-4c8c-96c8-b2fd69857f54_playbook.retry

    PLAY RECAP *********************************************************************
    localhost : ok=1 changed=0 unreachable=0 failed=1

    (truncated, view all with --long)
  deploy_stderr: |

overcloud.AllNodesDeploySteps.ServiceApiUpgrade_Step5:
  resource_type: OS::Heat::SoftwareDeploymentGroup
  physical_resource_id: f3714032-4cce-4f76-9797-18303b065d5e
  status: CREATE_FAILED
  status_reason: |
    CREATE aborted
overcloud.AllNodesDeploySteps.CephStorageUpgrade_Step5:
  resource_type: OS::Heat::SoftwareDeploymentGroup
  physical_resource_id: ae7b3aab-8b18-433a-bbff-97aa4148ce83
  status: CREATE_FAILED
  status_reason: |
    CREATE aborted

We have this exception in the cinder-manage.log:

2017-08-08 11:10:19.121 359275 INFO migrate.versioning.api [-] 89 -> 90...
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters [-] DBAPIError exception wrapped from (pymysql.err.InternalError) (1060, u"Duplicate column name 'race_preventer'") [SQL: u'\nALTER TABLE workers ADD race_preventer INTEGER NOT NULL DEFAULT 0']
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters Traceback (most recent call last):
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters context)
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters cursor.execute(statement, parameters)
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 166, in execute
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters result = self._query(query)
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 322, in _query
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters conn.query(q)
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 841, in query
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters self._affected_rows = self._read_query_result(unbuffered=unbuffered)
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1029, in _read_query_result
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters result.read()
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1312, in read
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters first_packet = self.connection._read_packet()
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 991, in _read_packet
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters packet.check_error()
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 393, in check_error
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters err.raise_mysql_exception(self._data)
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/err.py", line 107, in raise_mysql_exception
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters raise errorclass(errno, errval)
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters InternalError: (1060, u"Duplicate column name 'race_preventer'")
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters
2017-08-08 11:10:19.140 359275 CRITICAL cinder [-] DBError: (pymysql.err.InternalError) (1060, u"Duplicate column name 'race_preventer'") [SQL: u'\nALTER TABLE workers ADD race_preventer INTEGER NOT NULL DEFAULT 0']

on ctl0, and we can see that the migration already happened on ctl1:

2017-08-08 11:10:18.959 349866 DEBUG migrate.versioning.repository [-] Config: OrderedDict([('db_settings', OrderedDict([('__name__', 'db_settings'), ('repository_id', 'cinder'), ('version_table', 'migrate_version'), ('required_dbs', '[]')]))]) __init__ /usr/lib/python2.7/site-packages/migrate/versioning/repository.py:83
2017-08-08 11:10:18.969 349866 INFO migrate.versioning.api [-] 79 -> 80...
2017-08-08 11:10:18.977 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:18.977 349866 INFO migrate.versioning.api [-] 80 -> 81...
2017-08-08 11:10:18.982 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:18.983 349866 INFO migrate.versioning.api [-] 81 -> 82...
2017-08-08 11:10:18.988 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:18.989 349866 INFO migrate.versioning.api [-] 82 -> 83...
2017-08-08 11:10:18.994 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:18.995 349866 INFO migrate.versioning.api [-] 83 -> 84...
2017-08-08 11:10:19.000 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:19.000 349866 INFO migrate.versioning.api [-] 84 -> 85...
2017-08-08 11:10:19.025 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:19.025 349866 INFO migrate.versioning.api [-] 85 -> 86...
2017-08-08 11:10:19.053 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:19.053 349866 INFO migrate.versioning.api [-] 86 -> 87...
2017-08-08 11:10:19.068 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:19.068 349866 INFO migrate.versioning.api [-] 87 -> 88...
2017-08-08 11:10:19.101 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:19.101 349866 INFO migrate.versioning.api [-] 88 -> 89...
2017-08-08 11:10:19.117 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:19.117 349866 INFO migrate.versioning.api [-] 89 -> 90...
2017-08-08 11:10:19.142 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:19.142 349866 INFO migrate.versioning.api [-] 90 -> 91...
2017-08-08 11:10:19.175 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:19.175 349866 INFO migrate.versioning.api [-] 91 -> 92...
2017-08-08 11:10:19.181 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:19.182 349866 INFO migrate.versioning.api [-] 92 -> 93...
2017-08-08 11:10:19.187 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:19.188 349866 INFO migrate.versioning.api [-] 93 -> 94...
2017-08-08 11:10:19.194 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:19.194 349866 INFO migrate.versioning.api [-] 94 -> 95...
2017-08-08 11:10:19.202 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:19.202 349866 INFO migrate.versioning.api [-] 95 -> 96...
2017-08-08 11:10:19.210 349866 INFO migrate.versioning.api [-] done

hence the duplicate error.

Fix proposed to branch: master
Review: https://review.openstack.org/491799

Changed in tripleo:
assignee: nobody → Sofer Athlan-Guyot (sofer-athlan-guyot)
status: Confirmed → In Progress

Reviewed: https://review.openstack.org/491799
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=42d8a1c9440d4f37830bdcbda9c88a6bf39265f5
Submitter: Jenkins
Branch: master

commit 42d8a1c9440d4f37830bdcbda9c88a6bf39265f5
Author: Sofer Athlan-Guyot <email address hidden>
Date: Tue Aug 8 15:18:42 2017 +0200

    Make cinder-manage db sync run on only one controller during upgrade

    We got to ensure that the cinder-manage db sync is run on only one
    controller.

    Change-Id: I88a6aa4c49d893b95a26795fbfcf163a780fd0bc
    Closes-Bug: #1709315

Changed in tripleo:
status: In Progress → Fix Released

This issue was fixed in the openstack/tripleo-heat-templates 7.0.0.0rc1 release candidate.

Reviewed: https://review.openstack.org/494976
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=5c3cbe721dcc443b7b7891c3a94e4b8617ecca0b
Submitter: Jenkins
Branch: stable/ocata

commit 5c3cbe721dcc443b7b7891c3a94e4b8617ecca0b
Author: Sofer Athlan-Guyot <email address hidden>
Date: Tue Aug 8 15:18:42 2017 +0200

    Make cinder-manage db sync run on only one controller during upgrade

    We got to ensure that the cinder-manage db sync is run on only one
    controller.

    Change-Id: I88a6aa4c49d893b95a26795fbfcf163a780fd0bc
    Closes-Bug: #1709315
    (cherry picked from commit 42d8a1c9440d4f37830bdcbda9c88a6bf39265f5)

tags: added: in-stable-ocata

This issue was fixed in the openstack/tripleo-heat-templates 6.2.2 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers