n->o: cinder upgrade with pacemaker can fail.

Bug #1709315 reported by Sofer Athlan-Guyot
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Sofer Athlan-Guyot

Bug Description

Hi, originally reported there https://bugzilla.redhat.com/show_bug.cgi?id=1479327.

When doing upgrade of newton to ocata upgrade: major upgrade composable step fails on composable roles deployment while running cinder-manage db sync:

[stack@undercloud-0 ~]$ openstack stack failures list overcloud
overcloud.AllNodesDeploySteps.ControllerUpgrade_Step5.0:
  resource_type: OS::Heat::SoftwareDeployment
  physical_resource_id: 2c5739e7-6099-4830-98ef-eb610d747866
  status: CREATE_FAILED
  status_reason: |
    Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
  deploy_stdout: |
    ...
    TASK [Gathering Facts] *********************************************************
    ok: [localhost]

    TASK [Sync cinder DB] **********************************************************
    fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["cinder-manage", "db", "sync"], "delta": "0:00:01.579537", "end": "2017-08-08 11:10:19.235581", "failed": true, "rc": 1, "start": "2017-08-08 11:10:17.656044", "stderr": "Option \"logdir\" from group \"DEFAULT\" is deprecated. Use option \"log-dir\" from group \"DEFAULT\".", "stderr_lines": ["Option \"logdir\" from group \"DEFAULT\" is deprecated. Use option \"log-dir\" from group \"DEFAULT\"."], "stdout": "", "stdout_lines": []}
     to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/84534f12-7925-4c8c-96c8-b2fd69857f54_playbook.retry

    PLAY RECAP *********************************************************************
    localhost : ok=1 changed=0 unreachable=0 failed=1

    (truncated, view all with --long)
  deploy_stderr: |

overcloud.AllNodesDeploySteps.ServiceApiUpgrade_Step5:
  resource_type: OS::Heat::SoftwareDeploymentGroup
  physical_resource_id: f3714032-4cce-4f76-9797-18303b065d5e
  status: CREATE_FAILED
  status_reason: |
    CREATE aborted
overcloud.AllNodesDeploySteps.CephStorageUpgrade_Step5:
  resource_type: OS::Heat::SoftwareDeploymentGroup
  physical_resource_id: ae7b3aab-8b18-433a-bbff-97aa4148ce83
  status: CREATE_FAILED
  status_reason: |
    CREATE aborted

We have this exception in the cinder-manage.log:

2017-08-08 11:10:19.121 359275 INFO migrate.versioning.api [-] 89 -> 90...
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters [-] DBAPIError exception wrapped from (pymysql.err.InternalError) (1060, u"Duplicate column name 'race_preventer'") [SQL: u'\nALTER TABLE workers ADD race_preventer INTEGER NOT NULL DEFAULT 0']
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters Traceback (most recent call last):
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters context)
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters cursor.execute(statement, parameters)
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 166, in execute
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters result = self._query(query)
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 322, in _query
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters conn.query(q)
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 841, in query
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters self._affected_rows = self._read_query_result(unbuffered=unbuffered)
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1029, in _read_query_result
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters result.read()
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1312, in read
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters first_packet = self.connection._read_packet()
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 991, in _read_packet
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters packet.check_error()
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 393, in check_error
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters err.raise_mysql_exception(self._data)
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/err.py", line 107, in raise_mysql_exception
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters raise errorclass(errno, errval)
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters InternalError: (1060, u"Duplicate column name 'race_preventer'")
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters
2017-08-08 11:10:19.140 359275 CRITICAL cinder [-] DBError: (pymysql.err.InternalError) (1060, u"Duplicate column name 'race_preventer'") [SQL: u'\nALTER TABLE workers ADD race_preventer INTEGER NOT NULL DEFAULT 0']

on ctl0, and we can see that the migration already happened on ctl1:

2017-08-08 11:10:18.959 349866 DEBUG migrate.versioning.repository [-] Config: OrderedDict([('db_settings', OrderedDict([('__name__', 'db_settings'), ('repository_id', 'cinder'), ('version_table', 'migrate_version'), ('required_dbs', '[]')]))]) __init__ /usr/lib/python2.7/site-packages/migrate/versioning/repository.py:83
2017-08-08 11:10:18.969 349866 INFO migrate.versioning.api [-] 79 -> 80...
2017-08-08 11:10:18.977 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:18.977 349866 INFO migrate.versioning.api [-] 80 -> 81...
2017-08-08 11:10:18.982 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:18.983 349866 INFO migrate.versioning.api [-] 81 -> 82...
2017-08-08 11:10:18.988 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:18.989 349866 INFO migrate.versioning.api [-] 82 -> 83...
2017-08-08 11:10:18.994 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:18.995 349866 INFO migrate.versioning.api [-] 83 -> 84...
2017-08-08 11:10:19.000 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:19.000 349866 INFO migrate.versioning.api [-] 84 -> 85...
2017-08-08 11:10:19.025 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:19.025 349866 INFO migrate.versioning.api [-] 85 -> 86...
2017-08-08 11:10:19.053 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:19.053 349866 INFO migrate.versioning.api [-] 86 -> 87...
2017-08-08 11:10:19.068 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:19.068 349866 INFO migrate.versioning.api [-] 87 -> 88...
2017-08-08 11:10:19.101 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:19.101 349866 INFO migrate.versioning.api [-] 88 -> 89...
2017-08-08 11:10:19.117 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:19.117 349866 INFO migrate.versioning.api [-] 89 -> 90...
2017-08-08 11:10:19.142 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:19.142 349866 INFO migrate.versioning.api [-] 90 -> 91...
2017-08-08 11:10:19.175 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:19.175 349866 INFO migrate.versioning.api [-] 91 -> 92...
2017-08-08 11:10:19.181 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:19.182 349866 INFO migrate.versioning.api [-] 92 -> 93...
2017-08-08 11:10:19.187 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:19.188 349866 INFO migrate.versioning.api [-] 93 -> 94...
2017-08-08 11:10:19.194 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:19.194 349866 INFO migrate.versioning.api [-] 94 -> 95...
2017-08-08 11:10:19.202 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:19.202 349866 INFO migrate.versioning.api [-] 95 -> 96...
2017-08-08 11:10:19.210 349866 INFO migrate.versioning.api [-] done

hence the duplicate error.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/491799

Changed in tripleo:
assignee: nobody → Sofer Athlan-Guyot (sofer-athlan-guyot)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/491799
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=42d8a1c9440d4f37830bdcbda9c88a6bf39265f5
Submitter: Jenkins
Branch: master

commit 42d8a1c9440d4f37830bdcbda9c88a6bf39265f5
Author: Sofer Athlan-Guyot <email address hidden>
Date: Tue Aug 8 15:18:42 2017 +0200

    Make cinder-manage db sync run on only one controller during upgrade

    We got to ensure that the cinder-manage db sync is run on only one
    controller.

    Change-Id: I88a6aa4c49d893b95a26795fbfcf163a780fd0bc
    Closes-Bug: #1709315

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/494976

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 7.0.0.0rc1

This issue was fixed in the openstack/tripleo-heat-templates 7.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/ocata)

Reviewed: https://review.openstack.org/494976
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=5c3cbe721dcc443b7b7891c3a94e4b8617ecca0b
Submitter: Jenkins
Branch: stable/ocata

commit 5c3cbe721dcc443b7b7891c3a94e4b8617ecca0b
Author: Sofer Athlan-Guyot <email address hidden>
Date: Tue Aug 8 15:18:42 2017 +0200

    Make cinder-manage db sync run on only one controller during upgrade

    We got to ensure that the cinder-manage db sync is run on only one
    controller.

    Change-Id: I88a6aa4c49d893b95a26795fbfcf163a780fd0bc
    Closes-Bug: #1709315
    (cherry picked from commit 42d8a1c9440d4f37830bdcbda9c88a6bf39265f5)

tags: added: in-stable-ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 6.2.2

This issue was fixed in the openstack/tripleo-heat-templates 6.2.2 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.