M/N upgrade: blockstorage fails to converge.

Bug #1633073 reported by Sofer Athlan-Guyot
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

Hi,

During a full upgrade of Mitaka to Newton the convergence step on the blockstorage fails with:

    Error: Could not find dependency Exec[wait-for-settle] for Pacemaker::Resource::Systemd[openstack-cinder-volume]

This is on the blockstorage node! It believes that it's the bootstrap
node and tries to create the pacemaker resource.

The reason seems to be linked to the commit
b345dbea16ad3edd600c62848d8ee116f4df16ee.

The bootstrap_nodeid, now looks like that:

[stack@instack ~]$ nova list
+--------------------------------------+---------------------------+--------+------------+-------------+---------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+---------------------------+--------+------------+-------------+---------------------+
| 750b6855-e59f-4556-9a3f-0cde0b6dfe71 | overcloud-blockstorage-0 | ACTIVE | - | Running | ctlplane=192.0.2.11 |
| 522c9203-e906-47c4-a94b-b00cf8b20397 | overcloud-cephstorage-0 | ACTIVE | - | Running | ctlplane=192.0.2.7 |
| afdb238d-fea5-4ca0-911f-eea7bee9fca5 | overcloud-controller-0 | ACTIVE | - | Running | ctlplane=192.0.2.13 |
| 365f459d-a377-4335-b5d6-06df69811c9f | overcloud-controller-1 | ACTIVE | - | Running | ctlplane=192.0.2.10 |
| 8d23bc79-864b-4c8f-bb8d-0e9419568041 | overcloud-controller-2 | ACTIVE | - | Running | ctlplane=192.0.2.12 |
| 5c752ac3-b219-4036-9986-a1c2554d2a7b | overcloud-novacompute-0 | ACTIVE | - | Running | ctlplane=192.0.2.9 |
| 4801bde7-2f2c-429c-94cf-bfec580374f4 | overcloud-objectstorage-0 | ACTIVE | - | Running | ctlplane=192.0.2.8 |
+--------------------------------------+---------------------------+--------+------------+-------------+---------------------+

    ==> 192.0.2.11
    bootstrap_nodeid: overcloud-blockstorage-0
    bootstrap_nodeid_ip: 192.0.2.11
    ==> 192.0.2.7
    bootstrap_nodeid: overcloud-cephstorage-0
    bootstrap_nodeid_ip: 192.0.2.7
    ==> 192.0.2.13
    bootstrap_nodeid: overcloud-controller-0
    bootstrap_nodeid_ip: 192.0.2.13
    ==> 192.0.2.10
    bootstrap_nodeid: overcloud-controller-0
    bootstrap_nodeid_ip: 192.0.2.13
    ==> 192.0.2.12
    bootstrap_nodeid: overcloud-controller-0
    bootstrap_nodeid_ip: 192.0.2.13
    ==> 192.0.2.9
    bootstrap_nodeid: overcloud-novacompute-0
    bootstrap_nodeid_ip: 192.0.2.9
    ==> 192.0.2.8
    bootstrap_nodeid: overcloud-objectstorage-0
    bootstrap_nodeid_ip: 192.0.2.8

So on all roles in puppet-tripleo that check the bootstrap_node,
thinking it will be a pacemaker master fails on everything but the
controller nodes.

In the case of this bug this is the kind of code I'm talking about:

  $bootstrap_node = hiera('bootstrap_nodeid'),

  if $::hostname == downcase($bootstrap_node) {
    $pacemaker_master = true
  } else {
    $pacemaker_master = false
  }
  ...
  if $step >= 5 and $pacemaker_master {
    pacemaker::resource::service { $::cinder::params::volume_service :
      op_params => 'start timeout=200s stop timeout=200s',
    }
  }

in manifests/profile/pacemaker/cinder/volume.pp

So the bootstrap_nodeip is associated with the pacemaker_master, which
is not true anymore.

The tripleo code base is filled with this idiom.

Changed in tripleo:
importance: Undecided → Critical
milestone: none → newton-rc3
Revision history for this message
Sofer Athlan-Guyot (sofer-athlan-guyot) wrote :

This is a duplicate of https://bugs.launchpad.net/tripleo/+bug/1628912 which has been fixed.

Changed in tripleo:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.