tripleo::profile::base classes that need to sync db do not work with multiple controllers

Bug #1600149 reported by Michele Baldessari
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
puppet-tripleo
Fix Released
Undecided
Unassigned

Bug Description

Within the Next Generation HA architecture there will be a bunch of services running via systemd
on the controllers (See https://review.openstack.org/#/c/299628/ for more info).

This means that we simply switch the services not managed by pacemaker to their corresponding
tripleo::profile::base::<service> class. The problem is that these classes sync_db is defaulted
to true and (it seems?) that parameter cannot be easily changed via heat to be set to true only on bootstrap nodes.

The following happens when we simply switch some services to systemd on the controllers:
1)
Jul 8 00:50:39 localhost os-collect-config: [neutron-db-sync]/returns: File \"/usr/lib/python2.7/site-packages/pymysql/connections.py\", line 1138, in read\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Db::Sync/Exec[neutron-db-sync]/returns: first_packet = self.connection._read_packet()\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Db::Sync/Exec[neutron-db-sync]/returns: File \"/usr/lib/python2.7/site-packages/pymysql/connections.py\", line 906, in _read_packet\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Db::Sync/Exec[neutron-db-sync]/returns: packet.check_error()\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Db::Sync/Exec[neutron-db-sync]/returns: File \"/usr/lib/python2.7/site-packages/pymysql/connections.py\", line 367, in check_error\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Db::Sync/Exec[neutron-db-sync]/returns: err.raise_mysql_exception(self._data)\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Db::Sync/Exec[neutron-db-sync]/returns: File \"/usr/lib/python2.7/site-packages/pymysql/err.py\", line 120, in raise_mysql_exception\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Db::Sync/Exec[neutron-db-sync]/returns: _check_mysql_exception(errinfo)\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Db::Sync/Exec[neutron-db-sync]/returns: File \"/usr/lib/python2.7/site-packages/pymysql/err.py\", line 115, in _check_mysql_exception\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Db::Sync/Exec[neutron-db-sync]/returns: raise InternalError(errno, errorvalue)\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Db::Sync/Exec[neutron-db-sync]/returns: oslo_db.exception.DBError: (pymysql.err.InternalError) (1050, u\"Table 'lbaas_healthmonitors' already exists\") [SQL: u\"\\nCREATE TABLE lbaas_healthmonitors (\\n\\ttenant_id VARCHAR(255), \\n\\tid VARCHAR(36) NOT NULL, \\n\\ttype ENUM('PING','TCP','HTTP','HTTPS') NOT NULL, \\n\\tdelay INTEGER NOT NULL, \\n\\ttimeout INTEGER NOT NULL, \\n\\tmax_retries INTEGER NOT NULL, \\n\\thttp_method VARCHAR(16), \\n\\turl_path VARCHAR(255), \\n\\texpected_codes VARCHAR(64), \\n\\tstatus

2) It will fail on the creation of the specific service user because the user already exists.

For now our workaround in https://review.openstack.org/#/c/338387/ is the following:
 class tripleo::profile::base::nova::api (
- $step = hiera('step'),
- $sync_db = true,
+ $bootstrap_node = hiera('bootstrap_nodeid'),
+ $step = hiera('step'),
 ) {
+ if $::hostname == downcase($bootstrap_node) {
+ $sync_db = true
+ } else {
+ $sync_db = false
+ }

We were wondering if there are better approaches to solve this issue though.

Revision history for this message
Giulio Fidente (gfidente) wrote :

Right :(

I think we need to move the logic from the pacemaker roles in the base roles because it's used by both!

Changed in puppet-tripleo:
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to puppet-tripleo (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/339423

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on puppet-tripleo (master)

Change abandoned by Giulio Fidente (<email address hidden>) on branch: master
Review: https://review.openstack.org/339423
Reason: we can close this and implement same change for all services in a single change at https://review.openstack.org/#/c/338387/

Revision history for this message
Michele Baldessari (michele) wrote :

We are currently working via https://review.openstack.org/#/c/338387/ to fix this.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (master)

Reviewed: https://review.openstack.org/338387
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=5927148c4b5813180204c2983b5c95b69a2ad265
Submitter: Jenkins
Branch: master

commit 5927148c4b5813180204c2983b5c95b69a2ad265
Author: Michele Baldessari <email address hidden>
Date: Wed Jul 13 16:30:45 2016 -0400

    Make ::tripleo::profile::base classes work with multiple nodes

    In the Next Generation HA architecture a number of active/active services
    will be run via systemd. In order for this to work we need to make sure that
    the sync_db operation only takes place on the bootstrap node, just like it is
    done today for the pacemaker profiles.

    We do this by removing sync_db as a parameter and instead set it to true
    or false depending if the hostname matches the bootstrap_node as it is done
    today in the pacemaker role.

    Note that we call hiera('bootstrap_nodeid', undef) because if a profile
    is included on a non controller node that variable will be undefined.

    The following testing was done:
    - HA puppet-pacemaker.yaml scenario with three computes
    - NonHA with one controller
    - NonHA with three controllers

    Fixes-Bug: 1600149

    Co-Author: <email address hidden>

    Change-Id: I04a7b9e3c18627ea512000a34357acb7f27d6e0e
    Implements: blueprint ha-lightweight-architecture

Changed in puppet-tripleo:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.