Removing default mysql accounts fails during step1 when using pacemaker HA

Bug #1633113 reported by James Slagle
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Emilien Macchi

Bug Description

We have remove_default_accounts=true in our Pacemaker HA jobs, and that causes puppetlabs-mysql to access the database and remove the default accounts.

However, this fails during step1 of the deployment because the galera ocf resource is not defined in pacemaker until step2, so there is no database running during step1.

Interestingly, this does not fail the puppet run during step1, which is why our ovb HA jobs are not failing. You just end up with this error message in the puppet logs:

Oct 13 09:04:42 overcloud-controller-0 os-collect-config[2326]: 0m\n\u001b[mNotice: /File[/etc/haproxy/haproxy.cfg]/seluser: seluser changed 'unconfined_u' to 'system_u'\u001b[0m\n\u001b[mNotice: /Stage[main]/Tripleo::Profile:
:Base::Haproxy/Exec[haproxy-reload]: Triggered 'refresh' from 1 events\u001b[0m\n\u001b[mNotice: /Stage[runtime]/Tripleo::Firewall::Post/Firewall[998 log all]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[runtime]/Tripleo::
Firewall::Post/Tripleo::Firewall::Rule[999 drop all]/Firewall[999 drop all]/ensure: created\u001b[0m\n\u001b[mNotice: Finished catalog run in 75.63 seconds\u001b[0m\n", "deploy_stderr": "exception: connect failed\n\u001b[1;31m
Warning: Scope(Haproxy::Config[haproxy]): haproxy: The $merge_options parameter will default to true in the next major release. Please review the documentation regarding the implications.\u001b[0m\n\u001b[1;31mError: Could not
 prefetch mysql_user provider 'mysql': Execution of '/usr/bin/mysql -NBe SELECT CONCAT(User, '@',Host) AS User FROM mysql.user' returned 1: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql
/mysql.sock' (2 \"No such file or directory\")\u001b[0m\n\u001b[1;31mError: Could not prefetch mysql_database provider 'mysql': Execution of '/usr/bin/mysql -NBe show databases' returned 1: ERROR 2002 (HY000): Can't connect to
 local MySQL server through socket '/var/lib/mysql/mysql.sock' (2 \"No such file or directory\")\u001b[0m\n", "deploy_status_code": 0}
Oct 13 09:04:42 overcloud-controller-0 os-collect-config[2326]: [2016-10-13 13:04:42,401] (heat-config) [DEBUG] [2016-10-13 13:03:13,032] (heat-config) [DEBUG] Running FACTER_heat_outputs_path="/var/run/heat-config/heat-config
-puppet/865b62ce-7bbd-4c15-8f34-6c8092d6d88a" FACTER_fqdn="overcloud-controller-0.localdomain" FACTER_deploy_config_name="ControllerDeployment_Step1" puppet apply --detailed-exitcodes --modulepath /etc/puppet/modules:/opt/s
tack/puppet-modules:/usr/share/openstack-puppet/modules /var/lib/heat-config/heat-config-puppet/865b62ce-7bbd-4c15-8f34-6c8092d6d88a.pp
Oct 13 09:04:42 overcloud-controller-0 os-collect-config[2326]: [2016-10-13 13:04:42,397] (heat-config) [INFO] Return code 2

You can see the error above, but puppet still returned 2.

This can't be quite misleading to people trying to debug failed deployments as they may go chasing the cause of this error, not realizing that a puppet return code of 2 actually means "success". In reality, they may have a later failure that is the real cause of their problem.

We should fix our manifests so that we do not attempt to access the database during step1 of pacemaker deployments.

For reference, the above log comes from this job:
http://logs.openstack.org/78/385078/3/check-tripleo/gate-tripleo-ci-centos-7-ovb-ha/e7ad019/

Changed in tripleo:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → Emilien Macchi (emilienm)
milestone: none → ocata-1
Revision history for this message
David Hill (david-hill-ubisoft) wrote :

Does this affect Newton?

Revision history for this message
Michele Baldessari (michele) wrote :

Yes

Revision history for this message
James Slagle (james-slagle) wrote :

Emilien has provided a patch: https://review.openstack.org/#/c/386042/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (master)

Reviewed: https://review.openstack.org/386042
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=381e59f6e15ab4cb5316e0be3ba831fe61b318d2
Submitter: Jenkins
Branch: master

commit 381e59f6e15ab4cb5316e0be3ba831fe61b318d2
Author: Emilien Macchi <email address hidden>
Date: Thu Oct 13 11:21:02 2016 -0400

    pacemaker/mysql: wait step 2 to remove default accounts

    remove_default_accounts is a mysql::server parameter that, set to True,
    will execute some MySQL commands to cleanup MySQL defaults accounts
    created by packaging.
    In order to successfully run the commands, we need MySQL up and running,
    which is not the case at step 1 but at step 2.

    This patch make sure we run the commands at step 2 on pacemaker master
    only.

    No change for scenarios without Pacemaker.

    Change-Id: Ifad3cb40fd958d7ea606b9cd2ba4c8ec22a8e94e
    Closes-Bug: #1633113

Changed in tripleo:
status: Confirmed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to puppet-tripleo (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/393317

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (stable/newton)

Reviewed: https://review.openstack.org/393317
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=3ae0d241f2bd26123c83017cb51c2cc5df2cb726
Submitter: Jenkins
Branch: stable/newton

commit 3ae0d241f2bd26123c83017cb51c2cc5df2cb726
Author: Emilien Macchi <email address hidden>
Date: Thu Oct 13 11:21:02 2016 -0400

    pacemaker/mysql: wait step 2 to remove default accounts

    remove_default_accounts is a mysql::server parameter that, set to True,
    will execute some MySQL commands to cleanup MySQL defaults accounts
    created by packaging.
    In order to successfully run the commands, we need MySQL up and running,
    which is not the case at step 1 but at step 2.

    This patch make sure we run the commands at step 2 on pacemaker master
    only.

    No change for scenarios without Pacemaker.

    Change-Id: Ifad3cb40fd958d7ea606b9cd2ba4c8ec22a8e94e
    Closes-Bug: #1633113
    (cherry picked from commit 381e59f6e15ab4cb5316e0be3ba831fe61b318d2)

tags: added: in-stable-newton
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-tripleo 5.4.0

This issue was fixed in the openstack/puppet-tripleo 5.4.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-tripleo 6.0.0

This issue was fixed in the openstack/puppet-tripleo 6.0.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-tripleo 5.4.0

This issue was fixed in the openstack/puppet-tripleo 5.4.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.