M/N upgrades - Start mongod before calling ceilometer-dbsync

Bug #1627453 reported by Michele Baldessari
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Michele Baldessari

Bug Description

Currently we in major_upgrade_controller_pacemaker_2.sh we are calling ceilometer-dbsync before mongod is actually started (only galera is started at this point). This will make the dbsync hang indefinitely until the heat stack times out:
We also get this during the db-sync if mongod is down:
root 4387 0.4 0.5 315644 41812 ? Ss Sep23 1:52 /usr/bin/python /usr/bin/os-collect-config
root 6810 0.0 0.1 195516 9912 ? S 06:03 0:00 \_ /usr/bin/python /usr/bin/os-refresh-config --timeout 14400
root 6823 0.0 0.0 115248 1568 ? S 06:03 0:00 \_ /bin/bash /usr/local/bin/dib-run-parts /usr/libexec/os-refresh-config/configure.d
root 14714 0.0 0.1 227008 14876 ? S 06:03 0:00 \_ python /usr/libexec/os-refresh-config/configure.d/55-heat-config
root 14719 0.0 0.0 149708 6184 ? S 06:03 0:00 \_ python /var/lib/heat-config/hooks/script
root 14720 0.0 0.0 115380 1676 ? S 06:03 0:00 \_ /bin/bash /var/lib/heat-config/heat-config-script/49acf119-9e34-409a-8280-ddddf17e3500
root 19205 1.1 0.6 627596 51988 ? Sl 06:05 0:14 \_ /usr/bin/python2 /usr/bin/ceilometer-dbsync --config-file=/etc/ceilometer/ceilometer.conf

And we can see the process constantly trying to talk to a mongo server:
[pid 19205] connect(4, {sa_family=AF_INET, sin_port=htons(27017), sin_addr=inet_addr("172.16.2.7")}, 16) = -1 EINPROGRESS (Operation now in progress)
[pid 19257] poll([{fd=4, events=POLLOUT}], 1, 20000) = 1 ([{fd=4, revents=POLLOUT|POLLERR|POLLHUP}])
[pid 19205] getsockopt(4, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
[pid 19205] close(4) = 0

Tags: upgrade
summary: - M/N upgrades - Stop the services to be moved to systemd directly in the
- migration function
+ M/N upgrades - Start mongod before calling ceilometer-dbsync
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/375979

Changed in tripleo:
assignee: nobody → Michele Baldessari (michele)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/375979
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=9593981149aededef621ebda17962b1e02318328
Submitter: Jenkins
Branch: master

commit 9593981149aededef621ebda17962b1e02318328
Author: Michele Baldessari <email address hidden>
Date: Sun Sep 25 10:49:15 2016 +0200

    Start mongod before calling ceilometer-dbsync

    Currently we in major_upgrade_controller_pacemaker_2.sh we are calling
    ceilometer-dbsync before mongod is actually started (only galera is
    started at this point). This will make the dbsync hang indefinitely
    until the heat stack times out.

    Now this approach should be okay, but do note that when we start mongod
    via systemctl we are not guaranteed that it will be up on all nodes
    before we call ceilometer-dbsync. This *should* be okay because
    ceilometer-dbsync keeps retrying and eventually one of the nodes will
    be available. A completely clean fix here would be to add another
    step in heat to have the guarantee that all mongo servers are up and
    running before the dbsync call.

    Change-Id: I10c960b1e0efdeb1e55d77c25aebf1e3e67f17ca
    Closes-Bug: #1627453

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 5.0.0.0rc2

This issue was fixed in the openstack/tripleo-heat-templates 5.0.0.0rc2 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.