The cron job update-lma-configuration lma_infrastructure_alerting removes configuration files when we apply the nagios.pp

Bug #1622628 reported by guillaume thouvenin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StackLight
Fix Released
High
guillaume thouvenin
0.10
Fix Released
High
guillaume thouvenin

Bug Description

Something very strange happens on MOS 8 environment after an upgrade from SL 0.9.0 to 0.10.1. For the upgrade the following document has been used: https://docs.google.com/document/d/1BE74LOkAlK5Hy1RJAtrSsfrmw8EVwR56_ucaCenN3z4

After the upgrade the nagios3 was not able to start. The configuration files prefixed by "lma_" in the /etc/nagios3 were not present on all three SL nodes. To reconfigure nagios we run several times the following command:

# puppet apply --debug --modulepath=/etc/fuel/plugins/lma_infrastructure_alerting-1.0/puppet/modules:/etc/puppet/modules /etc/fuel/plugins/lma_infrastructure_alerting-1.0/puppet/manifests/nagios.pp

Each time the configuration files were created but right after they were deleted.

We modified the file /etc/fuel/plugins/lma_infrastructure_alerting-1.0/puppet/modules/lma_infra_alerting/manifests/nagios.pp to be sure that the cron is not installed and it worked.

If we run the cron job manually on the node we got the following error:

root@lma-2:/var/tmp# /usr/bin/flock -n /tmp/lma.lock -c "/usr/local/bin/update-lma-configuration lma_infrastructure_alerting"
Error: Could not run: Could not find file /etc/fuel/plugins/lma_infrastructure_alerting-0.10/puppet
/etc/fuel/plugins/lma_infrastructure_alerting-0.9/puppet/manifests/nagios.pp
root@lma-2:/var/tmp# ls /etc/nagios3/
apache2.conf cgi.cfg commands.cfg conf.d htpasswd.users nagios.cfg resource.cfg stylesheets
root@lma-2:/var/tmp# ls /etc/nagios3/conf.d/
cmd_notify-service-by-email-with-long-service-output.cfg generic-host_nagios2.cfg localhost_nagios2.cfg.puppet-bak
contacts_nagios2.cfg generic-service_nagios2.cfg services_nagios2.cfg.puppet-bak
extinfo_nagios2.cfg.puppet-bak hostgroups_nagios2.cfg.puppet-bak timeperiods_nagios2.cfg
root@lma-2:/var/tmp#

And lma_ configuration files have been removed.

Revision history for this message
guillaume thouvenin (guillaume-thouvenin) wrote :

There are two problems:

1) The script is setting PLUGIN_PUPPET_DIR=$(ls -d /etc/fuel/plugins/"$PLUGIN_NAME"*/puppet). It works when we have one version of the plugin, but if we have two versions then the puppet apply failed to apply because PLUGIN_PUPPET_DIR="lma_infrastructure_alerting-0.9 lma_infrastructure_alerting-0.10". It is the error we are seeing in the description.
2) And as the script failed, the file LAST_CHECK and LAST_CHECK_NODES are never updated. So the cron job is running over and over....

We need to fix the script to check what version of the plugin is running.

Revision history for this message
guillaume thouvenin (guillaume-thouvenin) wrote :

I'm currently testing but if two versions of the plugin are available we should hit the bug. So I change the severity to high.

Changed in lma-toolchain:
importance: Medium → High
Revision history for this message
guillaume thouvenin (guillaume-thouvenin) wrote :

In fact the problem occurs only after an upgrade. In other cases only the version that you are installing will be copied on the nodes. After an upgrade, both versions are available and you hit the bug.

Revision history for this message
guillaume thouvenin (guillaume-thouvenin) wrote :

To reproduce the bug:

- You can simulate the result of an upgrade by copying the content of the /etc/fuel/plugins/lma_infrastructure_alerting. For example if you have 0.10 installed:

cp -r /etc/fuel/plugins/lma_infrastructure_alerting-0.10/ /etc/fuel/plugins/lma_infrastructure_alerting-0.9

Then modify /var/cache/lma_last_astute_yaml.md5sum to force the execution of the cron job.

After one minute you will see in the /etc/nagios3/conf.d directory that all "lma_*" files disappeared.

Changed in lma-toolchain:
assignee: LMA-Toolchain Fuel Plugins (mos-lma-toolchain) → guillaume thouvenin (guillaume-thouvenin)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-plugin-lma-infrastructure-alerting (master)

Fix proposed to branch: master
Review: https://review.openstack.org/369211

Changed in lma-toolchain:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-plugin-lma-infrastructure-alerting (master)

Reviewed: https://review.openstack.org/369211
Committed: https://git.openstack.org/cgit/openstack/fuel-plugin-lma-infrastructure-alerting/commit/?id=bccb10031c7ad120bbdd939067a0af3097c57d84
Submitter: Jenkins
Branch: master

commit bccb10031c7ad120bbdd939067a0af3097c57d84
Author: Guillaume Thouvenin <email address hidden>
Date: Tue Sep 13 10:06:33 2016 +0200

    Modify the cron job to use specific version of the plugin

    This patch modifies the cron job used to update the nagios configuration
    to solve a bug that occurs when several version of the plugin are
    available. This happens after an upgrade procedure.

    Change-Id: I41831e43707e7ca5e88fd0a329508ad4813d26bb
    Closes-Bug: #1622628

Changed in lma-toolchain:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-plugin-lma-infrastructure-alerting (stable/0.10)

Fix proposed to branch: stable/0.10
Review: https://review.openstack.org/373959

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-plugin-lma-infrastructure-alerting (stable/0.10)

Reviewed: https://review.openstack.org/373959
Committed: https://git.openstack.org/cgit/openstack/fuel-plugin-lma-infrastructure-alerting/commit/?id=68ecd65fc21b4ac1677a589206b41f3a59469f7a
Submitter: Jenkins
Branch: stable/0.10

commit 68ecd65fc21b4ac1677a589206b41f3a59469f7a
Author: Guillaume Thouvenin <email address hidden>
Date: Tue Sep 13 10:06:33 2016 +0200

    Modify the cron job to use specific version of the plugin

    This patch modifies the cron job used to update the nagios configuration
    to solve a bug that occurs when several version of the plugin are
    available. This happens after an upgrade procedure.

    Change-Id: I41831e43707e7ca5e88fd0a329508ad4813d26bb
    Closes-Bug: #1622628
    (cherry picked from commit bccb10031c7ad120bbdd939067a0af3097c57d84)

Changed in lma-toolchain:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.