The LMA collector can't be installed and deployed after the initial deployment

Bug #1573081 reported by Simon Pasquier
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Roman Prykhodchenko
8.0.x
Incomplete
High
Simon Pasquier
Mitaka
Invalid
High
Roman Prykhodchenko
StackLight
Invalid
High
LMA-Toolchain Fuel Plugins

Bug Description

Environmnet:

MOS 8 with LMA 0.9.0

Steps to reproduce:

1. Create an environment with 1 controller (node-1) & 1 compute (node-2)
2. Deploy it and wait for completion
3. install the LMA collector, ES and InfluxDB plugins.
4. Enable and configure the plugins for the existing environment.
5. Assign the ES and InfluxDB roles to a new node (node-3).
6. Deploy it and wait for completion.
7. Re-execute the post-deployment tasks for the 2 original nodes to complete the installation of the LMA collector.
  fuel nodes --env 1 --node-1,node-2 --start post_deployment_start

Expected result:

The environment is ready.

Actual result:

Step 7 fails with:

> Could not find data item lma_collector in any Hiera data file and no default supplied at /etc/fuel/plugins/lma_collector-0.9/puppet/manifests/check_environment_configuration.pp:18 on node node-1

The initial investigation shows that there's no lma_collector entry in the /etc/astute.yaml file on node-1. For some unknown reason, Nailgun/Astute doesn't push this file.

Workaround:

1. Isolate the 'lma_collector', 'elasticsearch_kibana', 'influxdb_grafana' and 'lma_infrastructure_alerting' parts from /etc/astute.yaml on node-3.
2. On node-1 and node-2, create a file named /etc/override/plugins.yaml with the content from step 1.
3. Run the post_deployment tasks on node-1 and node-2.

Changed in lma-toolchain:
importance: Undecided → High
description: updated
Revision history for this message
Marcin Iwinski (iwi) wrote :

Looks like the astute.yaml actually gets pushed (timestamp changes) but the new (plugin specifc) data is not there.

Revision history for this message
Simon Pasquier (simon-pasquier) wrote :

Comments from IRC:

6:14 PM <ikalnitsky> pasquier-s: hm.. I'm afraid reexecution of post-deployment tasks won't trigger serialization.
6:32 PM <ikalnitsky> pasquier-s: i've made some brief investigation, and yeah, it seems like astute uploads facts to nodes only if there're some tasks in main deployment graph. could you please check it out by running some task from the main graph?

description: updated
Revision history for this message
Marcin Iwinski (iwi) wrote :

I ran hiera and globals tasks - i've seen that the timestamp of astute.yaml got updated but the data still wasn't there.

Dmitry Klenov (dklenov)
tags: added: area-python
Dmitry Klenov (dklenov)
Changed in fuel:
assignee: nobody → Fuel Python Team (fuel-python)
milestone: none → 10.0
importance: Undecided → High
status: New → Confirmed
milestone: 10.0 → 9.0
Marcin Iwinski (iwi)
tags: added: customer-found
Revision history for this message
Simon Pasquier (simon-pasquier) wrote :

I couldn't reproduce the issue with MOS 8 #570 and the plugins compiled from the master branch. I'll try with the "official" 0.9 plugins.

description: updated
Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

Please explain how it relates to Fuel engine.

Changed in fuel:
status: Confirmed → Incomplete
Revision history for this message
Simon Pasquier (simon-pasquier) wrote :

I couldn't reproduce it either with the official MOS 8 ISO and the 0.9.0 plugins...

Dmitry Pyzhov (dpyzhov)
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Fuel Sustaining (fuel-sustaining-team)
Changed in lma-toolchain:
status: Confirmed → Incomplete
Changed in fuel:
milestone: 9.0 → 10.0
Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

Please reopen the issue in Fuel project if you have evidences that it is a Fuel fault.

Changed in fuel:
status: Incomplete → Invalid
Revision history for this message
Bartłomiej Piotrowski (bpiotrowski) wrote :

This is unrelated to LMA plugin. Let me describe the environment at customer site:

Fuel 8.0
Neutron with tunneling segmentation
Additional network group used for Ceph cluster
Network template to take the network above into account
Virt role assigned to node; MongoDB nodes deployed as virtual machines
+ 2 plugins, customized zabbix and detach-db

Nodes with roles coming from plugins have new data in /etc/astute.yaml. Already deployed nodes (either physical computes/controllers and virtualized Mongo) weren't updated to reflect it. We also added new nodes, and the hash was missing there as well. Running tasks like hiera or globals from main deployment graph doesn't change anything.

We cannot share diagnostic snapshot here as it contains customer data. Please contact me if you want to take a look at the logs or need help with recreating environment.

Changed in fuel:
status: Invalid → New
Dmitry Klenov (dklenov)
Changed in fuel:
status: New → Confirmed
Revision history for this message
Simon Pasquier (simon-pasquier) wrote :

Marking as invalid for LMA since the problem seems to be located in Fuel.

Changed in lma-toolchain:
status: Incomplete → Invalid
Revision history for this message
Bartłomiej Piotrowski (bpiotrowski) wrote :

We just added completely new node, never managed with Fuel before, and the same problem happened.

Dmitry Pyzhov (dpyzhov)
Changed in fuel:
assignee: Fuel Sustaining (fuel-sustaining-team) → Roman Prykhodchenko (romcheg)
Revision history for this message
Bulat Gaifullin (bulat.gaifullin) wrote :

since 9.0 the uploading facts to nodes (that creates/updates file astute.yaml) executes by task "upload_configuration" and this logic was removed from astute.

that means if this task is not executed, the astute.yaml will not be updated.

It is possible to launch whole deployment after installation new plugin, all core tasks is idempotent and also there is some logic in nailgun that uses to skip tasks, which are not affected
by new deployment, that means only tasks from new plugin will be executed.

In case if need to launch only selected tasks there is 2 ways: specify the tasks ids via CLI
or use 'custom graph' for this purpose.

Changed in fuel:
status: Confirmed → Invalid
Revision history for this message
Simon Pasquier (simon-pasquier) wrote :

@Bulat, sorry but I don't see how your comment applies to this bug since it has been seen on MOS 8. Maybe it won't be fixed in MOS 8 but I don't think that this makes the bug invalid.

Revision history for this message
Vadim Rovachev (vrovachev) wrote :

This problem not reproduced on 8.0 ISO and plugins from https://www.mirantis.com/validated-solution-integrations/fuel-plugins/ [MOS 8.0] version.
Steps.
1. Install master node.
2. Create and deploy env with parameters:
controller node
compute+cinder node
neutron with tunneling segmentation
KVM hupervisor
Cinder LVM
3. Download and install monitoring plugins:
wget https://3a98d2877cb62a6e6b14-93babe93196056fe375611ed4c1716dd.ssl.cf5.rackcdn.com/l/c/lma_collector-0.9-0.9.0-1.noarch.rpm
wget https://3a98d2877cb62a6e6b14-93babe93196056fe375611ed4c1716dd.ssl.cf5.rackcdn.com/i/g/influxdb_grafana-0.9-0.9.0-1.noarch.rpm
wget https://3a98d2877cb62a6e6b14-93babe93196056fe375611ed4c1716dd.ssl.cf5.rackcdn.com/e/k/elasticsearch_kibana-0.9-0.9.0-1.noarch.rpm
fuel plugins --install lma_collector-0.9-0.9.0-1.noarch.rpm
fuel plugins --install influxdb_grafana-0.9-0.9.0-1.noarch.rpm
fuel plugins --install elasticsearch_kibana-0.9-0.9.0-1.noarch.rpm
after install plugins: https://paste.mirantis.net/show/2314/
4. Enable and configure the plugins for the existing environment.
5. Assign the ES and InfluxDB roles to a new node (node-3).
6. Deploy it and wait for completion.
7. Re-execute the post-deployment tasks for the 2 original nodes to complete the installation of the LMA collector:
fuel nodes --env 1 --node 1,3 --start post_deployment_start
8. Wait end of task:
https://paste.mirantis.net/show/2315/

Revision history for this message
Bartłomiej Piotrowski (bpiotrowski) wrote :

Vadim,

please take a look at #8 comment. In order to reproduce it, you should also use a networking template and virt role. No wonder you couldn't reproduce it.

Revision history for this message
Vadim Rovachev (vrovachev) wrote :

Bartłomiej,
Could you please to add these steps in bug description?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.