[Fuel Plugins] Updating core repos triggers plugin restart

Bug #1527330 reported by Oleksandr Savatieiev
80
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
Critical
Fuel Python (Deprecated)
6.1.x
Fix Committed
High
okosse
7.0.x
Invalid
Critical
MOS Maintenance
8.0.x
Invalid
Critical
Fuel Python (Deprecated)

Bug Description

Top-level configuration is:
Fuel 6.1
Contrail 2.0.0

Steps to reproduce:
1. Create environment using upstream repos
2. Run `fuel-createmirror’
3. Change repos for deployed env
4. Try to update repos on all deployed nodes
        "fuel --env <ENV_ID> node --node-id <NODE_ID1>, <NODE_ID2>, <NODE_ID_N> --tasks upload_core_repos"

Expected result:
Repos will be updated. Cluster functionality not affected (is operational)

Actual result:
Repos are updated on nodes with standard roles OK
And on base-os with plugin installed repos update goes unstable: it can either be updated or not. In our reproduced scenario it has been triggered on third time with not updated repos and started tasks for restart ALL contrail services. Note that restart of the services is not strictly related to repo update procedure outcome.

information type: Public → Private Security
tags: added: customer-found
tags: removed: customer-found
tags: added: customer-found
Revision history for this message
Denis Klepikov (dklepikov) wrote :

Fuel 6.1
Contrail 2.0.0

1 create environment using upstream repos
2 run `fuel-createmirror'
3 change repos for deployed env
4 try to update repos on all deployed nodes "fuel --env <ENV_ID> node --node-id <NODE_ID1>, <NODE_ID2>, <NODE_ID_N> --tasks upload_core_repos"

expected result - repos will be updated

we have

repos updates on nodes with standard roles, on base-os with plugin installed repos was not updated and triggered ALL contrail services restart.

Revision history for this message
Oleksandr Savatieiev (osavatieiev) wrote :

Updated description

description: updated
Revision history for this message
Oleksandr Martsyniuk (omartsyniuk) wrote :

As a plugin developer, I would like to mention that his issue may affect not only contrail plugin, but all plugins, that contain post-deployment tasks.
Updating core repos can trigger puppet tasks which may reconfigure and restart services in the moment when they are not supposed to be restarted.

Dmitry Pyzhov (dpyzhov)
information type: Private Security → Private
Roman Rufanov (rrufanov)
tags: added: support
Revision history for this message
Oleksandr Martsyniuk (omartsyniuk) wrote :

I've investigated the problem and found details that may be useful for other plugin developers.

I have done the investigation of astute logs provided by support team and noticed that running update_core_repos task triggered the execution of not only post-deployment tasks from plugin, but pre-deployment tasks too.

In our case the problem was with netconfig pre-deployment task.
Plugin pre-deployment tasks for base-os nodes include calling netconfig task. Looks like that keepalived service that provides custom contrail VIPs was not able to work properly after netconfig run. Contrail cluster was broken due to unavailability of VIPs with contrail service endpoints.

To workaround this problem, it was decided to restart keepalived after each netconfig run.
A change to plugin code was introduced, which includes update to plugins puppet manifests to restart the keepalived on each run. This ensures that keepalived will continue to serve the VIPs after re-running netconfig task. You may review the code at https://review.openstack.org/#/c/260469. This change is proposed to plugins stable/2.1 branch.

Dmitry Pyzhov (dpyzhov)
tags: added: area-plugins
Changed in fuel:
assignee: MOS Maintenance (mos-maintenance) → nobody
Revision history for this message
Denis Meltsaykin (dmeltsaykin) wrote :

MOS-Maintenance team does not fix fuel's or plugins' bugs, we only do backporting fixes. As the issue is caused not by our tooling, we are not the correct assignee.

Revision history for this message
Irina Povolotskaya (ipovolotskaya) wrote :

Should the fuel-library be the primary assignee then?

I'm also worried about the following:
what if plugin developers produce fixes for 7.0-compatible plugins and then we have the fix in the product itself -- could it cause the yet another failure?

Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

My comments from https://bugs.launchpad.net/fuel/+bug/1528608 (now duplicate of this bug)

Irina, why is this bug marked as private? There is no sensitive information here.

Secondly, you can't actually deploy an environment and then reconfigure repositories. That's not possible as far as I am aware.

Lastly, can you provide any logs that support this issue? We can't provide any fixes or solutions without any supporting logs or steps to reproduce.

Moving to python team to continue investigating.

Changed in fuel:
assignee: nobody → Fuel Python Team (fuel-python)
importance: Undecided → Critical
milestone: 8.0 → 9.0
status: New → Incomplete
Dmitry Pyzhov (dpyzhov)
information type: Private → Public
information type: Public → Private
information type: Private → Public
Revision history for this message
Oleksandr Martsyniuk (omartsyniuk) wrote :

Logs from the affected environment attached.
NB: deployment status is success here, cause keepalived issue was resolved while re-running post-deployment tasks.

Revision history for this message
Oleksandr Martsyniuk (omartsyniuk) wrote :

The list f hooks run attached

Dmitry Pyzhov (dpyzhov)
no longer affects: fuel/future
tags: added: team-bugfix
Revision history for this message
Illia Polliul (ipolliul) wrote :

We fixed this in Contrail plugin scope, Fuel itself wasn't changed.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-plugin-contrail (stable/2.1)

Reviewed: https://review.openstack.org/269708
Committed: https://git.openstack.org/cgit/openstack/fuel-plugin-contrail/commit/?id=07552c74602152a29c54e6dbb3444bbb104e326e
Submitter: Jenkins
Branch: stable/2.1

commit 07552c74602152a29c54e6dbb3444bbb104e326e
Author: Illia Polliul <email address hidden>
Date: Tue Jan 19 17:25:11 2016 +0200

    Fix for keepalived restart

    On some systems you need explicitly specify puppet service provider.

    Closes-Bug: 1535782
    Related-Bug: 1527330
    Change-Id: Iec147b3033166788a2157007956e2a79df4f30c8
    Signed-off-by: Illia Polliul <email address hidden>

Revision history for this message
Ihor Kalnytskyi (ikalnytskyi) wrote :

Illia Polliul, so it's fixed for 7.0 and 8.0 either? Maybe move from Incomplete to Fix Commited?

Also, I want to notice that there's no "upload_core_repos" task in Fuel 8.0 anymore. There's another set of tasks called "setup_repositroes" or something. Please reach #fuel-dev IRC channel, if you need any assist on new naming.

Changed in fuel:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.