[upgrade][8.0] Mcollective on slaves can hang after upgrade

Bug #1561092 reported by Vladimir Khlyunev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Sergey Abramov
8.0.x
Fix Released
High
Sergey Abramov
Mitaka
Fix Released
High
Sergey Abramov
Newton
Fix Committed
High
Sergey Abramov

Bug Description

This bug occurs not often and can be easily workarounded

Steps to reproduce:
1 Deploy any 7.0 cluster
2 Upgrade fuel master to 8.0 using fuel-octane
3 Verify networks/Change cluster and deploy/Generate diag.snapshot

Expected result:
No errors

Actual Result:
Nodes can be unavailable via mcollective. In /var/log/mcollective following lines could be found:
 Failed to handle message: incompatible marshal file format (can't be read)
        format version 4.8 required; 89.111 given - TypeError

Reproducibility:
 Rarely

Workaround:
 Restart mcollective on unavailable slaves using "service mcollective restart"

Impact:
 User not able to change existing cluster and generate diagnostic snapshot

Env info:
 Fuel 7.0 + MU2, Fuel 8.0 Release

Dmitry Klenov (dklenov)
tags: added: area-python
Changed in fuel:
assignee: nobody → Fuel Octane (fuel-octane-team)
milestone: none → 9.0
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Oleg S. Gelbukh (gelbuhos) wrote :

Dmitry, let's leave it Medium importance for now. It is not frequent and does not affect the deployment in general. It also has a workaround.

Dmitry Pyzhov (dpyzhov)
tags: added: team-upgrades
Changed in fuel:
assignee: Fuel Octane (fuel-octane-team) → Fuel Python Team (fuel-python)
tags: added: release-notes
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to mos/mos-docs (master)

Fix proposed to branch: master
Change author: Oleg Gelbukh <email address hidden>
Review: https://review.fuel-infra.org/18873

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to mos/mos-docs (master)

Reviewed: https://review.fuel-infra.org/18873
Submitter: Evgeny Konstantinov <email address hidden>
Branch: master

Commit: beccc231a1edaec6467c997ccbd58b3d175fa3ef
Author: Oleg Gelbukh <email address hidden>
Date: Tue Mar 29 13:56:05 2016

LP1561092 bug to Known Issues in Release Notes

Due to this bug, the node is false detected as unavailable. The
workaround is to restart mcollective service by hand.

This patch also adds a missing period to the line 93.

Change-Id: Iec50cb32120ec92ea7ad8f32bd94b180e0a13a33
Partial-bug: 1561092

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to mos/mos-docs (stable/8.0)

Fix proposed to branch: stable/8.0
Change author: Oleg Gelbukh <email address hidden>
Review: https://review.fuel-infra.org/18891

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to mos/mos-docs (stable/8.0)

Reviewed: https://review.fuel-infra.org/18891
Submitter: Evgeny Konstantinov <email address hidden>
Branch: stable/8.0

Commit: 97e6e3c8e03faa7eb6fcaa7ed84b0f96ddf88f35
Author: Oleg Gelbukh <email address hidden>
Date: Tue Mar 29 14:03:32 2016

LP1561092 bug to Known Issues in Release Notes

Due to this bug, the node is false detected as unavailable. The
workaround is to restart mcollective service by hand.

This patch also adds a missing period to the line 93.

Change-Id: Iec50cb32120ec92ea7ad8f32bd94b180e0a13a33
Partial-bug: 1561092
(cherry picked from commit beccc231a1edaec6467c997ccbd58b3d175fa3ef)

Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote : Re: [upgrade][8.0] Mcollective on slaves can hangs after upgrade

Bug doesn't seem to be in progress by the team.

Changed in fuel:
status: In Progress → Confirmed
Changed in fuel:
milestone: 9.0 → 10.0
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Fuel Octane (fuel-octane-team)
Changed in fuel:
importance: High → Low
tags: added: feature-testing
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-octane (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/358858

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-octane (stable/mitaka)

Related fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/375382

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-octane (stable/8.0)

Related fix proposed to branch: stable/8.0
Review: https://review.openstack.org/375511

tags: added: release-notes-done
removed: release-notes
Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-octane (stable/7.0)

Fix proposed to branch: stable/7.0
Review: https://review.openstack.org/384987

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-octane (master)

Reviewed: https://review.openstack.org/358858
Committed: https://git.openstack.org/cgit/openstack/fuel-octane/commit/?id=f91ff40264c257a8d3eb436ab5e7c4c8272b27ce
Submitter: Jenkins
Branch: master

commit f91ff40264c257a8d3eb436ab5e7c4c8272b27ce
Author: Ilya Kharin <email address hidden>
Date: Mon Aug 22 23:17:00 2016 +0300

    Restart mcollective on slave nodes after restore

    Also, on the backup step a status of `mco ping` is serialized in an
    upgrade tarball and on the restore step it is compared with the actual
    status. All nodes that are not respond are logged.

    In additional, the status of `mco ping` is archived on the backup step
    and is compared on the restore step with the actual ones.

    Change-Id: Ibba81102214998d83614a42cdb21c21bebd8284a
    Related-Bug: #1561092

tags: added: in-stable-mitaka
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-octane (stable/mitaka)

Reviewed: https://review.openstack.org/375382
Committed: https://git.openstack.org/cgit/openstack/fuel-octane/commit/?id=298428cc40efa5be6b6cb5c8c06d063f50028af7
Submitter: Jenkins
Branch: stable/mitaka

commit 298428cc40efa5be6b6cb5c8c06d063f50028af7
Author: Ilya Kharin <email address hidden>
Date: Mon Aug 22 23:17:00 2016 +0300

    Restart mcollective on slave nodes after restore

    Also, on the backup step a status of `mco ping` is serialized in an
    upgrade tarball and on the restore step it is compared with the actual
    status. All nodes that are not respond are logged.

    In additional, the status of `mco ping` is archived on the backup step
    and is compared on the restore step with the actual ones.

    Change-Id: Ibba81102214998d83614a42cdb21c21bebd8284a
    Related-Bug: #1561092

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-octane (stable/7.0)

Reviewed: https://review.openstack.org/384987
Committed: https://git.openstack.org/cgit/openstack/fuel-octane/commit/?id=12daa0e54f48d712e93192ad2ad98cce0db71bd2
Submitter: Jenkins
Branch: stable/7.0

commit 12daa0e54f48d712e93192ad2ad98cce0db71bd2
Author: Ilya Kharin <email address hidden>
Date: Mon Aug 22 23:17:00 2016 +0300

    Restart mcollective on slave nodes after restore

    Also, on the backup step a status of `mco ping` is serialized in an
    upgrade tarball and on the restore step is compared with the actual
    status. All nodes that are not respond are logged.

    In additional, the status of `mco ping` is archived on the backup step
    and is compared on the restore step with the actual ones.

    Change-Id: Ibba81102214998d83614a42cdb21c21bebd8284a
    Closes-Bug: #1561092

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-octane (stable/8.0)

Reviewed: https://review.openstack.org/375511
Committed: https://git.openstack.org/cgit/openstack/fuel-octane/commit/?id=0237c9db110d415c25808e6700c558fa62c55159
Submitter: Jenkins
Branch: stable/8.0

commit 0237c9db110d415c25808e6700c558fa62c55159
Author: Ilya Kharin <email address hidden>
Date: Mon Aug 22 23:17:00 2016 +0300

    Restart mcollective on slave nodes after restore

    Also, on the backup step a status of `mco ping` is serialized in an
    upgrade tarball and on the restore step it is compared with the actual
    status. All nodes that are not respond are logged.

    In additional, the status of `mco ping` is archived on the backup step
    and is compared on the restore step with the actual ones.

    Change-Id: Ibba81102214998d83614a42cdb21c21bebd8284a
    Related-Bug: #1561092

summary: - [upgrade][8.0] Mcollective on slaves can hangs after upgrade
+ [upgrade][8.0] Mcollective on slaves can hang after upgrade
Revision history for this message
Ekaterina Shutova (eshutova) wrote :

[root@nailgun ~]# mco ping
master time=34.13 ms
3 time=36.49 ms
2 time=40.66 ms
1 time=42.17 ms

---- ping statistics ----
4 replies max: 42.17 min: 34.13 avg: 38.36
Nodes are available via mcollective after upgrade, no errors in /var/log/mcollective.log
Verified on:
[root@nailgun ~]# shotgun2 short-report
cat /etc/fuel_build_id:
 495
cat /etc/fuel_build_number:
 495
cat /etc/fuel_release:
 9.0
cat /etc/fuel_openstack_version:
 mitaka-9.0
rpm -qa | egrep 'fuel|astute|network-checker|nailgun|packetary|shotgun':
 fuelmenu-9.0.0-1.mos276.noarch
 fuel-provisioning-scripts-9.0.0-1.mos8893.noarch
 fuel-notify-9.0.0-1.mos8630.noarch
 fuel-release-9.0.0-1.mos6358.noarch
 fuel-bootstrap-cli-9.0.0-1.mos291.noarch
 fuel-migrate-9.0.0-1.mos8630.noarch
 fuel-octane-9.0.0-1.mos1368.noarch
 shotgun-9.0.0-1.mos90.noarch
 fuel-nailgun-9.0.0-1.mos8893.noarch
 fuel-library9.0-9.0.0-1.mos8630.noarch
 python-fuelclient-9.0.0-1.mos361.noarch
 network-checker-9.0.0-1.mos77.x86_64
 fuel-nailgun-extension-cluster-upgrade-9.1-1.mos82.noarch
 fuel-mirror-9.0.0-1.mos155.noarch
 fuel-9.0.0-1.mos6358.noarch
 nailgun-mcagents-9.0.0-1.mos776.noarch
 fuel-setup-9.0.0-1.mos6358.noarch
 fuel-utils-9.0.0-1.mos8630.noarch
 fuel-agent-9.0.0-1.mos291.noarch
 fuel-ostf-9.0.0-1.mos947.noarch
 fuel-openstack-metadata-9.0.0-1.mos8893.noarch
 rubygem-astute-9.0.0-1.mos776.noarch
 fuel-misc-9.0.0-1.mos8630.noarch
 fuel-ui-9.0.0-1.mos2831.noarch
 python-packetary-9.0.0-1.mos155.noarch

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Related fix proposed to mos/mos-docs (master)

Related fix proposed to branch: master
Change author: Mariia Zlatkova <email address hidden>
Review: https://review.fuel-infra.org/30303

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Related fix merged to mos/mos-docs (master)

Reviewed: https://review.fuel-infra.org/30303
Submitter: Olena Logvinova <email address hidden>
Branch: master

Commit: dc4cfe1141c0237b04d32ddea8c33b09ac0f854d
Author: Mariia Zlatkova <email address hidden>
Date: Thu Feb 2 13:50:37 2017

[RN-9.2] Fuel resolved and known issues

Change-Id: Idb919f92b981eee0f2cb48618dde243e4582ee5b
Related-Bug: #1590633
Related-Bug: #1625293
Related-Bug: #1561092
Related-Bug: #1619341
Related-Bug: #1563465
Related-Bug: #1628500
Related-Bug: #1593277
Related-Bug: #1628940
Related-Bug: #1658952

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Related fix proposed to mos/mos-docs (stable/9.2)

Related fix proposed to branch: stable/9.2
Change author: Mariia Zlatkova <email address hidden>
Review: https://review.fuel-infra.org/30423

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Related fix merged to mos/mos-docs (stable/9.2)

Reviewed: https://review.fuel-infra.org/30423
Submitter: Mariia Zlatkova <email address hidden>
Branch: stable/9.2

Commit: c040581a57fac1dfeaed44952359f21963216d62
Author: Mariia Zlatkova <email address hidden>
Date: Thu Feb 2 14:03:50 2017

[RN-9.2] Fuel resolved and known issues

Change-Id: Idb919f92b981eee0f2cb48618dde243e4582ee5b
Related-Bug: #1590633
Related-Bug: #1625293
Related-Bug: #1561092
Related-Bug: #1619341
Related-Bug: #1563465
Related-Bug: #1628500
Related-Bug: #1593277
Related-Bug: #1628940
Related-Bug: #1658952
(cherry picked from commit dc4cfe1141c0237b04d32ddea8c33b09ac0f854d)

Revision history for this message
TatyanaGladysheva (tgladysheva) wrote :

Verified on 8.0 + MU4 updates.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.