[upgrade][8.0] Mcollective on slaves can hang after upgrade

Bug #1561092 reported by Vladimir Khlyunev on 2016-03-23
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
High
Sergey Abramov
8.0.x
High
Sergey Abramov
Mitaka
High
Sergey Abramov
Newton
High
Sergey Abramov

Bug Description

This bug occurs not often and can be easily workarounded

Steps to reproduce:
1 Deploy any 7.0 cluster
2 Upgrade fuel master to 8.0 using fuel-octane
3 Verify networks/Change cluster and deploy/Generate diag.snapshot

Expected result:
No errors

Actual Result:
Nodes can be unavailable via mcollective. In /var/log/mcollective following lines could be found:
 Failed to handle message: incompatible marshal file format (can't be read)
        format version 4.8 required; 89.111 given - TypeError

Reproducibility:
 Rarely

Workaround:
 Restart mcollective on unavailable slaves using "service mcollective restart"

Impact:
 User not able to change existing cluster and generate diagnostic snapshot

Env info:
 Fuel 7.0 + MU2, Fuel 8.0 Release

Dmitry Klenov (dklenov) on 2016-03-23
tags: added: area-python
Changed in fuel:
assignee: nobody → Fuel Octane (fuel-octane-team)
milestone: none → 9.0
importance: Undecided → High
status: New → Confirmed
Oleg S. Gelbukh (gelbuhos) wrote :

Dmitry, let's leave it Medium importance for now. It is not frequent and does not affect the deployment in general. It also has a workaround.

Dmitry Pyzhov (dpyzhov) on 2016-03-25
tags: added: team-upgrades
Changed in fuel:
assignee: Fuel Octane (fuel-octane-team) → Fuel Python Team (fuel-python)
tags: added: release-notes

Fix proposed to branch: master
Change author: Oleg Gelbukh <email address hidden>
Review: https://review.fuel-infra.org/18873

Changed in fuel:
status: Confirmed → In Progress

Reviewed: https://review.fuel-infra.org/18873
Submitter: Evgeny Konstantinov <email address hidden>
Branch: master

Commit: beccc231a1edaec6467c997ccbd58b3d175fa3ef
Author: Oleg Gelbukh <email address hidden>
Date: Tue Mar 29 13:56:05 2016

LP1561092 bug to Known Issues in Release Notes

Due to this bug, the node is false detected as unavailable. The
workaround is to restart mcollective service by hand.

This patch also adds a missing period to the line 93.

Change-Id: Iec50cb32120ec92ea7ad8f32bd94b180e0a13a33
Partial-bug: 1561092

Fix proposed to branch: stable/8.0
Change author: Oleg Gelbukh <email address hidden>
Review: https://review.fuel-infra.org/18891

Reviewed: https://review.fuel-infra.org/18891
Submitter: Evgeny Konstantinov <email address hidden>
Branch: stable/8.0

Commit: 97e6e3c8e03faa7eb6fcaa7ed84b0f96ddf88f35
Author: Oleg Gelbukh <email address hidden>
Date: Tue Mar 29 14:03:32 2016

LP1561092 bug to Known Issues in Release Notes

Due to this bug, the node is false detected as unavailable. The
workaround is to restart mcollective service by hand.

This patch also adds a missing period to the line 93.

Change-Id: Iec50cb32120ec92ea7ad8f32bd94b180e0a13a33
Partial-bug: 1561092
(cherry picked from commit beccc231a1edaec6467c997ccbd58b3d175fa3ef)

Bug doesn't seem to be in progress by the team.

Changed in fuel:
status: In Progress → Confirmed
Changed in fuel:
milestone: 9.0 → 10.0
Dmitry Pyzhov (dpyzhov) on 2016-04-27
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Fuel Octane (fuel-octane-team)
Changed in fuel:
importance: High → Low
tags: added: feature-testing
tags: added: release-notes-done
removed: release-notes
Changed in fuel:
status: Confirmed → In Progress

Reviewed: https://review.openstack.org/358858
Committed: https://git.openstack.org/cgit/openstack/fuel-octane/commit/?id=f91ff40264c257a8d3eb436ab5e7c4c8272b27ce
Submitter: Jenkins
Branch: master

commit f91ff40264c257a8d3eb436ab5e7c4c8272b27ce
Author: Ilya Kharin <email address hidden>
Date: Mon Aug 22 23:17:00 2016 +0300

    Restart mcollective on slave nodes after restore

    Also, on the backup step a status of `mco ping` is serialized in an
    upgrade tarball and on the restore step it is compared with the actual
    status. All nodes that are not respond are logged.

    In additional, the status of `mco ping` is archived on the backup step
    and is compared on the restore step with the actual ones.

    Change-Id: Ibba81102214998d83614a42cdb21c21bebd8284a
    Related-Bug: #1561092

tags: added: in-stable-mitaka

Reviewed: https://review.openstack.org/375382
Committed: https://git.openstack.org/cgit/openstack/fuel-octane/commit/?id=298428cc40efa5be6b6cb5c8c06d063f50028af7
Submitter: Jenkins
Branch: stable/mitaka

commit 298428cc40efa5be6b6cb5c8c06d063f50028af7
Author: Ilya Kharin <email address hidden>
Date: Mon Aug 22 23:17:00 2016 +0300

    Restart mcollective on slave nodes after restore

    Also, on the backup step a status of `mco ping` is serialized in an
    upgrade tarball and on the restore step it is compared with the actual
    status. All nodes that are not respond are logged.

    In additional, the status of `mco ping` is archived on the backup step
    and is compared on the restore step with the actual ones.

    Change-Id: Ibba81102214998d83614a42cdb21c21bebd8284a
    Related-Bug: #1561092

Reviewed: https://review.openstack.org/384987
Committed: https://git.openstack.org/cgit/openstack/fuel-octane/commit/?id=12daa0e54f48d712e93192ad2ad98cce0db71bd2
Submitter: Jenkins
Branch: stable/7.0

commit 12daa0e54f48d712e93192ad2ad98cce0db71bd2
Author: Ilya Kharin <email address hidden>
Date: Mon Aug 22 23:17:00 2016 +0300

    Restart mcollective on slave nodes after restore

    Also, on the backup step a status of `mco ping` is serialized in an
    upgrade tarball and on the restore step is compared with the actual
    status. All nodes that are not respond are logged.

    In additional, the status of `mco ping` is archived on the backup step
    and is compared on the restore step with the actual ones.

    Change-Id: Ibba81102214998d83614a42cdb21c21bebd8284a
    Closes-Bug: #1561092

Reviewed: https://review.openstack.org/375511
Committed: https://git.openstack.org/cgit/openstack/fuel-octane/commit/?id=0237c9db110d415c25808e6700c558fa62c55159
Submitter: Jenkins
Branch: stable/8.0

commit 0237c9db110d415c25808e6700c558fa62c55159
Author: Ilya Kharin <email address hidden>
Date: Mon Aug 22 23:17:00 2016 +0300

    Restart mcollective on slave nodes after restore

    Also, on the backup step a status of `mco ping` is serialized in an
    upgrade tarball and on the restore step it is compared with the actual
    status. All nodes that are not respond are logged.

    In additional, the status of `mco ping` is archived on the backup step
    and is compared on the restore step with the actual ones.

    Change-Id: Ibba81102214998d83614a42cdb21c21bebd8284a
    Related-Bug: #1561092

summary: - [upgrade][8.0] Mcollective on slaves can hangs after upgrade
+ [upgrade][8.0] Mcollective on slaves can hang after upgrade
Ekaterina Shutova (eshutova) wrote :

[root@nailgun ~]# mco ping
master time=34.13 ms
3 time=36.49 ms
2 time=40.66 ms
1 time=42.17 ms

---- ping statistics ----
4 replies max: 42.17 min: 34.13 avg: 38.36
Nodes are available via mcollective after upgrade, no errors in /var/log/mcollective.log
Verified on:
[root@nailgun ~]# shotgun2 short-report
cat /etc/fuel_build_id:
 495
cat /etc/fuel_build_number:
 495
cat /etc/fuel_release:
 9.0
cat /etc/fuel_openstack_version:
 mitaka-9.0
rpm -qa | egrep 'fuel|astute|network-checker|nailgun|packetary|shotgun':
 fuelmenu-9.0.0-1.mos276.noarch
 fuel-provisioning-scripts-9.0.0-1.mos8893.noarch
 fuel-notify-9.0.0-1.mos8630.noarch
 fuel-release-9.0.0-1.mos6358.noarch
 fuel-bootstrap-cli-9.0.0-1.mos291.noarch
 fuel-migrate-9.0.0-1.mos8630.noarch
 fuel-octane-9.0.0-1.mos1368.noarch
 shotgun-9.0.0-1.mos90.noarch
 fuel-nailgun-9.0.0-1.mos8893.noarch
 fuel-library9.0-9.0.0-1.mos8630.noarch
 python-fuelclient-9.0.0-1.mos361.noarch
 network-checker-9.0.0-1.mos77.x86_64
 fuel-nailgun-extension-cluster-upgrade-9.1-1.mos82.noarch
 fuel-mirror-9.0.0-1.mos155.noarch
 fuel-9.0.0-1.mos6358.noarch
 nailgun-mcagents-9.0.0-1.mos776.noarch
 fuel-setup-9.0.0-1.mos6358.noarch
 fuel-utils-9.0.0-1.mos8630.noarch
 fuel-agent-9.0.0-1.mos291.noarch
 fuel-ostf-9.0.0-1.mos947.noarch
 fuel-openstack-metadata-9.0.0-1.mos8893.noarch
 rubygem-astute-9.0.0-1.mos776.noarch
 fuel-misc-9.0.0-1.mos8630.noarch
 fuel-ui-9.0.0-1.mos2831.noarch
 python-packetary-9.0.0-1.mos155.noarch

Related fix proposed to branch: master
Change author: Mariia Zlatkova <email address hidden>
Review: https://review.fuel-infra.org/30303

Reviewed: https://review.fuel-infra.org/30303
Submitter: Olena Logvinova <email address hidden>
Branch: master

Commit: dc4cfe1141c0237b04d32ddea8c33b09ac0f854d
Author: Mariia Zlatkova <email address hidden>
Date: Thu Feb 2 13:50:37 2017

[RN-9.2] Fuel resolved and known issues

Change-Id: Idb919f92b981eee0f2cb48618dde243e4582ee5b
Related-Bug: #1590633
Related-Bug: #1625293
Related-Bug: #1561092
Related-Bug: #1619341
Related-Bug: #1563465
Related-Bug: #1628500
Related-Bug: #1593277
Related-Bug: #1628940
Related-Bug: #1658952

Related fix proposed to branch: stable/9.2
Change author: Mariia Zlatkova <email address hidden>
Review: https://review.fuel-infra.org/30423

Reviewed: https://review.fuel-infra.org/30423
Submitter: Mariia Zlatkova <email address hidden>
Branch: stable/9.2

Commit: c040581a57fac1dfeaed44952359f21963216d62
Author: Mariia Zlatkova <email address hidden>
Date: Thu Feb 2 14:03:50 2017

[RN-9.2] Fuel resolved and known issues

Change-Id: Idb919f92b981eee0f2cb48618dde243e4582ee5b
Related-Bug: #1590633
Related-Bug: #1625293
Related-Bug: #1561092
Related-Bug: #1619341
Related-Bug: #1563465
Related-Bug: #1628500
Related-Bug: #1593277
Related-Bug: #1628940
Related-Bug: #1658952
(cherry picked from commit dc4cfe1141c0237b04d32ddea8c33b09ac0f854d)

Verified on 8.0 + MU4 updates.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers