[upgrade] Node can not be rebooted while upgrading

Bug #1616925 reported by Vladimir Khlyunev
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Sergey Abramov
Mitaka
Fix Released
High
Ilya Kharin

Bug Description

7.0MU5
8.0MU2+latest proposed
probably 9.1

this issue is floating and can't be reproduced with 100% chance
Steps to repro:
1 - deploy 7.0 env + HA cluster
2 - upgrade master node to 8.0
3 - upgrade cloud

Result:
At any upgrade-node step the deployment can fail - it hangs while trying to reboot target node for reprovisioning. There is no useful logs and diag snapshot is not available but I got astute.log which can be useful.

Revision history for this message
Vladimir Khlyunev (vkhlyunev) wrote :
Dmitry Klenov (dklenov)
Changed in fuel:
milestone: none → 7.0-updates
importance: Undecided → Medium
tags: added: area-python
Changed in fuel:
status: New → Confirmed
Ilya Kharin (akscram)
Changed in fuel:
milestone: 7.0-updates → 8.0-updates
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-nailgun-extension-cluster-upgrade (stable/mitaka)

Related fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/365168

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-octane (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/365169

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-nailgun-extension-cluster-upgrade (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/365173

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-nailgun-extension-cluster-upgrade (master)

Change abandoned by Ilya Kharin (<email address hidden>) on branch: master
Review: https://review.openstack.org/365173

Changed in fuel:
importance: Medium → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-nailgun-extension-cluster-upgrade (stable/mitaka)

Reviewed: https://review.openstack.org/365168
Committed: https://git.openstack.org/cgit/openstack/fuel-nailgun-extension-cluster-upgrade/commit/?id=a4e2a67e3e5024b0ae65f445355965a1263fef73
Submitter: Jenkins
Branch: stable/mitaka

commit a4e2a67e3e5024b0ae65f445355965a1263fef73
Author: Ilya Kharin <email address hidden>
Date: Sat Sep 3 02:42:07 2016 +0300

    Add support to re-assign a set of nodes

    This patch adds an ability to re-assign a set of the given nodes at
    once. This feature was technically available but not exposed to the
    client. A groupped re-assigning allows to effectively re-provision nodes
    by creating an atomic task in Astute.

    Change-Id: I4a7c7e35d844683ef73ad7f8459d1892e80e0a64
    Related-Bug: #1616925

tags: added: in-stable-mitaka
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-nailgun-extension-cluster-upgrade (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/367019

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-nailgun-extension-cluster-upgrade (master)

Change abandoned by Ilya Kharin (<email address hidden>) on branch: master
Review: https://review.openstack.org/367019

Revision history for this message
Vladimir Khlyunev (vkhlyunev) wrote :

wrong moving into "confirmed" - actually "fix released"
root@node-1:~# ssh cirros@10.109.18.129
cirros@10.109.18.129's password:
$ uptime
 18:39:39 up 5:21, 1 users, load average: 0.00, 0.01, 0.03
$ Connection to 10.109.18.129 closed.

Revision history for this message
Vladimir Khlyunev (vkhlyunev) wrote :

wrong bug, sorry, my bad

Ilya Kharin (akscram)
Changed in fuel:
assignee: Fuel Octane (fuel-octane-team) → Ilya Kharin (akscram)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-nailgun-extension-cluster-upgrade (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/367419

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-nailgun-extension-cluster-upgrade (master)

Change abandoned by Ilya Kharin (<email address hidden>) on branch: master
Review: https://review.openstack.org/367419

tags: added: blocker-for-qa
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-nailgun-extension-cluster-upgrade (master)
Download full text (14.8 KiB)

Reviewed: https://review.openstack.org/367019
Committed: https://git.openstack.org/cgit/openstack/fuel-nailgun-extension-cluster-upgrade/commit/?id=e38d48cbc5d216ac0f853466ab48e5d6d9687521
Submitter: Jenkins
Branch: master

commit a4e2a67e3e5024b0ae65f445355965a1263fef73
Author: Ilya Kharin <email address hidden>
Date: Sat Sep 3 02:42:07 2016 +0300

    Add support to re-assign a set of nodes

    This patch adds an ability to re-assign a set of the given nodes at
    once. This feature was technically available but not exposed to the
    client. A groupped re-assigning allows to effectively re-provision nodes
    by creating an atomic task in Astute.

    Change-Id: I4a7c7e35d844683ef73ad7f8459d1892e80e0a64
    Related-Bug: #1616925

commit d87125662f2f55204244e43ec6522fe36c6bf21e
Author: Nikita Zubkov <email address hidden>
Date: Mon Aug 29 17:56:09 2016 +0300

    Add test for vip transformer

    Change-Id: I65124237604fe6718ad6c351825f192a98d470fb
    (cherry picked from commit 55422ddce7b0e9e1b9976504500de8a1688a99d7)

commit 3cac9b551cdecaf9148800cc6fd9b81fbb3149c5
Author: Nikita Zubkov <email address hidden>
Date: Tue Jul 19 18:24:42 2016 +0300

    Switch to upstream fuel-web repository

    Change-Id: I994304bdc8eaf7e4da175981cb721d41a286fed0
    Depends-On: Id0bc78478cf3f40767fed760cd54e487a934fa10

commit a78f4348f84dd470ba1f3b99d2c751fa2dd12d7a
Author: Anastasiya <email address hidden>
Date: Mon Aug 1 14:39:35 2016 +0300

    Move change_env_settings function from octane to cluster upgrade extension

    * change_env_settings function was moved to cluster upgrade extention
    * merge generated attributes code was written

    Change-Id: I6d1e27b8b0c01f3251067bc88931cd2354feb5ce
    Partial-Bug: #1602587
    (cherry picked from commit dc2e3f930957b2c8af2d6c6a60bfcc6c5e6bb061)

commit 280fc4f08258f1e85ba099f74c4956233652e9a2
Author: Ilya Kharin <email address hidden>
Date: Wed Aug 10 17:28:17 2016 +0300

    Add absent __init__.py to migrations/versions

    Without the versions/__init__.py file versions was not identified as
    a package and was not included in a distribution.

    Change-Id: I67f152ebb9234df880c61d79d154b1aabc8828c6
    Closes-Bug: #1611793

commit f7ebb08b46f5beb13701f7a6a71a1f4fea05f451
Author: Alexander Tsamutali <email address hidden>
Date: Mon Aug 1 15:47:58 2016 +0300

    Add package spec

    Change-Id: Id71764dff07a4b32851eb8ccf69c66dca4a7b6ab
    Related-Bug: #1604492

commit b93ebedc49f79f6ba4a710a9d1715c9f965b3081
Author: Anastasiya <email address hidden>
Date: Fri Jul 15 10:24:11 2016 +0300

    Correction of transformation for text_list

    * added removing of space in text_list
    * added test for merge_attributes

    Change-Id: I5582878fc7c524551593abf21dfd4ea45cd430c9
    Closes-bug: 1602607
    (cherry picked from commit fdd2a6226483c67ce8bc7adc8b2d354862125bac)

commit d4db5ba78ccefd08e465cd30116094678c5cb35f
Author: Nikita Zubkov <email address hidden>
Date: Wed Jul 13 13:43:58 2016 +0300

    Fix package namespace

commit 443fc43da6e963cd0825880a80678e8e385c0a3a
Author: Nikita Zubkov <email address hidden>
Date: Wed J...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-octane (master)

Reviewed: https://review.openstack.org/365169
Committed: https://git.openstack.org/cgit/openstack/fuel-octane/commit/?id=9a4817174ed8b0702564baadc1728eaf8b778e28
Submitter: Jenkins
Branch: master

commit 9a4817174ed8b0702564baadc1728eaf8b778e28
Author: Ilya Kharin <email address hidden>
Date: Sat Sep 3 01:04:10 2016 +0300

    Move a set of nodes at once

    Before this patch octane made separate calls to move nodes one by one.

    Change-Id: I999a98d57b3184d35972e4862fcb4f284a066e9e
    Related-Bug: #1616925

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-octane (stable/mitaka)

Related fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/367652

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-octane (stable/mitaka)
Download full text (4.3 KiB)

Reviewed: https://review.openstack.org/367652
Committed: https://git.openstack.org/cgit/openstack/fuel-octane/commit/?id=293b79e6490605d9e74d2e2b6825146b49fdecfb
Submitter: Jenkins
Branch: stable/mitaka

commit 9a4817174ed8b0702564baadc1728eaf8b778e28
Author: Ilya Kharin <email address hidden>
Date: Sat Sep 3 01:04:10 2016 +0300

    Move a set of nodes at once

    Before this patch octane made separate calls to move nodes one by one.

    Change-Id: I999a98d57b3184d35972e4862fcb4f284a066e9e
    Related-Bug: #1616925

commit 763b69b97645752b4b08253751962687d50cf1be
Author: Alexander Tsamutali <email address hidden>
Date: Thu Sep 8 22:03:24 2016 +0300

    Cleanup %files section in spec

    Don't use --record during install, don't use -f in %files. Specify
    octane files/directories explicitly.

    Change-Id: I84f0d71a2a582b3a23fc048a331d6caae775e38b
    Closes-Bug: #1619319

commit e7fad96f4125386df4e5c9b7f0744de185de6c33
Author: Sergey Abramov <email address hidden>
Date: Mon Sep 5 18:24:37 2016 +0300

    Osd upgrade failed if version not changed

    ceph deploy not raise exception if it doesn't upgrade osd version.

    Change-Id: Ifcddd822228d78166d59b2ba49852be2e51c79fc
    Closes-bug: 1620277

commit 349073cbe399184daee672f75b4fe9941ae3c5da
Author: Sergey Abramov <email address hidden>
Date: Wed Sep 7 18:20:33 2016 +0300

    Remove patch pupet on upgrade osd

    this is not required after merge
    https://review.openstack.org/#/c/203639/

    Change-Id: I67fbcd77ab3437443219c34ee3ddaf7895b068ce
    Closes-bug: 1621436

commit 807c5166a8a0145623d41c3110007e33a1402c47
Author: Alexey Stepanov <email address hidden>
Date: Wed Sep 7 17:53:40 2016 +0300

    Stop waiting status change on node with stopped status

    Stopped status, if not expected, should be a reason for error on nodes
    Closes-bug: #1621069

    Change-Id: I0156d694ef20ece8603e3d840f085852a528e635

commit 5f689f3d6904e092998a0d8ba124280833452c30
Author: Anastasiya <email address hidden>
Date: Tue Sep 6 10:23:39 2016 +0300

    Backup/restore for admin networks

    * backup/restore for /etc/hiera/networks.yaml was added
    * configure dhcp after restore was added

    Change-Id: I5b1e3861589e1c56acbc37d0be569da5e55b8536
    Closes-Bug: #1616998

commit e4820087678d83d9477db4c1688137ee5ff66c3f
Author: Pavel Chechetin <email address hidden>
Date: Thu Aug 18 14:46:59 2016 +0300

    Graph-based upgrade-ceph. Python part.

    Change-Id: Icb4d543bd6801f21c6aca57415105f88a601c0c2

commit 0af1b6517a97589a401328f09f3bf56d84db9dbd
Author: Pavel Chechetin <email address hidden>
Date: Tue Sep 6 13:13:42 2016 +0300

    Graph-based upgrade-ceph. Puppet part.

     - Graphs and puppet manifests part
     - Delete lib from .gitignore
     - Augeas lens for Ceph is copied and pasted, should be switched to the
       version from the upstream when [0] is merged and published.

    [0] https://github.com/hercules-team/augeas/pull/401

    Change-Id: I639cbf786971fea8c56b4da6b2661477b3b12c41

commit 3315d741f295635303413ad839b3b66a0dac3282
Author: Alexey Stepanov <penguinolog@gmail....

Read more...

Ilya Kharin (akscram)
Changed in fuel:
status: Confirmed → Fix Committed
Ilya Kharin (akscram)
Changed in fuel:
status: Fix Committed → In Progress
milestone: 8.0-updates → 8.0-mu-4
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-web (stable/8.0)

Related fix proposed to branch: stable/8.0
Review: https://review.openstack.org/376453

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-octane (stable/8.0)

Related fix proposed to branch: stable/8.0
Review: https://review.openstack.org/376463

Ilya Kharin (akscram)
Changed in fuel:
assignee: Ilya Kharin (akscram) → Sergey Abramov (sabramov)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-web (stable/8.0)

Reviewed: https://review.openstack.org/376453
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=390756d3658937b9cface7d30176ffd0e7ede9d3
Submitter: Jenkins
Branch: stable/8.0

commit 390756d3658937b9cface7d30176ffd0e7ede9d3
Author: Sergey Abramov <email address hidden>
Date: Mon Sep 26 15:57:09 2016 +0300

    Add support to re-assign a set of nodes

    This patch adds an ability to re-assign a set of the given nodes at
    once. This feature was technically available but not exposed to the
    client. A groupped re-assigning allows to effectively re-provision nodes
    by creating an atomic task in Astute.

    Change-Id: Ia239d37ec0497de73715640e62205f6d3252f61b
    Related-Bug: #1616925

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-octane (stable/8.0)

Reviewed: https://review.openstack.org/376463
Committed: https://git.openstack.org/cgit/openstack/fuel-octane/commit/?id=58a9e6b2d3b1e268a995d523c395bf2d45bf051e
Submitter: Jenkins
Branch: stable/8.0

commit 58a9e6b2d3b1e268a995d523c395bf2d45bf051e
Author: Sergey Abramov <email address hidden>
Date: Mon Sep 26 16:07:30 2016 +0300

    Move a set of nodes at once

    Before this patch octane made separate calls to move nodes one by one.
    Partial-Bug: #1616925

    Change-Id: I999a98d57b3184d35972e4862fcb4f284a066e9e

Revision history for this message
Alexander Gordeev (a-gordeev) wrote :
Revision history for this message
Alexander Gordeev (a-gordeev) wrote :
Revision history for this message
Alexander Gordeev (a-gordeev) wrote :

Guys, i've done with astute log analysis. Good thing you've got only 2 nodes, so it was relatively easy to track all astute actions what happened.

I've attached to file: one with briefly decribed actions and the corresponding part of more verbose log.

My initial thought was that you've stepped into https://bugs.launchpad.net/fuel/+bug/1626962 but in fact, no any certain evidences about that was found at astute logs for booting into broken cobbler profile.

This never happened.

There're only 2 thing i can suspect:
1) fake positive result of rebooting into bootstrap image, nodes weren't rebooted and remained in target operating system.
2) successful reboot into bootstrap image, but image didn't have fuel-agent package installed (so provisioning failed complaining about not found 'provision' binary)

Too bad, there's no logs from bootstrap image builder.

For the 1) astute could just ensure that after 'reboot_reprovisioned_nodes' nodes were loaded into bootstrap image and have speficic node type such as 'bootstrap' https://github.com/openstack/fuel-agent/blob/master/contrib/fuel_bootstrap/files/trusty/etc/nailgun_systemtype

Revision history for this message
Alexander Gordeev (a-gordeev) wrote :

perhaps, we should probably add additional check for node type of 'bootstrap' somewhere after 'reboot_reprovisioned_nodes' : https://github.com/openstack/fuel-astute/blob/stable/mitaka/lib/astute/provision.rb#L536-L543

Revision history for this message
Alexander Gordeev (a-gordeev) wrote :

Just found the same traces of

> :stderr=>"flock: provision: No such file or directory\n",

at our lab too. So i can vouch that there's nothing wrong with bootstrap image. It did contain fuel-agent package with '/usr/bin/provision' entry-point srcipt.

So, we should definitely prepare a fix astute.

Ilya Kharin (akscram)
Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
TatyanaGladysheva (tgladysheva) wrote :

Verified on 8.0 + MU4 updates.

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.