Upgrade to pike failed with hanging config-changed hook

Bug #1742115 reported by Liam Young
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Nova Cloud Controller Charm
Fix Released
Critical
Liam Young

Bug Description

config-changed hook hung seemingly running map_instances. At this point multiple nova services where down across the 3 nova-cc units.

* map_instances was run by hand without max-count set which seemed to run for hours with no output.
* nova services were started by hand and came up cleanly
* map_instances was run by hand with max-count set initially at 10 and then increased gradually to 60000. After each run the RC was checked as 1 means more to do and 0 means complete.
* When complete (according to rc) the config-changed hook was re-run with map_instances() disabled
* Upgrade appears to be complete

Tags: upgrade
Revision history for this message
Liam Young (gnuoy) wrote :

Worth noting that as far as I can tell running map_instances after it has reported an rc of 0 starts the whole process over again from scratch.

Liam Young (gnuoy)
tags: added: upgrade
Liam Young (gnuoy)
Changed in charm-nova-cloud-controller:
importance: Undecided → Critical
Liam Young (gnuoy)
Changed in charm-nova-cloud-controller:
status: New → Triaged
assignee: nobody → Liam Young (gnuoy)
Revision history for this message
Liam Young (gnuoy) wrote :

It looks like thedac's observations from stracing map_instances are entirely correct. The default batch size is 50 which is crazy small *1 . I have raised an upstream bug for this Bug #1742649 . In the mean time I suggest we do batches of 50000

*1 https://github.com/openstack/nova/blob/stable/pike/nova/cmd/manage.py#L1411

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-nova-cloud-controller (master)

Fix proposed to branch: master
Review: https://review.openstack.org/532821

Changed in charm-nova-cloud-controller:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/532918

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-nova-cloud-controller (master)

Reviewed: https://review.openstack.org/532918
Committed: https://git.openstack.org/cgit/openstack/charm-nova-cloud-controller/commit/?id=edd9b1face6aac248d134d9261574e451a8121e7
Submitter: Zuul
Branch: master

commit edd9b1face6aac248d134d9261574e451a8121e7
Author: Liam Young <email address hidden>
Date: Thu Jan 11 17:27:20 2018 +0000

    Add action for running archive-deleted-rows

    Add an action for moving stale data to shadow tables using
    nova-manage *1. This will speed up other operations such as
    map_instances which no longer need to work against stale
    data.

    *1 https://docs.openstack.org/nova/pike/cli/nova-manage.html

    Change-Id: I03f3d641b50cfc6f02262edb0f714ba6e9566775
    Partial-Bug: #1742115

Revision history for this message
Liam Young (gnuoy) wrote :

I think this bug is covered by the following patches:

https://review.openstack.org/#/c/533597/ (
Add instructions for upgrading OpenStack)

https://review.openstack.org/#/c/533630/ (Only run map_instances for Ocata
)

https://review.openstack.org/#/c/532821/ (Batch up map_instances call
)

https://review.openstack.org/#/c/532918/ (Add action for running archive-deleted-rows
)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-nova-cloud-controller (stable/17.11)

Fix proposed to branch: stable/17.11
Review: https://review.openstack.org/541218

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-nova-cloud-controller (stable/17.11)

Reviewed: https://review.openstack.org/541218
Committed: https://git.openstack.org/cgit/openstack/charm-nova-cloud-controller/commit/?id=d16168a5dbdffd6422161a9f93c7d8b62b2f2d4d
Submitter: Zuul
Branch: stable/17.11

commit d16168a5dbdffd6422161a9f93c7d8b62b2f2d4d
Author: Liam Young <email address hidden>
Date: Thu Jan 11 13:39:29 2018 +0000

    Batch up map_instances call

    nova-manage map_instances maps all instances into a cell in batches
    of 50 if max-count is not set. Setting max-count causes the script
    to run a single batch of size max-count. The return code of the
    script shows if there are still more to do. This change runs
    map_instances repeatedly with a batch size of 50000 while rc is 1
    and then exists cleanly when a rc 0 is recieved.

    Change-Id: Id1184778a5ae94bb3b57348b10d12077b093d6dd
    Partial-Bug: #1742115
    (cherry picked from commit 69c2626d73c9b5aea6c3e31f6118dc3db715c551)

Liam Young (gnuoy)
Changed in charm-nova-cloud-controller:
status: In Progress → Fix Committed
James Page (james-page)
Changed in charm-nova-cloud-controller:
milestone: none → 18.02
James Page (james-page)
Changed in charm-nova-cloud-controller:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.