Undercloud upgrade breaks due to IP range change

Bug #1645267 reported by Steven Hardy
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Ben Nemec

Bug Description

Since we changed the default IP range, upgrades for existing underclouds will break like this:

2016-11-28 09:48:24,849 INFO: + cidr=192.0.2.0/24
2016-11-28 09:48:24,850 INFO: + '[' 192.0.2.0/24 = 192.168.24.0/24 ']'
2016-11-28 09:48:24,850 INFO: + echo 'New cidr 192.168.24.0/24 does not equal old cidr 192.0.2.0/24'
2016-11-28 09:48:24,850 INFO: New cidr 192.168.24.0/24 does not equal old cidr 192.0.2.0/24
2016-11-28 09:48:24,850 INFO: + echo 'Will attempt to delete and recreate subnet 0fc641d3-dfe1-4515-89f3-7d4ab8896c00'
2016-11-28 09:48:24,850 INFO: Will attempt to delete and recreate subnet 0fc641d3-dfe1-4515-89f3-7d4ab8896c00
2016-11-28 09:48:24,850 INFO: + '[' 1 -eq 1 ']'
2016-11-28 09:48:24,850 INFO: + neutron subnet-list
2016-11-28 09:48:24,850 INFO: + grep start
2016-11-28 09:48:26,202 INFO: | 0fc641d3-dfe1-4515-89f3-7d4ab8896c00 | | 192.0.2.0/24 | {"start": "192.0.2.5", "end": "192.0.2.24"} |
2016-11-28 09:48:26,203 INFO: ++ neutron subnet-list
2016-11-28 09:48:26,204 INFO: ++ awk '{print $2}'
2016-11-28 09:48:26,206 INFO: ++ grep start
2016-11-28 09:48:27,464 INFO: + neutron subnet-delete 0fc641d3-dfe1-4515-89f3-7d4ab8896c00
2016-11-28 09:48:28,877 INFO: Unable to complete operation on subnet 0fc641d3-dfe1-4515-89f3-7d4ab8896c00: One or more ports have an IP allocation from this subnet.
2016-11-28 09:48:28,877 INFO: Neutron server returns request_ids: ['req-f493524d-e463-467a-8af9-d32b48085d37']
2016-11-28 09:48:28,901 INFO: [2016-11-28 09:48:28,900] (os-refresh-config) [ERROR] during post-configure phase. [Command '['dib-run-parts', '/usr/libexec/os-refresh-config/post-configure.d']' returned non-zero exit status 1]
2016-11-28 09:48:28,901 INFO:
2016-11-28 09:48:28,901 INFO: [2016-11-28 09:48:28,901] (os-refresh-config) [ERROR] Aborting...
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python2.7/site-packages/instack_undercloud/undercloud.py", line 1173, in install
    _run_orc(instack_env)
  File "/usr/lib/python2.7/site-packages/instack_undercloud/undercloud.py", line 963, in _run_orc
    _run_live_command(args, instack_env, 'os-refresh-config')
  File "/usr/lib/python2.7/site-packages/instack_undercloud/undercloud.py", line 501, in _run_live_command
    raise RuntimeError('%s failed. See log for details.' % name)
RuntimeError: os-refresh-config failed. See log for details.
Command 'instack-install-undercloud' returned non-zero exit status 1

Although I support the change we made, I think we need a better user experience here - like at the start of the update we could check if the ip range will change, and if so we write a helpful message explaining what the user options are.

E.g, it seems clear that we can't support switching out the subnet, so probably we need to print an undercloud.conf fragment folks can cut/paste so they can continue using their deployment.

Long term, I guess we'll need to figure a migration plan, but I'm not sure how we do that given that the ctlplane IPs of all deployed nodes would change?

Steven Hardy (shardy)
Changed in tripleo:
milestone: none → ocata-2
status: New → Triaged
importance: Undecided → High
tags: added: undercloud upgrade
Revision history for this message
Julie Pichon (jpichon) wrote :

Not stopping early enough when the old range can't be deleted causes at least one issue with Swift, see bug 1646450.

Revision history for this message
Steven Hardy (shardy) wrote :

Yeah I think as discovered via bug 1646450 this has a worse side-effect that I realized - we really need to prevent the undercloud update happening at all if the network settings don't match or things end up in a bad broken state.

Revision history for this message
Ben Nemec (bnemec) wrote :

Note that on the upgrade front, this behavior was actually documented in Newton and users were advised to explicitly set the 192.0.2 cidr if they had already deployed with it and couldn't change it. See https://github.com/openstack/instack-undercloud/blob/stable/newton/undercloud.conf.sample#L79 and https://github.com/openstack/instack-undercloud/blob/stable/newton/instack_undercloud/undercloud.py#L98

That being said, since apparently changing the cidr doesn't work, I agree we need to either fix that or disallow it up front. Disallowing it will make people unhappy though since it basically means a wipe and reinstall if they decide to change their provisioning range after the initial undercloud install.

Revision history for this message
Steven Hardy (shardy) wrote :

Ben - I guess I'd rather disallow by default, and have a --force option or something which enables folks to do the change of provisioning range if they want to (assuming we can make that actually work).

It's not only breaking swift we have to worry about - if we reconfigure everything on the undercloud to a different ctlplane cidr, won't we orhpan all existing overcloud nodes, which will be polling the old swift tempurls containing the old IP?

We don't have any workflow to regenerate those tempurls via heat without deleting and re-creating the overcloud nodes (the tempurl creation is part of the server heat resource, which we explicitly never allow updates of).

So, in summary, I don't think changing the range will ever work unless you delete your overcloud, so we should just disallow it by default.

I think the best outcome in this situation is the operator gets a choice, they either update undercloud.conf to match their current environment (hence it stays working), or they make the migration by tearing down their overcloud (don't imagine many production environments will want that..)

Changed in tripleo:
milestone: ocata-2 → ocata-3
Changed in tripleo:
milestone: ocata-3 → ocata-rc1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to instack-undercloud (master)

Fix proposed to branch: master
Review: https://review.openstack.org/432388

Changed in tripleo:
assignee: nobody → Ben Nemec (bnemec)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to instack-undercloud (master)

Reviewed: https://review.openstack.org/432388
Committed: https://git.openstack.org/cgit/openstack/instack-undercloud/commit/?id=66d12ccfff0a4110c8928a2c832921804c89c61b
Submitter: Jenkins
Branch: master

commit 66d12ccfff0a4110c8928a2c832921804c89c61b
Author: Ben Nemec <email address hidden>
Date: Fri Feb 10 17:08:43 2017 +0000

    Disallow IP changes on undercloud update

    Changing the ctlplane IP after an undercloud has been installed
    causes various difficult to solve problems, and doesn't work
    properly in any case. Let's explicitly disallow it during the
    config validation before we start reconfiguring services and break
    something.

    Note that this also prevents accidental changes due to the new
    default CIDR in this release.

    Change-Id: I7f3a436b0a4e44fdc1241ebd52003ec9f659e8ea
    Closes-Bug: 1645267

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/instack-undercloud 6.0.0.0rc1

This issue was fixed in the openstack/instack-undercloud 6.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.