nova upgrade failures from newton to ocata

Bug #1692464 reported by Mathieu Rohon
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kolla-ansible
Expired
Undecided
Unassigned

Bug Description

I'm running an upgrade from kolla 3.0.3 to kolla 4.0.2, with a local registry and a build from source/centos.

I had a first failure when running the nova-manage commands from the nova_api container :

https://github.com/openstack/kolla-ansible/blob/stable/ocata/ansible/roles/nova/tasks/bootstrap.yml

Those tasks complain that they don't have write permissions on /var/log/kolla/nova/nova-manage.log
I had to enter the nova_api container and change permissions on this file.

Then the "Running Nova bootstrap container" fails. Logs on the nova_bootstrap container shows the following error :

ValidationError: Cell mappings are not created, but required for Ocata. Please run nova-manage cell_v2 simple_cell_setup before continuing.

When looking at the db, it appears that the nova_api.host_mappings table is empty. I had to enter the nova_api container and explicitly run the command :

# nova-manage --verbose cell_v2 map_cell_and_hosts --transport-url rabbit://openstack:.....

It should have been run by the following task :

https://github.com/openstack/kolla-ansible/blob/stable/ocata/ansible/roles/nova/tasks/upgrade.yml#L20

But in my case, the "nova_api nova-manage cell_v2 simple_cell_setup..." command left host_mappings table empty.

Revision history for this message
Eduardo Gonzalez (egonzalez90) wrote :

I tried a couple of time to verify this issue but I was unable to reproduce.

Here is one of the upgrade process logs http://paste.openstack.org/show/610147/
And sample cell creation process http://paste.openstack.org/show/610138/

map_cell_and_hosts should be executed during simple_cell_setup (Newton code) https://github.com/openstack/nova/blob/stable/newton/nova/cmd/manage.py#L1248

Basically what simple_cell_setup does is:

- Create cell0 mapping (we do this manually before setup cells to point to correct database(newton pointed to nova_api_cell0 instead of nova_cell0))
- Run DB sync for cell0 db schema
- Run map_cell_and_hosts
- Map instances into a cell (None if not using cells)

My guess on this issue:

- Something goes wrong before simple_cell_setup
- Kolla code was not in ocata branch (note than newton or master does not have cell creation during upgrade)
- Tried several times the upgrade on the same environment and things get unstable
- Wrong code or images used (There was a bug in nova before 3.0.3)

Revision history for this message
Eduardo Gonzalez (egonzalez90) wrote :

Bug into 3.0.2 images, nova was not idempotent and if cell0 was created, nothing else was made in simple_cell_setup, what means no mappings or None cell created and failing to do db sync at bootstrap task:

https://github.com/openstack/nova/commit/8418a2a97eaafc344b2976438852526b00079742
https://review.openstack.org/#/c/420051/

From kolla-ansible release notes:

Nova cells are required as of Ocata release, before upgrade database should be created. Due a bug in Nova, only latest code can be used to create default cells. Ensure nova is fully updated and have this patch applied before start upgrade to Ocata https://review.openstack.org/#/c/420051/ or upgrade to Kolla 3.0.3 first.

Revision history for this message
Mathieu Rohon (mathieu-rohon) wrote :

i suspect option 1 or option 3, from your guesses mentioned in comment #1, is the root cause of my troubles.

Your two other guesses don't apply to my environment.

Revision history for this message
Eduardo Gonzalez (egonzalez90) wrote :

@Mathieu, do you have more information about the issue to investigate or fix?
Regards

Revision history for this message
Mathieu Rohon (mathieu-rohon) wrote :

@eduardo, I didn't test any new upgrade. My platform is now up-to-date. Since you didn't meet my issue during your upgrade testing, I think it comes from my env, and from previous upgrades that failed and left my platform in a bad state.

Changed in kolla-ansible:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for kolla-ansible because there has been no activity for 60 days.]

Changed in kolla-ansible:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.