kolla-ansible

Nova services RPC version pinning present after upgrade of multinode configuration

Bug #1847990 reported by Zdenek Dvorak on 2019-10-14

This bug affects 1 person

	Status	Importance	Assigned to	Milestone
kolla-ansible	Fix Released	High	Unassigned
Rocky	Fix Released	High	Mark Goddard
Stein	Fix Released	High	Mark Goddard
Train	Fix Released	High	Unassigned	kolla-ansible 9.0.0 "Train"

Bug Description

RPC protocol is used to communicate between nova services. RPC version is normally increased during service upgrade. Reselection of APC version is needed after this change. This problem was already discovered and described in bug report 1833069. Provided fix solves mentioned issue on all-in-one installation. Further enhancement is needed for multi node installation.

Services in list
nova_services_require_nova_conf:
  - placement-api
  - nova-api
  - nova-compute
  - nova-compute-ironic
  - nova-conductor
  - nova-consoleauth
  - nova-novncproxy
  - nova-serialproxy
  - nova-scheduler
  - nova-spicehtml5proxy
are restarted now, but only on compute node. Services running on other nodes are not restarted now. The restart operation should be performed also on control node.

Other suggestion is to replace one step "restart" by two steps "stop" and "start". This will eliminate possible race condition.
Restart of nova-compute does not lead to correct RCP reselection if nova-scheduler is still running older version of RPC (during restart). Two steps approach will guarantee desigred timing.

Regards Zdenek

Tags:

Mark Goddard (mgoddard) on 2019-10-15

Changed in kolla-ansible:
status:	New → Triaged
importance:	Undecided → Medium
importance:	Medium → High

Revision history for this message

Mark Goddard (mgoddard) wrote on 2019-10-23:

Will be fixed by cells change.

Revision history for this message

Zdenek Dvorak (zdenek-dvorak) wrote on 2019-10-25:

Hello Mark,
I have a question related to the planed fix. Which versions will be fixed by the change in cells?
We are working on rocky and I have prepared simple fix in rocky.
Regards Zdenek

Revision history for this message

Mark Goddard (mgoddard) wrote on 2019-10-25:

Hi Zdenek, good question. The cells change will only be applied to the Train release, so a different fix will be required for Stein & earlier releases.

If you have a fix, please propose it to the stable/stein branch, then we can backport it to stable/rocky.

Revision history for this message

Zdenek Dvorak (zdenek-dvorak) wrote on 2019-11-06:

main.yml Edit (18.7 KiB, text/plain)

Hello Mark,
I tried to put the change to repository at https://opendev.org/openstack/kolla-ansible, but I was not successful. I will try to publish it there.
I attached file kolla-ansible/ansible/roles/nova/handlers/main.yml
Can you please have a look to attached file to see proposed solution?
(handlers
- name: Restart nova services to remove RPC version cap
- name: Restart nova services to remove RPC version cap step 2)
We are working with rocky version. Fix in Rocky and Stein should be similar.

Regards Zdenek

Revision history for this message

Mark Goddard (mgoddard) wrote on 2019-11-06:

Hi Zdenek. Changes need to be submitted via Gerrit: https://docs.openstack.org/infra/manual/developers.html.

I had a quick look at your file, and I think it will take a long time to execute, because of the loop/delegate_to.

How about considering a partial revert of this change: https://review.opendev.org/#/c/665660/. We could keep the delay handler ( Wait for nova-compute services to update service versions), but remove the restart (Restart nova services to remove RPC version cap), and do the restart in reload.yml - use restart_container rather than SIGHUP though.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-11-08: Fix proposed to kolla-ansible (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/693505

Revision history for this message

Zdenek Dvorak (zdenek-dvorak) wrote on 2019-11-08:

Hello Mark,
I checked the code, and I will prepare "partial revert" variation of the patch.

Regards Zdenek

Revision history for this message

Zdenek Dvorak (zdenek-dvorak) wrote on 2019-11-19:

Hello Mark,
I modified the patched according to your comments. Is it OK with you? Should I do some additional changes?

Regards Zdenek

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-11-19: Fix proposed to kolla-ansible (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/695049

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-11-21: Fix merged to kolla-ansible (stable/rocky)

#10

Reviewed: https://review.opendev.org/693505
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=7ecf68885777713a74612fc46cb3f60675fe5dfd
Submitter: Zuul
Branch: stable/rocky

commit 7ecf68885777713a74612fc46cb3f60675fe5dfd
Author: zdenek_dvorak <email address hidden>
Date: Fri Nov 8 17:26:00 2019 +0900

Patch to fix RPC selection problem after upgrade

Stable (Stein & earlier) only - nova cells change fixed this in Train.

    Nova services communicate via RPC. RPC version selection is done after
    service start. Restart of all nova services present in list
     “nova_services_require_nova_conf”
     is needed for correct RPC version selection.

Only nova services located on the same node as “nova-compute”
were restarted till now. This fix restarts nova services on all nodes.

Change-Id: Idaf857b4894deb024f839263503a97332e3dc9af
Closes-Bug: #1847990

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-11-23: Fix merged to kolla-ansible (stable/stein)

#11

Reviewed: https://review.opendev.org/695049
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=8cf2f14bb423d1f14412a6d962daec3c3606472e
Submitter: Zuul
Branch: stable/stein

commit 8cf2f14bb423d1f14412a6d962daec3c3606472e
Author: zdenek_dvorak <email address hidden>
Date: Fri Nov 8 17:26:00 2019 +0900

Patch to fix RPC selection problem after upgrade

Stable (Stein & earlier) only - nova cells change fixed this in Train.

Only nova services located on the same node as “nova-compute”
were restarted till now. This fix restarts nova services on all nodes.

Change-Id: Idaf857b4894deb024f839263503a97332e3dc9af
Closes-Bug: #1847990

Mark Goddard (mgoddard) on 2019-12-17

Changed in kolla-ansible:
milestone:	9.0.0 → none

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-01-30: Fix included in openstack/kolla-ansible 7.2.0

#12

This issue was fixed in the openstack/kolla-ansible 7.2.0 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-01-30: Fix included in openstack/kolla-ansible 8.1.0

#13

This issue was fixed in the openstack/kolla-ansible 8.1.0 release.

Mark Goddard (mgoddard) on 2020-11-16

Changed in kolla-ansible:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

main.yml Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.