Nova services RPC version pinning present after upgrade of multinode configuration

Bug #1847990 reported by Zdenek Dvorak
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kolla-ansible
Fix Released
High
Unassigned
Rocky
Fix Released
High
Mark Goddard
Stein
Fix Released
High
Mark Goddard
Train
Fix Released
High
Unassigned

Bug Description

RPC protocol is used to communicate between nova services. RPC version is normally increased during service upgrade. Reselection of APC version is needed after this change. This problem was already discovered and described in bug report 1833069. Provided fix solves mentioned issue on all-in-one installation. Further enhancement is needed for multi node installation.

Services in list
nova_services_require_nova_conf:
  - placement-api
  - nova-api
  - nova-compute
  - nova-compute-ironic
  - nova-conductor
  - nova-consoleauth
  - nova-novncproxy
  - nova-serialproxy
  - nova-scheduler
  - nova-spicehtml5proxy
are restarted now, but only on compute node. Services running on other nodes are not restarted now. The restart operation should be performed also on control node.

Other suggestion is to replace one step "restart" by two steps "stop" and "start". This will eliminate possible race condition.
Restart of nova-compute does not lead to correct RCP reselection if nova-scheduler is still running older version of RPC (during restart). Two steps approach will guarantee desigred timing.

Regards Zdenek

Tags: rpc
Mark Goddard (mgoddard)
Changed in kolla-ansible:
status: New → Triaged
importance: Undecided → Medium
importance: Medium → High
Revision history for this message
Mark Goddard (mgoddard) wrote :

Will be fixed by cells change.

Revision history for this message
Zdenek Dvorak (zdenek-dvorak) wrote :

Hello Mark,
I have a question related to the planed fix. Which versions will be fixed by the change in cells?
We are working on rocky and I have prepared simple fix in rocky.
Regards Zdenek

Revision history for this message
Mark Goddard (mgoddard) wrote :

Hi Zdenek, good question. The cells change will only be applied to the Train release, so a different fix will be required for Stein & earlier releases.

If you have a fix, please propose it to the stable/stein branch, then we can backport it to stable/rocky.

Revision history for this message
Zdenek Dvorak (zdenek-dvorak) wrote :

Hello Mark,
I tried to put the change to repository at https://opendev.org/openstack/kolla-ansible, but I was not successful. I will try to publish it there.
I attached file kolla-ansible/ansible/roles/nova/handlers/main.yml
Can you please have a look to attached file to see proposed solution?
(handlers
- name: Restart nova services to remove RPC version cap
- name: Restart nova services to remove RPC version cap step 2)
We are working with rocky version. Fix in Rocky and Stein should be similar.

Regards Zdenek

Revision history for this message
Mark Goddard (mgoddard) wrote :

Hi Zdenek. Changes need to be submitted via Gerrit: https://docs.openstack.org/infra/manual/developers.html.

I had a quick look at your file, and I think it will take a long time to execute, because of the loop/delegate_to.

How about considering a partial revert of this change: https://review.opendev.org/#/c/665660/. We could keep the delay handler ( Wait for nova-compute services to update service versions), but remove the restart (Restart nova services to remove RPC version cap), and do the restart in reload.yml - use restart_container rather than SIGHUP though.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/693505

Revision history for this message
Zdenek Dvorak (zdenek-dvorak) wrote :

Hello Mark,
I checked the code, and I will prepare "partial revert" variation of the patch.

Regards Zdenek

Revision history for this message
Zdenek Dvorak (zdenek-dvorak) wrote :

Hello Mark,
I modified the patched according to your comments. Is it OK with you? Should I do some additional changes?

Regards Zdenek

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/695049

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/rocky)

Reviewed: https://review.opendev.org/693505
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=7ecf68885777713a74612fc46cb3f60675fe5dfd
Submitter: Zuul
Branch: stable/rocky

commit 7ecf68885777713a74612fc46cb3f60675fe5dfd
Author: zdenek_dvorak <email address hidden>
Date: Fri Nov 8 17:26:00 2019 +0900

    Patch to fix RPC selection problem after upgrade

    Stable (Stein & earlier) only - nova cells change fixed this in Train.

    Nova services communicate via RPC. RPC version selection is done after
    service start. Restart of all nova services present in list
     “nova_services_require_nova_conf”
     is needed for correct RPC version selection.

    Only nova services located on the same node as “nova-compute”
    were restarted till now. This fix restarts nova services on all nodes.

    Change-Id: Idaf857b4894deb024f839263503a97332e3dc9af
    Closes-Bug: #1847990

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/stein)

Reviewed: https://review.opendev.org/695049
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=8cf2f14bb423d1f14412a6d962daec3c3606472e
Submitter: Zuul
Branch: stable/stein

commit 8cf2f14bb423d1f14412a6d962daec3c3606472e
Author: zdenek_dvorak <email address hidden>
Date: Fri Nov 8 17:26:00 2019 +0900

    Patch to fix RPC selection problem after upgrade

    Stable (Stein & earlier) only - nova cells change fixed this in Train.

    Nova services communicate via RPC. RPC version selection is done after
    service start. Restart of all nova services present in list
     “nova_services_require_nova_conf”
     is needed for correct RPC version selection.

    Only nova services located on the same node as “nova-compute”
    were restarted till now. This fix restarts nova services on all nodes.

    Change-Id: Idaf857b4894deb024f839263503a97332e3dc9af
    Closes-Bug: #1847990

Mark Goddard (mgoddard)
Changed in kolla-ansible:
milestone: 9.0.0 → none
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 7.2.0

This issue was fixed in the openstack/kolla-ansible 7.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 8.1.0

This issue was fixed in the openstack/kolla-ansible 8.1.0 release.

Mark Goddard (mgoddard)
Changed in kolla-ansible:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.