after running ``os-nova-install.yml`` lost instances connectivity

Bug #2106793 reported by Nilesh
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Expired
Undecided
Unassigned

Bug Description

++++++++++++++++++++++++
Problematic Environment:
++++++++++++++++++++++++

~~~
OS: Ubuntu 22.04
Kernel: 5.15.0-134-generic
OpenStack Release: 2023.1 aka Antelope
~~~

++++++
Issue:
++++++

Wanted to apply some nova overrides, so added that in ``user-variable``

~~~
nova_nova_conf_overrides:
      libvirt:
         cpu_mode: custom
         cpu_models: EPYC-Milan
         cpu_model_extra_flags: "-erms, -fsrm"
~~~

and ran the ``os-nova-install.yml`` applied to all node without the issue,

* This playbook also restarts the compute services across the cluster. After that all 100's of running instances was unreachable, ping was not happened.

+++++++++++
Workaround:
+++++++++++
* To block the users immediately restarted openvswitch on all computes and bring back the instance live, rechability form and users were able to access the VM.

Looking for a resolution even if we restart the nova-computes on any node.

Thanks,
cNilesh.

Tags: neutron nova
Revision history for this message
Sahid Orentino (sahid-ferdjaoui) wrote :

Hello Nilesh,

Thank you for this report.

So basically you applied change on nova compute, restarted nova compute service, and you lost connectivity on VMs, right? You did not made any chance on neutron or restarted any part of neutron?

Also Are you using ML2/OVS or ML2/OVN ?

Changed in neutron:
status: New → Incomplete
Revision history for this message
Nilesh (cnilesh) wrote :

Hello Sahid,

Yes thats true. This is ovs/ml2

Thanks.

Revision history for this message
Brian Haley (brian-haley) wrote :

Hi,

Is this an os-ansible deployment? I'm just wondering if the deployment tool here is the issue and not necessarily neutron. Can you loop-in that other team, either here or on the mailing list, since it's an issue they might have seen before?

Thanks,

-Brian

Revision history for this message
Nilesh (cnilesh) wrote :

Dear Brian,

I talked to OSA deployment community they are saying it not related to deployment.

Even if I restart nova-compute service on any of the compute node, connectivity get lost on that node only, I have to manually restart neutron-ovs service to get the connection back.

Thanks,
cNilesh.

summary: - after running ``os-nova-install.yml`` lost instances connectivity Edit
+ after running ``os-nova-install.yml`` lost instances connectivity
Revision history for this message
Brian Haley (brian-haley) wrote :

Hi,

I'm still sure this is a neutron issue. I don't remember seeing this issue in other deployment types. In this case it almost looks like a restart of the nova-compute services also needs to restart the neutron-ovs service. Do you have any information on what exactly the nova-compute restart is doing? Or any logs from the services to show what might be happening?

Thanks,
-Brian

Revision history for this message
Nilesh (cnilesh) wrote :

Hi Brian,

I can collect the log, look like this may be an race condition,

Case: 1

If restarting nova-compute service on any of the compute node, instance on that compute node loosing the connectivity. To mitigate have to restart openvswitch service.

Case: 2

If adding any config via temaplte specific to nova, surely nova service will get restart if that is happeneing then asgain instances are getting loosing connectivyt everywhere, had to restart openvswitch,

Let me know which logs you wants,

NOTE: I am not seeing any traces when instances are loosing the connectivity.

Thanks,
cNilesh.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for neutron because there has been no activity for 60 days.]

Changed in neutron:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.