Activity log for bug #1904730

Date Who What changed Old value New value Message
2020-11-18 12:26:31 Michał Ajduk bug added bug
2020-11-18 12:26:53 Michał Ajduk bug added subscriber Canonical Field Critical
2020-11-18 12:28:47 Michał Ajduk summary neutron-agent-stiov fails to create port neutron-agent-sriov fails to create port
2020-11-19 15:39:56 James Troup description # Problem description Attempt to boot instance with SR-IOV interface fails. Instance stays in BUILD stage for ca 1 minute and then turns to ERROR state. Neutron agent log shows: 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-f116d427-838a-4366-8173-801fbe84e406 - - - - -] Error in agent loop. Devices info: {}: TypeError: Cannot serialize error('unpack_from requires a buffer of at least 4 bytes',) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent Traceback (most recent call last): 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py", line 473, in daemon_loop 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent device_info = self.scan_devices(devices, updated_devices_copy) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/osprofiler/profiler.py", line 160, in wrapper 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent result = f(*args, **kwargs) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py", line 243, in scan_devices 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent curr_devices = self.eswitch_mgr.get_assigned_devices_info() 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 344, in get_assigned_devices_info 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent for device in embedded_switch.get_assigned_devices_info(): 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 186, in get_assigned_devices_info 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent mac = self.get_pci_device(pci_slot) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 297, in get_pci_device 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent macs = self.pci_dev_wrapper.get_assigned_macs([vf_index]) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/pci_lib.py", line 46, in get_assigned_macs 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent vfs = ip.link.get_vfs() 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/agent/linux/ip_lib.py", line 516, in get_vfs 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent return privileged.get_link_vfs(self.name, self._parent.namespace) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/oslo_privsep/priv_context.py", line 247, in _wrap 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent return self.channel.remote_call(name, args, kwargs) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/oslo_privsep/daemon.py", line 204, in remote_call 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent raise exc_type(*result[2]) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent TypeError: Cannot serialize error('unpack_from requires a buffer of at least 4 bytes',) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent 2020-11-18 10:55:00.885 53769 INFO neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-f116d427-838a-4366-8173-801fbe84e406 - - - - -] Agent out of sync with plugin! 2020-11-18 10:55:02.244 53769 INFO neutron.agent.securitygroups_rpc [req-f4627e61-2abe-49bc-bc7b-9fa9f66e1f4b 980116faf095432e8ac7887db995aeb3 78ab17a067cf49cabba6c3c5d0faabcc - - -] Security group member updated ['808d2b62-75ba-45d6-969c-87ce90d56c37'] # Environment Openstack USSURI + OVN ovn-chassis version: cs:~openstack-charmers-next/ovn-chassis-40 neutron-sriov-agent version 2:16.2.0-0ubuntu1~cloud0 CIS hardened system. aa profile set to disable, AppArmor profiles teardown applied. neutron-sriov-agent reports UP in openstack network agent list. charm configuration: charm: ovn-chassis settings: bridge-interface-mappings: value: br-data:bond1 debug: value: false dpdk-bond-config: value: :balance-tcp:active:fast dpdk-bond-mappings: dpdk-driver: dpdk-socket-cores: value: 1 dpdk-socket-memory: value: 1024 enable-dpdk: value: false enable-hardware-offload: value: false enable-sriov: value: true new-units-paused: value: false openstack-metadata-workers: value: 2 ovn-bridge-mappings: value: dcfabric:br-data sriovfabric:br-data sriov-device-mappings: value: sriovfabric:ens3f0 sriovfabric:ens3f1 sriov-numvfs: value: ens3f0:64 ens3f0:64 Agent config: root@cmp4az1cz20300kvs:~# cat /etc/neutron/plugins/ml2/sriov_agent.ini ############################################################################### # [ WARNING ] # Configuration file maintained by Juju. Local changes may be overwritten. # Config managed by ovn-chassis charm ############################################################################### [securitygroup] firewall_driver = neutron.agent.firewall.NoopFirewallDriver [sriov_nic] physical_device_mappings = sriovfabric:ens3f0,sriovfabric:ens3f1 exclude_devices = root@cmp4az1cz20300kvs:~# cat /etc/neutron/neutron.conf ############################################################################### # [ WARNING ] # Configuration file maintained by Juju. Local changes may be overwritten. # Config managed by ovn-chassis charm ############################################################################### [DEFAULT] debug = False host = cmp4az1cz20300kvs.mgt.pst.stg.tlc.gamma.cloud core_plugin = neutron.plugins.ml2.plugin.Ml2Plugin # This template must be included under the [DEFAULT] section transport_url = rabbit://neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.12:5672,neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.243:5672,neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.44:5672/openstack [oslo_messaging_notifications] driver = messagingv2 # This template must be included under the [DEFAULT] section transport_url = rabbit://neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.12:5672,neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.243:5672,neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.44:5672/openstack topics = notifications [AGENT] root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.confroot # STEPS TO REPRODUCE - apply environment config as above - create networking and the instance Create example network: $ juju switch openstack $ source ~/deploy/novarc $ openstack network create \ --provider-physical-network sriovfabric \ --provider-segment 300 \ --provider-network-type vlan \ test-sriov $ openstack subnet create --network test-sriov \ --no-dhcp \ --gateway none \ --subnet-range 192.168.1.0/24 test-sriov Create ports over virtual function: $ juju switch openstack $ source ~/deploy/novarc $ openstack port create \ --network test-sriov \ --vnic-type direct \ sriov-vf1 $ openstack server create \ --image bionic-kvm \ --flavor m1.small \ --network ext-net-300 \ --port sriov-vf1 \ --key-name ubuntu-keypair \ --availability-zone nova:cmp4az1cz20300kvs.mgt.pst.stg.tlc.gamma.cloud \ sriov-vf1 - the instance stalls in build state (virsh list shows paused VM) and drops to ERROR # Problem description Attempt to boot instance with SR-IOV interface fails. Instance stays in BUILD stage for ca 1 minute and then turns to ERROR state. Neutron agent log shows: 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-f116d427-838a-4366-8173-801fbe84e406 - - - - -] Error in agent loop. Devices info: {}: TypeError: Cannot serialize error('unpack_from requires a buffer of at least 4 bytes',) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent Traceback (most recent call last): 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py", line 473, in daemon_loop 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent device_info = self.scan_devices(devices, updated_devices_copy) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/osprofiler/profiler.py", line 160, in wrapper 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent result = f(*args, **kwargs) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py", line 243, in scan_devices 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent curr_devices = self.eswitch_mgr.get_assigned_devices_info() 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 344, in get_assigned_devices_info 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent for device in embedded_switch.get_assigned_devices_info(): 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 186, in get_assigned_devices_info 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent mac = self.get_pci_device(pci_slot) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 297, in get_pci_device 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent macs = self.pci_dev_wrapper.get_assigned_macs([vf_index]) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/pci_lib.py", line 46, in get_assigned_macs 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent vfs = ip.link.get_vfs() 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/agent/linux/ip_lib.py", line 516, in get_vfs 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent return privileged.get_link_vfs(self.name, self._parent.namespace) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/oslo_privsep/priv_context.py", line 247, in _wrap 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent return self.channel.remote_call(name, args, kwargs) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/oslo_privsep/daemon.py", line 204, in remote_call 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent raise exc_type(*result[2]) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent TypeError: Cannot serialize error('unpack_from requires a buffer of at least 4 bytes',) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent 2020-11-18 10:55:00.885 53769 INFO neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-f116d427-838a-4366-8173-801fbe84e406 - - - - -] Agent out of sync with plugin! 2020-11-18 10:55:02.244 53769 INFO neutron.agent.securitygroups_rpc [req-f4627e61-2abe-49bc-bc7b-9fa9f66e1f4b 980116faf095432e8ac7887db995aeb3 78ab17a067cf49cabba6c3c5d0faabcc - - -] Security group member updated ['808d2b62-75ba-45d6-969c-87ce90d56c37'] # Environment Openstack USSURI + OVN ovn-chassis version: cs:~openstack-charmers-next/ovn-chassis-40 neutron-sriov-agent version 2:16.2.0-0ubuntu1~cloud0 CIS hardened system. aa profile set to disable, AppArmor profiles teardown applied. neutron-sriov-agent reports UP in openstack network agent list. charm configuration: charm: ovn-chassis settings:   bridge-interface-mappings:     value: br-data:bond1   debug:     value: false   dpdk-bond-config:     value: :balance-tcp:active:fast   dpdk-bond-mappings:   dpdk-driver:   dpdk-socket-cores:     value: 1   dpdk-socket-memory:     value: 1024   enable-dpdk:     value: false   enable-hardware-offload:     value: false   enable-sriov:     value: true   new-units-paused:     value: false   openstack-metadata-workers:     value: 2   ovn-bridge-mappings:     value: dcfabric:br-data sriovfabric:br-data   sriov-device-mappings:     value: sriovfabric:ens3f0 sriovfabric:ens3f1   sriov-numvfs:     value: ens3f0:64 ens3f0:64 Agent config: root@cmp4az1cz20300kvs:~# cat /etc/neutron/plugins/ml2/sriov_agent.ini ############################################################################### # [ WARNING ] # Configuration file maintained by Juju. Local changes may be overwritten. # Config managed by ovn-chassis charm ############################################################################### [securitygroup] firewall_driver = neutron.agent.firewall.NoopFirewallDriver [sriov_nic] physical_device_mappings = sriovfabric:ens3f0,sriovfabric:ens3f1 exclude_devices = root@cmp4az1cz20300kvs:~# cat /etc/neutron/neutron.conf ############################################################################### # [ WARNING ] # Configuration file maintained by Juju. Local changes may be overwritten. # Config managed by ovn-chassis charm ############################################################################### [DEFAULT] debug = False host = cmp4az1cz20300kvs.mgt.pst.stg.tlc.example.com core_plugin = neutron.plugins.ml2.plugin.Ml2Plugin # This template must be included under the [DEFAULT] section transport_url = rabbit://neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.12:5672,neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.243:5672,neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.44:5672/openstack [oslo_messaging_notifications] driver = messagingv2 # This template must be included under the [DEFAULT] section transport_url = rabbit://neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.12:5672,neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.243:5672,neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.44:5672/openstack topics = notifications [AGENT] root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.confroot # STEPS TO REPRODUCE - apply environment config as above - create networking and the instance Create example network: $ juju switch openstack $ source ~/deploy/novarc $ openstack network create \ --provider-physical-network sriovfabric \ --provider-segment 300 \ --provider-network-type vlan \ test-sriov $ openstack subnet create --network test-sriov \   --no-dhcp \   --gateway none \   --subnet-range 192.168.1.0/24 test-sriov Create ports over virtual function: $ juju switch openstack $ source ~/deploy/novarc $ openstack port create \ --network test-sriov \ --vnic-type direct \ sriov-vf1 $ openstack server create \ --image bionic-kvm \ --flavor m1.small \ --network ext-net-300 \ --port sriov-vf1 \ --key-name ubuntu-keypair \ --availability-zone nova:cmp4az1cz20300kvs.mgt.pst.stg.tlc.example.com \ sriov-vf1 - the instance stalls in build state (virsh list shows paused VM) and drops to ERROR
2020-11-19 15:58:23 Michael Skalka information type Public Private
2020-11-19 16:01:12 Michael Skalka removed subscriber Canonical Field Critical
2020-11-19 16:11:08 Michael Skalka bug added subscriber Billy Olsen
2020-11-19 16:12:05 Michael Skalka bug added subscriber Frode Nordahl
2020-11-19 16:12:23 Billy Olsen bug added subscriber Andrew McLeod
2020-11-19 16:13:36 Billy Olsen bug added subscriber OpenStack Charmers
2020-11-19 16:17:46 Michael Skalka bug added subscriber Canonical Field Critical
2020-11-19 23:32:43 Billy Olsen charm-ovn-chassis: assignee Andrew McLeod (admcleod)
2020-11-23 11:23:05 Andrew McLeod charm-ovn-chassis: status New Incomplete
2020-11-24 10:39:41 Michał Ajduk attachment added neutron-sriov-agent.log.tgz https://bugs.launchpad.net/charm-ovn-chassis/+bug/1904730/+attachment/5437483/+files/neutron-sriov-agent.log.tgz
2020-11-24 10:44:24 Michał Ajduk attachment added syslog.tgz https://bugs.launchpad.net/charm-ovn-chassis/+bug/1904730/+attachment/5437484/+files/syslog.tgz
2020-11-24 11:59:00 Michał Ajduk bug added subscriber Nikolay Vinogradov
2020-11-24 12:27:25 James Page bug task added pyroute2 (Ubuntu)
2020-11-24 12:27:32 James Page charm-ovn-chassis: status Incomplete Invalid
2020-11-24 12:39:01 James Page bug watch added https://github.com/svinota/pyroute2/issues/751
2020-12-09 22:41:54 Billy Olsen pyroute2 (Ubuntu): assignee Billy Olsen (billy-olsen)
2020-12-09 22:42:11 Billy Olsen bug added subscriber Canonical Field High
2020-12-09 22:42:15 Billy Olsen removed subscriber Canonical Field Critical
2020-12-14 04:30:56 Barry Price removed subscriber Andrew McLeod
2021-01-14 23:26:51 Billy Olsen attachment added bionic patch https://bugs.launchpad.net/ubuntu/+source/pyroute2/+bug/1904730/+attachment/5453107/+files/lp1904730-bionic.debdiff
2021-01-14 23:27:21 Billy Olsen attachment added stein patch https://bugs.launchpad.net/ubuntu/+source/pyroute2/+bug/1904730/+attachment/5453108/+files/lp1904730-stein.debdiff
2021-01-14 23:27:51 Billy Olsen attachment added train patch https://bugs.launchpad.net/ubuntu/+source/pyroute2/+bug/1904730/+attachment/5453109/+files/lp1904730-train.debdiff
2021-01-14 23:28:23 Billy Olsen attachment added focal patch https://bugs.launchpad.net/ubuntu/+source/pyroute2/+bug/1904730/+attachment/5453110/+files/lp1904730-focal.debdiff
2021-01-14 23:28:52 Billy Olsen attachment added groovy patch https://bugs.launchpad.net/ubuntu/+source/pyroute2/+bug/1904730/+attachment/5453111/+files/lp1904730-groovy.debdiff
2021-01-14 23:29:15 Billy Olsen attachment added hirsute patch https://bugs.launchpad.net/ubuntu/+source/pyroute2/+bug/1904730/+attachment/5453112/+files/lp1904730-hirsute.debdiff
2021-01-14 23:29:38 Billy Olsen bug task deleted charm-ovn-chassis
2021-01-14 23:29:46 Billy Olsen pyroute2 (Ubuntu): status New In Progress
2021-01-14 23:45:03 Billy Olsen description # Problem description Attempt to boot instance with SR-IOV interface fails. Instance stays in BUILD stage for ca 1 minute and then turns to ERROR state. Neutron agent log shows: 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-f116d427-838a-4366-8173-801fbe84e406 - - - - -] Error in agent loop. Devices info: {}: TypeError: Cannot serialize error('unpack_from requires a buffer of at least 4 bytes',) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent Traceback (most recent call last): 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py", line 473, in daemon_loop 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent device_info = self.scan_devices(devices, updated_devices_copy) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/osprofiler/profiler.py", line 160, in wrapper 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent result = f(*args, **kwargs) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py", line 243, in scan_devices 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent curr_devices = self.eswitch_mgr.get_assigned_devices_info() 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 344, in get_assigned_devices_info 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent for device in embedded_switch.get_assigned_devices_info(): 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 186, in get_assigned_devices_info 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent mac = self.get_pci_device(pci_slot) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 297, in get_pci_device 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent macs = self.pci_dev_wrapper.get_assigned_macs([vf_index]) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/pci_lib.py", line 46, in get_assigned_macs 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent vfs = ip.link.get_vfs() 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/agent/linux/ip_lib.py", line 516, in get_vfs 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent return privileged.get_link_vfs(self.name, self._parent.namespace) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/oslo_privsep/priv_context.py", line 247, in _wrap 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent return self.channel.remote_call(name, args, kwargs) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/oslo_privsep/daemon.py", line 204, in remote_call 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent raise exc_type(*result[2]) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent TypeError: Cannot serialize error('unpack_from requires a buffer of at least 4 bytes',) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent 2020-11-18 10:55:00.885 53769 INFO neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-f116d427-838a-4366-8173-801fbe84e406 - - - - -] Agent out of sync with plugin! 2020-11-18 10:55:02.244 53769 INFO neutron.agent.securitygroups_rpc [req-f4627e61-2abe-49bc-bc7b-9fa9f66e1f4b 980116faf095432e8ac7887db995aeb3 78ab17a067cf49cabba6c3c5d0faabcc - - -] Security group member updated ['808d2b62-75ba-45d6-969c-87ce90d56c37'] # Environment Openstack USSURI + OVN ovn-chassis version: cs:~openstack-charmers-next/ovn-chassis-40 neutron-sriov-agent version 2:16.2.0-0ubuntu1~cloud0 CIS hardened system. aa profile set to disable, AppArmor profiles teardown applied. neutron-sriov-agent reports UP in openstack network agent list. charm configuration: charm: ovn-chassis settings:   bridge-interface-mappings:     value: br-data:bond1   debug:     value: false   dpdk-bond-config:     value: :balance-tcp:active:fast   dpdk-bond-mappings:   dpdk-driver:   dpdk-socket-cores:     value: 1   dpdk-socket-memory:     value: 1024   enable-dpdk:     value: false   enable-hardware-offload:     value: false   enable-sriov:     value: true   new-units-paused:     value: false   openstack-metadata-workers:     value: 2   ovn-bridge-mappings:     value: dcfabric:br-data sriovfabric:br-data   sriov-device-mappings:     value: sriovfabric:ens3f0 sriovfabric:ens3f1   sriov-numvfs:     value: ens3f0:64 ens3f0:64 Agent config: root@cmp4az1cz20300kvs:~# cat /etc/neutron/plugins/ml2/sriov_agent.ini ############################################################################### # [ WARNING ] # Configuration file maintained by Juju. Local changes may be overwritten. # Config managed by ovn-chassis charm ############################################################################### [securitygroup] firewall_driver = neutron.agent.firewall.NoopFirewallDriver [sriov_nic] physical_device_mappings = sriovfabric:ens3f0,sriovfabric:ens3f1 exclude_devices = root@cmp4az1cz20300kvs:~# cat /etc/neutron/neutron.conf ############################################################################### # [ WARNING ] # Configuration file maintained by Juju. Local changes may be overwritten. # Config managed by ovn-chassis charm ############################################################################### [DEFAULT] debug = False host = cmp4az1cz20300kvs.mgt.pst.stg.tlc.example.com core_plugin = neutron.plugins.ml2.plugin.Ml2Plugin # This template must be included under the [DEFAULT] section transport_url = rabbit://neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.12:5672,neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.243:5672,neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.44:5672/openstack [oslo_messaging_notifications] driver = messagingv2 # This template must be included under the [DEFAULT] section transport_url = rabbit://neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.12:5672,neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.243:5672,neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.44:5672/openstack topics = notifications [AGENT] root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.confroot # STEPS TO REPRODUCE - apply environment config as above - create networking and the instance Create example network: $ juju switch openstack $ source ~/deploy/novarc $ openstack network create \ --provider-physical-network sriovfabric \ --provider-segment 300 \ --provider-network-type vlan \ test-sriov $ openstack subnet create --network test-sriov \   --no-dhcp \   --gateway none \   --subnet-range 192.168.1.0/24 test-sriov Create ports over virtual function: $ juju switch openstack $ source ~/deploy/novarc $ openstack port create \ --network test-sriov \ --vnic-type direct \ sriov-vf1 $ openstack server create \ --image bionic-kvm \ --flavor m1.small \ --network ext-net-300 \ --port sriov-vf1 \ --key-name ubuntu-keypair \ --availability-zone nova:cmp4az1cz20300kvs.mgt.pst.stg.tlc.example.com \ sriov-vf1 - the instance stalls in build state (virsh list shows paused VM) and drops to ERROR [Impact] Netlink calls to the kernel can return more than 16k bytes (they can return 32k on newer kernels). The pyroute2 library has a default buffer size of 16k and fails to read the data when kernel response data overflows this. One example of where users encounter this is booting OpenStack instances with SRIOV when there are more than 32 VFs, as seen in the original problem description (included below). [Test Case] Use an SRIOV capable card and enable more than 32 VFs on a modern kernel. Attempt to launch an instance using OpenStack as follows: 1. Create example network: $ juju switch openstack $ source ~/deploy/novarc $ openstack network create \ --provider-physical-network sriovfabric \ --provider-segment 300 \ --provider-network-type vlan \ test-sriov $ openstack subnet create --network test-sriov \   --no-dhcp \   --gateway none \   --subnet-range 192.168.1.0/24 test-sriov 2. Create ports over virtual function: $ juju switch openstack $ source ~/deploy/novarc $ openstack port create \ --network test-sriov \ --vnic-type direct \ sriov-vf1 $ openstack server create \ --image bionic-kvm \ --flavor m1.small \ --network ext-net-300 \ --port sriov-vf1 \ --key-name ubuntu-keypair \ --availability-zone nova:cmp4az1cz20300kvs.mgt.pst.stg.tlc.example.com \ sriov-vf1 3. The instance stalls in build state (virsh list shows paused VM) and drops to ERROR [Where problems could occur] Problems may occur in existing customers already using openstack to schedule SRIOV instances and may show up as failure to build instances. Additional problems could include the increased memory usage of the nova processes which occurs by increasing the default buffer size. For tightly spec'd systems with small memory allocated to the host, this could further eat into any margin available and push memory usage over the edge. [Previous Description] # Problem Description Attempt to boot instance with SR-IOV interface fails. Instance stays in BUILD stage for ca 1 minute and then turns to ERROR state. Neutron agent log shows: 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-f116d427-838a-4366-8173-801fbe84e406 - - - - -] Error in agent loop. Devices info: {}: TypeError: Cannot serialize error('unpack_from requires a buffer of at least 4 bytes',) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent Traceback (most recent call last): 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py", line 473, in daemon_loop 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent device_info = self.scan_devices(devices, updated_devices_copy) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/osprofiler/profiler.py", line 160, in wrapper 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent result = f(*args, **kwargs) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py", line 243, in scan_devices 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent curr_devices = self.eswitch_mgr.get_assigned_devices_info() 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 344, in get_assigned_devices_info 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent for device in embedded_switch.get_assigned_devices_info(): 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 186, in get_assigned_devices_info 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent mac = self.get_pci_device(pci_slot) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 297, in get_pci_device 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent macs = self.pci_dev_wrapper.get_assigned_macs([vf_index]) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/pci_lib.py", line 46, in get_assigned_macs 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent vfs = ip.link.get_vfs() 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/agent/linux/ip_lib.py", line 516, in get_vfs 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent return privileged.get_link_vfs(self.name, self._parent.namespace) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/oslo_privsep/priv_context.py", line 247, in _wrap 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent return self.channel.remote_call(name, args, kwargs) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/oslo_privsep/daemon.py", line 204, in remote_call 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent raise exc_type(*result[2]) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent TypeError: Cannot serialize error('unpack_from requires a buffer of at least 4 bytes',) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent 2020-11-18 10:55:00.885 53769 INFO neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-f116d427-838a-4366-8173-801fbe84e406 - - - - -] Agent out of sync with plugin! 2020-11-18 10:55:02.244 53769 INFO neutron.agent.securitygroups_rpc [req-f4627e61-2abe-49bc-bc7b-9fa9f66e1f4b 980116faf095432e8ac7887db995aeb3 78ab17a067cf49cabba6c3c5d0faabcc - - -] Security group member updated ['808d2b62-75ba-45d6-969c-87ce90d56c37'] # Environment Openstack USSURI + OVN ovn-chassis version: cs:~openstack-charmers-next/ovn-chassis-40 neutron-sriov-agent version 2:16.2.0-0ubuntu1~cloud0 CIS hardened system. aa profile set to disable, AppArmor profiles teardown applied. neutron-sriov-agent reports UP in openstack network agent list. charm configuration: charm: ovn-chassis settings:   bridge-interface-mappings:     value: br-data:bond1   debug:     value: false   dpdk-bond-config:     value: :balance-tcp:active:fast   dpdk-bond-mappings:   dpdk-driver:   dpdk-socket-cores:     value: 1   dpdk-socket-memory:     value: 1024   enable-dpdk:     value: false   enable-hardware-offload:     value: false   enable-sriov:     value: true   new-units-paused:     value: false   openstack-metadata-workers:     value: 2   ovn-bridge-mappings:     value: dcfabric:br-data sriovfabric:br-data   sriov-device-mappings:     value: sriovfabric:ens3f0 sriovfabric:ens3f1   sriov-numvfs:     value: ens3f0:64 ens3f0:64 Agent config: root@cmp4az1cz20300kvs:~# cat /etc/neutron/plugins/ml2/sriov_agent.ini ############################################################################### # [ WARNING ] # Configuration file maintained by Juju. Local changes may be overwritten. # Config managed by ovn-chassis charm ############################################################################### [securitygroup] firewall_driver = neutron.agent.firewall.NoopFirewallDriver [sriov_nic] physical_device_mappings = sriovfabric:ens3f0,sriovfabric:ens3f1 exclude_devices = root@cmp4az1cz20300kvs:~# cat /etc/neutron/neutron.conf ############################################################################### # [ WARNING ] # Configuration file maintained by Juju. Local changes may be overwritten. # Config managed by ovn-chassis charm ############################################################################### [DEFAULT] debug = False host = cmp4az1cz20300kvs.mgt.pst.stg.tlc.example.com core_plugin = neutron.plugins.ml2.plugin.Ml2Plugin # This template must be included under the [DEFAULT] section transport_url = rabbit://neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.12:5672,neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.243:5672,neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.44:5672/openstack [oslo_messaging_notifications] driver = messagingv2 # This template must be included under the [DEFAULT] section transport_url = rabbit://neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.12:5672,neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.243:5672,neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.44:5672/openstack topics = notifications [AGENT] root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.confroot # STEPS TO REPRODUCE - apply environment config as above - create networking and the instance Create example network: $ juju switch openstack $ source ~/deploy/novarc $ openstack network create \ --provider-physical-network sriovfabric \ --provider-segment 300 \ --provider-network-type vlan \ test-sriov $ openstack subnet create --network test-sriov \   --no-dhcp \   --gateway none \   --subnet-range 192.168.1.0/24 test-sriov Create ports over virtual function: $ juju switch openstack $ source ~/deploy/novarc $ openstack port create \ --network test-sriov \ --vnic-type direct \ sriov-vf1 $ openstack server create \ --image bionic-kvm \ --flavor m1.small \ --network ext-net-300 \ --port sriov-vf1 \ --key-name ubuntu-keypair \ --availability-zone nova:cmp4az1cz20300kvs.mgt.pst.stg.tlc.example.com \ sriov-vf1 - the instance stalls in build state (virsh list shows paused VM) and drops to ERROR
2021-01-14 23:45:25 Billy Olsen pyroute2 (Ubuntu): importance Undecided High
2021-07-22 07:36:42 Dmitrii Shcherbakov bug added subscriber Alexander Litvinov
2021-07-22 07:36:49 Dmitrii Shcherbakov bug added subscriber Vladimir Grevtsev
2021-07-22 07:39:40 Vladimir Grevtsev bug added subscriber Nobuto Murata
2021-07-22 07:55:02 Vladimir Grevtsev bug added subscriber Pedro Guimarães
2021-07-22 20:45:46 Corey Bryant nominated for series Ubuntu Hirsute
2021-07-22 20:45:46 Corey Bryant bug task added pyroute2 (Ubuntu Hirsute)
2021-07-22 20:45:46 Corey Bryant nominated for series Ubuntu Impish
2021-07-22 20:45:46 Corey Bryant bug task added pyroute2 (Ubuntu Impish)
2021-07-22 20:45:46 Corey Bryant nominated for series Ubuntu Focal
2021-07-22 20:45:46 Corey Bryant bug task added pyroute2 (Ubuntu Focal)
2021-07-22 20:45:53 Corey Bryant pyroute2 (Ubuntu Focal): status New Triaged
2021-07-22 20:45:56 Corey Bryant pyroute2 (Ubuntu Hirsute): status New Triaged
2021-07-22 20:46:00 Corey Bryant pyroute2 (Ubuntu Focal): importance Undecided High
2021-07-22 20:46:06 Corey Bryant pyroute2 (Ubuntu Hirsute): importance Undecided High
2021-07-22 21:21:52 Corey Bryant bug added subscriber Ubuntu Stable Release Updates Team
2021-07-23 01:20:39 Launchpad Janitor pyroute2 (Ubuntu Impish): status In Progress Fix Released
2021-07-29 05:14:04 Robie Basak bug added subscriber Robie Basak
2021-07-30 18:32:20 Corey Bryant attachment removed syslog.tgz https://bugs.launchpad.net/ubuntu/+source/pyroute2/+bug/1904730/+attachment/5437484/+files/syslog.tgz
2021-07-30 18:32:30 Corey Bryant attachment removed neutron-sriov-agent.log.tgz https://bugs.launchpad.net/ubuntu/+source/pyroute2/+bug/1904730/+attachment/5437483/+files/neutron-sriov-agent.log.tgz
2021-07-30 18:37:52 Corey Bryant description [Impact] Netlink calls to the kernel can return more than 16k bytes (they can return 32k on newer kernels). The pyroute2 library has a default buffer size of 16k and fails to read the data when kernel response data overflows this. One example of where users encounter this is booting OpenStack instances with SRIOV when there are more than 32 VFs, as seen in the original problem description (included below). [Test Case] Use an SRIOV capable card and enable more than 32 VFs on a modern kernel. Attempt to launch an instance using OpenStack as follows: 1. Create example network: $ juju switch openstack $ source ~/deploy/novarc $ openstack network create \ --provider-physical-network sriovfabric \ --provider-segment 300 \ --provider-network-type vlan \ test-sriov $ openstack subnet create --network test-sriov \   --no-dhcp \   --gateway none \   --subnet-range 192.168.1.0/24 test-sriov 2. Create ports over virtual function: $ juju switch openstack $ source ~/deploy/novarc $ openstack port create \ --network test-sriov \ --vnic-type direct \ sriov-vf1 $ openstack server create \ --image bionic-kvm \ --flavor m1.small \ --network ext-net-300 \ --port sriov-vf1 \ --key-name ubuntu-keypair \ --availability-zone nova:cmp4az1cz20300kvs.mgt.pst.stg.tlc.example.com \ sriov-vf1 3. The instance stalls in build state (virsh list shows paused VM) and drops to ERROR [Where problems could occur] Problems may occur in existing customers already using openstack to schedule SRIOV instances and may show up as failure to build instances. Additional problems could include the increased memory usage of the nova processes which occurs by increasing the default buffer size. For tightly spec'd systems with small memory allocated to the host, this could further eat into any margin available and push memory usage over the edge. [Previous Description] # Problem Description Attempt to boot instance with SR-IOV interface fails. Instance stays in BUILD stage for ca 1 minute and then turns to ERROR state. Neutron agent log shows: 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-f116d427-838a-4366-8173-801fbe84e406 - - - - -] Error in agent loop. Devices info: {}: TypeError: Cannot serialize error('unpack_from requires a buffer of at least 4 bytes',) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent Traceback (most recent call last): 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py", line 473, in daemon_loop 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent device_info = self.scan_devices(devices, updated_devices_copy) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/osprofiler/profiler.py", line 160, in wrapper 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent result = f(*args, **kwargs) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py", line 243, in scan_devices 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent curr_devices = self.eswitch_mgr.get_assigned_devices_info() 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 344, in get_assigned_devices_info 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent for device in embedded_switch.get_assigned_devices_info(): 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 186, in get_assigned_devices_info 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent mac = self.get_pci_device(pci_slot) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 297, in get_pci_device 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent macs = self.pci_dev_wrapper.get_assigned_macs([vf_index]) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/pci_lib.py", line 46, in get_assigned_macs 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent vfs = ip.link.get_vfs() 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/agent/linux/ip_lib.py", line 516, in get_vfs 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent return privileged.get_link_vfs(self.name, self._parent.namespace) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/oslo_privsep/priv_context.py", line 247, in _wrap 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent return self.channel.remote_call(name, args, kwargs) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/oslo_privsep/daemon.py", line 204, in remote_call 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent raise exc_type(*result[2]) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent TypeError: Cannot serialize error('unpack_from requires a buffer of at least 4 bytes',) 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent 2020-11-18 10:55:00.885 53769 INFO neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-f116d427-838a-4366-8173-801fbe84e406 - - - - -] Agent out of sync with plugin! 2020-11-18 10:55:02.244 53769 INFO neutron.agent.securitygroups_rpc [req-f4627e61-2abe-49bc-bc7b-9fa9f66e1f4b 980116faf095432e8ac7887db995aeb3 78ab17a067cf49cabba6c3c5d0faabcc - - -] Security group member updated ['808d2b62-75ba-45d6-969c-87ce90d56c37'] # Environment Openstack USSURI + OVN ovn-chassis version: cs:~openstack-charmers-next/ovn-chassis-40 neutron-sriov-agent version 2:16.2.0-0ubuntu1~cloud0 CIS hardened system. aa profile set to disable, AppArmor profiles teardown applied. neutron-sriov-agent reports UP in openstack network agent list. charm configuration: charm: ovn-chassis settings:   bridge-interface-mappings:     value: br-data:bond1   debug:     value: false   dpdk-bond-config:     value: :balance-tcp:active:fast   dpdk-bond-mappings:   dpdk-driver:   dpdk-socket-cores:     value: 1   dpdk-socket-memory:     value: 1024   enable-dpdk:     value: false   enable-hardware-offload:     value: false   enable-sriov:     value: true   new-units-paused:     value: false   openstack-metadata-workers:     value: 2   ovn-bridge-mappings:     value: dcfabric:br-data sriovfabric:br-data   sriov-device-mappings:     value: sriovfabric:ens3f0 sriovfabric:ens3f1   sriov-numvfs:     value: ens3f0:64 ens3f0:64 Agent config: root@cmp4az1cz20300kvs:~# cat /etc/neutron/plugins/ml2/sriov_agent.ini ############################################################################### # [ WARNING ] # Configuration file maintained by Juju. Local changes may be overwritten. # Config managed by ovn-chassis charm ############################################################################### [securitygroup] firewall_driver = neutron.agent.firewall.NoopFirewallDriver [sriov_nic] physical_device_mappings = sriovfabric:ens3f0,sriovfabric:ens3f1 exclude_devices = root@cmp4az1cz20300kvs:~# cat /etc/neutron/neutron.conf ############################################################################### # [ WARNING ] # Configuration file maintained by Juju. Local changes may be overwritten. # Config managed by ovn-chassis charm ############################################################################### [DEFAULT] debug = False host = cmp4az1cz20300kvs.mgt.pst.stg.tlc.example.com core_plugin = neutron.plugins.ml2.plugin.Ml2Plugin # This template must be included under the [DEFAULT] section transport_url = rabbit://neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.12:5672,neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.243:5672,neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.44:5672/openstack [oslo_messaging_notifications] driver = messagingv2 # This template must be included under the [DEFAULT] section transport_url = rabbit://neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.12:5672,neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.243:5672,neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.44:5672/openstack topics = notifications [AGENT] root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.confroot # STEPS TO REPRODUCE - apply environment config as above - create networking and the instance Create example network: $ juju switch openstack $ source ~/deploy/novarc $ openstack network create \ --provider-physical-network sriovfabric \ --provider-segment 300 \ --provider-network-type vlan \ test-sriov $ openstack subnet create --network test-sriov \   --no-dhcp \   --gateway none \   --subnet-range 192.168.1.0/24 test-sriov Create ports over virtual function: $ juju switch openstack $ source ~/deploy/novarc $ openstack port create \ --network test-sriov \ --vnic-type direct \ sriov-vf1 $ openstack server create \ --image bionic-kvm \ --flavor m1.small \ --network ext-net-300 \ --port sriov-vf1 \ --key-name ubuntu-keypair \ --availability-zone nova:cmp4az1cz20300kvs.mgt.pst.stg.tlc.example.com \ sriov-vf1 - the instance stalls in build state (virsh list shows paused VM) and drops to ERROR [Impact] Netlink calls to the kernel can return more than 16k bytes (they can return 32k on newer kernels). The pyroute2 library has a default buffer size of 16k and fails to read the data when kernel response data overflows this. One example of where users encounter this is booting OpenStack instances with SRIOV when there are more than 32 VFs, as seen in the original problem description (included below). [Test Case] Use an SRIOV capable card and enable more than 32 VFs on a modern kernel. Attempt to launch an instance using OpenStack as follows: 1. Create example network: $ juju switch openstack $ source ~/deploy/novarc $ openstack network create \ --provider-physical-network sriovfabric \ --provider-segment 300 \ --provider-network-type vlan \ test-sriov $ openstack subnet create --network test-sriov \   --no-dhcp \   --gateway none \   --subnet-range 192.168.1.0/24 test-sriov 2. Create ports over virtual function: $ juju switch openstack $ source ~/deploy/novarc $ openstack port create \ --network test-sriov \ --vnic-type direct \ sriov-vf1 $ openstack server create \ --image bionic-kvm \ --flavor m1.small \ --network ext-net-300 \ --port sriov-vf1 \ --key-name ubuntu-keypair \ --availability-zone nova:cmp4az1cz20300kvs.mgt.pst.stg.tlc.example.com \ sriov-vf1 3. The instance stalls in build state (virsh list shows paused VM) and drops to ERROR [Where problems could occur] Problems may occur in existing customers already using openstack to schedule SRIOV instances and may show up as failure to build instances. Additional problems could include the increased memory usage of the nova processes which occurs by increasing the default buffer size. For tightly spec'd systems with small memory allocated to the host, this could further eat into any margin available and push memory usage over the edge.
2021-07-30 18:39:04 Corey Bryant information type Private Public
2021-07-30 21:48:49 Steve Langasek pyroute2 (Ubuntu Hirsute): status Triaged Fix Committed
2021-07-30 21:48:52 Steve Langasek bug added subscriber SRU Verification
2021-07-30 21:49:00 Steve Langasek tags verification-needed verification-needed-hirsute
2021-07-30 21:53:19 Steve Langasek pyroute2 (Ubuntu Focal): status Triaged Fix Committed
2021-07-30 21:53:30 Steve Langasek tags verification-needed verification-needed-hirsute verification-needed verification-needed-focal verification-needed-hirsute
2021-09-13 10:09:29 James Page nominated for series Ubuntu Bionic
2021-09-13 10:09:29 James Page bug task added pyroute2 (Ubuntu Bionic)
2021-09-13 10:09:35 James Page pyroute2 (Ubuntu Bionic): importance Undecided High
2021-09-13 10:09:38 James Page pyroute2 (Ubuntu Bionic): status New Triaged
2021-09-13 10:13:06 James Page bug task added cloud-archive
2021-09-13 10:13:37 James Page nominated for series cloud-archive/xena
2021-09-13 10:13:37 James Page bug task added cloud-archive/xena
2021-09-13 10:13:37 James Page nominated for series cloud-archive/ussuri
2021-09-13 10:13:37 James Page bug task added cloud-archive/ussuri
2021-09-13 10:13:37 James Page nominated for series cloud-archive/wallaby
2021-09-13 10:13:37 James Page bug task added cloud-archive/wallaby
2021-09-13 10:13:37 James Page nominated for series cloud-archive/train
2021-09-13 10:13:37 James Page bug task added cloud-archive/train
2021-09-13 10:13:37 James Page nominated for series cloud-archive/queens
2021-09-13 10:13:37 James Page bug task added cloud-archive/queens
2021-09-13 10:13:37 James Page nominated for series cloud-archive/stein
2021-09-13 10:13:37 James Page bug task added cloud-archive/stein
2021-09-13 10:14:14 James Page cloud-archive/xena: status New Fix Released
2021-09-13 10:14:26 James Page cloud-archive/wallaby: status New Fix Committed
2021-09-13 10:14:43 James Page cloud-archive/ussuri: status New Fix Committed
2021-09-13 10:31:07 James Page tags verification-needed verification-needed-focal verification-needed-hirsute verification-done-focal verification-needed verification-needed-hirsute
2021-09-13 10:54:20 James Page tags verification-done-focal verification-needed verification-needed-hirsute verification-done verification-done-focal verification-done-hirsute
2021-09-13 11:01:38 James Page cloud-archive/train: status New Triaged
2021-09-13 11:01:54 James Page cloud-archive/stein: status New Triaged
2021-09-13 11:02:03 James Page cloud-archive/queens: status New Triaged
2021-09-14 16:29:03 Launchpad Janitor pyroute2 (Ubuntu Hirsute): status Fix Committed Fix Released
2021-09-14 16:29:10 Brian Murray removed subscriber Ubuntu Stable Release Updates Team
2021-09-14 16:29:42 Launchpad Janitor pyroute2 (Ubuntu Focal): status Fix Committed Fix Released
2021-09-22 11:23:23 Robie Basak pyroute2 (Ubuntu Bionic): status Triaged Fix Committed
2021-09-22 11:23:26 Robie Basak bug added subscriber Ubuntu Stable Release Updates Team
2021-09-22 11:23:34 Robie Basak tags verification-done verification-done-focal verification-done-hirsute verification-done-focal verification-done-hirsute verification-needed verification-needed-bionic
2021-09-23 23:51:24 Billy Olsen tags verification-done-focal verification-done-hirsute verification-needed verification-needed-bionic verification-done verification-done-bionic verification-done-focal verification-done-hirsute verification-needed-bionic
2021-09-23 23:51:40 Billy Olsen tags verification-done verification-done-bionic verification-done-focal verification-done-hirsute verification-needed-bionic verification-done verification-done-bionic verification-done-focal verification-done-hirsute
2021-10-05 16:25:17 Launchpad Janitor pyroute2 (Ubuntu Bionic): status Fix Committed Fix Released
2021-10-25 22:12:10 Billy Olsen tags verification-done verification-done-bionic verification-done-focal verification-done-hirsute verification-done verification-done-bionic verification-done-focal verification-done-focal-wallaby verification-done-hirsute
2021-10-25 22:28:12 Billy Olsen tags verification-done verification-done-bionic verification-done-focal verification-done-focal-wallaby verification-done-hirsute verification-done verification-done-bionic verification-done-bionic-ussuri verification-done-focal verification-done-focal-wallaby verification-done-hirsute
2021-10-26 11:52:25 Chris MacNaughton nominated for series cloud-archive/victoria
2021-10-26 11:52:25 Chris MacNaughton bug task added cloud-archive/victoria
2021-10-26 11:52:43 Chris MacNaughton cloud-archive/victoria: status New Fix Committed
2021-10-26 11:52:58 Chris MacNaughton tags verification-done verification-done-bionic verification-done-bionic-ussuri verification-done-focal verification-done-focal-wallaby verification-done-hirsute verification-done verification-done-bionic verification-done-bionic-ussuri verification-done-focal verification-done-focal-wallaby verification-done-hirsute verification-needed-victoria
2021-11-03 13:01:03 Corey Bryant cloud-archive/wallaby: status Fix Committed Fix Released
2021-11-15 18:57:01 Billy Olsen tags verification-done verification-done-bionic verification-done-bionic-ussuri verification-done-focal verification-done-focal-wallaby verification-done-hirsute verification-needed-victoria verification-done verification-done-bionic verification-done-bionic-ussuri verification-done-focal verification-done-focal-wallaby verification-done-hirsute
2021-12-06 19:25:42 Corey Bryant cloud-archive/victoria: status Fix Committed Fix Released
2021-12-06 19:49:36 Corey Bryant cloud-archive/ussuri: status Fix Committed Fix Released
2021-12-06 20:04:25 Corey Bryant cloud-archive/queens: status Triaged Fix Committed
2021-12-06 20:04:28 Corey Bryant tags verification-done verification-done-bionic verification-done-bionic-ussuri verification-done-focal verification-done-focal-wallaby verification-done-hirsute verification-done verification-done-bionic verification-done-bionic-ussuri verification-done-focal verification-done-focal-wallaby verification-done-hirsute verification-queens-needed
2022-02-15 16:07:30 James Page cloud-archive/stein: status Triaged Fix Committed
2022-02-15 16:07:32 James Page tags verification-done verification-done-bionic verification-done-bionic-ussuri verification-done-focal verification-done-focal-wallaby verification-done-hirsute verification-queens-needed verification-done verification-done-bionic verification-done-bionic-ussuri verification-done-focal verification-done-focal-wallaby verification-done-hirsute verification-queens-needed verification-stein-needed
2022-02-15 16:08:03 James Page cloud-archive/train: status Triaged Fix Committed
2022-02-15 16:08:04 James Page tags verification-done verification-done-bionic verification-done-bionic-ussuri verification-done-focal verification-done-focal-wallaby verification-done-hirsute verification-queens-needed verification-stein-needed verification-done verification-done-bionic verification-done-bionic-ussuri verification-done-focal verification-done-focal-wallaby verification-done-hirsute verification-queens-needed verification-stein-needed verification-train-needed
2022-06-06 12:53:43 Corey Bryant cloud-archive/queens: status Fix Committed Fix Released