OpenStack Compute (nova)

tap TX packet drops during high cpu load

Bug #1792763 reported by Satish Patel on 2018-09-16

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Invalid	Undecided	Unassigned

Bug Description

We are running openstack and hypervisor is qemu-kvm and noticed during peak 50% packet loss on tap interface of instance.

I have found in google increase txqueue will solve this issue but in my case after increase to 10000 i am still seeing same issue.

I have 32 core compute node and i didn't reserve any CPU core for hypervisor and i am running 2 vm instance with 15 vCPU core to each.

OS: centos7.5
Kernel: 3.10.0-862.11.6.el7.x86_64

[root@ostack-compute-33 ~]# ifconfig tap5af7f525-5f | grep -i drop
RX errors 0 dropped 0 overruns 0 frame 0
TX errors 0 dropped 2528788837 overruns 0 carrier 0 collisions 0

what else i should try?

Revision history for this message

sean mooney (sean-k-mooney) wrote on 2018-11-06:

the max tx queue lentgh value that is supported by qemu is 1024

seting it to 10000 will not help.

thisbug does not specify what netwrok backend is being used but i will assume you are using
kernel ovs. on thing you could do is to enable multiqueue support.
see: https://specs.openstack.org/openstack/nova-specs/specs/liberty/implemented/libvirt-virtiomq.html

you could also try enabling hugepages and cpu pinning on the vms to enable better performance.

if you have deployed with security groups enable you can also change to the openvswitch security group
driver. this will disable hybridge plug and increase performance.

assuming you are still seeing packet drops at this point then it indicates that you are exceeded the capasity
of kernel ovs and need to consider other netwrok backends such as ovs-dpdk, sriov or vpp.

in any case this is a support request not a bug so i am closing as invalid.

Changed in nova:
status:	New → Invalid

Revision history for this message

Satish Patel (satish-txt) wrote on 2018-11-09: Re: [Bug 1792763] Re: tap TX packet drops during high cpu load

I have tried all possible solution but none helpful, I'm using linuxbridge networking

My application pps rate is high and I believe that is the problem where software switches not capable to handle

Finally I deployed sriov networking to fix this issue.

> On Nov 6, 2018, at 9:29 AM, sean mooney <email address hidden> wrote:
>
> the max tx queue lentgh value that is supported by qemu is 1024
>
> seting it to 10000 will not help.
>
> thisbug does not specify what netwrok backend is being used but i will assume you are using
> kernel ovs. on thing you could do is to enable multiqueue support.
> see: https://specs.openstack.org/openstack/nova-specs/specs/liberty/implemented/libvirt-virtiomq.html
>
> you could also try enabling hugepages and cpu pinning on the vms to
> enable better performance.
>
> if you have deployed with security groups enable you can also change to the openvswitch security group
> driver. this will disable hybridge plug and increase performance.
>
> assuming you are still seeing packet drops at this point then it indicates that you are exceeded the capasity
> of kernel ovs and need to consider other netwrok backends such as ovs-dpdk, sriov or vpp.
>
> in any case this is a support request not a bug so i am closing as
> invalid.
>
> ** Changed in: nova
> Status: New => Invalid
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1792763
>
> Title:
> tap TX packet drops during high cpu load
>
> Status in OpenStack Compute (nova):
> Invalid
>
> Bug description:
> We are running openstack and hypervisor is qemu-kvm and noticed during
> peak 50% packet loss on tap interface of instance.
>
> I have found in google increase txqueue will solve this issue but in
> my case after increase to 10000 i am still seeing same issue.
>
> I have 32 core compute node and i didn't reserve any CPU core for
> hypervisor and i am running 2 vm instance with 15 vCPU core to each.
>
> OS: centos7.5
> Kernel: 3.10.0-862.11.6.el7.x86_64
>
> [root@ostack-compute-33 ~]# ifconfig tap5af7f525-5f | grep -i drop
> RX errors 0 dropped 0 overruns 0 frame 0
> TX errors 0 dropped 2528788837 overruns 0 carrier 0 collisions 0
>
> what else i should try?
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/nova/+bug/1792763/+subscriptions

I have tried all possible solution but none helpful, I'm using linuxbridge networking

My application pps rate is high and I believe that is the problem where software switches not capable to handle

Finally I deployed sriov networking to fix this issue.

> On Nov 6, 2018, at 9:29 AM, sean mooney <smooney@redhat.com> wrote:
> 
> the max tx queue lentgh value that is supported by qemu is 1024
> 
> seting it to 10000 will not help.
> 
> thisbug does not specify what netwrok backend is being used but i will assume you are using
> kernel ovs.  on thing you could do is to enable multiqueue support.
> see: https://specs.openstack.org/openstack/nova-specs/specs/liberty/implemented/libvirt-virtiomq.html
> 
> you could also try enabling hugepages and cpu pinning on the vms to
> enable better performance.
> 
> if you have deployed with security groups enable you can also change to the openvswitch security group
> driver. this will disable hybridge plug and increase performance.
> 
> assuming you are still seeing packet drops at this point then it indicates that you are exceeded the capasity
> of kernel ovs and need to consider other netwrok backends such as ovs-dpdk, sriov or vpp.
> 
> in any case this is a support request not a bug so i am closing as
> invalid.
> 
> ** Changed in: nova
>       Status: New => Invalid
> 
> -- 
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1792763
> 
> Title:
>  tap TX packet drops during high cpu load
> 
> Status in OpenStack Compute (nova):
>  Invalid
> 
> Bug description:
>  We are running openstack and hypervisor is qemu-kvm and noticed during
>  peak 50% packet loss on tap interface of instance.
> 
>  I have found in google increase txqueue will solve this issue but in
>  my case after increase to 10000 i am still seeing same issue.
> 
>  I have 32 core compute node and i didn't reserve any CPU core for
>  hypervisor and i am running 2 vm instance with 15 vCPU core to each.
> 
>  OS: centos7.5 
>  Kernel: 3.10.0-862.11.6.el7.x86_64
> 
>  [root@ostack-compute-33 ~]# ifconfig tap5af7f525-5f | grep -i drop
>          RX errors 0  dropped 0  overruns 0  frame 0
>          TX errors 0  dropped 2528788837 overruns 0  carrier 0  collisions 0
> 
>  what else i should try?
> 
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/nova/+bug/1792763/+subscriptions

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.