Openflow entries are not totally removed for stopped VM

Bug #1808541 reported by Anton Kurbatov
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
New
High
Unassigned

Bug Description

I am using Queens release and VM's tap interfaces are plugged into ovs br-int.
I'm watching a case when openflow entries are not totally removed when I stop my VM (name='my-vm').
It is only reproducable when there is some another activity on a node for different VMs: in my case I attach new network to another VM (name='vm-other')

ovs-agent logs in attach.

I managed to simulate the issue using next steps:

1) grep by mac current openflow entries for my-vm:

# ovs-ofctl dump-flows br-int | grep fa:16:3e:ec:d3:45
 cookie=0xf4d7d970f5382f3d, duration=93.162s, table=60, n_packets=146, n_bytes=21001, idle_age=4, priority=90,dl_vlan=9,dl_dst=fa:16:3e:ec:d3:45 actions=load:0xa3->NXM_NX_REG5[],load:0x9->NXM_NX_REG6[],strip_vlan,resubmit(,81)
 cookie=0xf4d7d970f5382f3d, duration=93.162s, table=71, n_packets=2, n_bytes=84, idle_age=4, priority=95,arp,reg5=0xa3,in_port=163,dl_src=fa:16:3e:ec:d3:45,arp_spa=10.94.152.212 actions=NORMAL
 cookie=0xf4d7d970f5382f3d, duration=93.162s, table=71, n_packets=28, n_bytes=2448, idle_age=9, priority=65,ip,reg5=0xa3,in_port=163,dl_src=fa:16:3e:ec:d3:45,nw_src=10.94.152.212 actions=ct(table=72,zone=NXM_NX_REG6[0..15])
 cookie=0xf4d7d970f5382f3d, duration=93.162s, table=71, n_packets=0, n_bytes=0, idle_age=93, priority=65,ipv6,reg5=0xa3,in_port=163,dl_src=fa:16:3e:ec:d3:45,ipv6_src=fe80::f816:3eff:feec:d345 actions=ct(table=72,zone=NXM_NX_REG6[0..15])
 cookie=0xf4d7d970f5382f3d, duration=93.162s, table=73, n_packets=0, n_bytes=0, idle_age=3401, priority=100,reg6=0x9,dl_dst=fa:16:3e:ec:d3:45 actions=load:0xa3->NXM_NX_REG5[],resubmit(,81)
#

2) # ps ax | grep libvirt
 4887 pts/6 S+ 0:00 grep --color=auto libvirt
 3934 ? Sl 0:18 /usr/libexec/qemu-kvm -name guest=instance-00000012,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-131-instance-00000012/master-key.aes -machine pc-i440fx-vz7.8.0,accel=kvm,usb=off,dump-guest-core=off -cpu Westmere-IBRS,vme=on,ss=on,pcid=on,x2apic=on,tsc-deadline=on,hypervisor=on,arat=on,tsc_adjust=on,ssbd=on,stibp=on,pdpe1gb=on,rdtscp=on,aes=off,+kvmclock -m 512 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -numa node,nodeid=0,cpus=0,mem=512 -uuid 89fccc31-96a6-47ce-abd1-e40fba7274e6 -smbios type=1,manufacturer=Virtuozzo Infrastructure Platform,product=OpenStack Compute,version=17.0.6-1.vl7,serial=71f55add-ef93-4ec2-a4dd-ab8098b6312d,uuid=89fccc31-96a6-47ce-abd1-e40fba7274e6,family=Virtual Machine -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-131-instance-00000012/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -global kvm-pit.lost_tick_policy=discard -no-shutdown -boot strict=on -device nec-usb-x,id=usb,bus=pci.0,addr=0x4 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x5 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/mnt/vstorage/vols/datastores/cinder/volume-e30d1874-d68e-4578-bf89-aa599e8383c7/volume-e30d1874-d68e-4578-bf89-aa599e8383c7,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,l2-cache-size=128M,discard=unmap,aio=native -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,logical_block_size=512,physical_block_size=4096,serial=e30d1874-d68e-4578-bf89-aa599e8383c7 -netdev tap,fd=28,id=hostnet0,vhost=on,vhostfd=30 -device virtio-net-pci,host_mtu=1500,netdev=hostnet0,id=net0,mac=fa:16:3e:ec:d3:45,bus=pci.0,addr=0x3 -chardev pty,id=charserial0,logfile=/mnt/vstorage/vols/datastores/nova/instances/89fccc31-96a6-47ce-abd1-e40fba7274e6/console.log,logappend=on -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/org.qemu.guest_agent.0.instance-00000012.sock,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/org.qemu.guest_agent.1.instance-00000012.sock,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.1 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 10.10.1.237:0 -device VGA,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -device vmcoreinfo -device pvpanic,ioport=1285 -msg timestamp=on
24553 ? Ssl 11:44 /usr/sbin/libvirtd --listen

Note: "-netdev tap,fd=28", so, net device is passed to qemu as handle and AFAIU tap interface is auto removed (by kernel) when qemu process exits.

3) SIGSTOP libvirtd to emulate port deletion delay that is performed by libvirtd when guest is stopped
(I believe libvirtd removes port when guest is stopped in the way like 'ovs-vsctl --timeout=5 -- --if-exists del-port taped0487c9-23')

# kill -SIGSTOP 24553
#

4) Kill the guest:
# kill -9 3934
#

Ovs agent logs right after killing:

2018-12-14 17:18:26.600 7 DEBUG neutron.agent.linux.async_process [-] Output received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: {"data":[["568b058e-382a-4c50-a185-45807107190b","old",null,163,null],["","new","taped0487c9-23",-1,["map",[["attached-mac","fa:16:3e:ec:d3:45"],["iface-id","ed0487c9-23b2-4bea-9a17-567b7f52d7fc"],["iface-status","active"],["vm-id","89fccc31-96a6-47ce-abd1-e40fba7274e6"]]]]],"headings":["row","action","name","ofport","external_ids"]} _read_stdout /usr/lib/python2.7/site-packages/neutron/agent/linux/async_process.py:239

Really:
# ovs-vsctl --columns name,ofport,error find Interface name='taped0487c9-23'
name : "taped0487c9-23"
ofport : -1
error : "could not open network device taped0487c9-23 (No such device)"
# ip a s taped0487c9-23
Device "taped0487c9-23" does not exist.
#

5) emulate parallel guest device adding (i.e. force 'rpc_loop' to process the ports):
# ip tuntap add dev my-tap mode tap
# ovs-vsctl add-port br-int my-tap
#

Right after adding rpc_loop starts to process ports and there is warning in logs:
2018-12-14 17:21:12.520 7 WARNING neutron.agent.common.ovs_lib [req-338406ef-81c5-4268-9463-f2e9435892ee - - - - -] ofport: -1 for VIF: ed0487c9-23b2-4bea-9a17-567b7f52d7fc is not a positive integer

Part of openflow entries are deleted but not all:

# ovs-ofctl dump-flows br-int | grep fa:16:3e:ec:d3:45
 cookie=0xf4d7d970f5382f3d, duration=905.467s, table=60, n_packets=176, n_bytes=23641, idle_age=367, priority=90,dl_vlan=9,dl_dst=fa:16:3e:ec:d3:45 actions=load:0xa3->NXM_NX_REG5[],load:0x9->NXM_NX_REG6[],strip_vlan,resubmit(,81)
 cookie=0xf4d7d970f5382f3d, duration=905.467s, table=73, n_packets=0, n_bytes=0, idle_age=4213, priority=100,reg6=0x9,dl_dst=fa:16:3e:ec:d3:45 actions=load:0xa3->NXM_NX_REG5[],resubmit(,81)
#

Continue libvirtd:

# kill -SIGCONT 24553
#

2018-12-14 17:24:58.557 7 DEBUG neutron.agent.linux.async_process [-] Output received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: {"data":[["568b058e-382a
-4c50-a185-45807107190b","delete","taped0487c9-23",-1,["map",[["attached-mac","fa:16:3e:ec:d3:45"],["iface-id","ed0487c9-23b2-4bea-9a17-567b7f52d7fc"],["iface-status","active"],["vm-id","89fccc31-96a6-47ce-a
bd1-e40fba7274e6"]]]]],"headings":["row","action","name","ofport","external_ids"]} _read_stdout /usr/lib/python2.7/site-packages/neutron/agent/linux/async_process.py:239

But flows are still here:
# ovs-ofctl dump-flows br-int | grep fa:16:3e:ec:d3:45
 cookie=0xf4d7d970f5382f3d, duration=1030.412s, table=60, n_packets=176, n_bytes=23641, idle_age=492, priority=90,dl_vlan=9,dl_dst=fa:16:3e:ec:d3:45 actions=load:0xa3->NXM_NX_REG5[],load:0x9->NXM_NX_REG6[],strip_vlan,resubmit(,81)
 cookie=0xf4d7d970f5382f3d, duration=1030.412s, table=73, n_packets=0, n_bytes=0, idle_age=4338, priority=100,reg6=0x9,dl_dst=fa:16:3e:ec:d3:45 actions=load:0xa3->NXM_NX_REG5[],resubmit(,81)
#

Trace result for stopped VM:

# ovs-appctl ofproto/trace br-int in_port=12,tcp,dl_src=fa:16:3e:05:95:a1,dl_dst=fa:16:3e:ec:d3:45,nw_src=10.94.152.196,nw_dst=10.94.152.212
<cut>
bridge("br-int")
----------------
    thaw
        Resuming from table 72
72. ct_state=+new-est,ip,reg5=0xc, priority 74, cookie 0xf4d7d970f5382f3d
    resubmit(,73)
73. reg6=0x9,dl_dst=fa:16:3e:ec:d3:45, priority 100, cookie 0xf4d7d970f5382f3d
    set_field:0xa3->reg5
    resubmit(,81)
81. ct_state=+trk,reg5=0xa3, priority 80, cookie 0xf4d7d970f5382f3d
    resubmit(,82)
82. ct_state=+new-est,ip,reg5=0xa3, priority 74, cookie 0xf4d7d970f5382f3d
    ct(commit,zone=NXM_NX_REG6[0..15])
    drop
    output:163
     >> Nonexistent output port
    resubmit(,92)
92. priority 0, cookie 0xf4d7d970f5382f3d
    drop

Final flow: recirc_id=0x9b0,eth,tcp,reg5=0xa3,reg6=0x9,in_port=12,vlan_tci=0x0000,dl_src=fa:16:3e:05:95:a1,dl_dst=fa:16:3e:ec:d3:45,nw_src=10.94.152.196,nw_dst=10.94.152.212,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=0,tp_dst=0,tcp_flags=0
Megaflow: recirc_id=0x9b0,ct_state=+new-est-rel-rpl+trk,eth,tcp,in_port=12,dl_dst=fa:16:3e:ec:d3:45,nw_frag=no
Datapath actions: ct(commit,zone=9)
#

If then migrate my-vm (in stopped state) to another node, then other-vm from current node lost connectivity to my-vm due to egress traffic from vm-other is going to 'Nonexistent output port'.

The issue looks like also actual for master.

Tags: ovs
Revision history for this message
Anton Kurbatov (akurbatov) wrote :
description: updated
description: updated
tags: added: ovs
Changed in neutron:
importance: Undecided → Medium
Changed in neutron:
importance: Medium → High
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello Anton:

Do you have the full ovs-agent log? I say that because the one attached is very limited.

Just for the record: your agent is running with OVS firewall.

I have one more question: why are you stopping the VM manually? Instead of using the OpenStack commands: server stop, server start, etc.

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello Anton:

Please, check [1]. The first section is the list of flows with one VM running. Then I stopped (not deleted) the VM using OpenStack CLI. The second section is the list of flows with the VM halted.

I tested several times, spawning up to 5 VMs, stopping, starting and deleting the VMs and I don't see staled flows on br-int. All flows related to a port, once is deleted, are removed from the bridge.

[1] http://paste.openstack.org/show/737780/

Revision history for this message
Anton Kurbatov (akurbatov) wrote :
Revision history for this message
Anton Kurbatov (akurbatov) wrote :
Download full text (3.9 KiB)

Hello Rodolfo

> Do you have the full ovs-agent log? I say that because the one attached is very limited.
Atached new logs neutron-openvswitch-agent-full.log (search 'is not a positive integer' for ~ timestamp)

> I have one more question: why are you stopping the VM manually? Instead of using the OpenStack commands: server stop, server start, etc.

Actually I stop the VM using openstack CLI. In the issue description I am stopping it manually to show reproduction by steps (to elliminate race between libvirt and rcp_loop calls).

I wrote a small bash script using witch I managed to get reproduction.

So, I have two VMs:

[root@sqvm2-0506 ~]# openstack server list
+--------------------------------------+------+--------+---------------------+-------+--------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+------+--------+---------------------+-------+--------+
| 968eb966-77d7-4668-a2bf-226284661ee8 | vm-2 | ACTIVE | net1=192.168.128.18 | | tiny |
| ef1afb42-ab8f-448b-876c-b1705e718ec5 | vm-1 | ACTIVE | net1=192.168.128.13 | | tiny |
+--------------------------------------+------+--------+---------------------+-------+--------+
[root@sqvm2-0506 ~]#

[root@sqvm2-0506 ~]# openstack port list | grep 192.168.128.13
| fce15578-ba1c-4e96-a8ba-1612b389aada | | fa:16:3e:d5:10:62 | ip_address='192.168.128.13', subnet_id='868b4265-f7b9-4bb0-845c-6a05a56a4eaa' | ACTIVE |
[root@sqvm2-0506 ~]#

Mac 'fa:16:3e:d5:10:62' is the address of vm-1

[root@sqvm2-0506 ~]# cat test.sh
#!/usr/bin/env bash

set -e
step=0

while :; do
    echo "Step N=${step}" && ((step++)) || true
    openstack --insecure server add network vm-2 net2 1>/dev/null &
    pid=$!
    openstack --insecure server stop vm-1 1>/dev/null
    wait "${pid}"

    sleep 30 # give time to VM stopping and rpc_loop ports processing
    ovs-ofctl dump-flows br-int | grep fa:16:3e:d5:10:62 && { echo "Got it, exit"; exit 1; }

    openstack --insecure server remove network vm-2 net2 1>/dev/null
    openstack --insecure server start vm-1 1>/dev/null
    sleep 30 # give time to VM start
done
[root@sqvm2-0506 ~]#

[root@sqvm2-0506 ~]# /bin/bash test.sh
Step N=0
 cookie=0x9e2f137c9b53b412, duration=239.247s, table=60, n_packets=67, n_bytes=7398, idle_age=329, priority=90,dl_vlan=1,dl_dst=fa:16:3e:d5:10:62 actions=load:0x24->NXM_NX_REG5[],load:0x1->NXM_NX_REG6[],strip_vlan,resubmit(,81)
 cookie=0x9e2f137c9b53b412, duration=239.246s, table=73, n_packets=0, n_bytes=0, idle_age=357, priority=100,reg6=0x1,dl_dst=fa:16:3e:d5:10:62 actions=load:0x24->NXM_NX_REG5[],resubmit(,81)
Got it, exit
[root@sqvm2-0506 ~]#
[root@sqvm2-0506 ~]# openstack server list
+--------------------------------------+------+---------+----------------------------------------+-------+--------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+------+---------+----------------------------------------+-------+--------+
| 968eb966-77d7-4668-a2bf-226284661ee8 | vm-2 | ACTIVE | net2=192.168.5.15; net1=192.168.128.18 | | tiny ...

Read more...

Revision history for this message
Anton Kurbatov (akurbatov) wrote :

Not sure, but looks like the root cause is in how tap device is removed from the ovs bridge:

2018-12-20 14:34:11.687 7 DEBUG neutron.agent.linux.async_process [-] Output received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: {"data":[["dd9fb3ea-df1a-49cd-8bb6-a705b66dbcd6","old",null,36,null],["","new","tapfce15578-ba",-1,["map",[["attached-mac","fa:16:3e:d5:10:62"],["iface-id","fce15578-ba1c-4e96-a8ba-1612b389aada"],["iface-status","active"],["vm-id","ef1afb42-ab8f-448b-876c-b1705e718ec5"]]]]],"headings":["row","action","name","ofport","external_ids"]} _read_stdout /usr/lib/python2.7/site-packages/neutron/agent/linux/async_process.py:239

2018-12-20 14:34:11.879 7 DEBUG neutron.agent.linux.async_process [-] Output received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: {"data":[["dd9fb3ea-df1a-49cd-8bb6-a705b66dbcd6","delete","tapfce15578-ba",-1,["map",[["attached-mac","fa:16:3e:d5:10:62"],["iface-id","fce15578-ba1c-4e96-a8ba-1612b389aada"],["iface-status","active"],["vm-id","ef1afb42-ab8f-448b-876c-b1705e718ec5"]]]]],"headings":["row","action","name","ofport","external_ids"]} _read_stdout /usr/lib/python2.7/site-packages/neutron/agent/linux/async_process.py:239

If rpc_loop processes ports when ovs Interface is still listed (but in fact device is already removed from the node), then we are getting ofport=-1 with further effects.

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hi Anton:

I saw the firewall is using deferred logic, to speedup the OF execution. The problem is when delete flows with strict is used.

I'm going to propose a patch for this.

Regards.

Changed in neutron:
assignee: nobody → Rodolfo Alonso (rodolfo-alonso-hernandez)
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hi:

The patch I proposed is not solving the problem because this is not fixing the root cause. I think the problem is, as you mentioned in #6, the order of the events (add/remove ports) and how those events are handled. We should keep the same order when executing the add/remove actions.

Regards.

Changed in neutron:
assignee: Rodolfo Alonso (rodolfo-alonso-hernandez) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.