2. ovs-vswitchd log:
2023-05-19T10:48:42.400Z|13247|jsonrpc|DBG|unix:/run/openvswitch/db.sock: received notification, method="update3", params=[["monid","Open_vSwitch"],"00000000-0000-0000-0000-000000000000",{"Open_vSwitch":{"1a4db534-3133-43e0-922e-dd7d5d76a802":{"modify":{"next_cfg":135}}},"Interface":{"8d7bdf97-1f91-4ef3-9b27-b2a9e7c362e2":{"insert":{"name":"tap6d5eccfa-06"}}},"Port":{"a4be79ec-df95-495b-87e1-fc373582f647":{"insert":{"name":"tap6d5eccfa-06","interfaces":["uuid","8d7bdf97-1f91-4ef3-9b27-b2a9e7c362e2"]}}},"Bridge":{"ee9ab5d4-b72b-479b-87cd-29a19d5540a8":{"modify":{"ports":["uuid","a4be79ec-df95-495b-87e1-fc373582f647"]}}}}]
2023-05-19T10:48:42.410Z|13248|bridge|WARN|could not open network device tap6d5eccfa-06 (No such device)
2023-05-19T10:48:42.411Z|13249|jsonrpc|DBG|unix:/run/openvswitch/db.sock: send request, method="transact", params=["Open_vSwitch",{"row":{"cur_cfg":135},"where":[["_uuid","==",["uuid","1a4db534-3133-43e0-922e-dd7d5d76a802"]]],"table":"Open_vSwitch","op":"update"},{"row":{"error":"could not open network device tap6d5eccfa-06 (No such device)","ofport":-1},"where":[["_uuid","==",["uuid","8d7bdf97-1f91-4ef3-9b27-b2a9e7c362e2"]]],"table":"Interface","op":"update"},{"op":"assert","lock":"ovs_vswitchd"}], id=4787
2023-05-19T10:48:42.412Z|13250|poll_loop|DBG|wakeup due to [POLLIN] on fd 13 (<->/run/openvswitch/db.sock) at ../lib/stream-fd.c:157 (4% CPU usage)
2023-05-19T10:48:42.412Z|13251|jsonrpc|DBG|unix:/run/openvswitch/db.sock: received notification, method="update3", params=[["monid","Open_vSwitch"],"00000000-0000-0000-0000-000000000000",{"Interface":{"8d7bdf97-1f91-4ef3-9b27-b2a9e7c362e2":{"modify":{"error":"could not open network device tap6d5eccfa-06 (No such device)","ofport":-1}}},"Open_vSwitch":{"1a4db534-3133-43e0-922e-dd7d5d76a802":{"modify":{"cur_cfg":135}}}}]
2023-05-19T10:48:42.907Z|13260|vconn|DBG|unix#50: sent (Success): OFPT_PORT_STATUS (OF1.5) (xid=0x0): ADD: 57(tap6d5eccfa-06): addr:fe:16:3e:dc:bc:b8
config: 0
state: 0
current: 10MB-FD COPPER
speed: 10 Mbps now, 0 Mbps max
2023-05-19T10:48:42.907Z|13261|bridge|INFO|bridge br-int: added interface tap6d5eccfa-06 on port 57
2023-05-19T10:48:42.908Z|13262|hmap|DBG|Dropped 74 log messages in last 6 seconds (most recently, 1 seconds ago) due to excessive rate
2023-05-19T10:48:42.908Z|13263|hmap|DBG|../ofproto/ofproto-dpif-xlate.c:884: 1 bucket with 6+ nodes, including 1 bucket with 7 nodes (16 nodes total across 16 buckets)
2023-05-19T10:48:42.908Z|13264|dpif_netlink|DBG|port_changed: dpif:system@ovs-system vport:tap6d5eccfa-06 cmd:1
2023-05-19T10:48:42.908Z|13265|vconn|DBG|unix#50: sent (Success): OFPT_PORT_STATUS (OF1.5) (xid=0x0): MOD: 57(tap6d5eccfa-06): addr:fe:16:3e:dc:bc:b8
config: 0
state: LIVE
current: 10MB-FD COPPER
speed: 10 Mbps now, 0 Mbps max
2023-05-19T10:48:42.971Z|13289|bridge|INFO|bridge br-int: deleted interface tap6d5eccfa-06 on port 57
2023-05-19T10:48:42.972Z|04671|poll_loop(revalidator62)|DBG|wakeup due to [POLLIN] on fd 43 (FIFO pipe:[154647]) at ../lib/ovs-thread.c:378 (0% CPU usage)
2023-05-19T10:48:42.972Z|04651|poll_loop(revalidator63)|DBG|wakeup due to [POLLIN] on fd 45 (FIFO pipe:[154648]) at ../lib/ovs-thread.c:378 (0% CPU usage)
2023-05-19T10:48:42.975Z|04672|poll_loop(revalidator62)|DBG|wakeup due to [POLLIN] on fd 43 (FIFO pipe:[154647]) at ../lib/ovs-thread.c:378 (0% CPU usage)
2023-05-19T10:48:42.975Z|13290|dpif|DBG|system@ovs-system: failed to query port tap6d5eccfa-06: No such device
2023-05-19T10:48:42.982Z|07648|poll_loop(revalidator61)|DBG|wakeup due to [POLLIN] on fd 41 (FIFO pipe:[155716]) at ../lib/ovs-thread.c:378 (10% CPU usage)
2023-05-19T10:48:42.984Z|04719|poll_loop(revalidator65)|DBG|wakeup due to [POLLIN] on fd 49 (FIFO pipe:[149813]) at ../lib/ovs-thread.c:378 (0% CPU usage)
2023-05-19T10:48:42.984Z|04589|poll_loop(revalidator64)|DBG|wakeup due to [POLLIN] on fd 47 (FIFO pipe:[154649]) at ../lib/ovs-thread.c:378 (1% CPU usage)
2023-05-19T10:48:42.985Z|13291|poll_loop|DBG|wakeup due to [POLLIN] on fd 56 (FIFO pipe:[136293]) at ../lib/ovs-rcu.c:236 (2% CPU usage)
2023-05-19T10:48:42.999Z|13292|poll_loop|DBG|wakeup due to [POLLIN] on fd 56 (FIFO pipe:[136293]) at ../lib/ovs-rcu.c:236 (2% CPU usage)
2023-05-19T10:48:42.999Z|13293|dpif_netlink|DBG|port_changed: dpif:system@ovs-system vport:tap6d5eccfa-06 cmd:2
2023-05-19T10:48:42.999Z|13294|dpif|DBG|system@ovs-system: failed to query port tap6d5eccfa-06: No such device
2023-05-19 18:58:45.827 7 WARNING nova.compute.manager [None req-6f1ad586-b66d-4fb6-8603-ba96af2af311 bb05576c972b49b4ac598cb33cf151f2 af89a2f86b204b54af86df7e5bb6a841 - - default default] [instance: 1cfbb245-c807-48e9-90eb-2c4f77e7a7b3] Timeout waiting for ['network-vif-plugged-6d5eccfa-069e-4058-a1c8-87bec9c1c280'] for instance with vm_state building and task_state spawning. Event states are: network-vif-plugged-6d5eccfa-069e-4058-a1c8-87bec9c1c280: timed out after 600.00 seconds: eventlet.timeout.Timeout: 600 seconds
It seems that ovs successfully add tap device(tap6d5eccfa-06) in br-int, then quickly delete the device.
In Nova, it wait for network-vif-plugged event, but tap device not present in br-int bridge, so create instance failed after 600 secs(we modify the default timeout 300s).
Hi, any update for this bug?
I can reproduce in OpenStack Zed version, (OVS version 2.17.7/OVN version 22.06.1)
this bug can easily reproduce.
Using the command like:
# openstack server create test_vm --flavor XXXXXX --network XXXXXX --image XXXXXX --min 10 --max 10 --availability-zone nova:XXXXXX
There maybe at least 2 or 3 instances created failed because of vif plugging timeout error.
VM port UUID: 6d5eccfa- 069e-4058- a1c8-87bec9c1c2 80
Here is some logs info:
1. ovsdb server log:
record 554: 2023-05-19 10:48:42.399
table Interface insert row "tap6d5eccfa-06" (8d7bdf97):
table Open_vSwitch row 1a4db534 (1a4db534) diff:
table Bridge row "br-int" (ee9ab5d4) diff:
table Port insert row "tap6d5eccfa-06" (a4be79ec):
record 555: 2023-05-19 10:48:42.411
table Interface row "tap6d5eccfa-06" (8d7bdf97) diff:
table Open_vSwitch row 1a4db534 (1a4db534) diff:
record 556: 2023-05-19 10:48:42.909
table Interface row "tap6d5eccfa-06" (8d7bdf97) diff:
record 557: 2023-05-19 10:48:42.941 "ovs-vsctl (invoked by init (pid 0)): ovs-vsctl del-port tap6d5eccfa-06"
table Open_vSwitch row 1a4db534 (1a4db534) diff:
table Interface row "tap6d5eccfa-06" (8d7bdf97) diff:
delete row
table Bridge row "br-int" (ee9ab5d4) diff:
table Port row "tap6d5eccfa-06" (a4be79ec) diff:
delete row
2. ovs-vswitchd log: 19T10:48: 42.400Z| 13247|jsonrpc| DBG|unix: /run/openvswitc h/db.sock: received notification, method="update3", params= [["monid" ,"Open_ vSwitch" ],"00000000- 0000-0000- 0000-0000000000 00",{"Open_ vSwitch" :{"1a4db534- 3133-43e0- 922e-dd7d5d76a8 02":{"modify" :{"next_ cfg":135} }},"Interface" :{"8d7bdf97- 1f91-4ef3- 9b27-b2a9e7c362 e2":{"insert" :{"name" :"tap6d5eccfa- 06"}}}, "Port": {"a4be79ec- df95-495b- 87e1-fc373582f6 47":{"insert" :{"name" :"tap6d5eccfa- 06","interfaces ":["uuid" ,"8d7bdf97- 1f91-4ef3- 9b27-b2a9e7c362 e2"]}}} ,"Bridge" :{"ee9ab5d4- b72b-479b- 87cd-29a19d5540 a8":{"modify" :{"ports" :["uuid" ,"a4be79ec- df95-495b- 87e1-fc373582f6 47"]}}} }] 19T10:48: 42.410Z| 13248|bridge| WARN|could not open network device tap6d5eccfa-06 (No such device) 19T10:48: 42.411Z| 13249|jsonrpc| DBG|unix: /run/openvswitc h/db.sock: send request, method="transact", params= ["Open_ vSwitch" ,{"row" :{"cur_ cfg":135} ,"where" :[["_uuid" ,"==",[ "uuid", "1a4db534- 3133-43e0- 922e-dd7d5d76a8 02"]]], "table" :"Open_ vSwitch" ,"op":" update" },{"row" :{"error" :"could not open network device tap6d5eccfa-06 (No such device) ","ofport" :-1},"where" :[["_uuid" ,"==",[ "uuid", "8d7bdf97- 1f91-4ef3- 9b27-b2a9e7c362 e2"]]], "table" :"Interface" ,"op":" update" },{"op" :"assert" ,"lock" :"ovs_vswitchd" }], id=4787 19T10:48: 42.412Z| 13250|poll_ loop|DBG| wakeup due to [POLLIN] on fd 13 (<->/run/ openvswitch/ db.sock) at ../lib/ stream- fd.c:157 (4% CPU usage) 19T10:48: 42.412Z| 13251|jsonrpc| DBG|unix: /run/openvswitc h/db.sock: received notification, method="update3", params= [["monid" ,"Open_ vSwitch" ],"00000000- 0000-0000- 0000-0000000000 00",{"Interface ":{"8d7bdf97- 1f91-4ef3- 9b27-b2a9e7c362 e2":{"modify" :{"error" :"could not open network device tap6d5eccfa-06 (No such device) ","ofport" :-1}}}, "Open_vSwitch" :{"1a4db534- 3133-43e0- 922e-dd7d5d76a8 02":{"modify" :{"cur_ cfg":135} }}}]
2023-05-
2023-05-
2023-05-
2023-05-
2023-05-
2023-05- 19T10:48: 42.907Z| 13260|vconn| DBG|unix# 50: sent (Success): OFPT_PORT_STATUS (OF1.5) (xid=0x0): ADD: 57(tap6d5eccfa-06): addr:fe: 16:3e:dc: bc:b8 19T10:48: 42.907Z| 13261|bridge| INFO|bridge br-int: added interface tap6d5eccfa-06 on port 57 19T10:48: 42.908Z| 13262|hmap| DBG|Dropped 74 log messages in last 6 seconds (most recently, 1 seconds ago) due to excessive rate 19T10:48: 42.908Z| 13263|hmap| DBG|../ ofproto/ ofproto- dpif-xlate. c:884: 1 bucket with 6+ nodes, including 1 bucket with 7 nodes (16 nodes total across 16 buckets) 19T10:48: 42.908Z| 13264|dpif_ netlink| DBG|port_ changed: dpif:system@ ovs-system vport:tap6d5ecc fa-06 cmd:1 19T10:48: 42.908Z| 13265|vconn| DBG|unix# 50: sent (Success): OFPT_PORT_STATUS (OF1.5) (xid=0x0): MOD: 57(tap6d5eccfa-06): addr:fe: 16:3e:dc: bc:b8
config: 0
state: 0
current: 10MB-FD COPPER
speed: 10 Mbps now, 0 Mbps max
2023-05-
2023-05-
2023-05-
2023-05-
2023-05-
config: 0
state: LIVE
current: 10MB-FD COPPER
speed: 10 Mbps now, 0 Mbps max
2023-05- 19T10:48: 42.971Z| 13289|bridge| INFO|bridge br-int: deleted interface tap6d5eccfa-06 on port 57 19T10:48: 42.972Z| 04671|poll_ loop(revalidato r62)|DBG| wakeup due to [POLLIN] on fd 43 (FIFO pipe:[154647]) at ../lib/ ovs-thread. c:378 (0% CPU usage) 19T10:48: 42.972Z| 04651|poll_ loop(revalidato r63)|DBG| wakeup due to [POLLIN] on fd 45 (FIFO pipe:[154648]) at ../lib/ ovs-thread. c:378 (0% CPU usage) 19T10:48: 42.975Z| 04672|poll_ loop(revalidato r62)|DBG| wakeup due to [POLLIN] on fd 43 (FIFO pipe:[154647]) at ../lib/ ovs-thread. c:378 (0% CPU usage) 19T10:48: 42.975Z| 13290|dpif| DBG|system@ ovs-system: failed to query port tap6d5eccfa-06: No such device 19T10:48: 42.982Z| 07648|poll_ loop(revalidato r61)|DBG| wakeup due to [POLLIN] on fd 41 (FIFO pipe:[155716]) at ../lib/ ovs-thread. c:378 (10% CPU usage) 19T10:48: 42.984Z| 04719|poll_ loop(revalidato r65)|DBG| wakeup due to [POLLIN] on fd 49 (FIFO pipe:[149813]) at ../lib/ ovs-thread. c:378 (0% CPU usage) 19T10:48: 42.984Z| 04589|poll_ loop(revalidato r64)|DBG| wakeup due to [POLLIN] on fd 47 (FIFO pipe:[154649]) at ../lib/ ovs-thread. c:378 (1% CPU usage) 19T10:48: 42.985Z| 13291|poll_ loop|DBG| wakeup due to [POLLIN] on fd 56 (FIFO pipe:[136293]) at ../lib/ ovs-rcu. c:236 (2% CPU usage) 19T10:48: 42.999Z| 13292|poll_ loop|DBG| wakeup due to [POLLIN] on fd 56 (FIFO pipe:[136293]) at ../lib/ ovs-rcu. c:236 (2% CPU usage) 19T10:48: 42.999Z| 13293|dpif_ netlink| DBG|port_ changed: dpif:system@ ovs-system vport:tap6d5ecc fa-06 cmd:2 19T10:48: 42.999Z| 13294|dpif| DBG|system@ ovs-system: failed to query port tap6d5eccfa-06: No such device
2023-05-
2023-05-
2023-05-
2023-05-
2023-05-
2023-05-
2023-05-
2023-05-
2023-05-
2023-05-
2023-05-
3. nova-compute log: b66d-4fb6- 8603-ba96af2af3 11 bb05576c972b49b 4ac598cb33cf151 f2 af89a2f86b204b5 4af86df7e5bb6a8 41 - - default default] Successfully plugged vif VIFOpenVSwitch( active= False,address= fa:16:3e: dc:bc:b8, bridge_ name='br- int',has_ traffic_ filtering= True,id= 6d5eccfa- 069e-4058- a1c8-87bec9c1c2 80,network= Network( 19771f61- 1003-46a3- 8850-001b32947f 66),plugin= 'ovs',port_ profile= VIFPortProfileO penVSwitch, preserve_ on_delete= False,vif_ name='tap6d5ecc fa-06')
2023-05-19 18:48:42.416 7 INFO os_vif [None req-6f1ad586-
2023-05-19 18:58:45.827 7 WARNING nova.compute. manager [None req-6f1ad586- b66d-4fb6- 8603-ba96af2af3 11 bb05576c972b49b 4ac598cb33cf151 f2 af89a2f86b204b5 4af86df7e5bb6a8 41 - - default default] [instance: 1cfbb245- c807-48e9- 90eb-2c4f77e7a7 b3] Timeout waiting for ['network- vif-plugged- 6d5eccfa- 069e-4058- a1c8-87bec9c1c2 80'] for instance with vm_state building and task_state spawning. Event states are: network- vif-plugged- 6d5eccfa- 069e-4058- a1c8-87bec9c1c2 80: timed out after 600.00 seconds: eventlet. timeout. Timeout: 600 seconds
It seems that ovs successfully add tap device( tap6d5eccfa- 06) in br-int, then quickly delete the device.
In Nova, it wait for network-vif-plugged event, but tap device not present in br-int bridge, so create instance failed after 600 secs(we modify the default timeout 300s).