Comment 13 for bug 2005111

Revision history for this message
Dongcan Ye (hellochosen) wrote :

Hi, any update for this bug?

I can reproduce in OpenStack Zed version, (OVS version 2.17.7/OVN version 22.06.1)
this bug can easily reproduce.

Using the command like:
# openstack server create test_vm --flavor XXXXXX --network XXXXXX --image XXXXXX --min 10 --max 10 --availability-zone nova:XXXXXX

There maybe at least 2 or 3 instances created failed because of vif plugging timeout error.

VM port UUID: 6d5eccfa-069e-4058-a1c8-87bec9c1c280

Here is some logs info:

1. ovsdb server log:
record 554: 2023-05-19 10:48:42.399
  table Interface insert row "tap6d5eccfa-06" (8d7bdf97):
  table Open_vSwitch row 1a4db534 (1a4db534) diff:
  table Bridge row "br-int" (ee9ab5d4) diff:
  table Port insert row "tap6d5eccfa-06" (a4be79ec):

record 555: 2023-05-19 10:48:42.411
  table Interface row "tap6d5eccfa-06" (8d7bdf97) diff:
  table Open_vSwitch row 1a4db534 (1a4db534) diff:

record 556: 2023-05-19 10:48:42.909
  table Interface row "tap6d5eccfa-06" (8d7bdf97) diff:

record 557: 2023-05-19 10:48:42.941 "ovs-vsctl (invoked by init (pid 0)): ovs-vsctl del-port tap6d5eccfa-06"
  table Open_vSwitch row 1a4db534 (1a4db534) diff:
  table Interface row "tap6d5eccfa-06" (8d7bdf97) diff:
    delete row
  table Bridge row "br-int" (ee9ab5d4) diff:
  table Port row "tap6d5eccfa-06" (a4be79ec) diff:
    delete row

2. ovs-vswitchd log:
2023-05-19T10:48:42.400Z|13247|jsonrpc|DBG|unix:/run/openvswitch/db.sock: received notification, method="update3", params=[["monid","Open_vSwitch"],"00000000-0000-0000-0000-000000000000",{"Open_vSwitch":{"1a4db534-3133-43e0-922e-dd7d5d76a802":{"modify":{"next_cfg":135}}},"Interface":{"8d7bdf97-1f91-4ef3-9b27-b2a9e7c362e2":{"insert":{"name":"tap6d5eccfa-06"}}},"Port":{"a4be79ec-df95-495b-87e1-fc373582f647":{"insert":{"name":"tap6d5eccfa-06","interfaces":["uuid","8d7bdf97-1f91-4ef3-9b27-b2a9e7c362e2"]}}},"Bridge":{"ee9ab5d4-b72b-479b-87cd-29a19d5540a8":{"modify":{"ports":["uuid","a4be79ec-df95-495b-87e1-fc373582f647"]}}}}]
2023-05-19T10:48:42.410Z|13248|bridge|WARN|could not open network device tap6d5eccfa-06 (No such device)
2023-05-19T10:48:42.411Z|13249|jsonrpc|DBG|unix:/run/openvswitch/db.sock: send request, method="transact", params=["Open_vSwitch",{"row":{"cur_cfg":135},"where":[["_uuid","==",["uuid","1a4db534-3133-43e0-922e-dd7d5d76a802"]]],"table":"Open_vSwitch","op":"update"},{"row":{"error":"could not open network device tap6d5eccfa-06 (No such device)","ofport":-1},"where":[["_uuid","==",["uuid","8d7bdf97-1f91-4ef3-9b27-b2a9e7c362e2"]]],"table":"Interface","op":"update"},{"op":"assert","lock":"ovs_vswitchd"}], id=4787
2023-05-19T10:48:42.412Z|13250|poll_loop|DBG|wakeup due to [POLLIN] on fd 13 (<->/run/openvswitch/db.sock) at ../lib/stream-fd.c:157 (4% CPU usage)
2023-05-19T10:48:42.412Z|13251|jsonrpc|DBG|unix:/run/openvswitch/db.sock: received notification, method="update3", params=[["monid","Open_vSwitch"],"00000000-0000-0000-0000-000000000000",{"Interface":{"8d7bdf97-1f91-4ef3-9b27-b2a9e7c362e2":{"modify":{"error":"could not open network device tap6d5eccfa-06 (No such device)","ofport":-1}}},"Open_vSwitch":{"1a4db534-3133-43e0-922e-dd7d5d76a802":{"modify":{"cur_cfg":135}}}}]

2023-05-19T10:48:42.907Z|13260|vconn|DBG|unix#50: sent (Success): OFPT_PORT_STATUS (OF1.5) (xid=0x0): ADD: 57(tap6d5eccfa-06): addr:fe:16:3e:dc:bc:b8
     config: 0
     state: 0
     current: 10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max
2023-05-19T10:48:42.907Z|13261|bridge|INFO|bridge br-int: added interface tap6d5eccfa-06 on port 57
2023-05-19T10:48:42.908Z|13262|hmap|DBG|Dropped 74 log messages in last 6 seconds (most recently, 1 seconds ago) due to excessive rate
2023-05-19T10:48:42.908Z|13263|hmap|DBG|../ofproto/ofproto-dpif-xlate.c:884: 1 bucket with 6+ nodes, including 1 bucket with 7 nodes (16 nodes total across 16 buckets)
2023-05-19T10:48:42.908Z|13264|dpif_netlink|DBG|port_changed: dpif:system@ovs-system vport:tap6d5eccfa-06 cmd:1
2023-05-19T10:48:42.908Z|13265|vconn|DBG|unix#50: sent (Success): OFPT_PORT_STATUS (OF1.5) (xid=0x0): MOD: 57(tap6d5eccfa-06): addr:fe:16:3e:dc:bc:b8
     config: 0
     state: LIVE
     current: 10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max

2023-05-19T10:48:42.971Z|13289|bridge|INFO|bridge br-int: deleted interface tap6d5eccfa-06 on port 57
2023-05-19T10:48:42.972Z|04671|poll_loop(revalidator62)|DBG|wakeup due to [POLLIN] on fd 43 (FIFO pipe:[154647]) at ../lib/ovs-thread.c:378 (0% CPU usage)
2023-05-19T10:48:42.972Z|04651|poll_loop(revalidator63)|DBG|wakeup due to [POLLIN] on fd 45 (FIFO pipe:[154648]) at ../lib/ovs-thread.c:378 (0% CPU usage)
2023-05-19T10:48:42.975Z|04672|poll_loop(revalidator62)|DBG|wakeup due to [POLLIN] on fd 43 (FIFO pipe:[154647]) at ../lib/ovs-thread.c:378 (0% CPU usage)
2023-05-19T10:48:42.975Z|13290|dpif|DBG|system@ovs-system: failed to query port tap6d5eccfa-06: No such device
2023-05-19T10:48:42.982Z|07648|poll_loop(revalidator61)|DBG|wakeup due to [POLLIN] on fd 41 (FIFO pipe:[155716]) at ../lib/ovs-thread.c:378 (10% CPU usage)
2023-05-19T10:48:42.984Z|04719|poll_loop(revalidator65)|DBG|wakeup due to [POLLIN] on fd 49 (FIFO pipe:[149813]) at ../lib/ovs-thread.c:378 (0% CPU usage)
2023-05-19T10:48:42.984Z|04589|poll_loop(revalidator64)|DBG|wakeup due to [POLLIN] on fd 47 (FIFO pipe:[154649]) at ../lib/ovs-thread.c:378 (1% CPU usage)
2023-05-19T10:48:42.985Z|13291|poll_loop|DBG|wakeup due to [POLLIN] on fd 56 (FIFO pipe:[136293]) at ../lib/ovs-rcu.c:236 (2% CPU usage)
2023-05-19T10:48:42.999Z|13292|poll_loop|DBG|wakeup due to [POLLIN] on fd 56 (FIFO pipe:[136293]) at ../lib/ovs-rcu.c:236 (2% CPU usage)
2023-05-19T10:48:42.999Z|13293|dpif_netlink|DBG|port_changed: dpif:system@ovs-system vport:tap6d5eccfa-06 cmd:2
2023-05-19T10:48:42.999Z|13294|dpif|DBG|system@ovs-system: failed to query port tap6d5eccfa-06: No such device

3. nova-compute log:
2023-05-19 18:48:42.416 7 INFO os_vif [None req-6f1ad586-b66d-4fb6-8603-ba96af2af311 bb05576c972b49b4ac598cb33cf151f2 af89a2f86b204b54af86df7e5bb6a841 - - default default] Successfully plugged vif VIFOpenVSwitch(active=False,address=fa:16:3e:dc:bc:b8,bridge_name='br-int',has_traffic_filtering=True,id=6d5eccfa-069e-4058-a1c8-87bec9c1c280,network=Network(19771f61-1003-46a3-8850-001b32947f66),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=False,vif_name='tap6d5eccfa-06')

2023-05-19 18:58:45.827 7 WARNING nova.compute.manager [None req-6f1ad586-b66d-4fb6-8603-ba96af2af311 bb05576c972b49b4ac598cb33cf151f2 af89a2f86b204b54af86df7e5bb6a841 - - default default] [instance: 1cfbb245-c807-48e9-90eb-2c4f77e7a7b3] Timeout waiting for ['network-vif-plugged-6d5eccfa-069e-4058-a1c8-87bec9c1c280'] for instance with vm_state building and task_state spawning. Event states are: network-vif-plugged-6d5eccfa-069e-4058-a1c8-87bec9c1c280: timed out after 600.00 seconds: eventlet.timeout.Timeout: 600 seconds

It seems that ovs successfully add tap device(tap6d5eccfa-06) in br-int, then quickly delete the device.

In Nova, it wait for network-vif-plugged event, but tap device not present in br-int bridge, so create instance failed after 600 secs(we modify the default timeout 300s).