DHCP failed for instances

Bug #1567773 reported by Li Ma
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
DragonFlow
Fix Released
High
Li Ma

Bug Description

This is discovered in the latest dragonflow codes (04-10-2016). There are no errors reported. But in controller log, there are no DHCP app logs reported. It is really weird. No DHCP offer and No DHCP ack.

After I created an instance connected to a private network:

stack@nick-dev:~/dragonflow$ sudo ovs-ofctl dump-flows br-int
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=1433.614s, table=9, n_packets=0, n_bytes=0, idle_age=1433, priority=100,arp actions=resubmit(,10)
 cookie=0x0, duration=1433.613s, table=9, n_packets=0, n_bytes=0, idle_age=1433, priority=100,udp,dl_dst=ff:ff:ff:ff:ff:ff,tp_src=68,tp_dst=67 actions=resubmit(,11)
 cookie=0x0, duration=1433.615s, table=9, n_packets=0, n_bytes=0, idle_age=1433, priority=1 actions=resubmit(,17)
 cookie=0x0, duration=1433.614s, table=10, n_packets=0, n_bytes=0, idle_age=1433, priority=1 actions=resubmit(,17)
 cookie=0x0, duration=1433.612s, table=11, n_packets=0, n_bytes=0, idle_age=1433, priority=1 actions=resubmit(,17)
 cookie=0x0, duration=1433.613s, table=20, n_packets=0, n_bytes=0, idle_age=1433, priority=1 actions=resubmit(,64)
stack@nick-dev:~/dragonflow$

Revision history for this message
Li Ma (nick-ma-z) wrote :

Q_ENABLE_DRAGONFLOW_LOCAL_CONTROLLER=True
DF_SELECTIVE_TOPO_DIST=True
DF_PUB_SUB=True

DATABASE_PASSWORD=password
RABBIT_PASSWORD=password
SERVICE_PASSWORD=password
SERVICE_TOKEN=password
ADMIN_PASSWORD=password

enable_plugin dragonflow http://git.openstack.org/openstack/dragonflow
enable_service df-etcd
enable_service df-etcd-server
enable_service df-controller
enable_service df-ext-services
enable_service df-zmq-publisher-service

disable_service n-net
enable_service q-svc
enable_service df-l3-agent
#enable_service q-meta
disable_service q-dhcp
disable_service heat
disable_service tempest
enable_service cinder

# We have to disable the neutron L2 agent. DF does not use the L2 agent.
disable_service q-agt

[[post-config|$NEUTRON_CONF]]
[quotas]
quota_port = -1
quota_network = -1
quota_subnet = -1
[DEFAULT]
router_distributed=True

Changed in dragonflow:
importance: Undecided → High
Revision history for this message
Li Ma (nick-ma-z) wrote :
Download full text (3.7 KiB)

When enable selective distribution with PUB/SUB, the logged message is

2016-04-08 14:18:31.501 INFO dragonflow.db.api_nb [-] Pushing Update to Queue: Action:create, Table:ovsinterface, Key:61176198-5e89-4bcf-a671-17f6ce0d5397 Value:uuid:61176198-5e89-4bcf-a671-17f6ce0d5397, ofport:-1, name:tap86f8d260-37, admin_state:, type:vm, iface_id:86f8d260-375e-4549-8f81-ad4c9c1fd2a2, peer:, attached_mac:fa:16:3e:f0:c8:ac, remote_ip:, tunnel_type: Topic:None
2016-04-08 14:18:31.506 INFO dragonflow.db.api_nb [-] Pushing Update to Queue: Action:update, Table:ovsinterface, Key:61176198-5e89-4bcf-a671-17f6ce0d5397 Value:uuid:61176198-5e89-4bcf-a671-17f6ce0d5397, ofport:14, name:tap86f8d260-37, admin_state:down, type:vm, iface_id:86f8d260-375e-4549-8f81-ad4c9c1fd2a2, peer:, attached_mac:fa:16:3e:f0:c8:ac, remote_ip:, tunnel_type: Topic:None
2016-04-08 14:18:31.508 INFO dragonflow.controller.ryu_base_app [-] port added 14
2016-04-08 14:18:31.509 INFO dragonflow.controller.ryu_base_app [-] port modified 14
2016-04-08 14:18:31.510 INFO dragonflow.controller.topology [-] Ovs port updated: uuid:61176198-5e89-4bcf-a671-17f6ce0d5397, ofport:-1, name:tap86f8d260-37, admin_state:, type:vm, iface_id:86f8d260-375e-4549-8f81-ad4c9c1fd2a2, peer:, attached_mac:fa:16:3e:f0:c8:ac, remote_ip:, tunnel_type:
2016-04-08 14:18:31.515 INFO dragonflow.db.api_nb [-] Pushing Update to Queue: Action:update, Table:ovsinterface, Key:61176198-5e89-4bcf-a671-17f6ce0d5397 Value:uuid:61176198-5e89-4bcf-a671-17f6ce0d5397, ofport:14, name:tap86f8d260-37, admin_state:up, type:vm, iface_id:86f8d260-375e-4549-8f81-ad4c9c1fd2a2, peer:, attached_mac:fa:16:3e:f0:c8:ac, remote_ip:, tunnel_type: Topic:None

When disable it (by reverting the commit), the logged message is

2016-04-08 14:05:41.578 INFO dragonflow.db.api_nb [-] Pushing Update to Queue: Action:create, Table:ovsinterface, Key:535f4785-dffc-4417-a4d9-4561c6899b07 Value:uuid:535f4785-dffc-4417-a4d9-4561c6899b07, ofport:-1, name:tap02501dca-17, admin_state:, type:vm, iface_id:02501dca-171b-4332-9136-91ea45a9e6f4, peer:, attached_mac:fa:16:3e:4a:58:68, remote_ip:, tunnel_type: Topic:D
2016-04-08 14:05:41.586 INFO dragonflow.db.api_nb [-] Pushing Update to Queue: Action:update, Table:ovsinterface, Key:535f4785-dffc-4417-a4d9-4561c6899b07 Value:uuid:535f4785-dffc-4417-a4d9-4561c6899b07, ofport:13, name:tap02501dca-17, admin_state:down, type:vm, iface_id:02501dca-171b-4332-9136-91ea45a9e6f4, peer:, attached_mac:fa:16:3e:4a:58:68, remote_ip:, tunnel_type: Topic:D
2016-04-08 14:05:41.588 INFO dragonflow.controller.ryu_base_app [-] port added 13
2016-04-08 14:05:41.592 INFO dragonflow.controller.dhcp_app [-] Regiter VM as DHCP client::port <02501dca-171b-4332-9136-91ea45a9e6f4>
2016-04-08 14:05:41.593 INFO dragonflow.controller.ryu_base_app [-] port modified 13
2016-04-08 14:05:41.594 INFO dragonflow.controller.topology [-] Ovs port updated: uuid:535f4785-dffc-4417-a4d9-4561c6899b07, ofport:-1, name:tap02501dca-17, admin_state:, type:vm, iface_id:02501dca-171b-4332-9136-91ea45a9e6f4, peer:, attached_mac:fa:16:3e:4a:58:68, remote_ip:, tunnel_type:
2016-04-08 14:05:41.604 INFO dragonflow.db.api_nb [-] Pushing Update to Queue: Action:update, Table:o...

Read more...

Revision history for this message
Li Ma (nick-ma-z) wrote :

When I enable selective distribution for PUB/SUB again, it works. And I tried several times, I cannot reproduce it. I quit. :-(

Changed in dragonflow:
importance: High → Undecided
Revision history for this message
Li Ma (nick-ma-z) wrote :

I guess it has nothing to do with selective distribution but has something related with PUB/SUB. Maybe subscriber lose the connection to publisher at the time.

Li Ma (nick-ma-z)
Changed in dragonflow:
assignee: nobody → Li Ma (nick-ma-z)
Revision history for this message
Li Ma (nick-ma-z) wrote :

Seems ovs port update event loss.

Changed in dragonflow:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to dragonflow (master)

Fix proposed to branch: master
Review: https://review.openstack.org/303284

Changed in dragonflow:
status: New → In Progress
Li Ma (nick-ma-z)
description: updated
Li Ma (nick-ma-z)
description: updated
Revision history for this message
Li Ma (nick-ma-z) wrote :
Download full text (4.1 KiB)

The logs according to the described problem during the VM creation:

2016-04-10 16:27:45.233 INFO dragonflow.db.api_nb [-] Pushing Update to Queue: Action:create, Table:ovsinterface, Key:c594bfc3-2cfe-4ce6-a60d-834df7e2da38 Value:uuid:c594bfc3-2cfe-4ce6-a60d-834df7e2da38, ofport:-1, name:tap87e9d127-a3, admin_state:, type:vm, iface_id:87e9d127-a395-410d-87a8-cad4e3a413cd, peer:, attached_mac:fa:16:3e:1c:88:92, remote_ip:, tunnel_type: Topic:None
2016-04-10 16:27:45.237 INFO dragonflow.db.api_nb [-] Pushing Update to Queue: Action:update, Table:ovsinterface, Key:c594bfc3-2cfe-4ce6-a60d-834df7e2da38 Value:uuid:c594bfc3-2cfe-4ce6-a60d-834df7e2da38, ofport:7, name:tap87e9d127-a3, admin_state:down, type:vm, iface_id:87e9d127-a395-410d-87a8-cad4e3a413cd, peer:, attached_mac:fa:16:3e:1c:88:92, remote_ip:, tunnel_type: Topic:None
2016-04-10 16:27:45.243 INFO dragonflow.db.api_nb [-] Pushing Update to Queue: Action:update, Table:ovsinterface, Key:c594bfc3-2cfe-4ce6-a60d-834df7e2da38 Value:uuid:c594bfc3-2cfe-4ce6-a60d-834df7e2da38, ofport:7, name:tap87e9d127-a3, admin_state:up, type:vm, iface_id:87e9d127-a395-410d-87a8-cad4e3a413cd, peer:, attached_mac:fa:16:3e:1c:88:92, remote_ip:, tunnel_type: Topic:None
2016-04-10 16:27:45.244 INFO dragonflow.controller.ryu_base_app [-] port added 7
2016-04-10 16:27:45.244 INFO dragonflow.controller.ryu_base_app [-] port modified 7
2016-04-10 16:27:45.245 INFO dragonflow.controller.topology [-] Ovs port updated: uuid:c594bfc3-2cfe-4ce6-a60d-834df7e2da38, ofport:-1, name:tap87e9d127-a3, admin_state:, type:vm, iface_id:87e9d127-a395-410d-87a8-cad4e3a413cd, peer:, attached_mac:fa:16:3e:1c:88:92, remote_ip:, tunnel_type:

You can see that there is only one ovs port updated message during VM creation:

2016-04-10 16:27:45.245 INFO dragonflow.controller.topology [-] Ovs port updated: uuid:c594bfc3-2cfe-4ce6-a60d-834df7e2da38, ofport:-1, name:tap87e9d127-a3, admin_state:, type:vm, iface_id:87e9d127-a395-410d-87a8-cad4e3a413cd, peer:, attached_mac:fa:16:3e:1c:88:92, remote_ip:, tunnel_type:

This message happened when a VM port is being created, and its ofport = -1. At this time, the port is ignored. See the code below:
https://github.com/openstack/dragonflow/blob/master/dragonflow/controller/topology.py#L63

However, in the log, you can see that there's one important message that wasn't processed:

2016-04-10 16:27:45.243 INFO dragonflow.db.api_nb [-] Pushing Update to Queue: Action:update, Table:ovsinterface, Key:c594bfc3-2cfe-4ce6-a60d-834df7e2da38 Value:uuid:c594bfc3-2cfe-4ce6-a60d-834df7e2da38, ofport:7, name:tap87e9d127-a3, admin_state:up, type:vm, iface_id:87e9d127-a395-410d-87a8-cad4e3a413cd, peer:, attached_mac:fa:16:3e:1c:88:92, remote_ip:, tunnel_type: Topic:None

This message is an update message that indicates that the VM port is ready. The admin_state is up while the ofport number is assigned. The problem is here. It is not respected.

We can assume that if it is respected:

There should be a funcitonal call from distributed topology: ovs_port_updated with the right ofport number. As a result, it will call _vm_port_updated.
https://github.com/openstack/dragonflow/blob/master/dragonflow/controller/topology.py#L...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to dragonflow (master)

Reviewed: https://review.openstack.org/303284
Committed: https://git.openstack.org/cgit/openstack/dragonflow/commit/?id=6e1583cc0c048f4487fe5ba71ed48367d7a0ebd9
Submitter: Jenkins
Branch: master

commit 6e1583cc0c048f4487fe5ba71ed48367d7a0ebd9
Author: Li Ma <email address hidden>
Date: Fri Apr 8 16:11:07 2016 +0800

    Fix port update event loss

    When the port update event is respected, the logical port is successfully
    configured by local controller.

    This patch also fixes some vague logs.

    A detailed analysis is in the bug thread.

    Closes-Bug: #1567773
    Change-Id: If7f2515e21d503b8683a906c4285a50e9fadd263

Changed in dragonflow:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.