[RFE] openvswitch-agent support rootwrap daemon when hypervisor is XenServer

Bug #1585510 reported by huan
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Wishlist
Jianghua Wang
oslo.rootwrap
Won't Fix
Undecided
Unassigned

Bug Description

As titled, when XenServer is hypervisor we want to implement rootwrap daemon mode in neutron-openvswitch-agent which runs in compute node.

neutron-openvswitch-agent which runs in compute node(DomU) cannot support rootwrap daemon mode. This is because XenServer has the seperation of Dom0(privileged domain) and DomU(user domain), br-int bridge of neutron-openvswitch-agent(in compute node) resides in Dom0, so all the ovs-vsctl/ovs-ofctl/iptables/ipset commands executed by neutron-openvswitch-agent(in compute node) need to be executed in Dom0 not DomU which is different with other hypervisors.

https://github.com/openstack/neutron/blob/master/bin/neutron-rootwrap-xen-dom0 is current implementation but cannot support rootwrap daemon.

We noticed rootwrap produces significant performance overhead and We want to implement the rootwrap daemon mode when XenServer is hypervisor to improve the performance.

Also, we discoverde that calls to netwrap (and creation of lots of sessions) are causing huge logging in dom0. Logrotate can handle those logs, but it will make diagnosis of issues very difficult indeed due to the very regular rotations.

Also, it seems that perhaps the excessive logging is causing the host to be **very** slow downloading an image from glance due to contention on the disk (looking at iostat, %iowait is up over 60% the majority of the time, sometimes up to 90%)

So, it's not stable and strong enough for a production OpenStack environment.

Proposal: subclass and override some class/functions from oslo.rootwrap to achive the goal. Actually I have did the POC which can work well.

Tags: rfe
Changed in neutron:
importance: Undecided → Wishlist
tags: added: rfe
Changed in neutron:
status: New → Confirmed
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Can you share the POC? Are you modifying oslo.rootwrap or the neutron ovs agent? AFAIK XenServer has not switched to 2.7 yet.

Changed in neutron:
status: Confirmed → Incomplete
Revision history for this message
huan (huan-xie) wrote :

Sure, I can share POC code, I didn't modify oslo.rootwrap, just added new function code in ovs agent
POC: https://review.openstack.org/#/c/321415/
I have tested with POC code that the way of subclass from oslo.rootwrap can work well at the moment.
For further offical code and verification, we can use XenServer neutron CI, we have deployed XenServer neutron CI and this CI is under testing internally, will make this CI publicly soon

Revision history for this message
huan (huan-xie) wrote :

Updated, add some issues we found without netwrap daemon for XenServer ovs-agent in compute node.

Calls to netwrap (and creation of lots of sessions) are causing huge logging in dom0. Logrotate can handle those logs, but it will make diagnosis of issues very difficult indeed due to the very regular rotations.
And it seems the excessive logging is causing the host to be **very** slow downloading an image from glance due to contention on the disk (looking at iostat, %iowait is up over 60% the majority of the time, sometimes up to 90%)

description: updated
huan (huan-xie)
Changed in neutron:
assignee: nobody → huan (huan-xie)
Revision history for this message
huan (huan-xie) wrote :

We also noticed that, neutron + XenServer suffered a lot from openstack/requirements repo changes, e.g. currently, there is a ryu version changes https://github.com/openstack/requirements/commit/ebad995f850d7fd69e89ce220300443ac70f2eed,
But neutron's requirements.txt is "ryu>=4.4 # Apache-2.0", which made our neutron CI failed suddently with error:

http://dd6b71949550285df7dc-dda4e480e005aaa13ec303551d2d8155.r49.cf1.rackcdn.com/55/341255/4/silent/dsvm-tempest-neutron-network/ce9f772/logs/screen-q-agt.txt.gz
2016-07-14 05:57:05.587 10810 DEBUG neutron.agent.linux.utils [-] Running command: ['/usr/local/bin/neutron-rootwrap-xen-dom0', '/etc/neutron/rootwrap.conf', 'ovs-vsctl', '--timeout=10', '--oneline', '--format=json', '--', '--columns=ofport', 'list', 'Interface', 'xapi4'] create_process /opt/stack/new/neutron/neutron/agent/linux/utils.py:83
2016-07-14 05:57:06.387 10810 ERROR neutron.agent.ovsdb.impl_vsctl [-] Unable to execute ['ovs-vsctl', '--timeout=10', '--oneline', '--format=json', '--', '--columns=ofport', 'list', 'Interface', 'xapi4']. Exception: Exit code: 1; Stdin: ; Stdout: ; Stderr: Traceback (most recent call last):
  File "/usr/local/bin/neutron-rootwrap-xen-dom0", line 4, in <module>
    __import__('pkg_resources').require('neutron==9.0.0.0b3.dev8')
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2927, in <module>
    @_call_aside
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2913, in _call_aside
    f(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2940, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 637, in _build_master
    return cls._build_from_requirements(__requires__)
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 650, in _build_from_requirements
    dists = ws.resolve(reqs, Environment())
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 829, in resolve
    raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'ryu>=4.4' distribution was not found and is required by neutron

So, if we can get daemon mode with xenserver as hypervisor, we can avoid this problem since we don't use /user/local/bin/neutron-rootwrap-xen-dom0 directly.

Also, as mentioned with above comments, we can get performance improvements

Changed in neutron:
assignee: huan (huan-xie) → Jianghua Wang (wjh-fresh)
Revision history for this message
Jianghua Wang (wjh-fresh) wrote :

As huan's working on other assignment, I will continue this work.

After discussion with huan and Bob. There is another idea: As we run commands in dom0 via Xapi calling, it seems we already have a daemon (the xapi daemon in dom0). so that we needn't use the oslo.rootwrap. what we need is to create a client which hold the session for consistent. So it would be:
1. neutron agent imports the new client module;
2. once need run command on dom0, get client (single instance in one service and hold a consistent xapi session) to call plugin which runs commands in dom0.

it would be very similar as oslo.rootwrap but in which both the client and daemon run in the same compute VM. and oslo.rootwarp use a named socket to exchange "commands" between the client and daemon.

Here is the prototype code which seems work as expected.
https://review.openstack.org/#/c/390931/

Any comments are welcome.

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

As I stated in the PoC patch, I don't believe neutron is the right place to maintain the code. Instead, oslo.rootwrap should get support for dom0 hypervisors.

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

We may get back to the bug in neutron scope if and when the solution implemented in oslo scope will require some small integration bits in neutron, like we have for neutron-rootwrap wrapper in setup.cfg.

Changed in neutron:
status: Incomplete → Opinion
Revision history for this message
huan (huan-xie) wrote :

Thanks Ihar!

Revision history for this message
Jianghua Wang (wjh-fresh) wrote :

Thanks Ihar for the comments. I will follow up the discussion in the existing ML.

Revision history for this message
Jianghua Wang (wjh-fresh) wrote :

Per the discussion result from the dev ML, there are 3 options for this issue and finally reached agreement to go with option 1).
1) Get Neutron to call XenAPI directly rather than trying to use a daemon - the session management would move from neutron-rootwrap-xen-dom0 into xen_rootwrap_client.py (perhaps this could be better named)
2) Get Neutron to call a local rootwrap daemon (as per the current implementation) which maintains a pool of connections and can efficiently call through to XenAPI
3) Extend oslo.rootwrap (and I presume also privsep) to know that some commands can run in different places, and put the logic for connecting to those different places in there.

See the mail thread:
http://lists.openstack.org/pipermail/openstack-dev/2016-November/106726.html

Revision history for this message
Jianghua Wang (wjh-fresh) wrote :

The code change is ready for review.
https://review.openstack.org/#/c/390931

It also passed the CI test with the change enabled.

Please help to check if it's ok for proceeding. Thanks.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Huan Xie (<email address hidden>) on branch: master
Review: https://review.openstack.org/321415
Reason: Abandon this to use the other implementation https://review.openstack.org/#/c/390931/

Changed in neutron:
status: Opinion → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/390931
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=8047da17db2d4d50f797d99880f3b76d0eba2084
Submitter: Jenkins
Branch: master

commit 8047da17db2d4d50f797d99880f3b76d0eba2084
Author: Jianghua Wang <email address hidden>
Date: Thu Oct 27 00:43:11 2016 +0800

    XenAPI: Support daemon mode for rootwrap

    For Neutron's compute agent in a XenServer's compute node, the commands
    actually need run in Dom0. Currently XenServer only supports rootwrap
    for that purpose by invoking a script which invokes XenAPI to execute
    commands in dom0. There are much performance overhead due to it requires
    parsing on the script and the configuration file every time running
    commands.

    This change is to support daemon mode with which each agent service will
    call XenAPI directly to execute commands in dom0. And it will keep the
    single XenAPI session.

    DocImpact: Need update the following configuration.

    file: /etc/neutron/plugins/ml2/openvswitch_agent.ini
    [agent]
    root_helper_daemon = xenapi_root_helper
    [xenapi]
    connection_url = http://169.254.0.1
    connection_username = root
    connection_password = xenroot

    Closes-Bug: #1585510
    Change-Id: I684034359fe0571bc92dbcf342a9821553b1da35

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 10.0.0.0b3

This issue was fixed in the openstack/neutron 10.0.0.0b3 development milestone.

tags: added: neutron-proactive-backport-potential
tags: removed: neutron-proactive-backport-potential
Revision history for this message
Ben Nemec (bnemec) wrote :

Looks like this was fixed in neutron. Let me know if there's anything left to be done on the oslo side. Thanks.

Changed in oslo.rootwrap:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.