Failure to launch neutron-openvswitch-agent after tripleo deployment

Bug #1649124 reported by David Hill
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Invalid
High
David Hill

Bug Description

Failure to launch neutron-openvswitch-agent after tripleo deployment ... I am able to manually start it and start an overcloud deployment which will later on fail but that's another issue.

2016-12-11 14:15:41.228 8272 INFO neutron.common.config [-] Logging enabled!
2016-12-11 14:15:41.229 8272 INFO neutron.common.config [-] /usr/bin/neutron-openvswitch-agent version 10.0.0.0b2.dev216
2016-12-11 14:15:41.234 8272 WARNING oslo_config.cfg [-] Option "rabbit_hosts" from group "oslo_messaging_rabbit" is deprecated for removal. Its value may be silently ignored in the future.
2016-12-11 14:15:41.235 8272 WARNING oslo_config.cfg [-] Option "rabbit_password" from group "oslo_messaging_rabbit" is deprecated for removal. Its value may be silently ignored in the future.
2016-12-11 14:15:41.235 8272 WARNING oslo_config.cfg [-] Option "rabbit_userid" from group "oslo_messaging_rabbit" is deprecated for removal. Its value may be silently ignored in the future.
2016-12-11 14:15:41.239 8272 INFO ryu.base.app_manager [-] loading app neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp
2016-12-11 14:15:41.553 8272 INFO ryu.base.app_manager [-] loading app ryu.app.ofctl.service
2016-12-11 14:15:41.555 8272 INFO ryu.base.app_manager [-] loading app ryu.controller.ofp_handler
2016-12-11 14:15:41.556 8272 INFO ryu.base.app_manager [-] instantiating app neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp of OVSNeutronAgentRyuApp
2016-12-11 14:15:41.557 8272 INFO ryu.base.app_manager [-] instantiating app ryu.controller.ofp_handler of OFPHandler
2016-12-11 14:15:41.557 8272 INFO ryu.base.app_manager [-] instantiating app ryu.app.ofctl.service of OfctlService
2016-12-11 14:15:41.583 8272 WARNING neutron.agent.ovsdb.native.vlog [-] tcp:127.0.0.1:6640: send error: Connection refused
2016-12-11 14:15:41.747 8272 INFO oslo_rootwrap.client [-] Spawned new rootwrap daemon process with pid=8418
2016-12-11 14:15:51.779 8272 ERROR neutron.agent.ovsdb.impl_vsctl [-] Unable to execute ['ovs-vsctl', '--timeout=10', '--oneline', '--format=json', '--', '--id=@manager', 'create', 'Manager', 'target="ptcp:6640:127.0.0.1"', '--', 'add', 'Open_vSwitch', '.', 'manager_options', '@manager']. Exception: Exit code: -14; Stdin: ; Stdout: ; Stderr: 2016-12-11T19:15:51Z|00002|fatal_signal|WARN|terminating with signal 14 (Alarm clock)

2016-12-11 14:15:51.849 8272 INFO neutron.agent.ovsdb.native.vlog [-] tcp:127.0.0.1:6640: connecting...
2016-12-11 14:15:51.849 8272 INFO neutron.agent.ovsdb.native.vlog [-] tcp:127.0.0.1:6640: connected
2016-12-11 14:16:01.881 8272 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp [-] Agent main thread died of an exception
2016-12-11 14:16:01.881 8272 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp Traceback (most recent call last):
2016-12-11 14:16:01.881 8272 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_ryuapp.py", line 42, in agent_main_wrapper
2016-12-11 14:16:01.881 8272 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp ovs_agent.main(bridge_classes)
2016-12-11 14:16:01.881 8272 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 2149, in main
2016-12-11 14:16:01.881 8272 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp agent = OVSNeutronAgent(bridge_classes, cfg.CONF)
2016-12-11 14:16:01.881 8272 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 169, in __init__
2016-12-11 14:16:01.881 8272 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp self.setup_integration_br()
2016-12-11 14:16:01.881 8272 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 154, in wrapper
2016-12-11 14:16:01.881 8272 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp return f(*args, **kwargs)
2016-12-11 14:16:01.881 8272 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 977, in setup_integration_br
2016-12-11 14:16:01.881 8272 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp self.int_br.create()
2016-12-11 14:16:01.881 8272 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp File "/usr/lib/python2.7/site-packages/neutron/agent/common/ovs_lib.py", line 220, in create
2016-12-11 14:16:01.881 8272 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp FAILMODE_SECURE))
2016-12-11 14:16:01.881 8272 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp File "/usr/lib/python2.7/site-packages/neutron/agent/ovsdb/api.py", line 76, in __exit__
2016-12-11 14:16:01.881 8272 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp self.result = self.commit()
2016-12-11 14:16:01.881 8272 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp File "/usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py", line 73, in commit
2016-12-11 14:16:01.881 8272 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp 'timeout': self.timeout})
2016-12-11 14:16:01.881 8272 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp TimeoutException: Commands [AddBridgeCommand(datapath_type=system, may_exist=True, name=br-int)] exceeded timeout 10 seconds
2016-12-11 14:16:01.881 8272 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp
2016-12-11 14:16:01.885 8272 ERROR neutron.agent.ovsdb.impl_idl [-] Post-commit checks failed
2016-12-11 14:16:01.885 8272 ERROR neutron.agent.ovsdb.impl_idl Traceback (most recent call last):
2016-12-11 14:16:01.885 8272 ERROR neutron.agent.ovsdb.impl_idl File "/usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py", line 152, in post_commit
2016-12-11 14:16:01.885 8272 ERROR neutron.agent.ovsdb.impl_idl self.do_post_commit(txn)
2016-12-11 14:16:01.885 8272 ERROR neutron.agent.ovsdb.impl_idl File "/usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py", line 174, in do_post_commit
2016-12-11 14:16:01.885 8272 ERROR neutron.agent.ovsdb.impl_idl 'timeout': self.timeout})
2016-12-11 14:16:01.885 8272 ERROR neutron.agent.ovsdb.impl_idl TimeoutException: Commands [AddBridgeCommand(datapath_type=system, may_exist=True, name=br-int)] exceeded timeout 10 seconds post-commit
2016-12-11 14:16:01.885 8272 ERROR neutron.agent.ovsdb.impl_idl
2016-12-11 14:16:01.886 8272 ERROR ryu.lib.hub [-] hub: uncaught exception: Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 54, in _launch
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/ryu/base/app_manager.py", line 545, in close
    self.uninstantiate(app_name)
  File "/usr/lib/python2.7/site-packages/ryu/base/app_manager.py", line 528, in uninstantiate
    app = self.applications.pop(name)
KeyError: 'ofctl_service'

2016-12-11 14:16:01.977 8272 INFO oslo_rootwrap.client [-] Stopping rootwrap daemon process with pid=8418

Revision history for this message
Emilien Macchi (emilienm) wrote :

This bug report miss details, like versions and environment. So far our CI has been able to deploy undercloud with OVS agent successfully, I'm not sure what is the bug here.

Changed in tripleo:
status: New → Invalid
Revision history for this message
David Hill (david-hill-ubisoft) wrote : Re: [Bug 1649124] Re: Failure to launch neutron-openvswitch-agent after tripleo deployment
Download full text (8.7 KiB)

The undercloud is deployed alright but as you can see, the neutron-openvswitch-agent dies on startup and the overcloud won't deploy properly because of this ... I'll attach the version later on today .

David Hill

> On Dec 12, 2016, at 21:15, Emilien Macchi <email address hidden> wrote:
>
> This bug report miss details, like versions and environment. So far our
> CI has been able to deploy undercloud with OVS agent successfully, I'm
> not sure what is the bug here.
>
> ** Changed in: tripleo
> Status: New => Invalid
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1649124
>
> Title:
> Failure to launch neutron-openvswitch-agent after tripleo deployment
>
> Status in tripleo:
> Invalid
>
> Bug description:
> Failure to launch neutron-openvswitch-agent after tripleo deployment
> ... I am able to manually start it and start an overcloud deployment
> which will later on fail but that's another issue.
>
> 2016-12-11 14:15:41.228 8272 INFO neutron.common.config [-] Logging enabled!
> 2016-12-11 14:15:41.229 8272 INFO neutron.common.config [-] /usr/bin/neutron-openvswitch-agent version 10.0.0.0b2.dev216
> 2016-12-11 14:15:41.234 8272 WARNING oslo_config.cfg [-] Option "rabbit_hosts" from group "oslo_messaging_rabbit" is deprecated for removal. Its value may be silently ignored in the future.
> 2016-12-11 14:15:41.235 8272 WARNING oslo_config.cfg [-] Option "rabbit_password" from group "oslo_messaging_rabbit" is deprecated for removal. Its value may be silently ignored in the future.
> 2016-12-11 14:15:41.235 8272 WARNING oslo_config.cfg [-] Option "rabbit_userid" from group "oslo_messaging_rabbit" is deprecated for removal. Its value may be silently ignored in the future.
> 2016-12-11 14:15:41.239 8272 INFO ryu.base.app_manager [-] loading app neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp
> 2016-12-11 14:15:41.553 8272 INFO ryu.base.app_manager [-] loading app ryu.app.ofctl.service
> 2016-12-11 14:15:41.555 8272 INFO ryu.base.app_manager [-] loading app ryu.controller.ofp_handler
> 2016-12-11 14:15:41.556 8272 INFO ryu.base.app_manager [-] instantiating app neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp of OVSNeutronAgentRyuApp
> 2016-12-11 14:15:41.557 8272 INFO ryu.base.app_manager [-] instantiating app ryu.controller.ofp_handler of OFPHandler
> 2016-12-11 14:15:41.557 8272 INFO ryu.base.app_manager [-] instantiating app ryu.app.ofctl.service of OfctlService
> 2016-12-11 14:15:41.583 8272 WARNING neutron.agent.ovsdb.native.vlog [-] tcp:127.0.0.1:6640: send error: Connection refused
> 2016-12-11 14:15:41.747 8272 INFO oslo_rootwrap.client [-] Spawned new rootwrap daemon process with pid=8418
> 2016-12-11 14:15:51.779 8272 ERROR neutron.agent.ovsdb.impl_vsctl [-] Unable to execute ['ovs-vsctl', '--timeout=10', '--oneline', '--format=json', '--', '--id=@manager', 'create', 'Manager', 'target="ptcp:6640:127.0.0.1"', '--', 'add', 'Open_vSwitch', '.', 'manager_options', '@manager']. Exception: Exit code: -14; Stdin: ; Stdout: ; Stderr: 2016-...

Read more...

Revision history for this message
David Hill (david-hill-ubisoft) wrote :

I got this issue with newton too now.

Changed in tripleo:
status: Invalid → New
Revision history for this message
David Hill (david-hill-ubisoft) wrote :

Please note that the undercloud deploys alright but this issue prevents nova from being able to find a valid host to deploy baremetal as per this heat logs:

2016-12-13 18:15:08Z [overcloud.Compute.0.NovaCompute]: CREATE_IN_PROGRESS state changed
2016-12-13 18:15:14Z [overcloud.Controller.1.Controller]: CREATE_FAILED ResourceInError: resources.Controller: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"
2016-12-13 18:15:14Z [overcloud.Controller.0.Controller]: CREATE_FAILED ResourceInError: resources.Controller: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"
2016-12-13 18:15:14Z [overcloud.Controller.1]: CREATE_FAILED Resource CREATE failed: ResourceInError: resources.Controller: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"
2016-12-13 18:15:14Z [overcloud.Controller.0]: CREATE_FAILED Resource CREATE failed: ResourceInError: resources.Controller: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"
2016-12-13 18:15:14Z [overcloud.Controller.1]: CREATE_FAILED ResourceInError: resources[1].resources.Controller: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"
2016-12-13 18:15:15Z [overcloud.Controller.0]: CREATE_FAILED CREATE aborted
2016-12-13 18:15:15Z [overcloud.Controller.2]: CREATE_FAILED CREATE aborted
2016-12-13 18:15:15Z [overcloud.Controller]: CREATE_FAILED Resource CREATE failed: ResourceInError: resources[1].resources.Controller: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"
2016-12-13 18:15:16Z [overcloud.Controller]: CREATE_FAILED ResourceInError: resources.Controller.resources[1].resources.Controller: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"
2016-12-13 18:15:17Z [overcloud.Compute]: CREATE_FAILED CREATE aborted
2016-12-13 18:15:17Z [overcloud]: CREATE_FAILED Resource CREATE failed: ResourceInError: resources.Controller.resources[1].resources.Controller: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"

Please note that when manually starting neutron-openvswitch-agent seems to fix this issue ...

Revision history for this message
Ben Nemec (bnemec) wrote :

Interestingly, this error does occur in our ci runs as well: http://logs.openstack.org/65/409165/4/check-tripleo/gate-tripleo-ci-centos-7-ovb-nonha/ebf8bf0/logs/undercloud/var/log/neutron/openvswitch-agent.txt.gz#_2016-12-13_16_29_24_665

However, it looks like ovs-agent gets restarted immediately after and then works fine.

I think to debug this we would need to see the neutron, nova, and ironic logs to see what interaction is causing the no valid host failures.

Revision history for this message
David Hill (david-hill-ubisoft) wrote :
Revision history for this message
David Hill (david-hill-ubisoft) wrote :

It seems like this is happening:

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26257 root 10 -10 416308 52272 9268 D 141.4 0.3 27:16.77 ovs-vswitchd

And in journalctl:
Dec 14 00:10:03 undercloud-0-trunk.localdomain ovs-vswitchd[26257]: ovs|04989|netlink_socket|ERR|Dropped 32820 log messages in last 1 seconds (most recently, 0 seconds ago) due to excessive rate
Dec 14 00:10:03 undercloud-0-trunk.localdomain ovs-vswitchd[26257]: ovs|04990|netlink_socket|ERR|transaction error (Permission denied)
Dec 14 00:10:04 undercloud-0-trunk.localdomain ovs-vswitchd[26257]: ovs|04991|netlink_socket|ERR|Dropped 41365 log messages in last 1 seconds (most recently, 0 seconds ago) due to excessive rate
Dec 14 00:10:04 undercloud-0-trunk.localdomain ovs-vswitchd[26257]: ovs|04992|netlink_socket|ERR|transaction error (Permission denied)
Dec 14 00:10:05 undercloud-0-trunk.localdomain ovs-vswitchd[26257]: ovs|04993|netlink_socket|ERR|Dropped 39747 log messages in last 1 seconds (most recently, 0 seconds ago) due to excessive rate
Dec 14 00:10:05 undercloud-0-trunk.localdomain ovs-vswitchd[26257]: ovs|04994|netlink_socket|ERR|transaction error (Permission denied)
Dec 14 00:10:06 undercloud-0-trunk.localdomain ovs-vswitchd[26257]: ovs|04995|socket_util|ERR|Dropped 201303 log messages in last 12 seconds (most recently, 0 seconds ago) due to excessive rate
Dec 14 00:10:06 undercloud-0-trunk.localdomain ovs-vswitchd[26257]: ovs|04996|socket_util|ERR|getsockopt(SO_RCVBUF): Permission denied
Dec 14 00:10:06 undercloud-0-trunk.localdomain ovs-vswitchd[26257]: ovs|04997|netlink_socket|ERR|Dropped 30751 log messages in last 1 seconds (most recently, 0 seconds ago) due to excessive rate
Dec 14 00:10:06 undercloud-0-trunk.localdomain ovs-vswitchd[26257]: ovs|04998|netlink_socket|ERR|transaction error (Permission denied)
Dec 14 00:10:07 undercloud-0-trunk.localdomain ovs-vswitchd[26257]: ovs|04999|netlink_socket|ERR|Dropped 49549 log messages in last 1 seconds (most recently, 0 seconds ago) due to excessive rate
Dec 14 00:10:07 undercloud-0-trunk.localdomain ovs-vswitchd[26257]: ovs|05000|netlink_socket|ERR|transaction error (Permission denied)
Dec 14 00:10:08 undercloud-0-trunk.localdomain ovs-vswitchd[26257]: ovs|05001|netlink_socket|ERR|Dropped 34087 log messages in last 1 seconds (most recently, 0 seconds ago) due to excessive rate
Dec 14 00:10:08 undercloud-0-trunk.localdomain ovs-vswitchd[26257]: ovs|05002|netlink_socket|ERR|transaction error (Permission denied)

I'm not sure if it's related yet.

Revision history for this message
David Hill (david-hill-ubisoft) wrote :
Revision history for this message
David Hill (david-hill-ubisoft) wrote :
Download full text (4.4 KiB)

I have the feeling that this neutron-ovs-agent is never starting properly because of this openvswitch issue ...

[root@undercloud-0-trunk stack]# systemctl | grep neutron
  neutron-dhcp-agent.service loaded active running OpenStack Neutron DHCP Agent
  neutron-server.service loaded active running OpenStack Neutron Server

I don't see the process as running in systemctl and:

[root@undercloud-0-trunk stack]# systemctl | grep neutron
  neutron-dhcp-agent.service loaded active running OpenStack Neutron DHCP Agent
  neutron-server.service loaded active running OpenStack Neutron Server
[root@undercloud-0-trunk stack]# systemctl status neutron-openvswitch-agent
● neutron-openvswitch-agent.service - OpenStack Neutron Open vSwitch Agent
   Loaded: loaded (/usr/lib/systemd/system/neutron-openvswitch-agent.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Wed 2016-12-14 00:14:46 EST; 13min ago
 Main PID: 9667 (code=exited, status=0/SUCCESS)

Dec 14 00:14:22 undercloud-0-trunk.localdomain neutron-enable-bridge-firewall.sh[9661]: net.bridge.bridge-nf-call-arptables = 1
Dec 14 00:14:22 undercloud-0-trunk.localdomain neutron-enable-bridge-firewall.sh[9661]: net.bridge.bridge-nf-call-iptables = 1
Dec 14 00:14:22 undercloud-0-trunk.localdomain systemd[1]: Started OpenStack Neutron Open vSwitch Agent.
Dec 14 00:14:22 undercloud-0-trunk.localdomain neutron-openvswitch-agent[9667]: Guru meditation now registers SIGUSR1 and SIGUSR2 by default for backward compatibility. SIGUSR1 will no longer be registered in a future rele...rate reports.
Dec 14 00:14:23 undercloud-0-trunk.localdomain neutron-openvswitch-agent[9667]: Option "verbose" from group "DEFAULT" is deprecated for removal. Its value may be silently ignored in the future.
Dec 14 00:14:23 undercloud-0-trunk.localdomain neutron-openvswitch-agent[9667]: Option "rpc_backend" from group "DEFAULT" is deprecated for removal. Its value may be silently ignored in the future.
Dec 14 00:14:25 undercloud-0-trunk.localdomain neutron-openvswitch-agent[9667]: Option "notification_driver" from group "DEFAULT" is deprecated. Use option "driver" from group "oslo_messaging_notifications".
Dec 14 00:14:25 undercloud-0-trunk.localdomain neutron-openvswitch-agent[9667]: Could not load neutron.openstack.common.notifier.rpc_notifier
Dec 14 00:14:25 undercloud-0-trunk.localdomain sudo[9796]: neutron : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf
Dec 14 00:14:26 undercloud-0-trunk.localdomain ovs-vsctl[9804]: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 --oneline --format=json -- --id=@manager create Manager "target=\"ptcp:6640:127.0.0.1\"" -- add O...ptions @manager
Hint: Some lines were ellipsized, use -l to show in full.

Restarting it seems to solve it:

[root@undercloud-0-trunk stack]# systemctl restart neutron-openvswitch-agent && systemctl status neutron-openvswitch-agent
● neutron-openvs...

Read more...

Revision history for this message
David Hill (david-hill-ubisoft) wrote :

From the logs:

2017-01-03 19:37:50.184 23313 ERROR neutron.agent.ovsdb.impl_vsctl [-] Unable to execute ['ovs-vsctl', '--timeout=10', '--oneline', '--format=json', '--', '--id=@manager', 'create', 'Manager', 'target="ptcp:6640:127.0.0.1"', '--', 'add', 'Open_vSwitch', '.', 'manager_options', '@manager']. Exception: Exit code: -14; Stdin: ; Stdout: ; Stderr: 2017-01-
1

If I try to run this manually it also fails:
[root@undercloud-0-newton neutron]# ovs-vsctl --timeout=10 --oneline --format=json -- --id=@manager create Manager target="ptcp:6640:127.0.0.1" -- add Open_vSwitch . manager_options @manager
ovs-vsctl: ptcp:6640:127.0.0.1: unexpected ":" parsing string

Revision history for this message
David Hill (david-hill-ubisoft) wrote :

I'm not sure if this is the cause of the problem but if I escape the " when trying to run it like this:

ovs-vsctl --timeout=10 --oneline --format=json -- --id=@manager create Manager target=\"ptcp:6640:127.0.0.1\" -- add Open_vSwitch . manager_options @manager

I then get this error message:
ovs-vsctl: transaction error: {"details":"Transaction causes multiple rows in \"Manager\" table to have identical values (\"ptcp:6640:127.0.0.1\") for index on column \"target\". First row, with UUID 4d4e7ccb-48f5-48ab-a901-ec57584e71d3, existed in the database before this transaction and was not modified by the transaction. Second row, with UUID 774a7c64-442a-4f0a-9568-d0bc88dc1182, was inserted by this transaction.","error":"constraint violation"}

Revision history for this message
David Hill (david-hill-ubisoft) wrote :

I've modified /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_vsctl.py so target is escaped like this:

    def add_manager(self, connection_uri):
        # This will add a new manager without overriding existing ones.
        conn_uri = 'target=\"%s\"' % connection_uri
        args = ['create', 'Manager', conn_uri, '--', 'add', 'Open_vSwitch',
                '.', 'manager_options', '@manager']
        return BaseCommand(self.context, '--id=@manager', args=args)

Revision history for this message
David Hill (david-hill-ubisoft) wrote :

That part was added between Mitaka and Newton :

diff --git a/neutron/agent/ovsdb/impl_vsctl.py b/neutron/agent/ovsdb/impl_vsctl.py
index 89df8df..51d19cc 100644
--- a/neutron/agent/ovsdb/impl_vsctl.py
+++ b/neutron/agent/ovsdb/impl_vsctl.py
@@ -163,6 +163,21 @@ class OvsdbVsctl(ovsdb.API):
     def transaction(self, check_error=False, log_errors=True, **kwargs):
         return Transaction(self.context, check_error, log_errors, **kwargs)

+ def add_manager(self, connection_uri):
+ # This will add a new manager without overriding existing ones.
+ conn_uri = 'target="%s"' % connection_uri
+ args = ['create', 'Manager', conn_uri, '--', 'add', 'Open_vSwitch',
+ '.', 'manager_options', '@manager']
+ return BaseCommand(self.context, '--id=@manager', args=args)
+
+ def get_manager(self):
+ return MultiLineCommand(self.context, 'get-manager')
+
+ def remove_manager(self, connection_uri):
+ args = ['get', 'Manager', connection_uri, '--', 'remove',
+ 'Open_vSwitch', '.', 'manager_options', '@manager']
+ return BaseCommand(self.context, '--id=@manager', args=args)
+
     def add_br(self, name, may_exist=True, datapath_type=None):
         opts = ['--may-exist'] if may_exist else None
         params = [name]

Revision history for this message
David Hill (david-hill-ubisoft) wrote :

It created the following Manager:

418411da-e402-4276-9b82-ada260184b4f
    Manager "ptcp:6640:127.0.0.1"
    Bridge br-int
        fail_mode: secure
        Port br-int
            Interface br-int
                type: internal
    Bridge br-ctlplane
        fail_mode: standalone
        Port "eth1"
            Interface "eth1"
        Port br-ctlplane
            Interface br-ctlplane
                type: internal
    ovs_version: "2.5.0"

Revision history for this message
David Hill (david-hill-ubisoft) wrote :

The remaining problem I have is caused by Selinux:

type=AVC msg=audit(1483659047.414:47052089): avc: denied { getopt } for pid=15904 comm="ovs-vswitchd" scontext=system_u:system_r:openvswitch_t:s0 tcontext=system_u:system_r:openvswitch_t:s0 tclass=netlink_generic_socket permissive=0
type=AVC msg=audit(1483659047.414:47052090): avc: denied { getopt } for pid=15904 comm="ovs-vswitchd" scontext=system_u:system_r:openvswitch_t:s0 tcontext=system_u:system_r:openvswitch_t:s0 tclass=netlink_generic_socket permissive=0
type=AVC msg=audit(1483659047.414:47052091): avc: denied { getopt } for pid=15904 comm="ovs-vswitchd" scontext=system_u:system_r:openvswitch_t:s0 tcontext=system_u:system_r:openvswitch_t:s0 tclass=netlink_generic_socket permissive=0
type=AVC msg=audit(1483659047.414:47052092): avc: denied { getopt } for pid=15904 comm="ovs-vswitchd" scontext=system_u:system_r:openvswitch_t:s0 tcontext=system_u:system_r:openvswitch_t:s0 tclass=netlink_generic_socket permissive=0
type=AVC msg=audit(1483659047.414:47052093): avc: denied { getopt } for pid=15904 comm="ovs-vswitchd" scontext=system_u:system_r:openvswitch_t:s0 tcontext=system_u:system_r:openvswitch_t:s0 tclass=netlink_generic_socket permissive=0
type=AVC msg=audit(1483659047.416:47052094): avc: denied { getopt } for pid=15904 comm="ovs-vswitchd" scontext=system_u:system_r:openvswitch_t:s0 tcontext=system_u:system_r:openvswitch_t:s0 tclass=netlink_generic_socket permissive=0
type=AVC msg=audit(1483659047.416:47052095): avc: denied { getopt } for pid=15904 comm="ovs-vswitchd" scontext=system_u:system_r:openvswitch_t:s0 tcontext=system_u:system_r:openvswitch_t:s0 tclass=netlink_generic_socket permissive=0

Revision history for this message
David Hill (david-hill-ubisoft) wrote :

I'm not sure here if we need to open a new bug for the selinux permissions so here is my temporary selinux patch:

[root@undercloud-0-trunk devel]# cat fix.te

policy_module(fix,1.0.0)

require {
 type neutron_t;
 type nsfs_t;
 type openvswitch_t;
 class netlink_generic_socket { connect create getattr getopt setopt };
 class file { open read };
}

#============= neutron_t ==============
allow neutron_t nsfs_t:file { open read };
#
##============= openvswitch_t ==============
allow openvswitch_t self:netlink_generic_socket { connect create getattr getopt setopt };

Changed in tripleo:
importance: Undecided → High
status: New → Triaged
Revision history for this message
David Hill (david-hill-ubisoft) wrote :
Download full text (4.3 KiB)

So I have this error remaining to fix ... still investigating this one:

2017-01-05 19:04:58.020 3492 INFO neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_bridge [-] Bridge br-int has datapath-ID []
2017-01-05 19:04:58.021 3492 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp [-] Agent main thread died of an exception
2017-01-05 19:04:58.021 3492 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp Traceback (most recent call last):
2017-01-05 19:04:58.021 3492 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_ryuapp.py", line 42, in agent_main_wrapper
2017-01-05 19:04:58.021 3492 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp ovs_agent.main(bridge_classes)
2017-01-05 19:04:58.021 3492 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 2149, in main
2017-01-05 19:04:58.021 3492 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp agent = OVSNeutronAgent(bridge_classes, cfg.CONF)
2017-01-05 19:04:58.021 3492 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 169, in __init__
2017-01-05 19:04:58.021 3492 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp self.setup_integration_br()
2017-01-05 19:04:58.021 3492 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 154, in wrapper
2017-01-05 19:04:58.021 3492 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp return f(*args, **kwargs)
2017-01-05 19:04:58.021 3492 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 987, in setup_integration_br
2017-01-05 19:04:58.021 3492 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp self.int_br.setup_default_table()
2017-01-05 19:04:58.021 3492 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/br_int.py", line 43, in setup_default_table
2017-01-05 19:04:58.021 3492 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp self.install_normal()
2017-01-05 19:04:58.021 3492 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ofswitch.py", line 160, in install_normal
2017-01-05 19:04:58.021 3492 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp (_dp, ofp, _ofpp) = self...

Read more...

Revision history for this message
David Hill (david-hill-ubisoft) wrote :

2017-01-06T18:33:25.922Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log
2017-01-06T18:33:25.925Z|00002|ovs_numa|INFO|Discovered 4 CPU cores on NUMA node 0
2017-01-06T18:33:25.925Z|00003|ovs_numa|INFO|Discovered 1 NUMA nodes and 4 CPU cores
2017-01-06T18:33:25.925Z|00004|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting...
2017-01-06T18:33:25.925Z|00005|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected
2017-01-06T18:33:25.929Z|00006|netlink_socket|ERR|transaction error (Permission denied)
2017-01-06T18:33:25.929Z|00007|dpif_netlink|ERR|Generic Netlink family 'ovs_datapath' does not exist. The Open vSwitch kernel module is probably not loaded.
2017-01-06T18:33:25.929Z|00008|dpif|WARN|failed to enumerate system datapaths: Permission denied
2017-01-06T18:33:25.929Z|00009|dpif|WARN|failed to create datapath ovs-system: Permission denied
2017-01-06T18:33:25.929Z|00010|ofproto_dpif|ERR|failed to open datapath of type system: Permission denied
2017-01-06T18:33:25.930Z|00011|ofproto|ERR|failed to open datapath br-int: Permission denied
2017-01-06T18:33:25.930Z|00012|bridge|ERR|failed to create bridge br-int: Permission denied
2017-01-06T18:33:25.930Z|00013|bridge|INFO|ovs-vswitchd (Open vSwitch) 2.5.0
2017-01-06T18:33:29.934Z|00014|dpif|WARN|failed to enumerate system datapaths: Permission denied
2017-01-06T18:33:29.934Z|00015|dpif|WARN|failed to create datapath ovs-system: Permission denied
2017-01-06T18:33:29.934Z|00016|ofproto_dpif|ERR|failed to open datapath of type system: Permission denied
2017-01-06T18:33:29.934Z|00017|ofproto|ERR|failed to open datapath br-int: Permission denied
2017-01-06T18:33:29.934Z|00018|bridge|ERR|failed to create bridge br-int: Permission denied
2017-01-06T18:33:30.053Z|00019|dpif|WARN|failed to enumerate system datapaths: Permission denied
2017-01-06T18:33:30.053Z|00020|dpif|WARN|failed to create datapath ovs-system: Permission denied
2017-01-06T18:33:30.053Z|00021|ofproto_dpif|ERR|failed to open datapath of type system: Permission denied
2017-01-06T18:33:30.053Z|00022|ofproto|ERR|failed to open datapath br-int: Permission denied
2017-01-06T18:33:30.053Z|00023|bridge|ERR|failed to create bridge br-int: Permission denied

Revision history for this message
David Hill (david-hill-ubisoft) wrote :

The patch on comment #12 fixes my issue with Newton ... I'll try reproducing this with Trunk again just in case I did something wrong.

Changed in tripleo:
assignee: nobody → David Hill (david-hill-ubisoft)
affects: tripleo → neutron
Changed in neutron:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/419164

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/newton)

Change abandoned by David Hill (<email address hidden>) on branch: stable/newton
Review: https://review.openstack.org/419164
Reason: This commit will need to be merged in newton too if it ever gets attention.

Changed in neutron:
milestone: none → ocata-rc1
Revision history for this message
David Hill (david-hill-ubisoft) wrote :

We need that on Newton too.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by David Hill (<email address hidden>) on branch: master
Review: https://review.openstack.org/417641
Reason: The problem is due to selinux policies ...

Revision history for this message
David Hill (david-hill-ubisoft) wrote :

This solves the issue ... This is simply a selinux matter.

policy_module(fix,0.0.1)
require {
 type neutron_t;
 type nsfs_t;
 type openvswitch_t;
 class netlink_generic_socket { connect create getattr getopt setopt read write };
 class file { open read };
}
allow neutron_t nsfs_t:file { open read };
allow openvswitch_t self:netlink_generic_socket { connect create getattr getopt setopt read write};
neutron_systemctl(openvswitch_t)
hostname_exec(openvswitch_t)

Changed in neutron:
status: In Progress → Won't Fix
milestone: ocata-rc1 → none
Revision history for this message
David Hill (david-hill-ubisoft) wrote :

Why is it closed?

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

You said it yourself: it's a selinux matter. What is there to track and solve here in neutron?

My bad, this should have been invalid.

Changed in neutron:
status: Won't Fix → Invalid
Revision history for this message
Christian Berendt (berendt) wrote :

I have a similar issue with a neutron-openvswitch-server communication with a openvswitch on a xenserver, with selinux disabled on bot systems. The abandoned change does not solve the issue for me.

Revision history for this message
Christian Berendt (berendt) wrote :

But I think this is a different issue, just looked similar to me:

2017-02-15 12:32:51.544 6 ERROR neutron.agent.ovsdb.impl_vsctl [-] Unable to execute ['ovs-vsctl', '--timeout=10', '--oneline', '--format=json', '--', '--id=@manager', 'create', 'Manager', 'target="ptcp:6640:169.254.0.1"', '--', 'add', 'Open_vSwitch', '.', 'manager_options', '@manager']. Exception: Exit code: 1; Stdin: ; Stdout: ; Stderr: ovs-vsctl: transaction error: {"details":"Transaction causes multiple rows in \"Manager\" table to have identical values (\"ptcp:6640:169.254.0.1\") for index on column \"target\". First row, with UUID ddfaa272-f2e2-4b43-a057-f51bfbda5e8d, was inserted by this transaction. Second row, with UUID f02b0f45-1df5-4b26-939a-faf5fefb8e87, existed in the database before this transaction and was not modified by the transaction.","error":"constraint violation"}

I think the problem is that the Manager "ptcp:6640:169.254.0.1" already exists on the openvswitch on xenserver and that it has not to be created.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.