database connection failed (Protocol error)

Bug #1405588 reported by Xiang Hui
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
neutron-openvswitch (Juju Charms Collection)
Invalid
Critical
Hua Zhang
nova-compute (Juju Charms Collection)
Invalid
Critical
Unassigned
quantum-gateway (Juju Charms Collection)
Invalid
Critical
Unassigned

Bug Description

Versions:
root@juju-xh-machine-5:~# dpkg -l|grep openvswitch
ii openvswitch-common 2.0.2-0ubuntu0.14.04.1 amd64 Open vSwitch common components
ii openvswitch-switch 2.0.2-0ubuntu0.14.04.1 amd64 Open vSwitch switch implementations

root@juju-xh-machine-5:~# uname -a
Linux juju-precise-machine-5 3.13.0-43-generic #72-Ubuntu SMP Mon Dec 8 19:35:06 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

root@juju-xh-machine-5:~# cat /etc/issue
Ubuntu 14.04.1 LTS \n \l

root@juju-precise-machine-5:/var/log/neutron# dpkg -l|grep neutron
ii neutron-common 1:2014.1.3-0ubuntu1.1 all Neutron is a virtual network service for Openstack - common
ii neutron-plugin-ml2 1:2014.1.3-0ubuntu1.1 all Neutron is a virtual network service for Openstack - ML2 plugin
ii neutron-plugin-openvswitch-agent 1:2014.1.3-0ubuntu1.1 all Neutron is a virtual network service for Openstack - Open vSwitch plugin agent
ii python-neutron 1:2014.1.3-0ubuntu1.1 all Neutron is a virutal network service for Openstack - Python

Erros:
root@juju-xh-machine-5:~# ovs-vsctl show
2014-12-25T09:26:56Z|00001|reconnect|WARN|unix:/var/run/openvswitch/db.sock: connection attempt failed (Protocol error)
ovs-vsctl: unix:/var/run/openvswitch/db.sock: database connection failed (Protocol error)

root@juju-precise-machine-5:~# tail -f /var/log/neutron/openvswitch-agent.log
2014-12-25 09:37:03.990 20991 ERROR neutron.agent.linux.ovsdb_monitor [-] Error received from ovsdb monitor: ovsdb-client: unix:/var/run/openvswitch/db.sock: receive failed (End of file)
2014-12-25 09:44:53.796 20991 ERROR neutron.agent.linux.ovsdb_monitor [-] Error received from ovsdb monitor: 2014-12-25T09:44:53Z|00001|fatal_signal|WARN|terminating with signal 15 (Terminated)

Workaround:
restart openvswitch-switch

But ovs-vsctl will get this error every several minutes, this caused neutron-plugin-openvswitch-agent keep reporting errors since ovs-vsctl is wrong.

The root cause is :
  Neutron agent will call ovs monitor to get interfaces info, but ovs monitor process will be respawned every 'respawn_interval' minutes due to get error output 'sudo: unable to resolve host juju-precise-machine-5' when running command 'sudo neutron-rootwrap /etc/neutron/rootwrap.conf ovsdb-client monitor Interface name,ofport --format=json', after several times the ovs db socket is reconnected for multiple times, however, some of them are not released well, and it goes to the maxim limit of socket number, which leads to connect to ovsdb server failed.

Currently, one possible fix in code is to add hostname to /etc/hosts followed by 127.0.0.1 or localhost to avoid the error 'sudo: unable to resolve host juju-precise-machine-5' in openvswitch charm.

Xiang Hui (xianghui)
description: updated
Xiang Hui (xianghui)
description: updated
Revision history for this message
Hua Zhang (zhhuabj) wrote :

This issue will kill ovs process both in network and compute nodes, I find some logs in /var/log/openvswitch/ovs-vswitchd.log of compute node as below.

2014-12-26T03:59:53.899Z|00032|bridge|INFO|bridge br-int: added interface patch-tun on port 7
2014-12-26T03:59:53.982Z|00033|bridge|INFO|bridge br-tun: added interface patch-int on port 1
2014-12-26T03:59:55.320Z|00034|bridge|INFO|bridge br-tun: added interface gre-0a050041 on port 2
2014-12-26T03:59:55.411Z|00035|bridge|INFO|bridge br-tun: added interface gre-0a050040 on port 3
2014-12-26T04:00:03.253Z|00036|ofproto|INFO|br-int: 4 flow_mods 10 s ago (3 adds, 1 deletes)
2014-12-26T04:00:03.253Z|00037|ofproto|INFO|br-data: 3 flow_mods 10 s ago (2 adds, 1 deletes)
2014-12-26T04:00:04.253Z|00038|ofproto|INFO|br-tun: 18 flow_mods in the 1 s starting 10 s ago (15 adds, 1 deletes, 2 modifications)
2014-12-26T04:55:39.468Z|00001|tunnel(miss_handler)|WARN|receive tunnel port not found (10.5.0.69->10.5.0.66, key=0x2, dp port=7, pkt mark=0)
2014-12-26T04:55:39.468Z|00002|ofproto_dpif_upcall(miss_handler)|INFO|received packet on unassociated datapath port 7
2014-12-26T04:55:40.619Z|00003|tunnel(miss_handler)|WARN|receive tunnel port not found (10.5.0.69->10.5.0.66, key=0x2, dp port=7, pkt mark=0)
2014-12-26T04:55:40.619Z|00004|ofproto_dpif_upcall(miss_handler)|INFO|received packet on unassociated datapath port 7
2014-12-26T04:55:43.484Z|00005|tunnel(miss_handler)|WARN|receive tunnel port not found (10.5.0.69->10.5.0.66, key=0x2, dp port=7, pkt mark=0)
2014-12-26T04:55:43.484Z|00006|ofproto_dpif_upcall(miss_handler)|INFO|received packet on unassociated datapath port 7
2014-12-26T04:55:44.634Z|00007|tunnel(miss_handler)|WARN|receive tunnel port not found (10.5.0.69->10.5.0.66, key=0x2, dp port=7, pkt mark=0)
2014-12-26T04:55:44.634Z|00008|ofproto_dpif_upcall(miss_handler)|INFO|received packet on unassociated datapath port 7
2014-12-26T04:56:12.002Z|00009|tunnel(miss_handler)|WARN|receive tunnel port not found (10.5.0.69->10.5.0.66, key=0x2, dp port=7, pkt mark=0)
2014-12-26T04:56:12.002Z|00010|ofproto_dpif_upcall(miss_handler)|INFO|received packet on unassociated datapath port 7
2014-12-26T04:57:24.068Z|00001|fatal_signal(dispatcher)|WARN|terminating with signal 15 (Terminated)
2014-12-26T04:57:24.080Z|00002|daemon(monitor)|INFO|pid 10607 died, killed (Terminated), exiting

Revision history for this message
Hua Zhang (zhhuabj) wrote :

After watching it carefully, I found two problems:
1, compute node should set hostname, or ovs process will exit due to below error log
   Stderr: 'sudo: unable to resolve host juju-precise-machine-11\n2015-01-04T06:41:49Z|00001|fatal_signal|WARN|terminating with signal 14 (Alarm clock)\n'

2, also should restart ovs process after rescheduling the l3-agent, or ovs will not create the tunnel again.

Xiang Hui (xianghui)
description: updated
Xiang Hui (xianghui)
Changed in quantum-gateway (Juju Charms Collection):
importance: Undecided → Critical
Changed in nova-compute (Juju Charms Collection):
importance: Undecided → Critical
Changed in neutron-openvswitch (Juju Charms Collection):
importance: Undecided → Critical
assignee: nobody → Hua Zhang (zhhuabj)
tags: added: cts openstack
Revision history for this message
James Page (james-page) wrote :

This problem:

sudo: unable to resolve host juju-precise-machine-11

is the root cause of alot of issues in OpenStack deployments; it starts to confuse the calling daemons as they get extra output; having reverse-resolvable hostnames is pretty critical. This is related to bug 1382190 (for the MAAS provider) which has been fixed - I know you are testing ontop of an openstack deployment, so you need to setup DNS resolution to work correctly.

no longer affects: openvswitch (Ubuntu)
Changed in neutron-openvswitch (Juju Charms Collection):
status: New → Invalid
Changed in quantum-gateway (Juju Charms Collection):
status: New → Invalid
Changed in nova-compute (Juju Charms Collection):
status: New → Invalid
Ryan Beisner (1chb1n)
tags: added: reverse-dns uosci
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.