ovs-vswitchd errors at br-ctlplane and br-int

Bug #1816750 reported by Quique Llorente on 2019-02-20
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
High
Unassigned

Bug Description

We are facing some OVS disconnection that end up with some tempest failures

http://logs.openstack.org/32/636232/5/check/tripleo-ci-centos-7-undercloud-containers/0f5c399/logs/undercloud/var/log/journal.txt.gz#_Feb_20_10_02_10

Feb 20 10:02:16 undercloud.localdomain systemd[1]: Starting barbican_worker healthcheck...
Feb 20 10:02:10 undercloud.localdomain ovs-vswitchd[6999]: ovs|00106|rconn|ERR|br-ctlplane<->tcp:127.0.0.1:6633: no response to inactivity probe after 5 seconds, disconnecting
Feb 20 10:02:17 undercloud.localdomain systemd[1]: Starting iscsid healthcheck...
Feb 20 10:02:12 undercloud.localdomain ovs-vswitchd[6999]: ovs|00108|rconn|ERR|br-int<->tcp:127.0.0.1:6633: no response to inactivity probe after 5 seconds, disconnecting
Feb 20 10:02:20 undercloud.localdomain systemd[1]: Starting ironic_inspector_dnsmasq healthcheck...
Feb 20 10:03:15 undercloud.localdomain sshd[78695]: Received disconnect from 189.109.247.148 port 33168:11: Bye Bye [preauth]
Feb 20 10:02:22 undercloud.localdomain systemd[1]: Starting memcached healthcheck...
Feb 20 10:03:15 undercloud.localdomain sshd[78695]: Disconnected from 189.109.247.148 port 33168 [preauth]
Feb 20 10:02:23 undercloud.localdomain systemd[1]: Starting rabbitmq healthcheck...
Feb 20 10:03:15 undercloud.localdomain sshd[78691]: input_userauth_request: invalid user biz [preauth]
Feb 20 10:02:30 undercloud.localdomain systemd[1]: Starting barbican_api healthcheck...
Feb 20 10:03:15 undercloud.localdomain sshd[78691]: pam_unix(sshd:auth): check pass; user unknown
Feb 20 10:02:31 undercloud.localdomain systemd[1]: Starting neutron_ovs_agent healthcheck...
Feb 20 10:03:15 undercloud.localdomain sshd[78691]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=117.102.68.188
Feb 20 10:02:31 undercloud.localdomain systemd[1]: Starting neutron_dhcp healthcheck...
Feb 20 10:03:16 undercloud.localdomain haproxy[21382]: 192.168.24.3:52822 [20/Feb/2019:10:02:06.598] ironic ironic/undercloud.ctlplane.localdomain 0/0/0/40199/40199 200 330 - - ---- 124/3/2/3/0 0/0 "GET /v1/ports/?fields=address HTTP/1.1"
Feb 20 10:02:32 undercloud.localdomain systemd[1]: Starting ironic_api healthcheck...
Feb 20 10:03:16 undercloud.localdomain swift[47762]: ERROR with Account server 192.168.24.1:6002/d1 re: Trying to HEAD /v1/.expiring_objects: Timeout (10.0s) (txn: tx85b0d15bd7224034a89de-005c6d259a)
Feb 20 10:03:18 undercloud.localdomain systemd[1]: Starting mistral_executor healthcheck...
Feb 20 10:03:16 undercloud.localdomain swift[47762]: Account HEAD returning 503 for [] (txn: tx85b0d15bd7224034a89de-005c6d259a)
Feb 20 10:03:18 undercloud.localdomain systemd[1]: Starting nova_compute healthcheck...
Feb 20 10:03:25 undercloud.localdomain kernel: iptables dropped: IN=eth0 OUT= MAC=01:00:5e:00:00:01:fa:16:3e:a3:e2:56:08:00 SRC=198.72.124.56 DST=224.0.0.1 LEN=154 TOS=0x00 PREC=0x00 TTL=1 ID=53588 PROTO=UDP SPT=55118 DPT=8472 LEN=134
Feb 20 10:03:25 undercloud.localdomain kernel: device tapbd70d080-86 left promiscuous mode
Feb 20 10:03:16 undercloud.localdomain haproxy[21382]: 192.168.24.3:56906 [20/Feb/2019:10:03:16.511] neutron neutron/undercloud.ctlplane.localdomain 0/0/0/328/328 404 339 - - ---- 124/4/4/4/0 0/0 "DELETE /v2.0/networks/47d857bc-e40b-4b9f-b125-255273205254 HTTP/1.1"
Feb 20 10:03:18 undercloud.localdomain systemd[1]: Starting keystone healthcheck...
Feb 20 10:03:25 undercloud.localdomain podman[51384]: /usr/lib/python2.7/site-packages/webob/acceptparse.py:1386: DeprecationWarning: The behavior of .best_match for the Accept classes is currently being maintained for backward compatibility, but the method will be deprecated in the future, as its behavior is not specified in (and currently does not conform to) RFC 7231.
Feb 20 10:03:25 undercloud.localdomain podman[51384]: DeprecationWarning,
Feb 20 10:03:16 undercloud.localdomain haproxy[21382]: Connect from 192.168.24.2:49254 to 192.168.24.2:13000 (keystone_public/HTTP)
Feb 20 10:03:18 undercloud.localdomain systemd[1]: Starting ironic_conductor healthcheck...

And at tempest

http://logs.openstack.org/32/636232/5/check/tripleo-ci-centos-7-undercloud-containers/0f5c399/logs/undercloud/home/zuul/tempest.log.txt.gz#_2019-02-20_10_06_36

Quique Llorente (quiquell) wrote :

14:14 <slaweq> quiquell|rover: TBH I don't think that those ovsdb timeouts are directly related to neutron api slow down
14:15 <chkumar|ruck> one min there two failure from two different jobs
14:15 <slaweq> quiquell|rover: what I think is that maybe node was a bit slow, dhcp agent was doing "full sync" all the time because of those errors which I pointed and that caused big load on db
1

Changed in tripleo:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers