Comment 14 for bug 1627106

Revision history for this message
IWAMOTO Toshihiro (iwamoto) wrote : Re: TimeoutException while executing test_post_commit_vswitchd_completed_no_failures

This comment is about "no response to inactivity probe after 5 seconds, disconnecting" messages from ovsdb-server and ovs-vswitchd. I believe this is from the same cause as this bug report, but it can turn out to be otherwise.

1. Query the logstash with message:"no response to inactivity probe". Use tcpdump or whatsoever to get the json.

import simplejson
j=simplejson.load(open('/tmp/logstash.json'))
import collections
for k,v in collections.Counter([x['_source']['log_url'] for x in j['hits']['hits']]).most_common():
    print('%d\t%s' % (v,k))

gives something like:

58 http://logs.openstack.org/24/425924/2/gate/gate-tempest-dsvm-neutron-full-ubuntu-xenial/e2834c2/logs/syslog.txt
49 http://logs.openstack.org/13/426813/2/check/gate-cinder-dsvm-apache-ubuntu-xenial-nv/04f9e80/logs/syslog.txt
19 http://logs.openstack.org/38/426938/2/check/gate-trove-scenario-dsvm-mariadb-single-ubuntu-xenial-nv/8bc7cbb/logs/syslog.txt
15 http://logs.openstack.org/96/426896/2/check/gate-manila-tempest-dsvm-scenario-ubuntu-xenial-nv/4a29497/logs/syslog.txt
14 http://logs.openstack.org/59/418559/12/check/gate-manila-tempest-dsvm-postgres-zfsonlinux-ubuntu-xenial-nv/c9137c2/logs/syslog.txt
9 http://logs.openstack.org/55/421155/4/check/gate-neutron-dsvm-fullstack-ubuntu-xenial/632669b/logs/syslog.txt

Note the issue is not limited to neutron or tempest. functional and fullstack are also on the list, but not useful for this analysis as they don't leave dstat logs.

2. Get syslog.txt and dstat-csv_log.txt.
3. Compare them. for example:

$ grep 'no response to inacti' 426842/syslog.txt.gz |colrm 80
$ python findjump.py < 426842/dstat-csv_log.txt.gz |colrm 80
$ cat findjump.py
import calendar
import time
import sys

yr = time.gmtime().tm_year
told = None
thres = 3 # there is some room for tuning :)
for l in sys.stdin:
    l = l[:-1]
    ll = l.split(',')
    try:
        tm = time.strptime(ll[0], "%d-%m %H:%M:%S")
        tnew = calendar.timegm(time.struct_time((yr,) + tm[1:]))
        if told and tnew - told > thres:
            print(tnew - told)
            print(l)
        told = tnew
    except ValueError:
        pass

Their timestamps are not identical, but highly correlated imo.