[System Tests] System tests stuck on syncing time on slaves

Bug #1414621 reported by Andrey Sledzinskiy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
Medium
Stanislaw Bogatkin

Bug Description

http://jenkins-product.srt.mirantis.net:8080/view/6.1_swarm/job/6.1.system_test.centos.ha_neutron/13/consoleFull

From time to time we have situation when test stuck on time syncing on slave
Deploy cluster in HA mode with Neutron VLAN and public network ... 2015-01-26 08:05:33,170 - INFO nailgun_client.py:26 -- Initiate Nailgun client with url http://10.108.247.2:8000
2015-01-26 08:05:33,170 - INFO http.py:27 -- Initiate HTTPClient with url http://10.108.247.2:8000
2015-01-26 08:05:33,188 - INFO environment.py:403 -- We have snapshot with such name ready_with_5_slaves
2015-01-26 08:06:08,936 - INFO environment.py:406 -- Starting snapshot reverting ....
2015-01-26 08:06:09,668 - INFO environment.py:409 -- Starting snapshot resuming ...
2015-01-26 08:06:09,839 - INFO environment.py:471 -- Admin node login name: 'root' , password: 'r00tme'
2015-01-26 08:06:09,842 - WARNING http.py:95 -- Authorization failure: Authentication required
2015-01-26 08:06:09,842 - INFO http.py:37 -- Initialize keystoneclient with url http://10.108.247.2:5000/v2.0
2015-01-26 08:06:10,128 - INFO environment.py:553 -- Sync time on revert for admin
2015-01-26 08:06:22,516 - INFO environment.py:573 -- Master node time: ['Mon Jan 26 08:06:22 UTC 2015\n']
2015-01-26 08:06:22,534 - INFO environment.py:441 -- Sync time on revert for node slave-01
Build timed out (after 480 minutes). Marking the build as aborted.

We need to try to investigate why it happens

Tags: system-tests
Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :

After replacing the 'ntp' package with a new package 'ntp-dev' it is crashed on the Fuel master node from time to time:

==== /var/log/messages
Jan 26 08:06:21 nailgun ntpd[18873]: Command line: ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
Jan 26 08:06:21 nailgun ntpd[18874]: proto: precision = 0.189 usec (-22)
Jan 26 08:06:21 nailgun ntpd[18874]: Listen and drop on 0 v6wildcard [::]:123
Jan 26 08:06:21 nailgun ntpd[18874]: Listen and drop on 1 v4wildcard 0.0.0.0:123
Jan 26 08:06:21 nailgun ntpd[18874]: Listen normally on 2 lo 127.0.0.1:123
Jan 26 08:06:21 nailgun ntpd[18874]: Listen normally on 3 eth0 10.108.247.2:123
Jan 26 08:06:21 nailgun ntpd[18874]: Listen normally on 4 docker0 172.17.42.1:123
Jan 26 08:06:21 nailgun ntpd[18874]: Listen normally on 5 lo [::1]:123
Jan 26 08:06:21 nailgun ntpd[18874]: Listen normally on 6 eth0 [fe80::6663:eeff:fe91:7c97%2]:123
Jan 26 08:06:21 nailgun ntpd[18874]: Listen normally on 7 docker0 [fe80::646c:c3ff:fea2:2876%7]:123
Jan 26 08:06:21 nailgun ntpd[18874]: Listen normally on 8 vethZgiZ84 [fe80::fc60:b6ff:fea4:99fe%9]:123
Jan 26 08:06:21 nailgun ntpd[18874]: Listen normally on 9 vethrnRLqD [fe80::fcf6:c2ff:fe6f:5b46%12]:123
Jan 26 08:06:21 nailgun ntpd[18874]: Listen normally on 10 vethl6Rp0x [fe80::fc85:edff:fe50:7747%15]:123
Jan 26 08:06:21 nailgun ntpd[18874]: Listen normally on 11 vethZSD22j [fe80::fc8a:5dff:fe13:aa22%18]:123
Jan 26 08:06:21 nailgun ntpd[18874]: Listen normally on 12 vethqsSw46 [fe80::fc0b:27ff:fed2:f5f9%21]:123
Jan 26 08:06:21 nailgun ntpd[18874]: Listen normally on 13 vethR6ymfQ [fe80::fcfc:5fff:fede:eec4%24]:123
Jan 26 08:06:21 nailgun ntpd[18874]: Listen normally on 14 vethE5uYog [fe80::fcad:beff:fe13:92f2%27]:123
Jan 26 08:06:21 nailgun ntpd[18874]: Listen normally on 15 vethupAER6 [fe80::fcad:abff:fe40:cf0%30]:123
Jan 26 08:06:21 nailgun ntpd[18874]: Listen normally on 16 vethCxyVQS [fe80::fc84:a4ff:fe1b:7fc%33]:123
Jan 26 08:06:21 nailgun ntpd[18874]: Listen normally on 17 vethOhhgHV [fe80::fc9f:30ff:fe3c:eb24%36]:123
Jan 26 08:06:21 nailgun ntpd[18874]: Listen normally on 18 vethdrhQuN [fe80::fc0b:99ff:fe5f:f086%39]:123
Jan 26 08:06:21 nailgun ntpd[18874]: Listening on routing socket on fd #35 for interface updates
Jan 26 08:06:21 nailgun ntpd[18874]: restrict default: KOD does nothing without LIMITED.
Jan 26 08:06:21 nailgun ntpd[18874]: restrict ::: KOD does nothing without LIMITED.
Jan 26 08:06:21 nailgun t of memory [18874]

Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

NTP in current 6.1 based on dev-branch (built from ntp v4.2.7p447). We have much newer version (stable 4.2.8) on review:
https://review.fuel-infra.org/#/c/1714/
https://review.fuel-infra.org/#/c/1699/

As for me - it is much better to try test 4.2.8 version and merge it as we have such possibility than search a floating bug in current version. Also, 4.2.8 closes bunch of CVEs, so it will be one more argument to test it.

Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :

Tested Fuel 6.1 build #75 with the 'ntp-dev' 4.2.8, success.

ISO version: {u'build_id': u'2015-01-22_22-54-45', u'ostf_sha': u'95054878eff472eb6a0d9d185ce65e3105dde632', u'build_number': u'75', u'auth_required': True, u'nailgun_sha': u'b677b29899f9e49532a959948d0ba68a83e41536', u'production': u'docker', u'api': u'1.0', u'fuelmain_sha': u'', u'astute_sha': u'f640780d1c7ebe89e4f343736500945525799ee0', u'feature_groups': [u'mirantis'], u'release': u'6.1', u'release_versions': {u'2014.2-6.1': {u'VERSION': {u'build_id': u'2015-01-22_22-54-45', u'ostf_sha': u'95054878eff472eb6a0d9d185ce65e3105dde632', u'build_number': u'75', u'api': u'1.0', u'nailgun_sha': u'b677b29899f9e49532a959948d0ba68a83e41536', u'production': u'docker', u'fuelmain_sha': u'', u'astute_sha': u'f640780d1c7ebe89e4f343736500945525799ee0', u'feature_groups': [u'mirantis'], u'release': u'6.1', u'fuellib_sha': u'673808c5bd6ff5560df75b74f7f798649d6a0040'}}}, u'fuellib_sha': u'673808c5bd6ff5560df75b74f7f798649d6a0040'}

Changed in fuel:
status: New → Confirmed
Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :

Reproduced on CI : http://jenkins-product.srt.mirantis.net:8080/view/6.1_swarm/job/6.1.system_test.ubuntu.thread_5/19/console

Process ntpd has crashed during time sync:
======
[root@nailgun ~]# tail /var/log/messages
Jan 31 00:26:46 nailgun ntpd[26975]: Listen normally on 14 veth1fwPVh [fe80::fc54:73ff:fe46:dc3a%27]:123
Jan 31 00:26:46 nailgun ntpd[26975]: Listen normally on 15 vethoi8GL5 [fe80::fcbf:54ff:fe68:1419%30]:123
Jan 31 00:26:46 nailgun ntpd[26975]: Listen normally on 16 vethEpofiG [fe80::fc8f:56ff:fed4:8fd2%33]:123
Jan 31 00:26:46 nailgun ntpd[26975]: Listen normally on 17 veth14sAEv [fe80::fc5b:eaff:feef:50e4%36]:123
Jan 31 00:26:46 nailgun ntpd[26975]: Listen normally on 18 vethrm7Mz1 [fe80::fca6:1bff:fe76:2570%39]:123
Jan 31 00:26:46 nailgun ntpd[26975]: Listening on routing socket on fd #35 for interface updates
Jan 31 00:26:46 nailgun ntpd[26975]: restrict default: KOD does nothing without LIMITED.
Jan 31 00:26:46 nailgun ntpd[26975]: restrict ::: KOD does nothing without LIMITED.
Jan 31 00:26:46 nailgun t of memory [26975]

[root@nailgun ~]# ps aux|grep ntp
root 8125 0.0 0.0 103244 860 pts/22 S+ 09:10 0:00 grep ntp
ntp 13765 0.0 0.1 97232 3376 ? S Jan30 0:05 nginx: worker process
ntp 13766 0.0 0.0 96968 3060 ? S Jan30 0:04 nginx: worker process
[root@nailgun ~]#
[root@nailgun ~]# yum list installed | grep ntp
ntp-dev.x86_64 4.2.8-1.mira1 @anaconda-CentOS-201410241409.x86_64/6.3

ISO version: {u'build_id': u'2015-01-30_22-55-01', u'ostf_sha': u'c9100263140008abfcc2704732e98fbdfd644068', u'build_number': u'88', u'auth_required': True, u'nailgun_sha': u'c0932eb5c2aa7fd1e13a999cb1b4bf5aff101c3b', u'production': u'docker', u'api': u'1.0', u'python-fuelclient_sha': u'2ea7b3e91c1d2ff85110bf5abb161a6f4e537358', u'astute_sha': u'ed5270bf9c6c1234797e00bd7d4dd3213253a413', u'fuelmain_sha': u'', u'feature_groups': [u'mirantis'], u'release': u'6.1', u'release_versions': {u'2014.2-6.1': {u'VERSION': {u'build_id': u'2015-01-30_22-55-01', u'ostf_sha': u'c9100263140008abfcc2704732e98fbdfd644068', u'build_number': u'88', u'api': u'1.0', u'nailgun_sha': u'c0932eb5c2aa7fd1e13a999cb1b4bf5aff101c3b', u'production': u'docker', u'python-fuelclient_sha': u'2ea7b3e91c1d2ff85110bf5abb161a6f4e537358', u'astute_sha': u'ed5270bf9c6c1234797e00bd7d4dd3213253a413', u'feature_groups': [u'mirantis'], u'release': u'6.1', u'fuelmain_sha': u'', u'fuellib_sha': u'c5e4a0410ba66f9e9911f62b3b71c0b9c29aed6e'}}}, u'fuellib_sha': u'c5e4a0410ba66f9e9911f62b3b71c0b9c29aed6e'}

Changed in fuel:
assignee: Fuel QA Team (fuel-qa) → Stanislaw Bogatkin (sbogatkin)
Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :

Still failing, the job was waiting for the crashed ntpd on masternode until timeout:

http://jenkins-product.srt.mirantis.net:8080/view/6.1_swarm/job/6.1.system_test.ubuntu.plugins/24/

Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

We revert ntp-dev to old ntp package, so it shoud not be problem anymore.

Changed in fuel:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.