[10.0-community] Failed in method error_prepare_release

Bug #1606887 reported by Vyacheslav Vakhlyuev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Vladimir Khlyunev
Mitaka
Fix Released
High
Vladimir Khlyunev
Newton
Fix Released
High
Vladimir Khlyunev

Bug Description

Detailed bug description:
"2016-07-26 13:13:36,999 - DEBUG __init__.py:56 -- Calling: wait_for_fuel_ready with args: ([<class 'fuelweb_test.helpers.fuel_actions.AdminActions'>(0x7f6018447dd0)],) {}
2016-07-26 13:13:37,000 - DEBUG __init__.py:62 -- Done: wait_for_fuel_ready with result: None
2016-07-26 13:13:37,000 - DEBUG __init__.py:56 -- Calling: get_releases with args: ([<class 'fuelweb_test.models.nailgun_client.NailgunClient'>(0x7f600f5bc750), url:None],) {}
2016-07-26 13:15:21,753 - ERROR __init__.py:67 -- get_releases raised: ConnectFailure(u'Unable to establish connection to http://10.109.46.2:8000/api/releases/',)
Traceback: Traceback (most recent call last):
  File "/home/jenkins/workspace/10.0-community.main.ubuntu.bvt_2/fuelweb_test/__init__.py", line 60, in wrapped
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/10.0-community.main.ubuntu.bvt_2/fuelweb_test/models/nailgun_client.py", line 218, in get_releases
    return self._get(url="/releases/").json()
  File "/home/jenkins/workspace/10.0-community.main.ubuntu.bvt_2/fuelweb_test/models/nailgun_client.py", line 70, in _get
    return self.session.get(url=url, **kwargs)
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/keystoneauth1/session.py", line 656, in get
    return self.request(url, 'GET', **kwargs)
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/positional/__init__.py", line 101, in inner
    return wrapped(*args, **kwargs)
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/keystoneauth1/session.py", line 544, in request
    resp = send(**kwargs)
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/keystoneauth1/session.py", line 588, in _send_request
    raise exceptions.ConnectFailure(msg)
ConnectFailure: Unable to establish connection to http://10.109.46.2:8000/api/releases/"

And the same happens in the next step:
"2016-07-26 13:15:26,760 - DEBUG __init__.py:56 -- Calling: get_releases with args: ([<class 'fuelweb_test.models.nailgun_client.NailgunClient'>(0x7f600f5bc750), url:None],) {}

2016-07-26 13:15:56,878 - DEBUG __init__.py:56 -- Calling: wait_nodes_get_online_state with args: (<fuelweb_test.models.fuel_web_client.FuelWebClient29 object at 0x7f601ed9fed0>, []) {'timeout': 360}
2016-07-26 13:15:56,878 - DEBUG __init__.py:62 -- Done: wait_nodes_get_online_state with result: None
2016-07-26 13:15:56,880 - DEBUG __init__.py:56 -- Calling: get_api_version with args: ([<class 'fuelweb_test.models.nailgun_client.NailgunClient'>(0x7f600f5bc750), url:None],) {}
2016-07-26 13:15:56,889 - ERROR __init__.py:67 -- get_api_version raised: ConnectFailure(u'Unable to establish connection to http://10.109.46.2:8000/api/version',)
Traceback: Traceback (most recent call last):
  File "/home/jenkins/workspace/10.0-community.main.ubuntu.bvt_2/fuelweb_test/__init__.py", line 60, in wrapped
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/10.0-community.main.ubuntu.bvt_2/fuelweb_test/models/nailgun_client.py", line 442, in get_api_version
    return self._get(url="/version").json()
  File "/home/jenkins/workspace/10.0-community.main.ubuntu.bvt_2/fuelweb_test/models/nailgun_client.py", line 70, in _get
    return self.session.get(url=url, **kwargs)
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/keystoneauth1/session.py", line 656, in get
    return self.request(url, 'GET', **kwargs)
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/positional/__init__.py", line 101, in inner
    return wrapped(*args, **kwargs)
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/keystoneauth1/session.py", line 544, in request
    resp = send(**kwargs)
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/keystoneauth1/session.py", line 588, in _send_request
    raise exceptions.ConnectFailure(msg)
ConnectFailure: Unable to establish connection to http://10.109.46.2:8000/api/version"

Reproducibility:
The next build went green (no changes), so might be tmp env. issue

Jenkins job:
https://ci.fuel-infra.org/job/10.0-community.main.ubuntu.bvt_2/409/

Revision history for this message
Vyacheslav Vakhlyuev (vvakhlyuev) wrote :
Changed in fuel:
milestone: none → 10.0
assignee: nobody → Fuel QA Team (fuel-qa)
Revision history for this message
Andrey Lavrentyev (alavrentyev) wrote :

Similar failure has been found on recent Swarm for 9.1

https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.support_hugepages/16/testReport/%28root%29/prepare_release/

sys_test.log is in attachement

Was able to access keystone manually without any issue on those machines

It blocks 10+ tests (in fact all prepare-release dependent tests), so swarm-blocker is added...

tags: added: swarm-blocker
Changed in fuel:
assignee: Fuel QA Team (fuel-qa) → nobody
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-qa (stable/mitaka)

Related fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/351593

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-qa (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/351595

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-qa (stable/mitaka)

Reviewed: https://review.openstack.org/351593
Committed: https://git.openstack.org/cgit/openstack/fuel-qa/commit/?id=249e6949513b980432737d32a70fd125f394cff7
Submitter: Jenkins
Branch: stable/mitaka

commit 249e6949513b980432737d32a70fd125f394cff7
Author: Vladimir Khlyunev <email address hidden>
Date: Fri Aug 5 12:09:13 2016 +0300

    Always sync time for master node after revert

    During migration to keystone shared session we gon an issue when
    time on master node was changed (during ntp sync). This issue reproduces only
    for the first connection after time sync.

    Change-Id: I2f203b8a1957b70389e76f9fa99459b1365b242f
    Related-bug:1606887

tags: added: in-stable-mitaka
tags: added: area-python
removed: in-stable-mitaka
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-qa (master)

Reviewed: https://review.openstack.org/351595
Committed: https://git.openstack.org/cgit/openstack/fuel-qa/commit/?id=cb15bdfd2f199ae22c52ec1b78e44b69c8bbe134
Submitter: Jenkins
Branch: master

commit cb15bdfd2f199ae22c52ec1b78e44b69c8bbe134
Author: Vladimir Khlyunev <email address hidden>
Date: Fri Aug 5 12:09:13 2016 +0300

    Always sync time for master node after revert

    During migration to keystone shared session we gon an issue when
    time on master node was changed (during ntp sync). This issue reproduces only
    for the first connection after time sync.

    Change-Id: I2f203b8a1957b70389e76f9fa99459b1365b242f
    Related-bug:1606887

Revision history for this message
Georgy Kibardin (gkibardin) wrote :

According to Nailgun logs there were no attempt to GET /api/version after the revert. However, subsequent attempt to generate diagnostic snapshot succeeded.
I suspect that the revert hasn't been completed and this was the reason GET failed. Just to check the conjecture I recommend add the code which attempt to reconnect at least once and see what happens. Another, and probably better option is to run tcpdump just before the attempt to understand what exactly have happened on the network level.

Revision history for this message
Alexander Kurenyshev (akurenyshev) wrote :

We have investigated the problem with Vladimir Khlyunev and got the real reason for it.
The main problem is in TCP session and it's KeepAlive parameter.
When we revert some snapshot with skip_timesync=True the time on admin node is incorrect (in meaning it is different from real time).
When a client initiates a session to the master node to the nailgun all works fine until the ntpd daemon wants to set correct time. It does it.
If the delta of old time and new time more than 1 minute the session becomes broken.
This is a reason of the `ConnectFailure: Unable to establish connection` exception. If we try to send another request the new tcp session opens and all works fine.

So, the current decision always sync time on the master node even if the skip_timesync arg is True, is right and more preferable than others.

Moved to Fix Commited State.
If this bug will appear again, please, feel free to reopen it with the appropriate information for debug

tags: added: on-verification
Revision history for this message
Tatyana Kuterina (tkuterina) wrote :

Verified on 9.1 snapshot #136

tags: removed: on-verification
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-qa (stable/mitaka)

Related fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/356577

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-qa (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/357030

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-qa (master)

Reviewed: https://review.openstack.org/357030
Committed: https://git.openstack.org/cgit/openstack/fuel-qa/commit/?id=1d536398674f946095ab3015103032cabbc6900a
Submitter: Jenkins
Branch: master

commit 1d536398674f946095ab3015103032cabbc6900a
Author: Vladimir Khlyunev <email address hidden>
Date: Wed Aug 17 19:08:54 2016 +0300

    Pass retry count=1 into keystone session

    KeystoneAuth can handle connection failure and timeouts not related to
    target service but but defauit this retries are disabled. We are faced with
    several issues related to broken session in case of changed time on server or
    even restart of nailgun. The second request will re-create session inside of
    KeystoneAuth so lets allow session to handle it.

    Related-bug:1606887
    Change-Id: I3011e25a5c81f74015430b5ace5aa14bc4695bb8

tags: added: in-stable-mitaka
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-qa (stable/mitaka)

Reviewed: https://review.openstack.org/356577
Committed: https://git.openstack.org/cgit/openstack/fuel-qa/commit/?id=2b5cba0ecced2a43a7142fbb58855354b0fda355
Submitter: Jenkins
Branch: stable/mitaka

commit 2b5cba0ecced2a43a7142fbb58855354b0fda355
Author: Vladimir Khlyunev <email address hidden>
Date: Wed Aug 17 19:08:54 2016 +0300

    Pass retry count=1 into keystone session

    KeystoneAuth can handle connection failure and timeouts not related to
    target service but but defauit this retries are disabled. We are faced with
    several issues related to broken session in case of changed time on server or
    even restart of nailgun. The second request will re-create session inside of
    KeystoneAuth so lets allow session to handle it.

    Related-bug:1606887
    Change-Id: I3011e25a5c81f74015430b5ace5aa14bc4695bb8

Revision history for this message
Dmitry Belyaninov (dbelyaninov) wrote :

Verified on Newton. Not reproduced

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.