Master node sometimes hung after making a snapshot

Bug #1418063 reported by Dmitry Tyzhnenko
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Confirmed
High
Fuel DevOps

Bug Description

ISO version:
{
    u'build_id': u'2015-02-03_22-55-01',
    u'ostf_sha': u'c9100263140008abfcc2704732e98fbdfd644068',
    u'build_number': u'96',
    u'auth_required': True,
    u'nailgun_sha': u'62dd62897850795fa35d2f359cf4f310d33f65c5',
    u'production': u'docker',
    u'api': u'1.0',
    u'python-fuelclient_sha': u'2ea7b3e91c1d2ff85110bf5abb161a6f4e537358',
    u'astute_sha': u'ed5270bf9c6c1234797e00bd7d4dd3213253a413',
    u'fuelmain_sha': u'',
    u'feature_groups': [u'mirantis'],
    u'release': u'6.1',
    u'release_versions': {u'2014.2-6.1': {u'VERSION': {
        u'build_id': u'2015-02-03_22-55-01',
        u'ostf_sha': u'c9100263140008abfcc2704732e98fbdfd644068',
        u'build_number': u'96',
        u'api': u'1.0',
        u'nailgun_sha': u'62dd62897850795fa35d2f359cf4f310d33f65c5',
        u'production': u'docker',
        u'python-fuelclient_sha': u'2ea7b3e91c1d2ff85110bf5abb161a6f4e537358',
        u'astute_sha': u'ed5270bf9c6c1234797e00bd7d4dd3213253a413',
        u'feature_groups': [u'mirantis'],
        u'release': u'6.1',
        u'fuelmain_sha': u'',
        u'fuellib_sha': u'2147da0c583a7944f440ceb51236e7cb2e6610c9',
        }}},
    u'fuellib_sha': u'2147da0c583a7944f440ceb51236e7cb2e6610c9',
    }

Catch error on CI - http://jenkins-product.srt.mirantis.net:8080/job/6.1.system_test.ubuntu.known_issues/22/

When we try check status of master node after successful snapshot, node was not available

--==--

2015-02-04 02:54:55,071 - INFO decorators.py:143 -- <<<<<****************************************************************************************************>>>>>
2015-02-04 02:54:55,071 - INFO decorators.py:144 -- Make snapshot: deploy_ha_one_controller_flat
2015-02-04 02:54:55,071 - INFO decorators.py:153 -- You could revert this snapshot using [dos.py revert 6.1.system_test.ubuntu.known_issues.22.2015-02-04_02-01-12 --snapshot-name deploy_ha_one_controller_flat && dos.py resume 6.1.system_test.ubuntu.known_issues.22.2015-02-04_02-01-12 && virsh net-dumpxml 6.1.system_test.ubuntu.known_issues.22.2015-02-04_02-01-12_admin | grep -P "(\d+\.){3}" -o | awk '{print "Admin node IP: "$0"2"}']
2015-02-04 02:54:55,072 - INFO decorators.py:158 -- <<<<<****************************************************************************************************>>>>>
2015-02-04 02:56:02,000 - ERROR environment.py:387 -- Admin node is unavailable via SSH after environment resume
2015-02-04 02:56:05,001 - ERROR decorators.py:80 -- Fetching of diagnostic snapshot failed: Traceback (most recent call last):
  File "/home/jenkins/workspace/6.1.system_test.ubuntu.known_issues/fuelweb_test/helpers/decorators.py", line 77, in wrapper
    "fail", name)
  File "/home/jenkins/workspace/6.1.system_test.ubuntu.known_issues/fuelweb_test/helpers/decorators.py", line 193, in create_diagnostic_snapshot
    task = env.fuel_web.task_wait(env.fuel_web.client.generate_logs(), 60 * 5)
  File "/home/jenkins/workspace/6.1.system_test.ubuntu.known_issues/fuelweb_test/__init__.py", line 48, in wrapped
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/6.1.system_test.ubuntu.known_issues/fuelweb_test/helpers/decorators.py", line 103, in wrapped
    response = func(*args, **kwargs)
  File "/home/jenkins/workspace/6.1.system_test.ubuntu.known_issues/fuelweb_test/models/nailgun_client.py", line 294, in generate_logs
    return self.client.put("/api/logs/package")
  File "/home/jenkins/workspace/6.1.system_test.ubuntu.known_issues/fuelweb_test/helpers/http.py", line 83, in put
    return self._open(req)
  File "/home/jenkins/workspace/6.1.system_test.ubuntu.known_issues/fuelweb_test/helpers/http.py", line 92, in _open
    return self._get_response(req)
  File "/home/jenkins/workspace/6.1.system_test.ubuntu.known_issues/fuelweb_test/helpers/http.py", line 109, in _get_response
    return self.opener.open(req)
  File "/usr/lib/python2.7/urllib2.py", line 404, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 422, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1214, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open
    raise URLError(err)
URLError: <urlopen error [Errno 113] No route to host>

--==--

Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :

How we do a snapshot:

https://github.com/stackforge/fuel-main/blob/master/fuelweb_test/models/environment.py#L379-L385

Environment is suspended, then snapshoted, then resumed.
After that, we are waiting for port 22 is available to Fuel master node.

Master node hung after snapshot quite often , see the 'grep' result in the attachment.

summary: - Master node unavailable after resuming successful snapshot
+ Master node sometimes hung after making a snapshot
Changed in fuel:
assignee: nobody → Fuel DevOps (fuel-devops)
importance: Medium → High
Dmitry Ilyin (idv1985)
Changed in fuel:
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.