fuel snapshot hangs at "Generating dump..."

Bug #1604311 reported by Roman Babyuk
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Georgy Kibardin
8.0.x
In Progress
High
MOS Maintenance
Mitaka
Fix Released
High
Georgy Kibardin

Bug Description

Detailed bug description:
 when run 'fuel snapshot' just "Generating dump..." appears.
 at shotgun log can see:
    2016-07-19 08:48:22 DEBUG 9018 (driver) Getting local file: cp -r /var/log/shotgun.log /var/dump/fuel-snapshot-2016-07-19_08-47-23/nailgun.test.domain.local/var/log
    2016-07-19 08:48:22 DEBUG 9018 (utils) Trying to execute command: mkdir -p "/var/dump/fuel-snapshot-2016-07-19_08-47-23/nailgun.test.domain.local/var/log"
    2016-07-19 08:48:22 DEBUG 9018 (utils) Trying to execute command: cp -r "/var/log/shotgun.log" "/var/dump/fuel-snapshot-2016-07-19_08-47- 23/nailgun.test.domain.local/var/log"
    2016-07-19 08:48:22 DEBUG 9018 (utils) Trying to execute command: tar cJvf /var/dump/fuel-snapshot-2016-07-19_08-47-23.tar.xz -C /var/dump fuel-snapshot-2016-07-19_08-47-23
    2016-07-19 08:50:00 DEBUG 9018 (utils) Trying to execute command: rm -r /var/dump/fuel-snapshot-2016-07-19_08-47-23
    2016-07-19 08:50:00 INFO 9018 (cli) Snapshot path: /var/dump/fuel-snapshot-2016-07-19_08-47-23.tar.xz

  but directory even does not exist:
    [root@nailgun ~]# ll /var/dump
    ls: cannot access /var/dump: No such file or directory

Steps to reproduce:
  - install cluster by dos.py
  - run 'fuel snapshot'
Expected results:
  - execution finished with no errors
  - snapshot created in /var/dump/
Actual result:
  - execution hangs
  - if to download config by 'fuel snapshot --conf > dump_conf.yaml'
    and adjust timeout field to 120 and run 'fuel snapshot' it fails with:
      [root@nailgun ~]# fuel snapshot < dump_conf.yaml
Generating dump...
Traceback (most recent call last):
  File "/usr/bin/fuel", line 10, in <module>
    sys.exit(main())
  File "/usr/lib/python2.7/site-packages/fuelclient/cli/error.py", line 115, in wrapper
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/fuelclient/cli/parser.py", line 266, in main
    parser.parse()
  File "/usr/lib/python2.7/site-packages/fuelclient/cli/parser.py", line 143, in parse
    actions[parsed_params.action].action_func(parsed_params)
  File "/usr/lib/python2.7/site-packages/fuelclient/cli/actions/base.py", line 62, in action_func
    method(params)
  File "/usr/lib/python2.7/site-packages/fuelclient/cli/actions/snapshot.py", line 68, in get_snapshot
    directory=params.dir
  File "/usr/lib/python2.7/site-packages/fuelclient/cli/formatting.py", line 118, in download_snapshot_with_progress_bar
    download_handle = urllib.request.urlopen(request)
  File "/usr/lib64/python2.7/urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib64/python2.7/urllib2.py", line 404, in open
    response = self._open(req, data)
  File "/usr/lib64/python2.7/urllib2.py", line 422, in _open
    '_open', req)
  File "/usr/lib64/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/usr/lib64/python2.7/urllib2.py", line 1216, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib64/python2.7/urllib2.py", line 1155, in do_open
    h = http_class(host, timeout=req.timeout) # will parse host:port
  File "/usr/lib64/python2.7/httplib.py", line 704, in __init__
    self._set_hostport(host, port)
  File "/usr/lib64/python2.7/httplib.py", line 732, in _set_hostport
    raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
httplib.InvalidURL: nonnumeric port: '8000Dump is timed out'

Reproducibility:
  several times
Workaround:
  n/a
Impact:
  can't create diagnostic snapshot
Description of the environment:
  VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "8.0"
  api: "1.0"
  build_number: "570"
  build_id: "570"
  fuel-nailgun_sha: "558ca91a854cf29e395940c232911ffb851899c1"
  python-fuelclient_sha: "4f234669cfe88a9406f4e438b1e1f74f1ef484a5"
  fuel-agent_sha: "658be72c4b42d3e1436b86ac4567ab914bfb451b"
  fuel-nailgun-agent_sha: "b2bb466fd5bd92da614cdbd819d6999c510ebfb1"
  astute_sha: "b81577a5b7857c4be8748492bae1dec2fa89b446"
  fuel-library_sha: "c2a335b5b725f1b994f78d4c78723d29fa44685a"
  fuel-ostf_sha: "3bc76a63a9e7d195ff34eadc29552f4235fa6c52"
  fuel-mirror_sha: "fb45b80d7bee5899d931f926e5c9512e2b442749"
  fuelmenu_sha: "78ffc73065a9674b707c081d128cb7eea611474f"
  shotgun_sha: "63645dea384a37dde5c01d4f8905566978e5d906"
  network-checker_sha: "a43cf96cd9532f10794dce736350bf5bed350e9d"
  fuel-upgrade_sha: "616a7490ec7199f69759e97e42f9b97dfc87e85b"
  fuelmain_sha: "d605bcbabf315382d56d0ce8143458be67c53434"

Additional information:
 shotgun log attached
 also if to adjust timeout in config and dump fails to create at Fuel Ui I can see:
19-07-2016
08:59:30Dump is timed out
08:49:23Dump is timed out
08:47:08Dump is timed out
08:12:36Dump is timed out

Tags: area-python
Revision history for this message
Roman Babyuk (rbabyuk) wrote :
description: updated
tags: added: area-python
Dmitry Pyzhov (dpyzhov)
no longer affects: fuel/newton
Changed in fuel:
assignee: Fuel Sustaining (fuel-sustaining-team) → Georgy Kibardin (gkibardin)
Revision history for this message
Georgy Kibardin (gkibardin) wrote :

Shotgun logs says that snapshot collection has been completed successfully. But it disappeared afterwards. Would be greate to take a look at other logs from the master node, such as:
/var/log/nailgun
/var/log/astute
/var/log/syslog (just in case)

Changed in fuel:
status: Confirmed → Incomplete
Revision history for this message
Georgy Kibardin (gkibardin) wrote :

BTW, making "fuel --debug snapshot" would also be helpful.

Revision history for this message
Roman Babyuk (rbabyuk) wrote :
Changed in fuel:
status: Incomplete → In Progress
Revision history for this message
Georgy Kibardin (gkibardin) wrote :

The problem happens when shotgun output contains non ASCII characters. Astute fails to encode them to utf8.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-astute (master)

Fix proposed to branch: master
Review: https://review.openstack.org/346741

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-astute (master)

Reviewed: https://review.openstack.org/346741
Committed: https://git.openstack.org/cgit/openstack/fuel-astute/commit/?id=90d0aa8ecd1677c9ec00b7ea9ff5e3dee6f9bc51
Submitter: Jenkins
Branch: master

commit 90d0aa8ecd1677c9ec00b7ea9ff5e3dee6f9bc51
Author: Georgy Kibardin <email address hidden>
Date: Mon Jul 25 15:29:11 2016 +0300

    Fix encoding of shotgun stdout

    Mcollective uses sustemu module to execute shell commands. This module
    doesn't care about encoding and as a result we get shotgun stdout as a
    string with ASCII-8BIT encoding. Later we get a failure at the attempt
    to encode it to JSON. However, shotgun output depends on locale settings
    and is usually UTF-8.

    Change-Id: I838f1aa55ea2885c1353a352c537dfe1bcd1e7e3
    Closes-Bug: #1604311

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-astute (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/357009

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-astute (stable/mitaka)

Reviewed: https://review.openstack.org/357009
Committed: https://git.openstack.org/cgit/openstack/fuel-astute/commit/?id=4ee5991a8be7c647e04a8ef43942d2412aabfce7
Submitter: Jenkins
Branch: stable/mitaka

commit 4ee5991a8be7c647e04a8ef43942d2412aabfce7
Author: Georgy Kibardin <email address hidden>
Date: Mon Jul 25 15:29:11 2016 +0300

    Fix encoding of shotgun stdout

    Mcollective uses sustemu module to execute shell commands. This module
    doesn't care about encoding and as a result we get shotgun stdout as a
    string with ASCII-8BIT encoding. Later we get a failure at the attempt
    to encode it to JSON. However, shotgun output depends on locale settings
    and is usually UTF-8.

    Change-Id: I838f1aa55ea2885c1353a352c537dfe1bcd1e7e3
    Closes-Bug: #1604311
    (cherry picked from commit 90d0aa8ecd1677c9ec00b7ea9ff5e3dee6f9bc51)

Revision history for this message
Dmitry Belyaninov (dbelyaninov) wrote :

Verified on snapshot #152
The snapshot is available after 'fuel snapshot' command:
http://10.109.5.2:8000/api/dump/fuel-snapshot-2016-08-18_22-28-57.tar.gz

Revision history for this message
Alexander Gubanov (ogubanov) wrote :

I have faced with the same issue on MOS QA CI - generating dignostic Snapshot hangs

MOS 9.x, snapshot #157, fuel-qa stable/mitaka
details: http://pastebin.com/Mx3b1bNs

[root@nailgun ~]# fuel --debug task
GET http://10.109.2.2:8000/api/v1/transactions/
id | status | name | cluster | progress | uuid
---+---------+-------------------------+---------+----------+-------------------------------------
8 | ready | provision | 1 | 100 | fa1ec149-4916-4ab7-bcab-cdf69a769d1f
5 | ready | deploy | 1 | 100 | bd2e0e9d-1ab0-4f36-96c5-524c71dcb49f
9 | ready | deployment | 1 | 100 | f3eef5d0-382e-4ebf-914e-54d4a4fbd30c
11 | ready | check_dhcp | 1 | 100 | aec5e6c4-17d3-482f-bee3-5ae51367813c
12 | ready | check_repo_availability | 1 | 100 | 93ddebf1-06e1-4956-9345-e3fbb3e08714
10 | ready | verify_networks | 1 | 100 | dc33a0ff-0055-447e-b694-b6dc1e652a6c
13 | running | dump | | 0 | 60f6a70e-5fb0-48fb-a886-c28d6588130d
[root@nailgun ~]#

[root@nailgun ~]# fuel --debug snapshot
PUT http://10.109.2.2:8000/api/v1/logs/package data={}
400 Client Error: Bad Request for url: http://10.109.2.2:8000/api/v1/logs/package (A task is already running)

/var/log/mcollective-audit.log
...
[2016-08-20 15:59:41 UTC] reqid=54202df6765658c2b4f084d5f90e9358: reqtime=1471708781 caller=uid=0@master agent=execute_shell_command action=execute data={:cmd=>"shotgun -c /tmp/dump_config > /dev/null 2>&1 && cat /var/www/nailgun/dump/last", :process_results=>true}

As I see the same issue on product CI:
https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.ha_neutron_destructive/33/console

Revision history for this message
Alexander Gubanov (ogubanov) wrote :

I've verified - diagnostic snapshot successfuly generated on snapshot #149 and hangs on higher version.
Addedd /var/log (snapshot #161)

Revision history for this message
Georgy Kibardin (gkibardin) wrote :

According to Shotgun logs the snapshot was successfully generated. The "hang" effect was caused by different thing. Please, create another bug.

2016-08-21 01:10:21.087 ERROR [7f48871f9740] (receiverd) Message consume failed: <kombu.transport.pyamqp.Message object at 0x37c60e8>
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/nailgun/rpc/receiverd.py", line 54, in consume_msg
    callback(**body["args"])
  File "/usr/lib/python2.7/site-packages/nailgun/rpc/receiver.py", line 1180, in dump_environment_resp
    kwargs={'snapshot_name': dumpfile})
  File "/usr/lib/python2.7/site-packages/nailgun/utils/__init__.py", line 41, in reverse
    """This convenience function improves readability of the code like this:
  File "/usr/lib/python2.7/site-packages/nailgun/api/v1/urls.py", line 90, in <module>
    from nailgun.api.v1.handlers.plugin import PluginCollectionHandler
  File "/usr/lib/python2.7/site-packages/nailgun/api/v1/handlers/plugin.py", line 25, in <module>
    from nailgun.api.v1.validators import plugin
  File "/usr/lib/python2.7/site-packages/nailgun/api/v1/validators/plugin.py", line 20, in <module>
    from nailgun.objects import ClusterPlugin
ImportError: cannot import name ClusterPlugin

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

Status changed to Confirmed because the problem was reproduced again.

Revision history for this message
Alexander Gubanov (ogubanov) wrote :
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-astute (stable/8.0)

Fix proposed to branch: stable/8.0
Review: https://review.openstack.org/386057

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/fuel-astute 10.0.0rc1

This issue was fixed in the openstack/fuel-astute 10.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/fuel-astute 10.0.0

This issue was fixed in the openstack/fuel-astute 10.0.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-astute (stable/8.0)

Change abandoned by Andreas Jaeger (<email address hidden>) on branch: stable/8.0
Review: https://review.opendev.org/386057
Reason: This repo is retired now, no further work will get merged.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.