Fuel cannot generate diagnostic snapshot

Bug #1529182 reported by Sergey Arkhipov
32
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Maciej Kwiek
8.0.x
Fix Released
High
Maciej Kwiek
Mitaka
Fix Released
High
Maciej Kwiek

Bug Description

MOS 8.0, ISO #328

Steps to reproduce:
    1. Reploy environment with Ceilometer, Ceph on 50 nodes 1-2 times (3 controllers, 47 computes, 5 computes with Ceph)
    2. Try to generate diagnostic snapshot with Fuel Web UI

Expected result:
    1. Snapshot is created

Real result:
    1. Snapshot is not created
    2. User gets 'exit code: 1 stderr:' message near button "Generate diagnostic snapshot"
    3. /var disk space is completely exhausted

It is really hard to gather any logs so I describe behavior as I see it. Right now, at fuel node we have several mount points:

$ [root@fuel ~]# df -h | grep /var
/dev/mapper/os-var 9,5G 6,6G 2,5G 73% /var
/dev/mapper/os-varlog 101G 3,9G 92G 5% /var/log

As you can see, /var is 9.5G (and usually ~2.5G is free), /var/log is about 100G. Fuel generates snapshot copying logs from /var/log to /var and at some point we got a situation when disk space at /var is exhausted because possible log size is >3G. At that point, everything stucks: user get an error in web UI, disk space is exhausted and nothing happen.

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "8.0"
  api: "1.0"
  build_number: "328"
  build_id: "328"
  fuel-nailgun_sha: "ec25ae8df28e1f1d87da653c5aab2711dff729f6"
  python-fuelclient_sha: "7c7a756fb6a3d091851c060003a2965c67aa353d"
  fuel-agent_sha: "d354cbe61b055db848a15ce66fb7ae92178d2c0a"
  fuel-nailgun-agent_sha: "a33a58d378c117c0f509b0e7badc6f0910364154"
  astute_sha: "c56dfde2da034151a7e707b381c4cf9d213b4ba2"
  fuel-library_sha: "bcc3d042a47547f6ad826360a85ef21dcaf25882"
  fuel-ostf_sha: "b2ebf15a3530b5c7b57707acf8642c1c3ac71bd8"
  fuel-mirror_sha: "8100acb3a566358d0d4ecc66de32d39626263028"
  fuelmenu_sha: "2942a85796d37f09ba8c8c6d762d8813292cf0d4"
  shotgun_sha: "cacb93cbc28910ff0dc38f30a855efa9af50d8ce"
  network-checker_sha: "d443ef47abeda58d319bc8d33d5005dd09440a02"
  fuel-upgrade_sha: "718aa3d7021fee2970f0fa6791cf5188578cc516"
  fuelmain_sha: "3faa824728ce60734abe602ff3778976f8a16eed"

Revision history for this message
Sergey Arkhipov (sarkhipov) wrote :
description: updated
Revision history for this message
Sergey Arkhipov (sarkhipov) wrote :

Hi, I've checked on #361 and it seems that it is not my case: there is no such error in astute.log and it seems that problem is definitely about free space exhausting.

Again, what I did:
   * Have 50 nodes
   * Enable debug logging
   * Deploy cluster
   * Click on 'Generate diagnostic snapshot'
   * Get the error above

Please check attached logs (I've tarred whole /var/log, snapshotting has been started at ~14:43 UTC)
https://drive.google.com/a/mirantis.com/file/d/0B9tzODpFABxkVHBCWmVWbTFfVVU/view?usp=sharing

Revision history for this message
Oleg S. Gelbukh (gelbuhos) wrote :

Looks like this one is caused by different issue than #1528815, has nothing to do with failed remote command.

tags: added: area-python
Changed in fuel:
status: New → Confirmed
assignee: nobody → Fuel Python Team (fuel-python)
Revision history for this message
Maciej Kwiek (maciej-iai) wrote :
Dmitry Pyzhov (dpyzhov)
tags: added: team-bugfix
Revision history for this message
Maciej Kwiek (maciej-iai) wrote :
Revision history for this message
Maciej Kwiek (maciej-iai) wrote :

Marking as triaged, since the root cause is known (lack of disk space for generating snapshot).

Changed in fuel:
status: Confirmed → Triaged
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Maciej Kwiek (maciej-iai)
milestone: 8.0 → 9.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to shotgun (master)

Fix proposed to branch: master
Review: https://review.openstack.org/266964

Changed in fuel:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/268151

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-astute (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/270083

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to shotgun (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/270823

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/271179

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-library (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/273094

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/274034

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on shotgun (master)

Change abandoned by Maciej Kwiek (<email address hidden>) on branch: master
Review: https://review.openstack.org/270823

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-web (master)

Change abandoned by Maciej Kwiek (<email address hidden>) on branch: master
Review: https://review.openstack.org/271179

Revision history for this message
Michal Skalski (mskalski) wrote :
Download full text (5.5 KiB)

In my opinion size of the var partition is just too small. Moreover it is fixed on 10000MB [1] so there is no benefit of having bigger hard disk.

It is not only about snapshots. This partition by default is used for storing bootstrap and os installation images, local mirrors created with fuel-mirror command and installed plugins

There is a high chance that it will be overfilled during usual activities for example:

Disk usage after Fuel Master installation and bootstrap image creation:

[root@fuel ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/os-root 9.5G 2.0G 7.1G 23% /
devtmpfs 1.9G 0 1.9G 0% /dev
tmpfs 1.9G 0 1.9G 0% /dev/shm
tmpfs 1.9G 17M 1.9G 1% /run
tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
/dev/mapper/os-var 9.5G 4.9G 4.2G 54% /var
/dev/mapper/os-varlog 46G 74M 44G 1% /var/log
/dev/vda3 197M 108M 90M 55% /boot
/dev/vda2 200M 0 200M 0% /boot/efi

During environment deployment (os image creation)

[root@fuel nailgun]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/os-root 9.5G 2.0G 7.0G 23% /
devtmpfs 1.9G 0 1.9G 0% /dev
tmpfs 1.9G 0 1.9G 0% /dev/shm
tmpfs 1.9G 17M 1.9G 1% /run
tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
/dev/mapper/os-var 9.5G 6.3G 2.8G 70% /var
/dev/mapper/os-varlog 46G 89M 44G 1% /var/log
/dev/vda3 197M 108M 90M 55% /boot
/dev/vda2 200M 0 200M 0% /boot/efi

After environment deployment, and local mirrors creation with fuel-mirror command:

[root@fuel ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/os-root 9.5G 2.0G 7.0G 23% /
devtmpfs 1.9G 0 1.9G 0% /dev
tmpfs 1.9G 0 1.9G 0% /dev/shm
tmpfs 1.9G 17M 1.9G 1% /run
tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
/dev/mapper/os-var 9.5G 7.7G 1.4G 86% /var
/dev/mapper/os-varlog 46G 129M 44G 1% /var/log
/dev/vda3 197M 108M 90M 55% /boot
/dev/vda2 200M 0 200M 0% /boot/efi

During second environment deployment

[root@fuel ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/os-root 9.5G 2.0G 7.0G 23% /
devtmpfs 1.9G 0 1.9G 0% /dev
tmpfs 1.9G 0 1.9G 0% /dev/shm
tmpfs 1.9G 17M 1.9G 1% /run
tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
/dev/mapper/os-var 9.5G 9.1G 0 100% /var
/dev/mapper/os-varlog 46G 149M 44G 1% /var/log
/dev/vda3 197M 108M 90M 55% /boot
/dev/vda2 200M 0 200M 0% /boot/efi

Second deployment failed because var partition was overfilled and new image creation was not possible.

Fuel agent logs:

2016-01-29 22:26:48.006 29397 ERROR fuel_agent.cmd.agent [-] Unexpected error while running command.
Command: resize2fs -M /var/lib/fuel/ibp/tmpfMEEo_.fuel-agent-image
Exit code: 1
Stdout: 'Resizing the f...

Read more...

Changed in fuel:
assignee: Maciej Kwiek (maciej-iai) → Sergii Golovatiuk (sgolovatiuk)
Changed in fuel:
assignee: Sergii Golovatiuk (sgolovatiuk) → Maciej Kwiek (maciej-iai)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/273094
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=7ba2d3dc2aaf181d00cf6bf989e9bfb48b10e06c
Submitter: Jenkins
Branch: master

commit 7ba2d3dc2aaf181d00cf6bf989e9bfb48b10e06c
Author: Maciej Kwiek <email address hidden>
Date: Wed Jan 27 16:05:33 2016 +0100

    Change nginx directory for dumps

    We will redirect shotgun to generate snapshot in
    /var/log/dump to make use of more space available in /var/
    log

    Change-Id: I0c0832d0c10350898b16d68ed195677fcf975377
    Partial-bug: 1529182

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/8.0)

Fix proposed to branch: stable/8.0
Review: https://review.openstack.org/275094

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/274034
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=a8f8f880fb03a48912260705d2c8241c27ee1fed
Submitter: Jenkins
Branch: master

commit a8f8f880fb03a48912260705d2c8241c27ee1fed
Author: Maciej Kwiek <email address hidden>
Date: Fri Jan 29 12:52:31 2016 +0100

    Change dump target in settings to /var/dump.

    This will make use of new dump placing introduced by library change this
    patch depends on. Dump will be on the same block device as /var/log,
    which will make use of additional disk space available in /var/log.

    Change-Id: Iec127e0d11194c7a89a2fb80203f173dc4ca3c98
    Closes-bug: 1529182
    Depends-on: I0c0832d0c10350898b16d68ed195677fcf975377

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (stable/8.0)

Fix proposed to branch: stable/8.0
Review: https://review.openstack.org/275159

Andrey Maximov (maximov)
tags: added: hit-hcf
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/8.0)

Reviewed: https://review.openstack.org/275094
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=a6d8c80f206f7296f5d574b444a53a8be0eeb470
Submitter: Jenkins
Branch: stable/8.0

commit a6d8c80f206f7296f5d574b444a53a8be0eeb470
Author: Maciej Kwiek <email address hidden>
Date: Wed Jan 27 16:05:33 2016 +0100

    Change nginx directory for dumps

    We will redirect shotgun to generate snapshot in
    /var/log/dump to make use of more space available in /var/
    log

    This is a backport of I0c0832d0c10350898b16d68ed195677fcf975377

    Change-Id: Ibe9efe9aa8d9eea16f4413b432c208ad6fdb25a9
    Partial-bug: 1529182

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (stable/8.0)

Reviewed: https://review.openstack.org/275159
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=c9c86d60303e669d79aef84972fe4ccecac94bd9
Submitter: Jenkins
Branch: stable/8.0

commit c9c86d60303e669d79aef84972fe4ccecac94bd9
Author: Maciej Kwiek <email address hidden>
Date: Fri Jan 29 12:52:31 2016 +0100

    Change dump target in settings to /var/dump.

    This will make use of new dump placing introduced by library change this
    patch depends on. Dump will be on the same block device as /var/log,
    which will make use of additional disk space available in /var/log.

    Change-Id: Iec127e0d11194c7a89a2fb80203f173dc4ca3c98
    Closes-bug: 1529182
    Depends-on: I0c0832d0c10350898b16d68ed195677fcf975377
    (cherry picked from commit a8f8f880fb03a48912260705d2c8241c27ee1fed)
    Depends-on: Ibe9efe9aa8d9eea16f4413b432c208ad6fdb25a9

Revision history for this message
Ivan Lozgachev (ilozgachev) wrote :

Verified for Fuel 8.0 on ENV-13 build 518.

P. S. In Fuel dashboard the error was seen "Dump is timed out" because of a lot of logs. But the dump has beed generated successfully after some time. Regarding timeout: this should be fixed by feature request.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to shotgun (master)

Reviewed: https://review.openstack.org/266964
Committed: https://git.openstack.org/cgit/openstack/shotgun/commit/?id=6930543c708cc3df6327c1a109dad396d8147ad8
Submitter: Jenkins
Branch: master

commit 6930543c708cc3df6327c1a109dad396d8147ad8
Author: Maciej Kwiek <email address hidden>
Date: Wed Jan 13 16:03:10 2016 +0100

    Getting local files creates symlinks instead of copying

    Dumping shotgun resources is now done through symlinks. All local
    resources are symlinked in dump directory, after that dump is compressed
    with tar using -h option (--dereference).

    Excluding files from tarball is now done by passing --exclude option to
    tar instead of removing files before taring to avoid deleting logs.

    Symlinks are created by 'ln -s' command because of wildcards used in
    shotgun settings.

    Change-Id: Ie9a0ab51d5874cd46a3919179def0aef407e7340
    Partial-bug: 1529182

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/268151
Committed: https://git.openstack.org/cgit/openstack/shotgun/commit/?id=4d0fa1ce2afe59091b13e81ad28c7f17e0c914dd
Submitter: Jenkins
Branch: master

commit 4d0fa1ce2afe59091b13e81ad28c7f17e0c914dd
Author: Maciej Kwiek <email address hidden>
Date: Fri Jan 15 15:09:59 2016 +0100

    Delete snasphot on out of space error.

    If generating snapshot throws an IOError with 28 error code
    (errno.ENOSPC) whole snapshot is deleted to not clutter drive.

    Change-Id: I442b6bfe7ea5d3b3661351ed36f973f870d4d95d
    Partial-bug: 1529182

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-astute (master)

Reviewed: https://review.openstack.org/270083
Committed: https://git.openstack.org/cgit/openstack/fuel-astute/commit/?id=78f082d41e2c3bb59a121aecb12235af8a6dfd5e
Submitter: Jenkins
Branch: master

commit 78f082d41e2c3bb59a121aecb12235af8a6dfd5e
Author: Maciej Kwiek <email address hidden>
Date: Wed Jan 20 10:31:14 2016 +0100

    Improve UX for disk space shortage for dump

    Displaying stderr could not work, as shotgun is called with redirecting
    stderr to stoud. Also, whole debug log is printed to stderr, which would
    cause enormous error message in UI. Special case of errno 28 is handled
    with custom error message.

    Change-Id: I92a1aa149e5edc76bb90cdd24d8e603233fdfe2f
    Related-bug: 1529182

Revision history for this message
Nastya Urlapova (aurlapova) wrote :
Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.