Huge snapshot on the big env

Bug #1382511 reported by Sergey Galkin
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Kamil Sambor

Bug Description

api: '1.0'
astute_sha: c3e7c7a18528cf9acca48021488a93dff74f5c97
auth_required: true
build_id: 2014-10-16_19-59-04
build_number: '72'
feature_groups:
- mirantis
fuellib_sha: 677c2809bd602ed6f793b03df49ef8b0f8dcb7e7
fuelmain_sha: 5cf06aac43ccb4a6031fbfa87ff9f9a729314daa
nailgun_sha: b83eaf18cbcc36393f8ac1e7732a6395546a7ca8
ostf_sha: de177931b53fbe9655502b73d03910b8118e25f1
production: docker
release: '6.0'

On the big env Fuel snapshot time creation and size are huge

In my case - 100 nodes after 13 hours uptime
size
fuel-snapshot-2014-10-17_11-08-01.tgz - 479M
time
[root@fuel nailgun]# time fuel snapshot
Generating dump...
Downloading: http://10.20.0.2:8000/dump/fuel-snapshot-2014-10-17_11-34-03.tgz Bytes: 517229965
[==============================================================================]()

real 23m53.762s
user 0m13.284s
sys 0m4.506s

This file hard to store and share
I guess Fuel must have feature like 'dockerctl shell nailgun /usr/bin/manage.py dropdb, syncdb, loaddefault' for all files including in snapshot.

This may be checkbox for UI and option of command 'fuel snapshot' about cleaning snapshoted files.

Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

I belive we have feature rotate logs, so we need to see if this wowrks properly

Changed in mos:
milestone: none → 6.0
assignee: nobody → Tatyana (tatyana-leontovich)
assignee: Tatyana (tatyana-leontovich) → nobody
assignee: nobody → Fuel QA Team (fuel-qa)
Revision history for this message
Sergey Galkin (sgalkin) wrote :
Revision history for this message
Tomasz 'Zen' Napierala (tzn) wrote :

We have log rotation, but it this scenario we dodn't hit our tresholds. It was to short (only 13h) to hit our daily rotation and to small to hit log size tresholds. We already improved log handling in 6.0, but we probably need to work on snapshot creation procedure. I'll analyze it further and get back with the ideas.

tags: added: scale
Revision history for this message
Sergey Galkin (sgalkin) wrote :

Tatyana, logrotate is not correct configured https://bugs.launchpad.net/mos/+bug/1382515

Changed in mos:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

You could shrink the logs as well with logrotate -f /etc/logrotate.conf prior to starting snapshot generation

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Note that then you make logrotate forcibly, all log files w/o new messages logged between rotation and snapshot creation would have 0 size.
So it could be a good idea to make force rotation prior to reproducing some issues or starting the troubleshooting/debugging session and once its done, generate a logs snapshot.

Revision history for this message
Sergey Galkin (sgalkin) wrote :
Revision history for this message
Sergey Galkin (sgalkin) wrote :

Can we change Importance to High ?
After 3-4 days of working I can't attache snapshot to issues
On example
https://bugs.launchpad.net/mos/+bug/1383257
https://bugs.launchpad.net/mos/+bug/1383265

Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

It is purely Fuel bug

affects: mos → fuel
Changed in fuel:
milestone: 6.0 → none
milestone: none → 6.0
Mike Scherbakov (mihgen)
Changed in fuel:
assignee: Fuel QA Team (fuel-qa) → Fuel Python Team (fuel-python)
milestone: 6.0 → 6.1
importance: Medium → High
Revision history for this message
Dima Shulyak (dshulyak) wrote :

From my pov it should be improved like this:

At the start of snapshot generation user will be able to provide timeline for logs, like

fuel snapshot --hours 5 --week 4

During snapshoting task in astute we will generate config for logrotate, execute logrotate and then collect snapshot

Revision history for this message
Dima Shulyak (dshulyak) wrote :

Hm, it seems i misunderstood capabilities of logrotate.
We need to be able to filter lines before certain date..

Dima Shulyak (dshulyak)
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Dima Shulyak (dshulyak)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/155349

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-web (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/157324

Revision history for this message
Dima Shulyak (dshulyak) wrote :

Assigning back to fuel-python, will get back to them ater ff

Changed in fuel:
assignee: Dima Shulyak (dshulyak) → Fuel Python Team (fuel-python)
status: In Progress → Triaged
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/155349
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=691c547b1782ae4778820814740d358e44d70ebd
Submitter: Jenkins
Branch: master

commit 691c547b1782ae4778820814740d358e44d70ebd
Author: Dmitry Shulyak <email address hidden>
Date: Thu Feb 12 17:15:37 2015 +0200

    Allow to provide pattern that will exclude certain files matched by it

    Directory/File driver accepts exclude parameter that
    will delete files after they are fetched.
    Exclude parameter will accept unix file patterns.

    They are deleted only after, because fabric fetches files/directories
    by sftp and it is not reasonable to pre-check pattern.

    DocImpact
    Partial-Bug: 1382511

    Change-Id: Ia6ee065066682b9a01733334d16d4fe3bc6b1a68

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-web (stable/6.0)

Related fix proposed to branch: stable/6.0
Review: https://review.openstack.org/162930

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/157324
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=b81a7ca70386b09f35e4aab943018e98d3bd29fc
Submitter: Jenkins
Branch: master

commit b81a7ca70386b09f35e4aab943018e98d3bd29fc
Author: Dmitry Shulyak <email address hidden>
Date: Thu Feb 19 12:46:07 2015 +0200

    Expose default snapshot config and allow to pass to snapshot request

    Config will be exposed and dumped into file by fuel client.
    After this is done, it will be directly consumed by shotgun
    or some storage right in nailgun will be implemented to use
    user defined config

    Change-Id: I5e01a7459cefe49a128192d82dd827f02866f909
    Related-Bug: 1382511
    Related-Bug: 1420054

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-web (stable/6.0)

Related fix proposed to branch: stable/6.0
Review: https://review.openstack.org/162938

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-web (stable/6.0)

Reviewed: https://review.openstack.org/162930
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=bd69e7dda1e14a84f120e57bd6fd8678acc994ec
Submitter: Jenkins
Branch: stable/6.0

commit bd69e7dda1e14a84f120e57bd6fd8678acc994ec
Author: Dmitry Shulyak <email address hidden>
Date: Thu Feb 19 12:46:07 2015 +0200

    Expose default snapshot config and allow to pass to snapshot request

    Config will be exposed and dumped into file by fuel client.
    After this is done, it will be directly consumed by shotgun
    or some storage right in nailgun will be implemented to use
    user defined config

    Change-Id: I5e01a7459cefe49a128192d82dd827f02866f909
    Related-Bug: 1382511
    Related-Bug: 1420054

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-web (stable/6.0)

Related fix proposed to branch: stable/6.0
Review: https://review.openstack.org/163435

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to python-fuelclient (master)

Reviewed: https://review.openstack.org/157738
Committed: https://git.openstack.org/cgit/stackforge/python-fuelclient/commit/?id=bb5434f0cfefd41ee53518f936332ed2d74b1abe
Submitter: Jenkins
Branch: master

commit bb5434f0cfefd41ee53518f936332ed2d74b1abe
Author: Dmitry Shulyak <email address hidden>
Date: Fri Feb 20 14:08:51 2015 +0200

    Add commands to operate snapshot config

    Added two commands to dump snapshot config
    and provide it for snapshot task

    Will download default config and dump it to stdout

    fuel snapshot --conf > conf.yaml

    Next command will accept config as input and bypass it to snapshot
    generation request

    fuel snapshot < conf.yaml

    This is required to overwrite certain parameters if needed,
    some of those are:
    - target directory of snapshot generation
    - timeout of snapshot generation procedure
    - exclusion/inclusion of certain directories

    Depends on I5e01a7459cefe49a128192d82dd827f02866f909
    Related-Bug: 1382511
    Related-Bug: 1420054
    DocImpact

    Change-Id: I1ddab26fba1346dad30955289e7d28f4d3aa1562

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-web (stable/6.0)

Change abandoned by Dmitry Shulyak (<email address hidden>) on branch: stable/6.0
Review: https://review.openstack.org/162938

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-web (stable/6.0)

Reviewed: https://review.openstack.org/163435
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=f5f2e9080c51b9044983e63f7f32870e3b256a5b
Submitter: Jenkins
Branch: stable/6.0

commit f5f2e9080c51b9044983e63f7f32870e3b256a5b
Author: Dmitry Shulyak <email address hidden>
Date: Thu Feb 19 12:46:07 2015 +0200

    Expose default snapshot config and allow to pass to snapshot request

    Config will be exposed and dumped into file by fuel client.
    After this is done, it will be directly consumed by shotgun
    or some storage right in nailgun will be implemented to use
    user defined config

    Original-change: I5e01a7459cefe49a128192d82dd827f02866f909
    Cant assign same one, because it was reverted one time

    Change-Id: I44f6b426d67bda62d54cbf4a751a78b689e06642
    Related-Bug: 1382511
    Related-Bug: 1420054

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/162938
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=f48ffff2efd0436146942ec33049b23783a390e6
Submitter: Jenkins
Branch: stable/6.0

commit f48ffff2efd0436146942ec33049b23783a390e6
Author: Dmitry Shulyak <email address hidden>
Date: Fri Feb 20 14:08:51 2015 +0200

    Add commands to operate snapshot config

    Added two commands to dump snapshot config
    and provide it for snapshot task

    Will download default config and dump it to stdout

    fuel snapshot --conf > conf.yaml

    Next command will accept config as input and bypass it to snapshot
    generation request

    fuel snapshot < conf.yaml

    This is required to overwrite certain parameters if needed,
    some of those are:
    - target directory of snapshot generation
    - timeout of snapshot generation procedure
    - exclusion/inclusion of certain directories

    Related-Bug: 1382511
    Related-Bug: 1420054

    Change-Id: I1ddab26fba1346dad30955289e7d28f4d3aa1562

Dina Belova (dbelova)
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Dima Shulyak (dshulyak)
Dima Shulyak (dshulyak)
Changed in fuel:
status: Triaged → In Progress
Dmitry Pyzhov (dpyzhov)
tags: added: module-shotgun
Revision history for this message
Alexander Bozhenko (alexbozhenko) wrote :

To fix this, and a lot of other problems with snapshot we need to be able to modify snapshot template(value of DUMP in nailgun settings). Since we already have shotgun made very universal, it is low-hanging-fruit.
Take a look at
https://github.com/stackforge/fuel-web/blob/713e6684f9f54e29acfe6b8ebf641b9de2292628/nailgun/nailgun/settings.yaml#L651
It is very clear how to use that template.
Just need to add feature to override snapshot tempalte(I mean DUMP part of settings.yaml) by passing it to cli. Similar to this
https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=f48ffff2efd0436146942ec33049b23783a390e6
but add something like:
fuel snapshot --template file_with_value_of_DUMP_part_of_settings.yaml

Another sing to improve is to distinguish nodes by roles in the template
https://review.openstack.org/#/c/168262/
Please, review.

Revision history for this message
Dima Shulyak (dshulyak) wrote :

The problem is that whole /var/log copied, even with rotated logs.
To reduce size i've added a parameter to exclude logs by pattern, smth like ['*.gz'].

It will not cover all cases, but mainly it will allow to logrotate previous logs by force before starting new tests on scale lab.

Revision history for this message
Dima Shulyak (dshulyak) wrote :

After second thought - i am not sure that [1] is proper fix for this issue, so i am assigning it back to fuel-python

 [1] https://review.openstack.org/#/c/155349/

Changed in fuel:
assignee: Dima Shulyak (dshulyak) → Fuel Python Team (fuel-python)
status: In Progress → Confirmed
Kamil Sambor (ksambor)
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Kamil Sambor (ksambor)
Revision history for this message
Alexander Bozhenko (alexbozhenko) wrote :

I thing we have three sides of problem with huge amount of logs included to the snapshot:
1) It takes long time to get them, and it is quite often that snapshot task ends by timeout (default 60 mins), and confusing user.
2) When snaphot is creating it making a copy of whole /var/log/ to /var/www/nailgun/dump
On big environments there is not enough space, and task failed
3) Resulting snapshot is huge, and it is pain to transfer to support.

>that will delete files after they are fetched.
This solution ([1] https://review.openstack.org/#/c/155349/)
>>Directory/File driver accepts exclude parameter that will delete files after they are fetched.
solves only number 3)
I think we need to modify it to solve parts 1 and 2 too, by not fetching them at all.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-web (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/174769

Revision history for this message
Kamil Sambor (ksambor) wrote :

Now we can skip some subdirectories or files using exclude param in settings so problem 2 should be fixed. This functionality have been added in https://review.openstack.org/#/c/172481/ . So now we don't need to copy whole /var/log when we will use fuelclient with command fuel snapshot < conf.yaml . Also I add fix which will remove old snapshots from environment and this will give as more space on the disk especially on environments which had a lot of snapshots.
Additionally as a temporary fix for problem 1 I add bigger timeout, but time outing snapshots problem should be fixed by Nikolay Markov plans to make shotgun asynchronously

Revision history for this message
Alexander Bozhenko (alexbozhenko) wrote :

>> So now we don't need to copy whole /var/log
But the whole copy made, and then subdir is deleted using utils.remove, right?
So It definitely give us ability to reduce resulting snapshot size, but doesn't changes free space requirements.
So snapshot may faill if we don't nave at least (2 * /var/log/ free space)

>Also I add fix which will remove old snapshots from environment and this will give as more space on the disk especially on environments which had a lot of snapshots.
I'm not sure about that because snapshots contains dump of nailgun postgres db and cobbler data.
When containers corrupted old snapshots saved us several times: we got db dump from snapshot and restored it.

Revision history for this message
Kamil Sambor (ksambor) wrote :

You remove only previous snapshots, so you will have always one snapshot (of course if you made some in the past)

Kamil Sambor (ksambor)
Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/174769
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=af18fdb16fb7ed222a3cda25aa5283f9b26c0041
Submitter: Jenkins
Branch: master

commit af18fdb16fb7ed222a3cda25aa5283f9b26c0041
Author: Kamil Sambor <email address hidden>
Date: Fri Apr 17 10:45:59 2015 +0200

    Add removing old snapshots

    * we keep old snapshots and this take a lot
      of space on the drive, so when generating
      snapshot starts we remove all old snapshots
    * we have problem with timeouts, as a temporaly
      solution snapshot timeout have been increased

    Change-Id: Ide64d5c745f9e97fe237a26d8de4f67f9cd10675
    Related-Bug: #1382511

Kamil Sambor (ksambor)
Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Leontii Istomin (listomin) wrote :

with 511 and 521 builds a snapshot keeps much less disk space. With 200 nodes during 48 hours ~ 30G

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.