docker backup sometimes is corrupt and cannot be restored

Bug #1541539 reported by Alexander Kurenyshev
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Fuel Library (Deprecated)
6.1.x
Invalid
High
Alexey Stupnikov
7.0.x
Invalid
High
Alexey Stupnikov
8.0.x
Fix Released
High
Denis Puchkin
Mitaka
Invalid
High
Fuel Library (Deprecated)

Bug Description

Something is wrong with restore operation, it has returned code 1, reproduced on CI
https://product-ci.infra.mirantis.net/job/8.0.system_test.ubuntu.backup_restore_master/127/console

Exception: Unexpected exit_code returned: actual 1, expected 0. Command: 'dockerctl restore /var/backup/fuel/backup_2016-02-04_0438/fuel_backup_2016-02-04_0438.tar.lrz
' Details: {'host': '10.109.25.2', 'command': 'dockerctl restore /var/backup/fuel/backup_2016-02-04_0438/fuel_backup_2016-02-04_0438.tar.lrz\n', 'exit_code': 1, 'stderr': ['\n', 'Restore failed!\n'], 'stdout': ['Stopping containers...\n', 'Stopping nginx...\n', 'Stopping rabbitmq...\n', 'Stopping astute...\n', 'Stopping rsync...\n', 'Stopping keystone...\n', 'Stopping postgres...\n', 'Stopping rsyslog...\n', 'Stopping nailgun...\n', 'Stopping cobbler...\n', 'Stopping ostf...\n', 'Stopping mcollective...\n', 'Output filename is: /var/backup/fuel/restore-2016-02-04_0438//fuel_backup.tar\n', 'Decompressing...\n', ' 10% 743.42 / 7433.99 MB\r 20% 1486.82 / 7433.99 MB\r 25% 1898.91 / 7433.99 MB\r 30% 2230.22 / 7433.99 MB\r 40% 2973.60 / 7433.99 MB\r 50% 3717.02 / 7433.99 MB\r 51% 3797.70 / 7433.99 MB\r 60% 4460.42 / 7433.99 MB\r 70% 5203.85 / 7433.99 MB\r 76% 5696.61 / 7433.99 MB\r 80% 5947.23 / 7433.99 MB\r 90% 6690.65 / 7433.99 MB\r100% 7433.99 / 7433.99 MB\r\n', 'Average DeCompression Speed: 45.323MB/s\n', 'MD5 CHECK FAILED.\n', 'Stored:cc4b50cab33e62354bb312e514d7f585\n', 'Output file:2f2895074ec5b268a300275008f537f9Fatal error - exiting\n']}

Revision history for this message
Alexander Kurenyshev (akurenyshev) wrote :
Revision history for this message
Vladimir Khlyunev (vkhlyunev) wrote :

As I see in test log -
'host': '10.109.20.2',
 'command': 'fuel --env 2 settings --upload --dir /tmp --json',
'exit_code': 1,
'stderr': ['500 Server Error: Internal Server Error ((psycopg2.InternalError) unexpected data beyond EOF in block 4 of relation base/16389/17060\n', 'HINT: This has been seen to occur with buggy kernels; consider updating your system.\n'.....

Looks like restore just not applied - so third node is just an indicator.

Changed in fuel:
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

This bug does not affect the deployed cloud, lowering to medium

tags: added: area-python
Revision history for this message
Nastya Urlapova (aurlapova) wrote :

@Bogdan, this issue affects backup/restore feature + upgrades for master node.

Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

This is a known issue. You cannot backup Fuel (and its nailgun DB), make changes, and then restore. Nailgun's DB cannot compensate for _ANY_ changes in deployment state when restoring from backup. All those nodes need to be redeployed since the last backup in order to bring the environment to a functional and managed state.

If python team can come up with a trick to make this work, then we can fix this bug. I'm of the opinion that this is a bug we cannot fix and it has been known since Fuel 6.1: https://docs.mirantis.com/openstack/fuel/fuel-6.1/release-notes.html#fuel-installation-and-deployment

Revision history for this message
Alexander Kurenyshev (akurenyshev) wrote :

Something is wrong with restore operation, it has returned code 1, reproduced on CI
https://product-ci.infra.mirantis.net/job/8.0.system_test.ubuntu.backup_restore_master/127/console

Exception: Unexpected exit_code returned: actual 1, expected 0. Command: 'dockerctl restore /var/backup/fuel/backup_2016-02-04_0438/fuel_backup_2016-02-04_0438.tar.lrz
' Details: {'host': '10.109.25.2', 'command': 'dockerctl restore /var/backup/fuel/backup_2016-02-04_0438/fuel_backup_2016-02-04_0438.tar.lrz\n', 'exit_code': 1, 'stderr': ['\n', 'Restore failed!\n'], 'stdout': ['Stopping containers...\n', 'Stopping nginx...\n', 'Stopping rabbitmq...\n', 'Stopping astute...\n', 'Stopping rsync...\n', 'Stopping keystone...\n', 'Stopping postgres...\n', 'Stopping rsyslog...\n', 'Stopping nailgun...\n', 'Stopping cobbler...\n', 'Stopping ostf...\n', 'Stopping mcollective...\n', 'Output filename is: /var/backup/fuel/restore-2016-02-04_0438//fuel_backup.tar\n', 'Decompressing...\n', ' 10% 743.42 / 7433.99 MB\r 20% 1486.82 / 7433.99 MB\r 25% 1898.91 / 7433.99 MB\r 30% 2230.22 / 7433.99 MB\r 40% 2973.60 / 7433.99 MB\r 50% 3717.02 / 7433.99 MB\r 51% 3797.70 / 7433.99 MB\r 60% 4460.42 / 7433.99 MB\r 70% 5203.85 / 7433.99 MB\r 76% 5696.61 / 7433.99 MB\r 80% 5947.23 / 7433.99 MB\r 90% 6690.65 / 7433.99 MB\r100% 7433.99 / 7433.99 MB\r\n', 'Average DeCompression Speed: 45.323MB/s\n', 'MD5 CHECK FAILED.\n', 'Stored:cc4b50cab33e62354bb312e514d7f585\n', 'Output file:2f2895074ec5b268a300275008f537f9Fatal error - exiting\n']}

Revision history for this message
Alexander Kurenyshev (akurenyshev) wrote :
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

Something is wrong with lrzip or the storage if the file doesn't verify md5sum. That is quite an unusual issue. We can add a .md5 file on backup and verify it when restoring first, but it can't be helpful in resolving this.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/276261

Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Matthew Mosesohn (raytrac3r)
status: Confirmed → In Progress
Revision history for this message
Andrey Maximov (maximov) wrote : Re: Node is still presented after restoring of backup

In the log:

MD5 CHECK FAILED.
Stored:cc4b50cab33e62354bb312e514d7f585
2f2895074ec5b268a300275008f537f9
Fatal error - exiting

so looks like backup operation failed ?

Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

It seems that the original reported bug is not important and #6 is taking priority here. If anyone disagrees, feel free to write.

Corruption of backups is likely related to a hardware issue and not anything related to packages or dockerctl. My patch adds errors and extra verification to the archive, but can't fix the corruption itself.

With my comments in mind, we are moving this bug to Medium status and fixing only in master (9.0), with 8.0 as wontfix

summary: - Node is still presented after restoring of backup
+ docker backup sometimes is corrupt and cannot be restored
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/276261
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=b8eb70d85567973b98764a4ccc4fb7140c1aef44
Submitter: Jenkins
Branch: master

commit b8eb70d85567973b98764a4ccc4fb7140c1aef44
Author: Matthew Mosesohn <email address hidden>
Date: Thu Feb 4 16:49:07 2016 +0300

    Verify backup integity during backup and restore

    md5sum of backup archive is now checked during backup
    and verified before before restore.

    Also fixes 02393e503db32ee3c790e249df11c967dfcd1af4
    which accidentally forced full backups

    Change-Id: Ib1ea740ed248838708a09b098cf14021014df18b
    Closes-Bug: #1541539

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Dmitriy Kruglov (dkruglov) wrote :

Does fixing 'corrupted backups' defect is skipped for 8.0?
This bug is constantly reproduced at the moment (tried it manually on builds 530 and 533; CI constantly shows this error on the corresponding test job since Feb 04).
So there are 2 points - the functionality doesn't work now and and we cannot verify the existing fixes for 8.0 (e.g this couple of high priority bugs - https://bugs.launchpad.net/fuel/+bug/1536314; https://bugs.launchpad.net/fuel/+bug/1538052).

Revision history for this message
Dmitriy Kruglov (dkruglov) wrote :

Sorry, the mentioned https://bugs.launchpad.net/fuel/+bug/1538052 one from the comment above is only for 9.0.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/8.0)

Fix proposed to branch: stable/8.0
Review: https://review.openstack.org/277375

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/8.0
Review: https://review.openstack.org/278986

Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

We still lack a real fix for 8.0. lrzip creates invalid archives sometimes, but only on archives greater than 2gb. I can't create any smaller archives that will corrupt. Additionally, upgrading to lrzip 0.621 does not reduce the likelihood of creating bad archives. So far as I can tell, this only happens on virtual hardware running KVM.

We need help from mos-linux team to help discover a solution.

tags: added: area-linux docker
removed: area-python
Revision history for this message
Nastya Urlapova (aurlapova) wrote :

Please, developers don't change Importance for this one.

Revision history for this message
Aleksander Mogylchenko (amogylchenko) wrote :

Can we reproduce this problem without dockerctl at all? E.g. what shell command (or a series of commands) I should run on plain CentOS 7.1 in order to reproduce this problem?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/8.0)

Change abandoned by Matthew Mosesohn (<email address hidden>) on branch: stable/8.0
Review: https://review.openstack.org/277375

Revision history for this message
Ivan Suzdal (isuzdal) wrote :

Looks like lrzip calculates wrong checksum during creation archive.
When I set compression window I got success.
http://paste.openstack.org/show/488516/
My suggestion is append '-w X' (in hundreds of MB) to lrzip options

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Matthew Mosesohn (<email address hidden>) on branch: stable/8.0
Review: https://review.openstack.org/278986

Revision history for this message
Dmitry Teselkin (teselkin-d) wrote :

We've found root cause, please take a look at comment #21 (https://bugs.launchpad.net/fuel/+bug/1541539/comments/21)

Revision history for this message
Vadim Rovachev (vrovachev) wrote :

According to last comment this problem in puppet codebase. Puppet team please take a look.

tags: added: area-puppet
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/8.0)

Fix proposed to branch: stable/8.0
Review: https://review.openstack.org/350398

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/8.0)

Reviewed: https://review.openstack.org/350398
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=fbfa950efff2f9fd5343098fe6c07ae56df77c9f
Submitter: Jenkins
Branch: stable/8.0

commit fbfa950efff2f9fd5343098fe6c07ae56df77c9f
Author: Denis Puchkin <email address hidden>
Date: Wed Aug 3 09:22:43 2016 +0300

    set the max compression window in lrzip

    When lrzip running in virtual env, it can't calculate
    the correct MD5 checksum of source file.
    Workaround for this - set the maximum allowable compression window
    size to 500MB

    Closes-Bug: #1541539
    Change-Id: If1ab921425939340714e9c90979f216255b5b7f3

Revision history for this message
Vladimir Jigulin (vjigulin) wrote :
Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

7.0.system_test.ubuntu.backup_restore_master tests are all green. This bug is Invalid for Fuel 7.0.

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

Steps to reproduce:
1. Install fuel on KVM-emulated node. Sustaining lab fits well.

2. Create files and stuff to increase container's size to 3 GB
# cd /etc/fuel
# wget http://mirror.yandex.ru/centos/6/isos/x86_64/CentOS-6.8-x86_64-bin-DVD2.iso

3. Backup
# dockerctl backup

4. Try to extract archive's contents:
# lrzip -d /var/backup/fuel/backup_2016-10-13_1150/fuel_backup_2016-10-13_1150.tar.lrz

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

I was unable to reproduce this issue for Fuel6.1, moving to Invalid for 6.1-updates.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.