Mirantis OpenStack

possible data corruption using ceph rdb with caching enabled

Bug #1627775 reported by Evgeny Kozhemyakin on 2016-09-26

This bug affects 1 person

	Status	Importance	Assigned to	Milestone
Mirantis OpenStack	Fix Released	High	MOS Ceph	Mirantis OpenStack 9.2
7.0.x	Won't Fix	High	Alexey Stupnikov	Mirantis OpenStack 7.0-updates
8.0.x	Fix Released	High	Alexey Stupnikov	Mirantis OpenStack 8.0-mu-4
9.x	Fix Released	High	MOS Ceph	Mirantis OpenStack 9.2

Bug Description

Detailed bug description: spurious page corruptions in SQL Server running on Windows 2012R2 instances. The instance use ceph rbd storage with cache enabled.
The issue is not reproducible on LVM/file based storage.

Steps to reproduce: run SQL Server running on Windows 2012R2 or SQLioSim (stress test utility emulating SQL server)

Expected results: no errors

Actual result:
xpected FileId: 0x0
Received FileId: 0x0
Expected PageId: 0xCB19C
Received PageId: 0xCB19A (does not match expected)
Received CheckSum: 0x9F444071
Calculated CheckSum: 0x89603EC9 (does not match expected)
Received Buffer Length: 0x2000

Reproducibility: steadily reproducable with SQLioSim
Was reproduced in
MOS 6.0
MOS 7.0
MOS 8.0

Workaround: completely disabling rbd cache.
But it's not acceptable due to significant performance degradation.
SQL server cannot keep up with the required transaction rate.

See original description

Tags:

Evgeny Kozhemyakin (ekozhemyakin) on 2016-09-26

tags:	added: customer-found
Changed in mos:
importance:	Undecided → Critical
assignee:	nobody → MOS Ceph (mos-ceph)

Revision history for this message

Alexei Sheplyakov (asheplyakov) wrote on 2016-09-26:

The problem looks like the application bug (forgotten fsync() or whatever it is on Windows).
librbd/ceph is also responsible for writing the filesystem metadata, and there are no (guest) filesystem metadata inconsistencies (no kernel panics/BSODs).

Please run a filesystem stress test, preferably a metadata heavy one (for instance, writing a lot of small files in a single directory)

Revision history for this message

Alexei Sheplyakov (asheplyakov) wrote on 2016-09-26:

There's a work-around, so it's not critical at all

Changed in mos:
importance:	Critical → High

Polina Petriuk (ppetriuk) on 2016-09-27

tags:

added: support

Revision history for this message

Alexei Sheplyakov (asheplyakov) wrote on 2016-09-27:

For now it's not clear if it's the app fails to flush the data properly, or librbd corrupts the data (the former is most likely since the OS does not complain about inconsistent filesystem metadata). Changed the bug title accordingly

summary:

- data corruption using ceph rdb with caching enabled
+ possible data corruption using ceph rdb with caching enabled

Evgeny Kozhemyakin (ekozhemyakin) on 2016-09-27

description:

updated

Revision history for this message

Evgeny Kozhemyakin (ekozhemyakin) wrote on 2016-09-27:

I've chanched bug's description.

+The issue is not reproducible on LVM/file based storage.

Please note that the workaround is not acceptable in case of SQL transactions.
Restore rate from SQL mirror is unadequate.

+But it's not acceptable due to significant performance degradation.
SQL server cannot keep up with the required transaction rate.

Revision history for this message

Alexei Sheplyakov (asheplyakov) wrote on 2016-09-27:

> Please note that the workaround is not acceptable in case of SQL transactions.

Can you reproduce the bug with cache=directsync?

> Restore rate from SQL mirror is unadequate.

What's the "SQL mirror"?

> SQL server cannot keep up with the required transaction rate.

Using ceph as a database storage is quite challening and requires proper planning and tuning, see
https://www.youtube.com/watch?v=OqlC7S3cUKs

Revision history for this message

Evgeny Kozhemyakin (ekozhemyakin) wrote on 2016-09-27:

Sorry for being unclear. Let me site our customer.

"A few comments for the LP case to explain/comment:

The SQL server running in Openstack is the passive slave node in an MSSQL cluster. What happens when we switch to directsync mode is that the openstack hosted mirror node cannot keep up commiting the stream of transactions received from the master. This cluster had at the time a very moderate transaction rate and I guess it may have required 100-200 iops to get by.

Due to this we cannot say if corruptions would persist in SQL server using cache=directsync but as mentioned before, we cannot reproduce the bug when using the SQLiosim tool in a much better performing test environment.

We already know that we probably need to do work improving storage performance but the storage being slow should not cause data to be corrupted (at least not sliletly) and the errors can be reproduced in SQLiosim running solitary in pure SSD ceph pools where performance is good."

Revision history for this message

Evgeny Kozhemyakin (ekozhemyakin) wrote on 2016-09-27:

Guys could we please elevate importance level? This is a really critical issue.

Vitaly Sedelnik (vsedelnik) on 2016-09-28

tags:

added: area-ceph

Revision history for this message

Victor Denisov (vdenisov) wrote on 2016-10-04:

So far the theory is that we've encountered a race condition in qemu
process. Due to high latency of ceph storage(compared to local hard
drive) MSSQL(or qemu) supposedly forgets to flush some data.
It works fine with local hard drives, but inevitably leads to an issue
with high latency devices.

Revision history for this message

Evgeny Kozhemyakin (ekozhemyakin) wrote on 2016-10-17:

http://tracker.ceph.com/issues/17545

Revision history for this message

Alexei Sheplyakov (asheplyakov) wrote on 2016-10-25:

#10

https://review.fuel-infra.org/16073
https://review.fuel-infra.org/27739
https://review.fuel-infra.org/25721

Revision history for this message

Rodion Tikunov (rtikunov) wrote on 2016-11-21:

#11

Patch https://review.fuel-infra.org/#/c/25721/ has merged. So - fix commited

TatyanaGladysheva (tgladysheva) on 2016-11-23

tags:

added: on-verification

Revision history for this message

TatyanaGladysheva (tgladysheva) wrote on 2016-11-23:

#12

Verified on 9.2 snapshot #537.

Ceph was updated to 0.94.9 version:
root@node-4:~# dpkg -l | grep ceph | grep 0.94.9
ii ceph 0.94.9-1~u14.04+mos1 amd64 distributed storage and file system
ii ceph-common 0.94.9-1~u14.04+mos1 amd64 common utilities to mount and interact with a ceph storage cluster
ii libcephfs1 0.94.9-1~u14.04+mos1 amd64 Ceph distributed file system client library
ii python-ceph 0.94.9-1~u14.04+mos1 all Meta-package for python libraries for the Ceph libraries
ii python-cephfs 0.94.9-1~u14.04+mos1 amd64 Python libraries for the Ceph libcephfs library

tags:

removed: on-verification

Revision history for this message

Alexey Stupnikov (astupnikov) wrote on 2017-03-20:

#13

I have spoke with Alexey Sheplyakov and Denis Meltsaykin and concluded that we shouldn't merge a fix to stable/7.0 branch. Proposed patch contain a lot of changes, and could break existing installations if existing nodes aren't updated properly, or new nodes are deployed without updating the old ones.

If ceph cluster deployed with Fuel7 is to be updated, new packages can be downloaded from [1].

[1] http://perestroika-repo-tst.infra.mirantis.net/review/LP-1532882/mos-repos/ubuntu/7.0/

Revision history for this message

Alexey Stupnikov (astupnikov) wrote on 2017-03-20:

#14

It looks like patch to be merged to stable/8.0 shouldn't break anything. I will nominate it to the next MU.

Revision history for this message

Alexey Stupnikov (astupnikov) wrote on 2017-03-20:

#15

Changed 7.0-updates status to Won't Fix, as updated ceph packages will not be shipped with next MU. Workaround is to install them properly from temporary build system.

Revision history for this message

Alexey Stupnikov (astupnikov) wrote on 2017-03-30:

#16

MOS-linux team would like to review patches with successfull systest, so need to troubleshoot what is wrong with https://packaging-ci.infra.mirantis.net/job/8.0-pkg-systest-ubuntu/3397/

Repo to check: fuel-infra/jenkins-jobs

Revision history for this message

Fuel Devops McRobotson (fuel-devops-robot) wrote on 2017-04-04: Fix merged to packages/trusty/ceph (8.0)

#17

Reviewed: https://review.fuel-infra.org/27739
Submitter: Pkgs Jenkins <email address hidden>
Branch: 8.0

Commit: 35f0e943a0b7f4b57e3196031442f2808f304e22
Author: Alexei Sheplyakov <email address hidden>
Date: Thu Mar 23 08:29:29 2017

Fix possible rbd data corruption

Fixes http://tracker.ceph.com/issues/17545

Closes-bug: #1627775
Change-Id: Ia016914438da8ff649c86e0d1c46de728fa23707

TatyanaGladysheva (tgladysheva) on 2017-04-06

tags:

added: on-verification

Revision history for this message

TatyanaGladysheva (tgladysheva) wrote on 2017-04-07:

#18

Verified on 8.0 + MU4 updates.

Ceph was updated to 0.94.5-0u~u14.04+mos3+mos8.0+3 version:
root@node-17:~# dpkg -l | grep ceph | grep 0.94.5
ii ceph 0.94.5-0u~u14.04+mos3+mos8.0+3 amd64 distributed storage and file system
ii ceph-common 0.94.5-0u~u14.04+mos3+mos8.0+3 amd64 common utilities to mount and interact with a ceph storage cluster
ii libcephfs1 0.94.5-0u~u14.04+mos3+mos8.0+3 amd64 Ceph distributed file system client library
ii python-ceph 0.94.5-0u~u14.04+mos3+mos8.0+3 all Meta-package for python libraries for the Ceph libraries
ii python-cephfs 0.94.5-0u~u14.04+mos3+mos8.0+3 amd64 Python libraries for the Ceph libcephfs library

tags:

removed: on-verification

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.