Old lvm metadata on partitons breaks deployment of ceph-osd

Bug #1323707 reported by Aleksandr Didenko on 2014-05-27
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Medium
Aleksandr Didenko
4.1.x
Medium
Aleksandr Didenko
5.0.x
Medium
Aleksandr Didenko

Bug Description

{
    "api": "1.0",
    "astute_sha": "a7eac46348dc77fc2723c6fcc3dbc66cc1a83152",
    "build_id": "2014-05-26_18-06-28",
    "build_number": "24",
    "fuellib_sha": "2f79c0415159651fc1978d99bd791079d1ae4a06",
    "fuelmain_sha": "d7f86968880a484d51f99a9fc439ef21139ea0b0",
    "mirantis": "yes",
    "nailgun_sha": "bd09f89ef56176f64ad5decd4128933c96cb20f4",
    "ostf_sha": "89bbddb78132e2997d82adc5ae5db9dcb7a35bcd",
    "production": "docker",
    "release": "5.0"
}

Environment:
multinode, 1 controller+ceph-osd, 1 compute+ceph-osd, 1 mongodb.
        "volumes_lvm": False,
        "volumes_ceph": True,
        "images_ceph": True,
        "murano": True,
        "sahara": True,
        "ceilometer": True,
        "net_provider": 'neutron',
        "net_segment_type": 'gre',
        "libvirt_type": "kvm"

After several re-deployments of the same environment I've hit the following deployment error on controller+cephosd node:

/usr/bin/ceph-deploy osd prepare node-2:/dev/sda5
raise Error('Device is in use by a device-mapper mapping (dm-crypt?)' % dev, ','.join(holders)

Checking the problem controller+cephosd node showed that we have "mongo" LVM on /dev/sda5 that is supposed to be used for ceph-osd (we're running mongodb on the different node):

# lvdisplay
  --- Logical volume ---
  LV Name /dev/mongo/mongodb
  VG Name mongo
  LV UUID EKaLrw-AYUR-e241-vqzo-zRMp-XxVZ-OMSrvB
  LV Write Access read/write
  LV Status available
  # open 0
  LV Size 173.66 GiB
  Current LE 5557
  Segments 1
  Allocation inherit
  Read ahead sectors auto
  - currently set to 256
  Block device 252:0

# pvdisplay
  --- Physical volume ---
  PV Name /dev/sda5
  VG Name mongo
  PV Size 173.71 GiB / not usable 27.00 MiB
  Allocatable yes
  PE Size 32.00 MiB
  Total PE 5558
  Free PE 1
  Allocated PE 5557
  PV UUID aEx26m-XxSQ-8eUK-63Io-SqBn-vZAZ-UTnokS

It looks like this is happening because we create /dev/sda5 partition during provisioning after lvm metadata cleaning and it may retain old lvm metadata:

2014-05-27T09:48:41.291249+00:00 notice: find /dev ( -type l -o -type b ) -exec ls -l {} ;
2014-05-27T09:48:41.294841+00:00 notice: brw------- 1 root root 8, 4 May 27 09:48 /dev/sda4
2014-05-27T09:48:41.296618+00:00 notice: brw------- 1 root root 8, 3 May 27 09:48 /dev/sda3
2014-05-27T09:48:41.298284+00:00 notice: brw------- 1 root root 8, 2 May 27 09:48 /dev/sda2
2014-05-27T09:48:41.300282+00:00 notice: brw------- 1 root root 8, 1 May 27 09:48 /dev/sda1
...
2014-05-27T09:48:42.368051+00:00 notice: === before additional cleaning ===
2014-05-27T09:48:42.370608+00:00 notice: vgs -a --noheadings
2014-05-27T09:48:42.380603+00:00 notice: No volume groups found
....
2014-05-27T09:48:42.598125+00:00 notice: parted -a none -s $(readlink -f $( (ls /dev/disk/by-id/wwn-0x50014ee1033d5fd6 ||
2014-05-27T09:48:42.599108+00:00 notice: ls /dev/disk/by-id/scsi-SATA_WDC_WD2502ABYS-_WD-WCAT1H270422 || ls /dev/disk/by
2014-05-27T09:48:42.600087+00:00 notice: -id/ata-WDC_WD2502ABYS-18B7A0_WD-WCAT1H270422 || ls /dev/disk/by-path/pci-0000:0
2014-05-27T09:48:42.601049+00:00 notice: 0:1f.2-scsi-0:0:0:0) 2>/dev/null) ) unit MiB mkpart primary 60195 238078
....
2014-05-27T09:48:53.260547+00:00 notice: find /dev ( -type l -o -type b ) -exec ls -l {} ;
2014-05-27T09:48:53.264081+00:00 notice: lrwxrwxrwx 1 root root 7 May 27 09:48 /dev/mongo/mongodb -> ../dm-0
2014-05-27T09:48:53.267014+00:00 notice: brw------- 1 root root 252, 0 May 27 09:48 /dev/dm-0
2014-05-27T09:48:53.268727+00:00 notice: brw------- 1 root root 8, 5 May 27 09:48 /dev/sda5
2014-05-27T09:48:53.271018+00:00 notice: brw------- 1 root root 8, 4 May 27 09:48 /dev/sda4
2014-05-27T09:48:53.272770+00:00 notice: brw------- 1 root root 8, 3 May 27 09:48 /dev/sda3
2014-05-27T09:48:53.274522+00:00 notice: brw------- 1 root root 8, 2 May 27 09:48 /dev/sda2
2014-05-27T09:48:53.276428+00:00 notice: brw------- 1 root root 8, 1 May 27 09:48 /dev/sda1

As you can see in the logs above, we had no LVM data on our disks, but it appeared right after we created /dev/sda5 partition.

Aleksandr Didenko (adidenko) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/95795

Changed in fuel:
status: New → In Progress
description: updated
Aleksandr Didenko (adidenko) wrote :

Another problem that we can have old "xfs" (or other fs) after some previous installations right after we create new partition during provisioning. And it may fail "ceph-deploy osd prepare node-4:/dev/sdb2" command too. So we need to make sure we clear old filesystems from the newly created partition (/dev/sdb2) not the entire disk (/dev/sdb). I've updated https://review.openstack.org/95795 to address this.

Reviewed: https://review.openstack.org/95795
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=8f2bc4890b7b2022af0fc821f6b9a6c6c4f869fd
Submitter: Jenkins
Branch: master

commit 8f2bc4890b7b2022af0fc821f6b9a6c6c4f869fd
Author: Aleksandr Didenko <email address hidden>
Date: Tue May 27 18:05:44 2014 +0300

    Clean lvm metadata after creating partitions

    Run erase_lvm_metadata() after we create all the needed partitions
    to make sure we don't have any old LVM data.

    We also need to clean old filesystem from newly created partition,
    that may retain after previous deployments.

    Change-Id: I6ee241aca94a4db7bae2b359af7244dee6a53150
    Closes-bug: #1323707

Changed in fuel:
status: In Progress → Fix Committed

Reviewed: https://review.openstack.org/96774
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=fc6b47a89fa5a1e983642a7dbec2fff2d12514b7
Submitter: Jenkins
Branch: stable/5.0

commit fc6b47a89fa5a1e983642a7dbec2fff2d12514b7
Author: Aleksandr Didenko <email address hidden>
Date: Tue May 27 18:05:44 2014 +0300

    Clean lvm metadata after creating partitions

    Run erase_lvm_metadata() after we create all the needed partitions
    to make sure we don't have any old LVM data.

    We also need to clean old filesystem from newly created partition,
    that may retain after previous deployments.

    Change-Id: I6ee241aca94a4db7bae2b359af7244dee6a53150
    Closes-bug: #1323707

Reviewed: https://review.openstack.org/97231
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=6f23c98e70abddd438d1ede2559ad71d61dea218
Submitter: Jenkins
Branch: stable/4.1

commit 6f23c98e70abddd438d1ede2559ad71d61dea218
Author: Aleksandr Didenko <email address hidden>
Date: Tue May 27 18:05:44 2014 +0300

    Clean lvm metadata after creating partitions

    Run erase_lvm_metadata() after we create all the needed partitions
    to make sure we don't have any old LVM data.

    We also need to clean old filesystem from newly created partition,
    that may retain after previous deployments.

    Change-Id: I6ee241aca94a4db7bae2b359af7244dee6a53150
    Closes-bug: #1323707
    (cherry picked from commit fc6b47a89fa5a1e983642a7dbec2fff2d12514b7)

Anastasia Palkina (apalkina) wrote :

Reproduced on ISO 344
"build_id": "2014-06-05_15-25-50",
"mirantis": "yes",
"build_number": "344",
"nailgun_sha": "a828d6b7610f872980d5a2113774f1cda6f6810b",
"ostf_sha": "2b7b39e4b6ea89751b65171f24a8e80b5cac56aa",
"fuelmain_sha": "9964da7dec34d3100419c1c77c8f5235d8e30f14",
"astute_sha": "55df06b2e84fa5d71a1cc0e78dbccab5db29d968",
"release": "4.1B",
"fuellib_sha": "3511461a2b529619a787a6306441d9039699e71d"

1. Create new environment (Ubuntu, HA mode)
2. Choose VLAN segmentation
3. Choose both ceph
4. Add 3 controllers+ceph, 2 compute+ceph
5. Choose rados in openstack settings
6. Start deployment. It was successful
7. But there is error on controller (node-12) in puppet.log:

/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) change from notrun to 0 failed: ceph-deploy osd prepare node-12:/dev/sdb2 node-12:/dev/sdc2 returned 1 instead of one of [0] at /etc/puppet/modules/ceph/manifests/osd.pp:27

Controllers node-12,13,14

Anastasia Palkina (apalkina) wrote :
Dmitry Borodaenko (angdraug) wrote :

If the patch from I6ee241aca94a4db7bae2b359af7244dee6a53150 did not fix this problem, we're likely to still have that issue in 5.0 and 5.1 as well, please confirm.

Dmitry Borodaenko (angdraug) wrote :

This bug is no longer as easy to reproduced as it used to be (it's either intermittent now or requires a more complex configuration), so it can't be a release blocker for 4.1.1.

Ryan Moe (rmoe) wrote :

I can't reproduce this on 4.1.1.

4.1.X -- doesn't support Telemtry
5.0.X -- verified (build 73)
5.1.X -- verified (build 272)

tags: added: customer-found release-notes
Meg McRoberts (dreidellhasa) wrote :

Listed in "Fixed Issues" in 5.0.1 Release Notes.

Dmitry Pyzhov (dpyzhov) on 2014-08-15
no longer affects: fuel/5.1.x
Changed in fuel:
milestone: 5.0.1 → 5.1
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers