Bug #1323343 “[library] ceph-deploy osd prepare node-5:/dev/vdb2...” : Series 5.0.x : Bugs : Fuel for OpenStack

Revision history for this message

Egor Kotko (ykotko) wrote on 2014-05-26:

#1

log.tar.gz Edit (21.0 MiB, application/x-tar)

Revision history for this message

Aleksandr Didenko (adidenko) wrote on 2014-05-27:

#2

fuel-snapshot-2014-05-27_10-21-13.tgz Edit (3.6 MiB, application/x-tar)

Download full text (5.0 KiB)

Reproduced several times on bare-metal (DELL)

{
    "api": "1.0",
    "astute_sha": "a7eac46348dc77fc2723c6fcc3dbc66cc1a83152",
    "build_id": "2014-05-26_18-06-28",
    "build_number": "24",
    "fuellib_sha": "2f79c0415159651fc1978d99bd791079d1ae4a06",
    "fuelmain_sha": "d7f86968880a484d51f99a9fc439ef21139ea0b0",
    "mirantis": "yes",
    "nailgun_sha": "bd09f89ef56176f64ad5decd4128933c96cb20f4",
    "ostf_sha": "89bbddb78132e2997d82adc5ae5db9dcb7a35bcd",
    "production": "docker",
    "release": "5.0"
}

Environment:
multinode, 1 controller+ceph-osd, 1 compute+ceph-osd, 1 mongodb.
        "volumes_lvm": False,
        "volumes_ceph": True,
        "images_ceph": True,
        "murano": True,
        "sahara": True,
        "ceilometer": True,
        "net_provider": 'neutron',
        "net_segment_type": 'gre',
        "libvirt_type": "kvm"

Sporadic errors during deployments:
2014-05-27T10:14:01.414383+00:00 err: ceph-deploy osd prepare node-2:/dev/sda5 returned 1 instead of one of [0]

More detailed logs:
2014-05-27T10:14:01.398761+00:00 notice: (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [node-2][INFO ] Running command: ceph-disk-prepare --fs-type xfs --cluster ceph -- /dev/sda5
2014-05-27T10:14:01.399020+00:00 notice: (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [node-2][ERROR ] Traceback (most recent call last):
2014-05-27T10:14:01.399728+00:00 notice: (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [node-2][ERROR ] File "/usr/lib/python2.7/dist-packages/ceph_deploy/osd.py", line 126, in prepare_disk
2014-05-27T10:14:01.400017+00:00 notice: (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [node-2][ERROR ] File "/usr/lib/python2.7/dist-packages/ceph_deploy/util/decorators.py", line 10, in inne
r
2014-05-27T10:14:01.400017+00:00 notice: (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [node-2][ERROR ] def inner(*args, **kwargs):
2014-05-27T10:14:01.400265+00:00 notice: (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [node-2][ERROR ] File "/usr/lib/python2.7/dist-packages/ceph_deploy/util/wrappers.py", line 6, in remote_
call
2014-05-27T10:14:01.401206+00:00 notice: (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [node-2][ERROR ] This allows us to only remote-execute the actual calls, not whole functions.
2014-05-27T10:14:01.401420+00:00 notice: (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [node-2][ERROR ] File "/usr/lib/python2.7/subprocess.py", line 511, in check_call
2014-05-27T10:14:01.401420+00:00 notice: (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [node-2][ERROR ] raise CalledProcessError(retcode, cmd)
2014-05-27T10:14:01.401519+00:00 notice: (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [node-2][ERROR ] CalledProcessError: Command '['ceph-disk-prepare', '--fs-type', 'xfs', '--cluster', 'ceph'
, '--', '/dev/sda5']' returned non-zero exit status 1
2014-05-27T10:14:01.402453+00:00 notice: (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [node-2][ERROR ] Traceback (most recent call last):
2014-05-27T10:14:01.402676+00:00...

Reproduced several times on bare-metal (DELL)

{
    "api": "1.0",
    "astute_sha": "a7eac46348dc77fc2723c6fcc3dbc66cc1a83152",
    "build_id": "2014-05-26_18-06-28",
    "build_number": "24",
    "fuellib_sha": "2f79c0415159651fc1978d99bd791079d1ae4a06",
    "fuelmain_sha": "d7f86968880a484d51f99a9fc439ef21139ea0b0",
    "mirantis": "yes",
    "nailgun_sha": "bd09f89ef56176f64ad5decd4128933c96cb20f4",
    "ostf_sha": "89bbddb78132e2997d82adc5ae5db9dcb7a35bcd",
    "production": "docker",
    "release": "5.0"
}

Environment:
multinode, 1 controller+ceph-osd, 1 compute+ceph-osd, 1 mongodb.
        "volumes_lvm": False,
        "volumes_ceph": True,
        "images_ceph": True,
        "murano": True,
        "sahara": True,
        "ceilometer": True,
        "net_provider": 'neutron',
        "net_segment_type": 'gre',
        "libvirt_type": "kvm"

Sporadic errors during deployments:
2014-05-27T10:14:01.414383+00:00 err:  ceph-deploy osd prepare node-2:/dev/sda5 returned 1 instead of one of [0]

More detailed  logs:
2014-05-27T10:14:01.398761+00:00 notice:  (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [node-2][INFO  ] Running command: ceph-disk-prepare --fs-type xfs --cluster ceph -- /dev/sda5
2014-05-27T10:14:01.399020+00:00 notice:  (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [node-2][ERROR ] Traceback (most recent call last):
2014-05-27T10:14:01.399728+00:00 notice:  (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [node-2][ERROR ]   File "/usr/lib/python2.7/dist-packages/ceph_deploy/osd.py", line 126, in prepare_disk
2014-05-27T10:14:01.400017+00:00 notice:  (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [node-2][ERROR ]   File "/usr/lib/python2.7/dist-packages/ceph_deploy/util/decorators.py", line 10, in inne
r
2014-05-27T10:14:01.400017+00:00 notice:  (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [node-2][ERROR ]     def inner(*args, **kwargs):
2014-05-27T10:14:01.400265+00:00 notice:  (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [node-2][ERROR ]   File "/usr/lib/python2.7/dist-packages/ceph_deploy/util/wrappers.py", line 6, in remote_
call
2014-05-27T10:14:01.401206+00:00 notice:  (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [node-2][ERROR ]     This allows us to only remote-execute the actual calls, not whole functions.
2014-05-27T10:14:01.401420+00:00 notice:  (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [node-2][ERROR ]   File "/usr/lib/python2.7/subprocess.py", line 511, in check_call
2014-05-27T10:14:01.401420+00:00 notice:  (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [node-2][ERROR ]     raise CalledProcessError(retcode, cmd)
2014-05-27T10:14:01.401519+00:00 notice:  (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [node-2][ERROR ] CalledProcessError: Command '['ceph-disk-prepare', '--fs-type', 'xfs', '--cluster', 'ceph'
, '--', '/dev/sda5']' returned non-zero exit status 1
2014-05-27T10:14:01.402453+00:00 notice:  (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [node-2][ERROR ] Traceback (most recent call last):
2014-05-27T10:14:01.402676+00:00 notice:  (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [node-2][ERROR ]   File "/usr/sbin/ceph-disk", line 2579, in <module>
2014-05-27T10:14:01.402910+00:00 notice:  (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [node-2][ERROR ]     main()
2014-05-27T10:14:01.403715+00:00 notice:  (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [node-2][ERROR ]   File "/usr/sbin/ceph-disk", line 2557, in main
2014-05-27T10:14:01.403905+00:00 notice:  (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [node-2][ERROR ]     args.func(args)
2014-05-27T10:14:01.403905+00:00 notice:  (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [node-2][ERROR ]   File "/usr/sbin/ceph-disk", line 1290, in main_prepare
2014-05-27T10:14:01.404096+00:00 notice:  (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [node-2][ERROR ]     verify_not_in_use(args.data)
2014-05-27T10:14:01.405038+00:00 notice:  (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [node-2][ERROR ]   File "/usr/sbin/ceph-disk", line 491, in verify_not_in_use
2014-05-27T10:14:01.405228+00:00 notice:  (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [node-2][ERROR ]     raise Error('Device is in use by a device-mapper mapping (dm-crypt?)' % dev, ','.join(
holders))
2014-05-27T10:14:01.405228+00:00 notice:  (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [node-2][ERROR ] TypeError: not all arguments converted during string formatting
2014-05-27T10:14:01.405228+00:00 notice:  (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [ceph_deploy.osd][ERROR ] Failed to execute command: ceph-disk-prepare --fs-type xfs --cluster ceph -- /dev
/sda5
2014-05-27T10:14:01.406363+00:00 notice:  (/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) [ceph_deploy][ERROR ] GenericError: Failed to create 1 OSDs

Changed in fuel:
importance:	Medium → Critical
milestone:	5.1 → 5.0

Revision history for this message

Aleksandr Didenko (adidenko) wrote on 2014-05-27:

#3

Bare-metal bug is pretty specific and could be reproduced only with numerous re-deployments on the same hardware. So lowering it back to Medium. Will create a bare-metal related bug separately.

Revision history for this message

Anastasia Palkina (apalkina) wrote on 2014-05-30:

#4

This bug reproduced on master ISO

"build_id": "2014-05-30_00-35-28",
"mirantis": "yes",
"build_number": "230",
"ostf_sha": "3e709cba57df7e958d01484aeb80ba7a3c875133",
"nailgun_sha": "3e2c266bf285b86f6d222c60aca999ea6a745b50",
"production": "docker",
"api": "1.0",
"fuelmain_sha": "e6a54762e6fa49f9d24a2f658f9d30d6b84b43a1",
"astute_sha": "b1f8d0eafed110fd748e473ea74674e7e1c495eb",
"release": "5.1",
"fuellib_sha": "bafa771b4ccd9df266cc2238810b67c6cc7aa995"

1. Create new environment (Ubuntu, simple mode)
2. Choose GRE segmentation
3. Choose both Ceph
4. Choose Sahara and Ceilometer installation
5. Add controller+ceph, compute+ceph, mongo
6. Start deployment. It has failed

Error in puppet.log on controller node:

(/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) change from notrun to 0 failed: ceph-deploy osd prepare node-1:/dev/sdb2 node-1:/dev/sdc2 returned 1 instead of one of [0]

Error in puppet.log on compute node:

(/Stage[main]/Ceph::Conf/Exec[ceph-deploy gatherkeys remote]/returns) change from notrun to 0 failed: ceph-deploy gatherkeys node-1 returned 1 instead of one of [0]

Revision history for this message

Anastasia Palkina (apalkina) wrote on 2014-05-30:

#5

fuel-snapshot-2014-05-30_11-53-29.tgz Edit (14.9 MiB, application/x-tar)

Revision history for this message

Vladimir Grujic (hyperbaba) wrote on 2014-06-20:

#6

The problem is in redeployment scenarios.
ceph-deploy creates an gpt partition on osd disk and tries to format it using mkfs_xfs.
Mkfs_xfs itself has a check for already existing xfs filesystem on partition and refuzes to format the partition.
I think not only lvm leftovers brake ceph recepie but this problem also.

Revision history for this message

Anastasia Palkina (apalkina) wrote on 2014-07-11:

#7

Reproduced on ISO #112
"build_id": "2014-07-10_00-39-56",
"mirantis": "yes",
"build_number": "112",
"ostf_sha": "09b6bccf7d476771ac859bb3c76c9ebec9da9e1f",
"nailgun_sha": "f5ff82558f99bb6ca7d5e1617eddddf7142fe857",
"production": "docker",
"api": "1.0",
"fuelmain_sha": "293015843304222ead899270449495af91b06aed",
"astute_sha": "5df009e8eab611750309a4c5b5c9b0f7b9d85806",
"release": "5.0.1",
"fuellib_sha": "364dee37435cbdc85d6b814a61f57800b83bf22d"

1. Create new environment (CentOS, simple mode)
2. Choose VLAN segmentation
3. Choose Ceph for images
4. Add controller, compute, cinder, 2 ceph
5. Untag storage network and move it to other interface
6. Start deployment. It was successful
7. But there is error in puppet.log on ceph node (node-35):

2014-07-11 09:49:47 ERR

(/Stage[main]/Ceph::Osd/Exec[ceph-deploy osd prepare]/returns) change from notrun to 0 failed: ceph-deploy osd prepare node-35:/dev/sdb4 node-35:/dev/sdc4 returned 1 instead of one of [0]

Revision history for this message

Anastasia Palkina (apalkina) wrote on 2014-07-11:

#8

Logs are here https://drive.google.com/a/mirantis.com/file/d/0B6SjzarTGFxaWl8wYmllSVgxbzA/edit?usp=sharing

Revision history for this message

Dmitry Borodaenko (angdraug) wrote on 2014-07-14:

#9

/root/ceph.log on node-35:

2014-07-11 09:49:28,503 [node-35][INFO ] Running command: ceph-disk-prepare --fs-type xfs --cluster ceph -- /dev/sdb4
2014-07-11 09:49:31,873 [node-35][ERROR ] Traceback (most recent call last):
2014-07-11 09:49:31,874 [node-35][ERROR ] File "/usr/lib/python2.6/site-packages/ceph_deploy/osd.py", line 126, in prepare_disk
2014-07-11 09:49:31,932 [node-35][ERROR ] File "/usr/lib/python2.6/site-packages/ceph_deploy/util/decorators.py", line 10, in inner
2014-07-11 09:49:31,948 [node-35][ERROR ] def inner(*args, **kwargs):
2014-07-11 09:49:31,960 [node-35][ERROR ] File "/usr/lib/python2.6/site-packages/ceph_deploy/util/wrappers.py", line 6, in remote_call
2014-07-11 09:49:31,982 [node-35][ERROR ] This allows us to only remote-execute the actual calls, not whole functions.
2014-07-11 09:49:31,990 [node-35][ERROR ] File "/usr/lib64/python2.6/subprocess.py", line 505, in check_call
2014-07-11 09:49:32,003 [node-35][ERROR ] raise CalledProcessError(retcode, cmd)
2014-07-11 09:49:32,013 [node-35][ERROR ] CalledProcessError: Command '['ceph-disk-prepare', '--fs-type', 'xfs', '--cluster', 'ceph', '--', '/dev/sdb4']' returned non-zero exit status 1
2014-07-11 09:49:32,048 [node-35][ERROR ] mkfs.xfs: /dev/sdb4 contains a mounted filesystem
2014-07-11 09:49:32,048 [node-35][ERROR ] Usage: mkfs.xfs
...
2014-07-11 09:49:32,061 [node-35][ERROR ] ceph-disk: Error: Command '['/sbin/mkfs', '-t', 'xfs', '-f', '-i', 'size=2048', '--', '/dev/sdb4']' returned non-zero exit status 1
2014-07-11 09:49:32,080 [ceph_deploy.osd][ERROR ] Failed to execute command: ceph-disk-prepare --fs-type xfs --cluster ceph -- /dev/sdb4

Revision history for this message

Dmitry Borodaenko (angdraug) wrote on 2014-07-14:

#10

Possibly related bug:
https://bugs.launchpad.net/fuel/+bug/1335880

Dmitry Pyzhov (dpyzhov) on 2014-07-15

no longer affects:	fuel/5.1.x
Changed in fuel:
milestone:	5.0.1 → 5.1

Dmitry Ilyin (idv1985) on 2014-07-15

summary:

- ceph-deploy osd prepare node-5:/dev/vdb2 returned 1 instead of one of
- [0]
+ [library] ceph-deploy osd prepare node-5:/dev/vdb2 returned 1 instead of
+ one of [0]

Revision history for this message

Ryan Moe (rmoe) wrote on 2014-07-15:

#11

This seems to be an intermittent issue (I had to delete and redeploy the same node 6 times for this to show up). Sometimes when a node is deleted from the environment the individual partitions are not wiped.

This node had OSDs deployed on vdb and vdc and was then redployed with nothing allocated to those disks. After provisioning you can see that the previous partition data still exists.

[root@node-16 ~]# parted /dev/vdc print
Model: Virtio Block Device (virtblk)
Disk /dev/vdc: 32.2GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number Start End Size File system Name Flags
1 17.4kB 25.2MB 25.1MB primary bios_grub
2 25.2MB 235MB 210MB primary boot
3 235MB 445MB 210MB ext2 primary boot

[root@node-16 ~]# parted -a none -s /dev/vdc unit MiB mkpart primary 424 30580

[root@node-16 ~]# parted /dev/vdc print
Model: Virtio Block Device (virtblk)
Disk /dev/vdc: 32.2GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number Start End Size File system Name Flags
1 17.4kB 25.2MB 25.1MB primary bios_grub
2 25.2MB 235MB 210MB primary boot
3 235MB 445MB 210MB ext2 primary boot
4 445MB 32.1GB 31.6GB xfs primary

(the new partition has an XFS file system formatted on it)

[root@node-16 ~]# mount /dev/vdc4 /mnt
[root@node-16 ~]# ls /mnt/
activate.monmap active ceph_fsid current fsid journal keyring magic ready store_version superblock sysvinit whoami

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-07-17: Fix proposed to fuel-astute (master)

#12

Fix proposed to branch: master
Review: https://review.openstack.org/107793

Changed in fuel:
status:	New → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-07-17: Fix merged to fuel-astute (master)

#13

Reviewed: https://review.openstack.org/107793
Committed: https://git.openstack.org/cgit/stackforge/fuel-astute/commit/?id=9f1e69aa3a2fe7a6093fa50596d6931826d93a09
Submitter: Jenkins
Branch: master

commit 9f1e69aa3a2fe7a6093fa50596d6931826d93a09
Author: Ryan Moe <email address hidden>
Date: Thu Jul 17 11:30:29 2014 -0700

Wipe each partition for all disks when removing nodes

    If partitions are not wiped then old formatted filesystems
    will show up if a parititon is created at the same point as
    the previous one. This will cause ceph-deploy to fail when
    bringing up an OSD. Ceph will see an XFS filesystem and the
    correct partition GUID and auto-mount the partition prior to
    activating the OSD.

Change-Id: I2b71ec429ac1982f3df585423ca0818b294d8210
Closes-bug: #1323343

Changed in fuel:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-07-17: Fix proposed to fuel-astute (stable/5.0)

#14

Fix proposed to branch: stable/5.0
Review: https://review.openstack.org/107826

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-07-22: Fix merged to fuel-astute (stable/5.0)

#15

Reviewed: https://review.openstack.org/107826
Committed: https://git.openstack.org/cgit/stackforge/fuel-astute/commit/?id=6db5f5031b74e67b92fcac1f7998eaa296d68025
Submitter: Jenkins
Branch: stable/5.0

commit 6db5f5031b74e67b92fcac1f7998eaa296d68025
Author: Ryan Moe <email address hidden>
Date: Thu Jul 17 11:30:29 2014 -0700

Wipe each partition for all disks when removing nodes

    If partitions are not wiped then old formatted filesystems
    will show up if a parititon is created at the same point as
    the previous one. This will cause ceph-deploy to fail when
    bringing up an OSD. Ceph will see an XFS filesystem and the
    correct partition GUID and auto-mount the partition prior to
    activating the OSD.

    Change-Id: I2b71ec429ac1982f3df585423ca0818b294d8210
    Closes-bug: #1323343
    (cherry picked from commit 9f1e69aa3a2fe7a6093fa50596d6931826d93a09)

Revision history for this message

Anastasia Palkina (apalkina) wrote on 2014-07-23:

#16

Verified on ISO #347
"build_id": "2014-07-23_02-01-14",
"ostf_sha": "c1b60d4bcee7cd26823079a86e99f3f65414498e",
"build_number": "347",
"auth_required": false,
"api": "1.0",
"nailgun_sha": "f5775d6b7f5a3853b28096e8c502ace566e7041f",
"production": "docker",
"fuelmain_sha": "74b9200955201fe763526ceb51607592274929cd",
"astute_sha": "fd9b8e3b6f59b2727b1b037054f10e0dd7bd37f1",
"feature_groups": ["mirantis"],
"release": "5.1",
"fuellib_sha": "fb0e84c954a33c912584bf35054b60914d2a2360"

Changed in fuel:
status:	Fix Committed → Fix Released

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Fix Released	High	Ryan Moe	Fuel for OpenStack 5.1
	5.0.x	Fix Committed	High	Ryan Moe	Fuel for OpenStack 5.0.1

Fuel for OpenStack

[library] ceph-deploy osd prepare node-5:/dev/vdb2 returned 1 instead of one of [0]

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches