Clean-up of the first sector of a block device is not sufficient to fix #1783427 (pvcreate fails to create a pv due to HFS+ signature is detected)

Bug #1851049 reported by Nikolay Vinogradov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceph OSD Charm
Triaged
Medium
Unassigned

Bug Description

Reproduced again:

     2019-10-31 16:04:39 DEBUG secrets-storage-relation-changed partx: /dev/disk/by-dname/osddata5: failed to read partition table
     2019-10-31 16:04:39 DEBUG juju-log secrets-storage:253: get partitions: []
     2019-10-31 16:04:39 DEBUG secrets-storage-relation-changed Failed to find physical volume "/dev/bcache6".
     2019-10-31 16:04:39 WARNING juju-log secrets-storage:253: Support for use of upstream ``apt_pkg`` module in conjunctionwith charm-helpers is deprecated since 2019-06-25
     2019-10-31 16:04:40 DEBUG secrets-storage-relation-changed DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): 172.20.10.53
     2019-10-31 16:04:40 DEBUG secrets-storage-relation-changed DEBUG:urllib3.connectionpool:http://172.20.10.53:8200 "POST /v1/auth/approle/login HTTP/1.1" 200 500
     2019-10-31 16:04:40 DEBUG secrets-storage-relation-changed INFO:vaultlocker.dmcrypt:LUKS formatting /dev/disk/by-dname/osddata5 using UUID:861482d7-f7a7-4d97-83ee-95e4b892ed86
     2019-10-31 16:04:43 DEBUG secrets-storage-relation-changed INFO:vaultlocker.dmcrypt:udevadm trigger block/add for /dev/disk/by-dname/osddata5
     2019-10-31 16:04:43 DEBUG secrets-storage-relation-changed INFO:vaultlocker.dmcrypt:udevadm settle /dev/disk/by-uuid/861482d7-f7a7-4d97-83ee-95e4b892ed86
     2019-10-31 16:04:43 DEBUG secrets-storage-relation-changed DEBUG:urllib3.connectionpool:http://172.20.10.53:8200 "PUT /v1/charm-vaultlocker/ucs-4b-block-2/861482d7-f7a7-4d97-83ee-95e4b892ed86 HTTP/1.1" 204 0
     2019-10-31 16:04:43 DEBUG secrets-storage-relation-changed DEBUG:urllib3.connectionpool:http://172.20.10.53:8200 "GET /v1/charm-vaultlocker/ucs-4b-block-2/861482d7-f7a7-4d97-83ee-95e4b892ed86 HTTP/1.1" 200 866
     2019-10-31 16:04:43 DEBUG secrets-storage-relation-changed INFO:vaultlocker.dmcrypt:LUKS opening 861482d7-f7a7-4d97-83ee-95e4b892ed86
     2019-10-31 16:04:45 DEBUG secrets-storage-relation-changed INFO:root:Enabling systemd unit for vaultlocker-decrypt@861482d7-f7a7-4d97-83ee-95e4b892ed86.service
     2019-10-31 16:04:45 DEBUG secrets-storage-relation-changed Created symlink /etc/systemd/system/multi-user.target.wants/vaultlocker-decrypt@861482d7-f7a7-4d97-83ee-95e4b892ed86.service → /lib/systemd/system/vaultlocker-decrypt@.service.
     2019-10-31 16:04:45 DEBUG secrets-storage-relation-changed 1+0 records in
     2019-10-31 16:04:45 DEBUG secrets-storage-relation-changed 1+0 records out
     2019-10-31 16:04:45 DEBUG secrets-storage-relation-changed 512 bytes copied, 0.000446392 s, 1.1 MB/s
     2019-10-31 16:04:46 DEBUG secrets-storage-relation-changed Failed to find physical volume "/dev/mapper/crypt-861482d7-f7a7-4d97-83ee-95e4b892ed86".
     2019-10-31 16:04:45 DEBUG secrets-storage-relation-changed INFO:root:Enabling systemd unit for vaultlocker-decrypt@861482d7-f7a7-4d97-83ee-95e4b892ed86.service
     2019-10-31 16:04:45 DEBUG secrets-storage-relation-changed Created symlink /etc/systemd/system/multi-user.target.wants/vaultlocker-decrypt@861482d7-f7a7-4d97-83ee-95e4b892ed86.service → /lib/systemd/system/vaultlocker-decrypt@.service.
     2019-10-31 16:04:45 DEBUG secrets-storage-relation-changed 1+0 records in
     2019-10-31 16:04:45 DEBUG secrets-storage-relation-changed 1+0 records out
     2019-10-31 16:04:45 DEBUG secrets-storage-relation-changed 512 bytes copied, 0.000446392 s, 1.1 MB/s
     2019-10-31 16:04:46 DEBUG secrets-storage-relation-changed Failed to find physical volume "/dev/mapper/crypt-861482d7-f7a7-4d97-83ee-95e4b892ed86".
     2019-10-31 16:04:46 DEBUG secrets-storage-relation-changed WARNING: hfs signature detected on /dev/mapper/crypt-861482d7-f7a7-4d97-83ee-95e4b892ed86 at offset 1024. Wipe it? [y/n]: [n]
     2019-10-31 16:04:46 DEBUG secrets-storage-relation-changed Aborted wiping of hfs.
     2019-10-31 16:04:46 DEBUG secrets-storage-relation-changed 1 existing signature left on the device.
     2019-10-31 16:04:46 DEBUG secrets-storage-relation-changed Traceback (most recent call last):
     2019-10-31 16:04:46 DEBUG secrets-storage-relation-changed File "/var/lib/juju/agents/unit-ceph-osd-1/charm/hooks/secrets-storage-relation-changed", line 874, in <module>
     2019-10-31 16:04:46 DEBUG secrets-storage-relation-changed hooks.execute(sys.argv)
     2019-10-31 16:04:46 DEBUG secrets-storage-relation-changed File "/var/lib/juju/agents/unit-ceph-osd-1/charm/hooks/charmhelpers/core/hookenv.py", line 914, in execute
     2019-10-31 16:04:46 DEBUG secrets-storage-relation-changed self._hooks[hook_name]()
     2019-10-31 16:04:46 DEBUG secrets-storage-relation-changed File "/var/lib/juju/agents/unit-ceph-osd-1/charm/hooks/secrets-storage-relation-changed", line 776, in secrets_storage_changed
     2019-10-31 16:04:46 DEBUG secrets-storage-relation-changed prepare_disks_and_activate()
     2019-10-31 16:04:46 DEBUG secrets-storage-relation-changed File "/var/lib/juju/agents/unit-ceph-osd-1/charm/hooks/secrets-storage-relation-changed", line 546, in prepare_disks_and_activate
     2019-10-31 16:04:46 DEBUG secrets-storage-relation-changed config('osd-encrypt-keymanager'))
     2019-10-31 16:04:46 DEBUG secrets-storage-relation-changed File "lib/ceph/utils.py", line 1463, in osdize
     2019-10-31 16:04:46 DEBUG secrets-storage-relation-changed bluestore, key_manager)
     2019-10-31 16:04:46 DEBUG secrets-storage-relation-changed File "lib/ceph/utils.py", line 1538, in osdize_dev
     2019-10-31 16:04:46 DEBUG secrets-storage-relation-changed key_manager)
     2019-10-31 16:04:46 DEBUG secrets-storage-relation-changed File "lib/ceph/utils.py", line 1673, in _ceph_volume
     2019-10-31 16:04:46 DEBUG secrets-storage-relation-changed key_manager=key_manager))
     2019-10-31 16:04:46 DEBUG secrets-storage-relation-changed File "lib/ceph/utils.py", line 1923, in _allocate_logical_volume
     2019-10-31 16:04:46 DEBUG secrets-storage-relation-changed lvm.create_lvm_physical_volume(pv_dev)
     2019-10-31 16:04:46 DEBUG secrets-storage-relation-changed File "/var/lib/juju/agents/unit-ceph-osd-1/charm/hooks/charmhelpers/contrib/storage/linux/lvm.py", line 92, in create_lvm_physical_volume
     2019-10-31 16:04:46 DEBUG secrets-storage-relation-changed check_call(['pvcreate', block_device])
     2019-10-31 16:04:46 DEBUG secrets-storage-relation-changed File "/usr/lib/python3.6/subprocess.py", line 311, in check_call
     2019-10-31 16:04:46 DEBUG secrets-storage-relation-changed raise CalledProcessError(retcode, cmd)
     2019-10-31 16:04:46 DEBUG secrets-storage-relation-changed subprocess.CalledProcessError: Command '['pvcreate', '/dev/mapper/crypt-861482d7-f7a7-4d97-83ee-95e4b892ed86']' returned non-zero exit status 5.
     2019-10-31 16:04:46 ERROR juju.worker.uniter.operation runhook.go:132 hook "secrets-storage-relation-changed" failed: exit status 1

Revision history for this message
Nikolay Vinogradov (nikolay.vinogradov) wrote :

Charm version:
ubuntu@ucs-1a-infra-1:~$ juju export-bundle | grep ceph-osd
  ceph-osd:
    charm: cs:ceph-osd-294
...

Root cause: as reported by pvcreate the signature was found at offset 1024 (WARNING: hfs signature detected on /dev/mapper/crypt-861482d7-f7a7-4d97-83ee-95e4b892ed86 at offset 1024. Wipe it? [y/n]: [n]), but current solution only zeroes the first sector. wipefs would be more generic solution (if it is available on the system)

Revision history for this message
Nikolay Vinogradov (nikolay.vinogradov) wrote :
Download full text (3.4 KiB)

Hidden previous comment due to misleading assumption with LUKS header. Please disregard it.

Higher-level symptoms faced during clean re-deployment of ceph-osd: (note 7 OSDs in ceph-osd/1, though 8 of them were expected).
ubuntu@ucs-1a-infra-1:~$ juju status ceph-osd | grep ceph-osd
ceph-osd 12.2.12 active 3 ceph-osd jujucharms 294 ubuntu
ceph-osd/0 active idle 3 172.20.10.173 Unit is ready (8 OSD)
  telegraf/19 active idle 172.20.10.173 9103/tcp Monitoring ceph-osd/0
ceph-osd/1* active idle 4 172.20.10.109 Unit is ready (7 OSD)
  telegraf/16 active idle 172.20.10.109 9103/tcp Monitoring ceph-osd/1
ceph-osd/2 active idle 5 172.20.10.121 Unit is ready (8 OSD)
  telegraf/17 active idle 172.20.10.121 9103/tcp Monitoring ceph-osd/2

As the reproductions are rare and also only 1 disk of 1 unit failed with that error, I suppose that the problem might be connected with random garbage looks like HFS signature:

nickv@nickv-x1:~/test-luks$ fallocate -l 4G take1
nickv@nickv-x1:~/test-luks$ md5sum take1
c9a5a6878d97b48cc965c1e41859f034 take1
nickv@nickv-x1:~/test-luks$ fallocate -l 4G take2
nickv@nickv-x1:~/test-luks$ md5sum take2
c9a5a6878d97b48cc965c1e41859f034 take2
nickv@nickv-x1:~/test-luks$ sudo cryptsetup -y luksFormat take1
[sudo] password for nickv:

WARNING!
========
This will overwrite data on take1 irrevocably.

Are you sure? (Type uppercase yes): YES
Enter passphrase for take1:
Verify passphrase:
nickv@nickv-x1:~/test-luks$ sudo cryptsetup luksOpen take1 take1
Enter passphrase for take1:
nickv@nickv-x1:~/test-luks$ sudo cryptsetup -y luksFormat take2

WARNING!
========
This will overwrite data on take2 irrevocably.

Are you sure? (Type uppercase yes): YES
Enter passphrase for take2:
Verify passphrase:
nickv@nickv-x1:~/test-luks$ sudo cryptsetup luksOpen take2 take2
Enter passphrase for take2:
nickv@nickv-x1:~/test-luks$ sudo losetup | grep take
/dev/loop51 0 0 1 0 /home/nickv/test-luks/take2 0 512
/dev/loop46 0 0 1 0 /home/nickv/test-luks/take1 0 512
nickv@nickv-x1:~/test-luks$ sudo dd if=/dev/loop46 of=take-1-first-2-sectors bs=512 count=2
2+0 records in
2+0 records out
1024 bytes (1.0 kB, 1.0 KiB) copied, 0.00012089 s, 8.5 MB/s
nickv@nickv-x1:~/test-luks$ sudo dd if=/dev/loop51 of=take-2-first-2-sectors bs=512 count=2
2+0 records in
2+0 records out
1024 bytes (1.0 kB, 1.0 KiB) copied, 9.6519e-05 s, 10.6 MB/s
nickv@nickv-x1:~/test-luks$ md5sum take-1-first-2-sectors
6fe162881271a8ef82fcc27f70777615 take-1-first-2-sectors
nickv@nickv-x1:~/test-luks$ md5sum take-2-first-2-sectors
ab97acd35f059e5b160a254f6b04893c take-2-first-2-sectors

Despite both block devices were created equal 4G files of zeroes, we got different contents in first 2 sectors after decryption. So random garbage after decryption can potentially look like FS signature and it means that decpypted partition must be wiped after mounting it for the first time. The code that does this: https://opendev.org/openstack/charms.ceph/commit/31ffa13124edce3490437b73fee5f0432c5eb7f8 (https://opendev.org/openstack/charms.ceph/src/commit/31ffa13124edce3490437b73fee5f0432c5eb7f8...

Read more...

Changed in charm-ceph-osd:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Nikolay Vinogradov (nikolay.vinogradov) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.