Bug #1828617 “Hosts randomly 'losing' disks, breaking ceph-osd s...” : Bugs : ceph package : Ubuntu

Revision history for this message

Launchpad Janitor (janitor) wrote on 2019-05-13:

#1

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in systemd (Ubuntu):
status:	New → Confirmed

Revision history for this message

David A. Desrosiers (setuid) wrote on 2019-05-13:

#2

This manifests itself as the following, as reported by lsblk(1). Note the missing Ceph LVM volume on the 6th NVME disk:

$ cat sos_commands/block/lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 1.8T 0 disk
|-sda1 8:1 0 512M 0 part /boot/efi
`-sda2 8:2 0 1.8T 0 part
|-foobar--vg-root 253:0 0 1.8T 0 lvm /
`-foobar--vg-swap_1 253:1 0 976M 0 lvm [SWAP]
nvme0n1 259:0 0 1.8T 0 disk
`-ceph--c576f63e--dfd4--48f7--9d60--6a7708cbccf6-osd--block--9fdd78b2--0745--47ae--b8d4--04d9803ab448 253:6 0 1.8T 0 lvm
nvme1n1 259:1 0 1.8T 0 disk
`-ceph--6eb6565f--6392--44a8--9213--833b09f7c0bc-osd--block--a7d3629c--724f--4218--9d15--593ec64781da 253:5 0 1.8T 0 lvm
nvme2n1 259:2 0 1.8T 0 disk
`-ceph--c14f9ee5--90d0--4306--9b18--99576516f76a-osd--block--bbf5bc79--edea--4e43--8414--b5140b409397 253:4 0 1.8T 0 lvm
nvme3n1 259:3 0 1.8T 0 disk
`-ceph--a821146b--7674--4bcc--b5e9--0126c4bd5e3b-osd--block--b9371499--ff99--4d3e--ab3f--62ec3cf918c4 253:3 0 1.8T 0 lvm
nvme4n1 259:4 0 1.8T 0 disk
`-ceph--2e39f75a--5d2a--49ee--beb1--5d0a2991fd6c-osd--block--a1be083e--1fa7--4397--acfa--2ff3d3491572 253:2 0 1.8T 0 lvm
nvme5n1 259:5 0 1.8T 0 disk

This manifests itself as the following, as reported by lsblk(1). Note the missing Ceph LVM volume on the 6th NVME disk:

$ cat sos_commands/block/lsblk
NAME                                                                                                  MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                                                                                                     8:0    0  1.8T  0 disk 
|-sda1                                                                                                  8:1    0  512M  0 part /boot/efi
`-sda2                                                                                                  8:2    0  1.8T  0 part 
  |-foobar--vg-root                                                                              253:0    0  1.8T  0 lvm  /
  `-foobar--vg-swap_1                                                                            253:1    0  976M  0 lvm  [SWAP]
nvme0n1                                                                                               259:0    0  1.8T  0 disk 
`-ceph--c576f63e--dfd4--48f7--9d60--6a7708cbccf6-osd--block--9fdd78b2--0745--47ae--b8d4--04d9803ab448 253:6    0  1.8T  0 lvm  
nvme1n1                                                                                               259:1    0  1.8T  0 disk 
`-ceph--6eb6565f--6392--44a8--9213--833b09f7c0bc-osd--block--a7d3629c--724f--4218--9d15--593ec64781da 253:5    0  1.8T  0 lvm  
nvme2n1                                                                                               259:2    0  1.8T  0 disk 
`-ceph--c14f9ee5--90d0--4306--9b18--99576516f76a-osd--block--bbf5bc79--edea--4e43--8414--b5140b409397 253:4    0  1.8T  0 lvm  
nvme3n1                                                                                               259:3    0  1.8T  0 disk 
`-ceph--a821146b--7674--4bcc--b5e9--0126c4bd5e3b-osd--block--b9371499--ff99--4d3e--ab3f--62ec3cf918c4 253:3    0  1.8T  0 lvm  
nvme4n1                                                                                               259:4    0  1.8T  0 disk 
`-ceph--2e39f75a--5d2a--49ee--beb1--5d0a2991fd6c-osd--block--a1be083e--1fa7--4397--acfa--2ff3d3491572 253:2    0  1.8T  0 lvm  
nvme5n1                                                                                               259:5    0  1.8T  0 disk

Xav Paice (xavpaice) on 2019-05-22

tags:

added: canonical-bootstack

Revision history for this message

Xav Paice (xavpaice) wrote on 2019-05-22:

#3

I'm seeing this in a slightly different manner, on Bionic/Queens.

We have LVMs encrypted (thanks Vault), and rebooting a host results in at least one OSD not returning fairly consistently. The LVs appear in the list, however the difference between a working and a non-working OSD is the lack of links to block.db and block.wal on a non-working OSD.

See https://pastebin.canonical.com/p/rW3VgMMkmY/ for some info.

If I made the links manually:

cd /var/lib/ceph/osd/ceph-4
ln -s /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-db-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 block.db
ln -s /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-wal-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 block.wal

This resulted in a perms error accessing the device "bluestore(/var/lib/ceph/osd/ceph-4) _open_db /var/lib/ceph/osd/ceph-4/block.db symlink exists but target unusable: (13) Permission denied"

ls -l /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/
total 0
lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-db-053e000a-76ed-427e-98b3-e5373e263f2d -> ../dm-20
lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-db-12e68fcb-d2b6-459f-97f2-d3eb4e28c75e -> ../dm-24
lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-db-33de740d-bd8c-4b47-a601-3e6e634e489a -> ../dm-14
lrwxrwxrwx 1 root root 8 May 22 23:04 osd-db-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 -> ../dm-12
lrwxrwxrwx 1 root root 8 May 22 23:04 osd-db-c2669da2-63aa-42e2-b049-cf00a478e076 -> ../dm-22
lrwxrwxrwx 1 root root 8 May 22 23:04 osd-db-d38a7e91-cf06-4607-abbe-53eac89ac5ea -> ../dm-18
lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-db-eb5270dc-1110-420f-947e-aab7fae299c9 -> ../dm-16
lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-wal-053e000a-76ed-427e-98b3-e5373e263f2d -> ../dm-19
lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-wal-12e68fcb-d2b6-459f-97f2-d3eb4e28c75e -> ../dm-23
lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-wal-33de740d-bd8c-4b47-a601-3e6e634e489a -> ../dm-13
lrwxrwxrwx 1 root root 8 May 22 23:04 osd-wal-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 -> ../dm-11
lrwxrwxrwx 1 root root 8 May 22 23:04 osd-wal-c2669da2-63aa-42e2-b049-cf00a478e076 -> ../dm-21
lrwxrwxrwx 1 root root 8 May 22 23:04 osd-wal-d38a7e91-cf06-4607-abbe-53eac89ac5ea -> ../dm-17
lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-wal-eb5270dc-1110-420f-947e-aab7fae299c9 -> ../dm-15

I tried to change the perms to ceph.ceph ownership, but no change.

I have also tried (using `systemctl edit lvm2-monitor.service`) adding the following to lvm2, but that's not changed the behavior either:

# cat /etc/systemd/system/lvm2-monitor.service.d/override.conf
[Service]
ExecStartPre=/bin/sleep 60

I'm seeing this in a slightly different manner, on Bionic/Queens.

We have LVMs encrypted (thanks Vault), and rebooting a host results in at least one OSD not returning fairly consistently.  The LVs appear in the list, however the difference between a working and a non-working OSD is the lack of links to block.db and block.wal on a non-working OSD.

See https://pastebin.canonical.com/p/rW3VgMMkmY/ for some info.

If I made the links manually:

cd /var/lib/ceph/osd/ceph-4
ln -s /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-db-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 block.db
ln -s /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-wal-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 block.wal

This resulted in a perms error accessing the device "bluestore(/var/lib/ceph/osd/ceph-4) _open_db /var/lib/ceph/osd/ceph-4/block.db symlink exists but target unusable: (13) Permission denied"

ls -l /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/
total 0
lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-db-053e000a-76ed-427e-98b3-e5373e263f2d -> ../dm-20
lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-db-12e68fcb-d2b6-459f-97f2-d3eb4e28c75e -> ../dm-24
lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-db-33de740d-bd8c-4b47-a601-3e6e634e489a -> ../dm-14
lrwxrwxrwx 1 root root 8 May 22 23:04 osd-db-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 -> ../dm-12
lrwxrwxrwx 1 root root 8 May 22 23:04 osd-db-c2669da2-63aa-42e2-b049-cf00a478e076 -> ../dm-22
lrwxrwxrwx 1 root root 8 May 22 23:04 osd-db-d38a7e91-cf06-4607-abbe-53eac89ac5ea -> ../dm-18
lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-db-eb5270dc-1110-420f-947e-aab7fae299c9 -> ../dm-16
lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-wal-053e000a-76ed-427e-98b3-e5373e263f2d -> ../dm-19
lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-wal-12e68fcb-d2b6-459f-97f2-d3eb4e28c75e -> ../dm-23
lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-wal-33de740d-bd8c-4b47-a601-3e6e634e489a -> ../dm-13
lrwxrwxrwx 1 root root 8 May 22 23:04 osd-wal-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 -> ../dm-11
lrwxrwxrwx 1 root root 8 May 22 23:04 osd-wal-c2669da2-63aa-42e2-b049-cf00a478e076 -> ../dm-21
lrwxrwxrwx 1 root root 8 May 22 23:04 osd-wal-d38a7e91-cf06-4607-abbe-53eac89ac5ea -> ../dm-17
lrwxrwxrwx 1 ceph ceph 8 May 22 23:04 osd-wal-eb5270dc-1110-420f-947e-aab7fae299c9 -> ../dm-15

I tried to change the perms to ceph.ceph ownership, but no change.

I have also tried (using `systemctl edit lvm2-monitor.service`) adding the following to lvm2, but that's not changed the behavior either:

# cat /etc/systemd/system/lvm2-monitor.service.d/override.conf 
[Service]
ExecStartPre=/bin/sleep 60

Revision history for this message

Xav Paice (xavpaice) wrote on 2019-05-22:

#4

Added field-critical, there's a cloud deploy ongoing where I currently can't reboot any hosts, nor get some of the OSDs back from a host I rebooted, until we have a workaround.

Revision history for this message

Xav Paice (xavpaice) wrote on 2019-05-23:

#5

Just one update, if I change the perms of the symlink made (chown -h) the OSD will actually start.

After rebooting, however, I found that the links I had made had gone again and the whole process needed repeating in order to start the OSD.

Revision history for this message

Steve Langasek (vorlon) wrote on 2019-05-28:

#6

> LVM module is supposed to create PVs from devices using the links in /dev/disk/by-dname/
> folder that are created by udev.

Created by udev how? disk/by-dname is not part of the hierarchy that is populated by the standard udev rules, nor is this created by lvm2. Is there something in the ceph-osd packaging specifically which generates these links - and, in turn, depends on them for assembling LVs?

Can you provide udev logs (journalctl --no-pager -lu systemd-udevd.service; udevadm info -e) from the system following a boot when this race is hit?

Changed in systemd (Ubuntu):
status:	Confirmed → Incomplete

Revision history for this message

Andrey Grebennikov (agrebennikov) wrote on 2019-05-28:

#7

Steve,
This is MAAS who creates these udev rules. We requested this feature to be implemented in order to be able to use persistent names in further services configuration (using templating). We couldn't go with /dev/sdX names as they may change after the reboot, and can't use wwn names as they are unique per node and don't allow us to use templates with FCB.

Revision history for this message

James Page (james-page) wrote on 2019-05-28:

#8

by-dname udev rules are created by MAAS/curtin as part of the server install I think.

Revision history for this message

James Page (james-page) wrote on 2019-05-28:

#9

The ceph-osd package provide udev rules which should switch the owner for all ceph related LVM VG's to ceph:ceph.

# OSD LVM layout example
# VG prefix: ceph-
# LV prefix: osd-
ACTION=="add", SUBSYSTEM=="block", \
  ENV{DEVTYPE}=="disk", \
  ENV{DM_LV_NAME}=="osd-*", \
  ENV{DM_VG_NAME}=="ceph-*", \
  OWNER:="ceph", GROUP:="ceph", MODE:="660"
ACTION=="change", SUBSYSTEM=="block", \
  ENV{DEVTYPE}=="disk", \
  ENV{DM_LV_NAME}=="osd-*", \
  ENV{DM_VG_NAME}=="ceph-*", \
  OWNER="ceph", GROUP="ceph", MODE="660"

Revision history for this message

Corey Bryant (corey.bryant) wrote on 2019-05-28:

#10

This feels similar to https://bugs.launchpad.net/charm-ceph-osd/+bug/1812925. First question, are you running with the latest stable charms which have the fix for that bug?

Revision history for this message

James Page (james-page) wrote on 2019-05-28:

#11

Please can you confirm which version of the ceph-osd package you have installed; older versions rely on a charm shipped udev ruleset, rather than it being provided by the packaging.

Revision history for this message

Andrey Grebennikov (agrebennikov) wrote on 2019-05-28:

#12

Yes, it is latest - the cluster is being re-deployed as part of Bootstack handover.

Corey,
The bug you point to is fixing the sequence of ceph/udev. Here however udev can't create any devices as they don't exist at the moment of udev run seems so - when the host boots and settles down - there is no PVs exist at all.

Revision history for this message

Corey Bryant (corey.bryant) wrote on 2019-05-28:

#13

Andrey, I don't know if you saw James' comment as yours may have coincided but if you can get the ceph-osd package version that would be helpful. Thanks!

Revision history for this message

Xav Paice (xavpaice) wrote on 2019-05-28:

#14

Charm is cs:ceph-osd-284
Ceph version is 12.2.11-0ubuntu0.18.04.2

The udev rules are created by curtin during the maas install.

Here's an example udev rule:

cat bcache4.rules

# Written by curtin
SUBSYSTEM=="block", ACTION=="add|change", ENV{CACHED_UUID}=="7b0e872b-ac78-4c4e-af18-8ccdce5962f6", SYMLINK+="disk/by-dname/bcache4"

The problem here is that when the host boots, for some OSDs (random, changes each boot), there's no symlinks for block.db and block.wal in /var/lib/ceph/osd/ceph-${thing}. If I manually create those two symlinks (and make sure the perms are right for the links themselves), then the OSD starts.

Some of the OSDs do get those links though, and that's interesting because on these hosts, the ceph wal and db for all the OSDs are LVs on the same nvme device, in fact the same partition even. The ceph OSD block dev is an LV on a different device.

Changed in systemd (Ubuntu):
status:	Incomplete → New

Revision history for this message

Xav Paice (xavpaice) wrote on 2019-05-28:

#15

1828617-1.out Edit (27.9 KiB, text/plain)

Download full text (11.3 KiB)

journalctl --no-pager -lu systemd-udevd.service >/tmp/1828617-1.out

Hostname obfusticated

lsblk:

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 88.4M 1 loop /snap/core/6964
loop1 7:1 0 89.4M 1 loop /snap/core/6818
loop2 7:2 0 8.4M 1 loop /snap/canonical-livepatch/77
sda 8:0 0 1.8T 0 disk
├─sda1 8:1 0 476M 0 part /boot/efi
├─sda2 8:2 0 3.7G 0 part /boot
└─sda3 8:3 0 1.7T 0 part
  └─bcache7 252:896 0 1.7T 0 disk /
sdb 8:16 0 1.8T 0 disk
└─bcache0 252:0 0 1.8T 0 disk
sdc 8:32 0 1.8T 0 disk
└─bcache6 252:768 0 1.8T 0 disk
  └─crypt-7478edfc-f321-40a2-a105-8e8a2c8ca3f6 253:0 0 1.8T 0 crypt
    └─ceph--7478edfc--f321--40a2--a105--8e8a2c8ca3f6-osd--block--7478edfc--f321--40a2--a105--8e8a2c8ca3f6 253:2 0 1.8T 0 lvm
sdd 8:48 0 1.8T 0 disk
└─bcache4 252:512 0 1.8T 0 disk
  └─crypt-33de740d-bd8c-4b47-a601-3e6e634e489a 253:4 0 1.8T 0 crypt
    └─ceph--33de740d--bd8c--4b47--a601--3e6e634e489a-osd--block--33de740d--bd8c--4b47--a601--3e6e634e489a 253:5 0 1.8T 0 lvm
sde 8:64 0 1.8T 0 disk
└─bcache3 252:384 0 1.8T 0 disk
  └─crypt-eb5270dc-1110-420f-947e-aab7fae299c9 253:1 ...

journalctl --no-pager -lu systemd-udevd.service >/tmp/1828617-1.out

Hostname obfusticated

lsblk:

NAME                                                                                                         MAJ:MIN  RM   SIZE RO TYPE  MOUNTPOINT
loop0                                                                                                          7:0     0  88.4M  1 loop  /snap/core/6964
loop1                                                                                                          7:1     0  89.4M  1 loop  /snap/core/6818
loop2                                                                                                          7:2     0   8.4M  1 loop  /snap/canonical-livepatch/77
sda                                                                                                            8:0     0   1.8T  0 disk  
├─sda1                                                                                                         8:1     0   476M  0 part  /boot/efi
├─sda2                                                                                                         8:2     0   3.7G  0 part  /boot
└─sda3                                                                                                         8:3     0   1.7T  0 part  
  └─bcache7                                                                                                  252:896   0   1.7T  0 disk  /
sdb                                                                                                            8:16    0   1.8T  0 disk  
└─bcache0                                                                                                    252:0     0   1.8T  0 disk  
sdc                                                                                                            8:32    0   1.8T  0 disk  
└─bcache6                                                                                                    252:768   0   1.8T  0 disk  
  └─crypt-7478edfc-f321-40a2-a105-8e8a2c8ca3f6                                                               253:0     0   1.8T  0 crypt 
    └─ceph--7478edfc--f321--40a2--a105--8e8a2c8ca3f6-osd--block--7478edfc--f321--40a2--a105--8e8a2c8ca3f6    253:2     0   1.8T  0 lvm   
sdd                                                                                                            8:48    0   1.8T  0 disk  
└─bcache4                                                                                                    252:512   0   1.8T  0 disk  
  └─crypt-33de740d-bd8c-4b47-a601-3e6e634e489a                                                               253:4     0   1.8T  0 crypt 
    └─ceph--33de740d--bd8c--4b47--a601--3e6e634e489a-osd--block--33de740d--bd8c--4b47--a601--3e6e634e489a    253:5     0   1.8T  0 lvm   
sde                                                                                                            8:64    0   1.8T  0 disk  
└─bcache3                                                                                                    252:384   0   1.8T  0 disk  
  └─crypt-eb5270dc-1110-420f-947e-aab7fae299c9                                                               253:1     0   1.8T  0 crypt 
    └─ceph--eb5270dc--1110--420f--947e--aab7fae299c9-osd--block--eb5270dc--1110--420f--947e--aab7fae299c9    253:3     0   1.8T  0 lvm   
sdf                                                                                                            8:80    0   1.8T  0 disk  
└─bcache1                                                                                                    252:128   0   1.8T  0 disk  
  └─crypt-d38a7e91-cf06-4607-abbe-53eac89ac5ea                                                               253:6     0   1.8T  0 crypt 
    └─ceph--d38a7e91--cf06--4607--abbe--53eac89ac5ea-osd--block--d38a7e91--cf06--4607--abbe--53eac89ac5ea    253:7     0   1.8T  0 lvm   
sdg                                                                                                            8:96    0   1.8T  0 disk  
└─bcache5                                                                                                    252:640   0   1.8T  0 disk  
  └─crypt-053e000a-76ed-427e-98b3-e5373e263f2d                                                               253:8     0   1.8T  0 crypt 
    └─ceph--053e000a--76ed--427e--98b3--e5373e263f2d-osd--block--053e000a--76ed--427e--98b3--e5373e263f2d    253:9     0   1.8T  0 lvm   
sdh                                                                                                            8:112   0   1.8T  0 disk  
└─bcache8                                                                                                    252:1024  0   1.8T  0 disk  
  └─crypt-c2669da2-63aa-42e2-b049-cf00a478e076                                                               253:25    0   1.8T  0 crypt 
    └─ceph--c2669da2--63aa--42e2--b049--cf00a478e076-osd--block--c2669da2--63aa--42e2--b049--cf00a478e076    253:28    0   1.8T  0 lvm   
sdi                                                                                                            8:128   0   1.8T  0 disk  
└─bcache2                                                                                                    252:256   0   1.8T  0 disk  
  └─crypt-12e68fcb-d2b6-459f-97f2-d3eb4e28c75e                                                               253:26    0   1.8T  0 crypt 
    └─ceph--12e68fcb--d2b6--459f--97f2--d3eb4e28c75e-osd--block--12e68fcb--d2b6--459f--97f2--d3eb4e28c75e    253:27    0   1.8T  0 lvm   
sdj                                                                                                            8:144   0 223.5G  0 disk  
nvme0n1                                                                                                      259:1     0   1.5T  0 disk  
├─nvme0n1p1                                                                                                  259:2     0  93.1G  0 part  
│ └─bcache7                                                                                                  252:896   0   1.7T  0 disk  /
├─nvme0n1p2                                                                                                  259:3     0 312.9G  0 part  
│ └─crypt-5b0d2371-6d71-4685-9d06-f28678f81633                                                               253:10    0 312.9G  0 crypt 
│   ├─ceph--wal--4de27554--2d05--440e--874a--9921dfc6f47e-osd--wal--7478edfc--f321--40a2--a105--8e8a2c8ca3f6 253:11    0   1.9G  0 lvm   
│   ├─ceph--wal--4de27554--2d05--440e--874a--9921dfc6f47e-osd--db--7478edfc--f321--40a2--a105--8e8a2c8ca3f6  253:12    0  36.3G  0 lvm   
│   ├─ceph--wal--4de27554--2d05--440e--874a--9921dfc6f47e-osd--wal--33de740d--bd8c--4b47--a601--3e6e634e489a 253:13    0   1.9G  0 lvm   
│   ├─ceph--wal--4de27554--2d05--440e--874a--9921dfc6f47e-osd--db--33de740d--bd8c--4b47--a601--3e6e634e489a  253:14    0  36.3G  0 lvm   
│   ├─ceph--wal--4de27554--2d05--440e--874a--9921dfc6f47e-osd--wal--eb5270dc--1110--420f--947e--aab7fae299c9 253:15    0   1.9G  0 lvm   
│   ├─ceph--wal--4de27554--2d05--440e--874a--9921dfc6f47e-osd--db--eb5270dc--1110--420f--947e--aab7fae299c9  253:16    0  36.3G  0 lvm   
│   ├─ceph--wal--4de27554--2d05--440e--874a--9921dfc6f47e-osd--wal--d38a7e91--cf06--4607--abbe--53eac89ac5ea 253:17    0   1.9G  0 lvm   
│   ├─ceph--wal--4de27554--2d05--440e--874a--9921dfc6f47e-osd--db--d38a7e91--cf06--4607--abbe--53eac89ac5ea  253:18    0  36.3G  0 lvm   
│   ├─ceph--wal--4de27554--2d05--440e--874a--9921dfc6f47e-osd--wal--053e000a--76ed--427e--98b3--e5373e263f2d 253:19    0   1.9G  0 lvm   
│   ├─ceph--wal--4de27554--2d05--440e--874a--9921dfc6f47e-osd--db--053e000a--76ed--427e--98b3--e5373e263f2d  253:20    0  36.3G  0 lvm   
│   ├─ceph--wal--4de27554--2d05--440e--874a--9921dfc6f47e-osd--wal--c2669da2--63aa--42e2--b049--cf00a478e076 253:21    0   1.9G  0 lvm   
│   ├─ceph--wal--4de27554--2d05--440e--874a--9921dfc6f47e-osd--db--c2669da2--63aa--42e2--b049--cf00a478e076  253:22    0  36.3G  0 lvm   
│   ├─ceph--wal--4de27554--2d05--440e--874a--9921dfc6f47e-osd--wal--12e68fcb--d2b6--459f--97f2--d3eb4e28c75e 253:23    0   1.9G  0 lvm   
│   └─ceph--wal--4de27554--2d05--440e--874a--9921dfc6f47e-osd--db--12e68fcb--d2b6--459f--97f2--d3eb4e28c75e  253:24    0  36.3G  0 lvm   
└─nvme0n1p3                                                                                                  259:4     0   1.1T  0 part  
  ├─bcache0                                                                                                  252:0     0   1.8T  0 disk  
  ├─bcache1                                                                                                  252:128   0   1.8T  0 disk  
  │ └─crypt-d38a7e91-cf06-4607-abbe-53eac89ac5ea                                                             253:6     0   1.8T  0 crypt 
  │   └─ceph--d38a7e91--cf06--4607--abbe--53eac89ac5ea-osd--block--d38a7e91--cf06--4607--abbe--53eac89ac5ea  253:7     0   1.8T  0 lvm   
  ├─bcache2                                                                                                  252:256   0   1.8T  0 disk  
  │ └─crypt-12e68fcb-d2b6-459f-97f2-d3eb4e28c75e                                                             253:26    0   1.8T  0 crypt 
  │   └─ceph--12e68fcb--d2b6--459f--97f2--d3eb4e28c75e-osd--block--12e68fcb--d2b6--459f--97f2--d3eb4e28c75e  253:27    0   1.8T  0 lvm   
  ├─bcache3                                                                                                  252:384   0   1.8T  0 disk  
  │ └─crypt-eb5270dc-1110-420f-947e-aab7fae299c9                                                             253:1     0   1.8T  0 crypt 
  │   └─ceph--eb5270dc--1110--420f--947e--aab7fae299c9-osd--block--eb5270dc--1110--420f--947e--aab7fae299c9  253:3     0   1.8T  0 lvm   
  ├─bcache4                                                                                                  252:512   0   1.8T  0 disk  
  │ └─crypt-33de740d-bd8c-4b47-a601-3e6e634e489a                                                             253:4     0   1.8T  0 crypt 
  │   └─ceph--33de740d--bd8c--4b47--a601--3e6e634e489a-osd--block--33de740d--bd8c--4b47--a601--3e6e634e489a  253:5     0   1.8T  0 lvm   
  ├─bcache5                                                                                                  252:640   0   1.8T  0 disk  
  │ └─crypt-053e000a-76ed-427e-98b3-e5373e263f2d                                                             253:8     0   1.8T  0 crypt 
  │   └─ceph--053e000a--76ed--427e--98b3--e5373e263f2d-osd--block--053e000a--76ed--427e--98b3--e5373e263f2d  253:9     0   1.8T  0 lvm   
  ├─bcache6                                                                                                  252:768   0   1.8T  0 disk  
  │ └─crypt-7478edfc-f321-40a2-a105-8e8a2c8ca3f6                                                             253:0     0   1.8T  0 crypt 
  │   └─ceph--7478edfc--f321--40a2--a105--8e8a2c8ca3f6-osd--block--7478edfc--f321--40a2--a105--8e8a2c8ca3f6  253:2     0   1.8T  0 lvm   
  └─bcache8                                                                                                  252:1024  0   1.8T  0 disk  
    └─crypt-c2669da2-63aa-42e2-b049-cf00a478e076                                                             253:25    0   1.8T  0 crypt 
      └─ceph--c2669da2--63aa--42e2--b049--cf00a478e076-osd--block--c2669da2--63aa--42e2--b049--cf00a478e076  253:28    0   1.8T  0 lvm

Currently osd.11 and osd.24 are down.

Revision history for this message

Xav Paice (xavpaice) wrote on 2019-05-28:

#16

1828617-2.out Edit (561.0 KiB, text/plain)

Download full text (4.7 KiB)

udevadm info -e >/tmp/1828617-2.out

~# ls -l /var/lib/ceph/osd/ceph*
-rw------- 1 ceph ceph 69 May 21 08:44 /var/lib/ceph/osd/ceph.client.osd-upgrade.keyring

/var/lib/ceph/osd/ceph-11:
total 24
lrwxrwxrwx 1 ceph ceph 93 May 28 22:12 block -> /dev/ceph-33de740d-bd8c-4b47-a601-3e6e634e489a/osd-block-33de740d-bd8c-4b47-a601-3e6e634e489a
-rw------- 1 ceph ceph 37 May 28 22:12 ceph_fsid
-rw------- 1 ceph ceph 37 May 28 22:12 fsid
-rw------- 1 ceph ceph 56 May 28 22:12 keyring
-rw------- 1 ceph ceph 6 May 28 22:12 ready
-rw------- 1 ceph ceph 10 May 28 22:12 type
-rw------- 1 ceph ceph 3 May 28 22:12 whoami

/var/lib/ceph/osd/ceph-18:
total 24
lrwxrwxrwx 1 ceph ceph 93 May 28 22:12 block -> /dev/ceph-eb5270dc-1110-420f-947e-aab7fae299c9/osd-block-eb5270dc-1110-420f-947e-aab7fae299c9
lrwxrwxrwx 1 ceph ceph 94 May 28 22:12 block.db -> /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-db-eb5270dc-1110-420f-947e-aab7fae299c9
lrwxrwxrwx 1 ceph ceph 95 May 28 22:12 block.wal -> /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-wal-eb5270dc-1110-420f-947e-aab7fae299c9
-rw------- 1 ceph ceph 37 May 28 22:12 ceph_fsid
-rw------- 1 ceph ceph 37 May 28 22:12 fsid
-rw------- 1 ceph ceph 56 May 28 22:12 keyring
-rw------- 1 ceph ceph 6 May 28 22:12 ready
-rw------- 1 ceph ceph 10 May 28 22:12 type
-rw------- 1 ceph ceph 3 May 28 22:12 whoami

/var/lib/ceph/osd/ceph-24:
total 24
lrwxrwxrwx 1 ceph ceph 93 May 28 22:12 block -> /dev/ceph-d38a7e91-cf06-4607-abbe-53eac89ac5ea/osd-block-d38a7e91-cf06-4607-abbe-53eac89ac5ea
-rw------- 1 ceph ceph 37 May 28 22:12 ceph_fsid
-rw------- 1 ceph ceph 37 May 28 22:12 fsid
-rw------- 1 ceph ceph 56 May 28 22:12 keyring
-rw------- 1 ceph ceph 6 May 28 22:12 ready
-rw------- 1 ceph ceph 10 May 28 22:12 type
-rw------- 1 ceph ceph 3 May 28 22:12 whoami

/var/lib/ceph/osd/ceph-31:
total 24
lrwxrwxrwx 1 ceph ceph 93 May 28 22:12 block -> /dev/ceph-053e000a-76ed-427e-98b3-e5373e263f2d/osd-block-053e000a-76ed-427e-98b3-e5373e263f2d
lrwxrwxrwx 1 ceph ceph 94 May 28 22:12 block.db -> /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-db-053e000a-76ed-427e-98b3-e5373e263f2d
lrwxrwxrwx 1 ceph ceph 95 May 28 22:12 block.wal -> /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-wal-053e000a-76ed-427e-98b3-e5373e263f2d
-rw------- 1 ceph ceph 37 May 28 22:12 ceph_fsid
-rw------- 1 ceph ceph 37 May 28 22:12 fsid
-rw------- 1 ceph ceph 56 May 28 22:12 keyring
-rw------- 1 ceph ceph 6 May 28 22:12 ready
-rw------- 1 ceph ceph 10 May 28 22:12 type
-rw------- 1 ceph ceph 3 May 28 22:12 whoami

/var/lib/ceph/osd/ceph-38:
total 24
lrwxrwxrwx 1 ceph ceph 93 May 28 22:12 block -> /dev/ceph-c2669da2-63aa-42e2-b049-cf00a478e076/osd-block-c2669da2-63aa-42e2-b049-cf00a478e076
lrwxrwxrwx 1 ceph ceph 94 May 28 22:12 block.db -> /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-db-c2669da2-63aa-42e2-b049-cf00a478e076
lrwxrwxrwx 1 ceph ceph 95 May 28 22:12 block.wal -> /dev/ceph-wal-4de27554-2d05-440e-874a-9921dfc6f47e/osd-wal-c2669da2-63aa-42e2-b049-cf00a478e076
-rw------- 1 ceph ceph 37 May 28 22:12 ceph_fsid
-rw------- 1 ceph ceph 37 May 28 22:12 fsid
-rw------- 1 ceph ceph 56 May 28 22:12 keyring
-rw------- 1 ceph ceph ...

I didn't recreate this but I did get a deployment on serverstack with bluestore WAL and DB devices. That's done with:

1) juju deploy --series bionic --num-units 1 --constraints mem=2G --config expected-osd-count=1 --config monitor-count=1 cs:ceph-mon ceph-mon

2) juju deploy --series bionic --num-units 1 --constraints mem=2G --storage osd-devices=cinder,10G --storage bluestore-wal=cinder,1G --storage bluestore-db=cinder,1G cs:ceph-osd ceph-osd

3) juju add-relation ceph-osd ceph-mon

James Page mentioned taking a look at the systemd bits.

ceph-osd systemd unit
---------------------
/lib/systemd/system/ceph-osd@.service calls:
ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i

Where /usr/lib/ceph/ceph-osd-prestart.sh has some logic that exits with an error code when certain things aren't ready. I think we might be able to add something in there. For example it currently has:

data="/var/lib/ceph/osd/${cluster:-ceph}-$id"

if [ -L "$journal" -a ! -e "$journal" ]; then
    udevadm settle --timeout=5 || :
    if [ -L "$journal" -a ! -e "$journal" ]; then
        echo "ceph-osd(${cluster:-ceph}-$id): journal not present, not starting yet." 1>&2
        exit 0
    fi
fi

The 'udevadm settle' watches the udev event queue and exists if all current events are handled or if it's been 5 seconds. Perhaps we can do something similar for this issue.

Here's what I see in /var/log/ceph/ceph-osd.0.log during a system reboot:
-------------------------------------------------------------------------
2019-05-29 19:04:25.800237 7fa6940d1700  1 freelist shutdown
...
2019-05-29 19:04:25.800548 7fa6940d1700  1 bdev(0x557eca7a1680 /var/lib/ceph/osd/ceph-0/block.wal) close
2019-05-29 19:04:26.079227 7fa6940d1700  1 bdev(0x557eca7a1200 /var/lib/ceph/osd/ceph-0/block.db) close
2019-05-29 19:04:26.266085 7fa6940d1700  1 bdev(0x557eca7a1440 /var/lib/ceph/osd/ceph-0/block) close
2019-05-29 19:04:26.474086 7fa6940d1700  1 bdev(0x557eca7a0fc0 /var/lib/ceph/osd/ceph-0/block) close
...
2019-05-29 19:04:53.601570 7fdd2ec17e40  1 bdev create path /var/lib/ceph/osd/ceph-0/block.db type kernel
2019-05-29 19:04:53.601581 7fdd2ec17e40  1 bdev(0x561e50583200 /var/lib/ceph/osd/ceph-0/block.db) open path /var/lib/ceph/osd/ceph-0/block.db
2019-05-29 19:04:53.601855 7fdd2ec17e40  1 bdev(0x561e50583200 /var/lib/ceph/osd/ceph-0/block.db) open size 1073741824 (0x40000000, 1GiB) block_size 4096 (4KiB) rotational
2019-05-29 19:04:53.601867 7fdd2ec17e40  1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-0/block.db size 1GiB
2019-05-29 19:04:53.602131 7fdd2ec17e40  1 bdev create path /var/lib/ceph/osd/ceph-0/block type kernel
2019-05-29 19:04:53.602143 7fdd2ec17e40  1 bdev(0x561e50583440 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block
2019-05-29 19:04:53.602464 7fdd2ec17e40  1 bdev(0x561e50583440 /var/lib/ceph/osd/ceph-0/block) open size 10733223936 (0x27fc00000, 10.0GiB) block_size 4096 (4KiB) rotational
2019-05-29 19:04:53.602480 7fdd2ec17e40  1 bluefs add_block_device bdev 2 path /var/lib/ceph/osd/ceph-0/block size 10.0GiB
2019-05-29 19:04:53.602499 7fdd2ec17e40  1 bdev create path /var/lib/ceph/osd/ceph-0/block.wal type kernel
2019-05-29 19:04:53.602502 7fdd2ec17e40  1 bdev(0x561e50583680 /var/lib/ceph/osd/ceph-0/block.wal) open path /var/lib/ceph/osd/ceph-0/block.wal
2019-05-29 19:04:53.602709 7fdd2ec17e40  1 bdev(0x561e50583680 /var/lib/ceph/osd/ceph-0/block.wal) open size 100663296 (0x6000000, 96MiB) block_size 4096 (4KiB) rotational
2019-05-29 19:04:53.602717 7fdd2ec17e40  1 bluefs add_block_device bdev 0 path /var/lib/ceph/osd/ceph-0/block.wal size 96MiB
...

Revision history for this message

Corey Bryant (corey.bryant) wrote on 2019-05-29:

#20

Couple typos in comment #19:
I think bluestore-wal and bluestore-db needed 2G.
Also s/exists/exits

Revision history for this message

Corey Bryant (corey.bryant) wrote on 2019-05-29:

#21

I'm building a test package for ceph with additional logic added to /usr/lib/ceph/ceph-osd-prestart.sh to allow block.wal and block.db additional time to settle. This is just a version to test the fix. I'm not sure if the behavior is the same as journal file (symlink exists but file doesn't) but that's what I have in this change. Here's the PPA: https://launchpad.net/~corey.bryant/+archive/ubuntu/bionic-queens-1828617/+packages

Xav, Any chance you could try this out once it builds?

Revision history for this message

Xav Paice (xavpaice) wrote on 2019-05-29:

#22

Thanks, will do. FWIW, the symlinks are in place before reboot.

Revision history for this message

Wouter van Bommel (woutervb) wrote on 2019-05-30:

#23

Hi,

Installed the packages from the above ppa, rebooted the host and 4 out of 7 osd's came up. The 3 that where missing from the `ceph osd tree` where not running the osd daemon as they lacked the symlinks to the db and the wal.

Rebooted the server, and after the reboot other osd's (again 3 out of 7) failed to start due to missing symlinks. This time it where other osd's. So the issue is not fixed with the deb's in the ppa.

Regards,
Wouter

Revision history for this message

Corey Bryant (corey.bryant) wrote on 2019-05-30:

#24

@Wouter, Thanks for testing. I'm rebuilding the package without the checks as they're probably preventing the udevadm settle from running. In the new build the 'udevadm settle --timeout=5' will run regardless. Let's see if that helps and then we can fine tune the checks surrounding the call later. Would you mind trying again once that builds (same PPA)?

Revision history for this message

Corey Bryant (corey.bryant) wrote on 2019-05-30:

#25

@Wouter, since ceph takes so long to build you could also manually add 'udevadm settle --timeout=5' to /usr/lib/ceph/ceph-osd-prestart.sh across the ceph-osd units to test that.

Revision history for this message

James Page (james-page) wrote on 2019-05-30:

#26

The ceph-volume tool assembles and primes the OSD directory using the LV tags written during the prepare action - it would be good to validate these are OK with 'sudo lvs -o lv_tags'

The tags will contain UUID information about all of the block devices associated with an OSD.

Revision history for this message

James Page (james-page) wrote on 2019-05-30:

#27

Any output in /var/log/ceph/ceph-volume-systemd.log would also be useful

Revision history for this message

James Page (james-page) wrote on 2019-05-30:

#28

Some further references:

Each part of the OSD is queried for its underlying block device using blkid:

https://github.com/ceph/ceph/blob/luminous/src/ceph-volume/ceph_volume/devices/lvm/activate.py#L114

I guess that if the block device was not visible/present at the point that code runs during activate, then the symlink for the block.db or block.wal devices would not be created, causing the OSD to fail to start.

Revision history for this message

James Page (james-page) wrote on 2019-05-30:

#29

Referenced from:

https://github.com/ceph/ceph/blob/luminous/src/ceph-volume/ceph_volume/devices/lvm/activate.py#L154

Revision history for this message

Corey Bryant (corey.bryant) wrote on 2019-05-30:

#30

Note that there may only be a short window during system startup to catch missing tags with 'sudo lvs -o lv_tags'.

Revision history for this message

Wouter van Bommel (woutervb) wrote on 2019-05-31:

#31

Hi,

Added the udevadm settle --timeout=5 in both the 2 remaining if block's in the referenced script. That did not make a difference.

See https://pastebin.ubuntu.com/p/8f2ZXMRNgv/ for the ceph-volume-systemd.log

At this boot, the osd's with numbers 4, 11 & 18 did not start, with the missing symlinks

Revision history for this message

Corey Bryant (corey.bryant) wrote on 2019-05-31:

#32

Thanks for testing. That should rule out udev as the cause of the race.

A couple of observations from the log:

* There is a loop for each osd that calls 'ceph-volume lvm trigger' 30 times until the OSD is activated, for example for 4:
[2019-05-31 01:27:29,235][ceph_volume.process][INFO ] Running command: ceph-volume lvm trigger 4-7478edfc-f321-40a2-a105-8e8a2c8ca3f6
[2019-05-31 01:27:35,435][ceph_volume.process][INFO ] stderr --> RuntimeError: could not find osd.4 with fsid 7478edfc-f321-40a2-a105-8e8a2c8ca3f6
[2019-05-31 01:27:35,530][systemd][WARNING] command returned non-zero exit status: 1
[2019-05-31 01:27:35,531][systemd][WARNING] failed activating OSD, retries left: 30
[2019-05-31 01:27:44,122][ceph_volume.process][INFO ] stderr --> RuntimeError: could not find osd.4 with fsid 7478edfc-f321-40a2-a105-8e8a2c8ca3f6
[2019-05-31 01:27:44,174][systemd][WARNING] command returned non-zero exit status: 1
[2019-05-31 01:27:44,175][systemd][WARNING] failed activating OSD, retries left: 29
...

I wonder if we can have similar 'ceph-volume lvm trigger' calls for WAL and DB devices per OSD. Does that even make sense? Or perhaps another call with a similar goal. We should be able to determine if an OSD has a DB or WAL device from the lvm tags.

* The first 3 osd's that are activated are 18, 4, and 11 and they are the 3 that are missing block.db/block.wal symlinks. That's just more confirmation this is a race:
[2019-05-31 01:28:03,370][systemd][INFO ] successfully trggered activation for: 18-eb5270dc-1110-420f-947e-aab7fae299c9
[2019-05-31 01:28:12,354][systemd][INFO ] successfully trggered activation for: 4-7478edfc-f321-40a2-a105-8e8a2c8ca3f6
[2019-05-31 01:28:12,530][systemd][INFO ] successfully trggered activation for: 11-33de740d-bd8c-4b47-a601-3e6e634e489a

Thanks for testing. That should rule out udev as the cause of the race.

A couple of observations from the log:

* There is a loop for each osd that calls 'ceph-volume lvm trigger' 30 times until the OSD is activated, for example for 4:
[2019-05-31 01:27:29,235][ceph_volume.process][INFO  ] Running command: ceph-volume lvm trigger 4-7478edfc-f321-40a2-a105-8e8a2c8ca3f6
[2019-05-31 01:27:35,435][ceph_volume.process][INFO  ] stderr -->  RuntimeError: could not find osd.4 with fsid 7478edfc-f321-40a2-a105-8e8a2c8ca3f6                        
[2019-05-31 01:27:35,530][systemd][WARNING] command returned non-zero exit status: 1                                                                                        
[2019-05-31 01:27:35,531][systemd][WARNING] failed activating OSD, retries left: 30                                                                           
[2019-05-31 01:27:44,122][ceph_volume.process][INFO  ] stderr -->  RuntimeError: could not find osd.4 with fsid 7478edfc-f321-40a2-a105-8e8a2c8ca3f6                        
[2019-05-31 01:27:44,174][systemd][WARNING] command returned non-zero exit status: 1                                                                 
[2019-05-31 01:27:44,175][systemd][WARNING] failed activating OSD, retries left: 29
...

I wonder if we can have similar 'ceph-volume lvm trigger' calls for WAL and DB devices per OSD. Does that even make sense? Or perhaps another call with a similar goal. We should be able to determine if an OSD has a DB or WAL device from the lvm tags.

* The first 3 osd's that are activated are 18, 4, and 11 and they are the 3 that are missing block.db/block.wal symlinks. That's just more confirmation this is a race:
[2019-05-31 01:28:03,370][systemd][INFO  ] successfully trggered activation for: 18-eb5270dc-1110-420f-947e-aab7fae299c9                     
[2019-05-31 01:28:12,354][systemd][INFO  ] successfully trggered activation for: 4-7478edfc-f321-40a2-a105-8e8a2c8ca3f6                                                     
[2019-05-31 01:28:12,530][systemd][INFO  ] successfully trggered activation for: 11-33de740d-bd8c-4b47-a601-3e6e634e489a

Revision history for this message

Corey Bryant (corey.bryant) wrote on 2019-05-31:

#33

The 'ceph-volume lvm trigger' call appears to come from ceph source at src/ceph-volume/ceph_volume/systemd/main.py.

Revision history for this message

Corey Bryant (corey.bryant) wrote on 2019-05-31:

#34

Upstream ceph bug opened: https://tracker.ceph.com/issues/40100

Revision history for this message

Corey Bryant (corey.bryant) wrote on 2019-05-31:

#35

Upstream pull request: https://github.com/ceph/ceph/pull/28357

Revision history for this message

Corey Bryant (corey.bryant) wrote on 2019-05-31:

#36

I've cherry-picked that patch to the package in the PPA if anyone can test. I'm fairly sure this will fix it as I've been testing and removing/adding the volume backed storage in my testing environment and it will wait for the wal/db devices for a while if they don't exist.

Changed in ceph (Ubuntu):
status:	New → Triaged
importance:	Undecided → Critical
assignee:	nobody → Corey Bryant (corey.bryant)

Revision history for this message

Xav Paice (xavpaice) wrote on 2019-06-03:

#37

After installing that PPA update and rebooting, the PV for the wal didn't come online till I ran pvscan --cache. Seems a second reboot didn't do that though, might have been a red herring from prior attempts.

Unfortunately, the OSDs didn't seem to come online in exactly the same way after installing the update.

Revision history for this message

Xav Paice (xavpaice) wrote on 2019-06-04:

#38

Let me word that last comment differently.

I went to the host and installed the PPA update, then rebooted.

When the box booted up, the PV which hosts the wal LVs wasn't listed in lsblk or 'pvs' or lvs. I then ran pvscan --cache, which brought the LVs back online, but not the OSDs, so I rebooted.

After that reboot, the behavior of the OSDs was exactly the same as prior to the update - I reboot, and some OSDs don't come online, and are missing symlinks.

Revision history for this message

Corey Bryant (corey.bryant) wrote on 2019-06-04:

#39

Do you have access to the /var/log/ceph/ceph-volume-systemd.log after the latest reboot? That should give us some details such as:

"[2019-05-31 20:43:44,334][systemd][WARNING] failed to find db volume, retries left: 17"

or similar for wal volume.

If you see that the retries have been exceeded in your case you can tune them (the new loops are using the same env vars):

http://docs.ceph.com/docs/mimic/ceph-volume/systemd/#failure-and-retries

As for the pvscan issue, I'm not sure if that is a ceph issue (?).

Revision history for this message

Xav Paice (xavpaice) wrote on 2019-06-04:

#40

The pvscan issue is likely something different, just wanted to make sure folks are aware of it for completeness.

The logs /var/log/ceph/ceph-volume-systemd.log and ceph-volume.log are empty.

Revision history for this message

Corey Bryant (corey.bryant) wrote on 2019-06-05:

#41

Any chance the log files got rotated and zipped? What does an ls of /var/log/ceph show?

Revision history for this message

Corey Bryant (corey.bryant) wrote on 2019-06-07:

#42

snippet from src/ceph-volume/ceph_volume/systemd/main.py Edit (2.1 KiB, text/x-python)

I chatted with xav in IRC and he showed me a private link to the log files. The ceph-volume-systemd.log.1 had timestamps of 2019-06-03 which matches up with the last attempt (see comment #37).

I didn't find any logs from the new code in this log file. That likely means one of the following: there were no wal/db devices found in lvs tags (ie. 'sudo lvs -o lv_tags'), the new code isn't working, or the new code wasn't installed.

I added a few more logs to the patch help understand better what's going on, and that's rebuilding in the PPA.

I'm attaching all the relevant code to show the log messages to look for.

Revision history for this message

Corey Bryant (corey.bryant) wrote on 2019-06-07:

#43

Note that the code looks for wal/db devices in the block device's LV tags after it is found. In other words:

sudo lvs -o lv_tags | grep type=block | grep ceph.wal_device
sudo lvs -o lv_tags | grep type=block | grep ceph.db_device

This is the window where the following might not yet exist, yet we know they *should* exist based on the above tags:

sudo lvs -o lv_tags | grep type=wal
sudo lvs -o lv_tags | grep type=db

James Page (james-page) on 2019-06-11

Changed in ceph (Ubuntu):
importance:	Critical → High
status:	Triaged → In Progress

Revision history for this message

Corey Bryant (corey.bryant) wrote on 2019-06-12:

#44

Py2 bug found in code review upstream. Updated PPA again with fix.

Revision history for this message

David A. Desrosiers (setuid) wrote on 2019-06-13:

#45

Just adding that I've worked around this issue with the following added to the lvm2-monitor overrides (/etc/systemd/system/lvm2-monitor.service.d/custom.conf):

[Service]
ExecStartPre=/bin/sleep 60

This results in 100% success for every single boot, with no missed disks nor missed LVM volumes applied to those block devices.

We've also disabled nvme multipathing on every Ceph storage node with the following in /etc/d/g kernel boot args:

nvme_core.multipath=0

Note: This LP was cloned from an internal customer case where their Ceph storage nodes were directly impacted by this issue, and this is the current workaround deployed, until/unless we can find a consistent RC for this issue in an upstream package.

Revision history for this message

Corey Bryant (corey.bryant) wrote on 2019-06-17:

#46

@David, thanks for the update. We could really use some testing of the current proposed fix if you have a chance. That's in a PPA mentioned above. The new code will wait for wal/db devices to arrive and has env vars to adjust wait times - http://docs.ceph.com/docs/mimic/ceph-volume/systemd/#failure-and-retries.

As for the pvscan issue, I don't think that is related to ceph.

Revision history for this message

James Page (james-page) wrote on 2019-07-03:

#47

Alternative fix proposed upstream - picking this in preference to Corey's fix as its in the right part of the codebase for ceph-volume.

Revision history for this message

James Page (james-page) wrote on 2019-07-03:

#48

Building in ppa:ci-train-ppa-service/3535 (will take a few hours).

Revision history for this message

James Page (james-page) wrote on 2019-07-03:

#49

Alternative fix: https://github.com/ceph/ceph/pull/28791

Changed in ceph (Ubuntu):
assignee:	Corey Bryant (corey.bryant) → James Page (james-page)

James Page (james-page) on 2019-08-29

Changed in ceph (Ubuntu Bionic):
status:	New → In Progress
Changed in ceph (Ubuntu Disco):
status:	New → In Progress
assignee:	nobody → James Page (james-page)
Changed in ceph (Ubuntu Bionic):
assignee:	nobody → James Page (james-page)
importance:	Undecided → High
Changed in ceph (Ubuntu Disco):
importance:	Undecided → High
description:	updated

Revision history for this message

Launchpad Janitor (janitor) wrote on 2019-08-29:

#50

This bug was fixed in the package ceph - 14.2.2-0ubuntu2

---------------
ceph (14.2.2-0ubuntu2) eoan; urgency=medium

  [ Eric Desrochers ]
  * Ensure that daemons are not automatically restarted during package
    upgrades (LP: #1840347):
    - d/rules: Use "--no-restart-after-upgrade" and "--no-stop-on-upgrade"
      instead of "--no-restart-on-upgrade".
    - d/rules: Drop exclusion for ceph-[osd,mon,mds] for restarts.

  [ Jesse Williamson ]
  * d/p/civetweb-755-1.8-somaxconn-configurable*.patch: Backport changes
    to civetweb to allow tuning of SOMAXCONN in Ceph RADOS Gateway
    deployments (LP: #1838109).

  [ James Page ]
  * d/p/ceph-volume-wait-for-lvs.patch: Cherry pick inflight fix to
    ensure that required wal and db devices are present before
    activating OSD's (LP: #1828617).

  [ Steve Beattie ]
  * SECURITY UPDATE: RADOS gateway remote denial of service
    - d/p/CVE-2019-10222.patch: rgw: asio: check the remote endpoint
      before processing requests.
    - CVE-2019-10222

-- James Page <email address hidden> Thu, 29 Aug 2019 13:54:25 +0100

Changed in ceph (Ubuntu Eoan):
status:	In Progress → Fix Released

James Page (james-page) on 2019-08-30

no longer affects:

cloud-archive/pike

Revision history for this message

Łukasz Zemczak (sil2100) wrote on 2019-09-05: Please test proposed package

#51

Hello Andrey, or anyone else affected,

Accepted ceph into disco-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ceph/13.2.6-0ubuntu0.19.04.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-disco to verification-done-disco. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-disco. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in ceph (Ubuntu Disco):
status:	In Progress → Fix Committed
tags:	added: verification-needed verification-needed-disco
Changed in ceph (Ubuntu Bionic):
status:	In Progress → Fix Committed
tags:	added: verification-needed-bionic

Revision history for this message

Łukasz Zemczak (sil2100) wrote on 2019-09-05:

#52

Hello Andrey, or anyone else affected,

Accepted ceph into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ceph/12.2.12-0ubuntu0.18.04.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message

James Page (james-page) wrote on 2019-09-06:

#53

Hello Andrey, or anyone else affected,

Accepted ceph into rocky-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

sudo add-apt-repository cloud-archive:rocky-proposed
sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-rocky-needed to verification-rocky-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-rocky-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags:

added: verification-rocky-needed

Revision history for this message

James Page (james-page) wrote on 2019-09-06:

#54

Hello Andrey, or anyone else affected,

Accepted ceph into stein-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

sudo add-apt-repository cloud-archive:stein-proposed
sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-stein-needed to verification-stein-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-stein-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags:

added: verification-stein-needed

Revision history for this message

James Page (james-page) wrote on 2019-09-19:

#55

bionic-proposed tested with a deployment using separate db and wal devices; OSD's restarted reliably over 10 x reboot iterations across three machines.

tags:

added: verification-done-bionic
removed: verification-needed-bionic

Revision history for this message

James Page (james-page) wrote on 2019-09-20:

#56

Download full text (3.9 KiB)

$ apt-cache policy ceph-osd
ceph-osd:
  Installed: 13.2.6-0ubuntu0.19.04.4
  Candidate: 13.2.6-0ubuntu0.19.04.4
  Version table:
*** 13.2.6-0ubuntu0.19.04.4 500
        500 http://archive.ubuntu.com/ubuntu disco-proposed/main amd64 Packages
        100 /var/lib/dpkg/status
     13.2.6-0ubuntu0.19.04.3 500
        500 http://nova.clouds.archive.ubuntu.com/ubuntu disco-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu disco-security/main amd64 Packages
     13.2.4+dfsg1-0ubuntu2 500
        500 http://nova.clouds.archive.ubuntu.com/ubuntu disco/main amd64 Packages

disco-proposed tested with a deployment using separate db and wal devices; OSD's restarted reliably over 10 x reboot iterations across three machines.

$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 88.7M 1 loop /snap/core/7396
loop1 7:1 0 54.5M 1 loop
loop2 7:2 0 89M 1 loop /snap/core/7713
loop3 7:3 0 54.6M 1 loop /snap/lxd/11964
loop4 7:4 0 54.6M 1 loop /snap/lxd/11985
vda 252:0 0 20G 0 disk
├─vda1 252:1 0 19.9G 0 part /
├─vda14 252:14 0 4M 0 part
└─vda15 252:15 0 106M 0 part /boot/efi
vdb 252:16 0 40G 0 disk /mnt
vdc 252:32 0 10G 0 disk
└─ceph--683a8389--9788--4fd5--b59e--bdd69936a768-osd--block--683a8389--9788--4fd5--b59e--bdd69936a768
                                                              253:0 0 10G 0 lvm
vdd 252:48 0 10G 0 disk
└─ceph--1fd8022f--e851--4cfa--82aa--64693510c705-osd--block--1fd8022f--e851--4cfa--82aa--64693510c705
                                                              253:6 0 10G 0 lvm
vde 252:64 0 10G 0 disk
└─ceph--302bafc8--9981--47a3--b66b--3d84ab550ba5-osd--block--302bafc8--9981--47a3--b66b--3d84ab550ba5
                                                              253:3 0 10G 0 lvm
vdf 252:80 0 5G 0 disk
├─ceph--db--28e3b53f--1468--4136--914d--6630343a2a67-osd--db--683a8389--9788--4fd5--b59e--bdd69936a768
│ 253:2 0 1G 0 lvm
├─ceph--db--28e3b53f--1468--4136--914d--6630343a2a67-osd--db--302bafc8--9981--47a3--b66b--3d84ab550ba5
│ 253:5 0 1G 0 lvm
└─ceph--db--28e3b53f--1468--4136--914d--6630343a2a67-osd--db--1fd8022f--e851--4cfa--82aa...

Ubuntu
ceph package

Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

	Status	Importance	Assigned to
Ubuntu Cloud Archive	Fix Released	High	James Page
Queens	Fix Released	High	James Page
Rocky	Fix Released	High	James Page
Stein	Fix Released	High	James Page
Train	Fix Released	High	James Page
ceph (Ubuntu)	Fix Released	High	James Page
Bionic	Fix Released	High	James Page
Disco	Fix Released	High	James Page
Eoan	Fix Released	High	James Page

Changed in ceph (Ubuntu Disco):
status:	Fix Committed → Fix Released

Changed in cloud-archive:
status:	Fix Committed → Fix Released

Ubuntuceph package

Hosts randomly 'losing' disks, breaking ceph-osd service enumeration

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
ceph package