VM rbd backed block devices inconsistent after unexpected host outage

Bug #1773449 reported by Craig Bender
48
This bug affects 7 people
Affects Status Importance Assigned to Milestone
Ceph Monitor Charm
Fix Released
High
James Page
OpenStack Compute (nova)
Invalid
Undecided
Unassigned
Ubuntu Cloud Archive
Invalid
High
Unassigned
charms.ceph
Fix Released
High
James Page
tripleo
Fix Released
High
Giulio Fidente
ceph (Ubuntu)
Invalid
High
Unassigned
nova (Ubuntu)
Invalid
High
Unassigned
qemu (Ubuntu)
Invalid
High
Unassigned

Bug Description

Reboot host that contains VMs with volumes and all VMs fail to boot. Happens with Queens on Bionic and Xenial

[ 0.000000] Initializing cgroup subsys cpuset

[ 0.000000] Initializing cgroup subsys cpu

[ 0.000000] Initializing cgroup subsys cpuacct

[ 0.000000] Linux version 4.4.0-124-generic (buildd@lcy01-amd64-028) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.9) ) #148-Ubuntu SMP Wed May 2 13:00:18 UTC 2018 (Ubuntu 4.4.0-124.148-generic 4.4.117)

[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.4.0-124-generic root=UUID=bca2de6e-f774-4203-ae05-e8deeb05f64a ro console=tty1 console=ttyS0

[ 0.000000] KERNEL supported cpus:

[ 0.000000] Intel GenuineIntel

[ 0.000000] AMD AuthenticAMD

[ 0.000000] Centaur CentaurHauls

[ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256

[ 0.000000] x86/fpu: Supporting XSAVE feature 0x01: 'x87 floating point registers'

[ 0.000000] x86/fpu: Supporting XSAVE feature 0x02: 'SSE registers'

[ 0.000000] x86/fpu: Supporting XSAVE feature 0x04: 'AVX registers'

[ 0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.

[ 0.000000] x86/fpu: Using 'eager' FPU context switches.

[ 0.000000] e820: BIOS-provided physical RAM map:

[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable

[ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved

[ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved

[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007ffdbfff] usable

[ 0.000000] BIOS-e820: [mem 0x000000007ffdc000-0x000000007fffffff] reserved

[ 0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved

[ 0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved

[ 0.000000] NX (Execute Disable) protection: active

[ 0.000000] SMBIOS 2.8 present.

[ 0.000000] Hypervisor detected: KVM

[ 0.000000] e820: last_pfn = 0x7ffdc max_arch_pfn = 0x400000000

[ 0.000000] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WC UC- WT

[ 0.000000] found SMP MP-table at [mem 0x000f6a20-0x000f6a2f] mapped at [ffff8800000f6a20]

[ 0.000000] Scanning 1 areas for low memory corruption

[ 0.000000] Using GB pages for direct mapping

[ 0.000000] RAMDISK: [mem 0x361f4000-0x370f1fff]

[ 0.000000] ACPI: Early table checksum verification disabled

[ 0.000000] ACPI: RSDP 0x00000000000F6780 000014 (v00 BOCHS )

[ 0.000000] ACPI: RSDT 0x000000007FFE1649 00002C (v01 BOCHS BXPCRSDT 00000001 BXPC 00000001)

[ 0.000000] ACPI: FACP 0x000000007FFE14CD 000074 (v01 BOCHS BXPCFACP 00000001 BXPC 00000001)

[ 0.000000] ACPI: DSDT 0x000000007FFE0040 00148D (v01 BOCHS BXPCDSDT 00000001 BXPC 00000001)

[ 0.000000] ACPI: FACS 0x000000007FFE0000 000040

[ 0.000000] ACPI: APIC 0x000000007FFE15C1 000088 (v01 BOCHS BXPCAPIC 00000001 BXPC 00000001)

[ 0.000000] No NUMA configuration found

[ 0.000000] Faking a node at [mem 0x0000000000000000-0x000000007ffdbfff]

[ 0.000000] NODE_DATA(0) allocated [mem 0x7ffd7000-0x7ffdbfff]

[ 0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00

[ 0.000000] kvm-clock: cpu 0, msr 0:7ffcf001, primary cpu clock

[ 0.000000] kvm-clock: using sched offset of 17590935813 cycles

[ 0.000000] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns

[ 0.000000] Zone ranges:

[ 0.000000] DMA [mem 0x0000000000001000-0x0000000000ffffff]

[ 0.000000] DMA32 [mem 0x0000000001000000-0x000000007ffdbfff]

[ 0.000000] Normal empty

[ 0.000000] Device empty

[ 0.000000] Movable zone start for each node

[ 0.000000] Early memory node ranges

[ 0.000000] node 0: [mem 0x0000000000001000-0x000000000009efff]

[ 0.000000] node 0: [mem 0x0000000000100000-0x000000007ffdbfff]

[ 0.000000] Initmem setup node 0 [mem 0x0000000000001000-0x000000007ffdbfff]

[ 0.000000] ACPI: PM-Timer IO Port: 0x608

[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])

[ 0.000000] IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23

[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)

[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)

[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)

[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)

[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)

[ 0.000000] Using ACPI (MADT) for SMP configuration information

[ 0.000000] smpboot: Allowing 3 CPUs, 0 hotplug CPUs

[ 0.000000] PM: Registered nosave memory: [mem 0x00000000-0x00000fff]

[ 0.000000] PM: Registered nosave memory: [mem 0x0009f000-0x0009ffff]

[ 0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000effff]

[ 0.000000] PM: Registered nosave memory: [mem 0x000f0000-0x000fffff]

[ 0.000000] e820: [mem 0x80000000-0xfeffbfff] available for PCI devices

[ 0.000000] Booting paravirtualized kernel on KVM

[ 0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns

[ 0.000000] setup_percpu: NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:3 nr_node_ids:1

[ 0.000000] PERCPU: Embedded 34 pages/cpu @ffff88007fc00000 s99544 r8192 d31528 u524288

[ 0.000000] KVM setup async PF for cpu 0

[ 0.000000] kvm-stealtime: cpu 0, msr 7fc101c0

[ 0.000000] Built 1 zonelists in Node order, mobility grouping on. Total pages: 515940

[ 0.000000] Policy zone: DMA32

[ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.4.0-124-generic root=UUID=bca2de6e-f774-4203-ae05-e8deeb05f64a ro console=tty1 console=ttyS0

[ 0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)

[ 0.000000] Memory: 2029152K/2096616K available (8532K kernel code, 1313K rwdata, 3996K rodata, 1508K init, 1316K bss, 67464K reserved, 0K cma-reserved)

[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=3, Nodes=1

[ 0.000000] Kernel/User page tables isolation: enabled

[ 0.000000] Hierarchical RCU implementation.

[ 0.000000] Build-time adjustment of leaf fanout to 64.

[ 0.000000] RCU restricting CPUs from NR_CPUS=512 to nr_cpu_ids=3.

[ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=64, nr_cpu_ids=3

[ 0.000000] NR_IRQS:33024 nr_irqs:448 16

[ 0.000000] Console: colour VGA+ 80x25

[ 0.000000] console [tty1] enabled

[ 0.000000] console [ttyS0] enabled

[ 0.000000] tsc: Detected 2599.996 MHz processor

[ 0.162974] Calibrating delay loop (skipped) preset value.. 5199.99 BogoMIPS (lpj=10399984)

[ 0.164958] pid_max: default: 32768 minimum: 301

[ 0.166117] ACPI: Core revision 20150930

[ 0.167823] ACPI: 1 ACPI AML tables successfully acquired and loaded

[ 0.169447] Security Framework initialized

[ 0.170423] Yama: becoming mindful.

[ 0.171304] AppArmor: AppArmor initialized

[ 0.172848] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)

[ 0.174774] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)

[ 0.176814] Mount-cache hash table entries: 4096 (order: 3, 32768 bytes)

[ 0.178303] Mountpoint-cache hash table entries: 4096 (order: 3, 32768 bytes)

[ 0.179888] Initializing cgroup subsys io

[ 0.180818] Initializing cgroup subsys memory

[ 0.181875] Initializing cgroup subsys devices

[ 0.182966] Initializing cgroup subsys freezer

[ 0.183957] Initializing cgroup subsys net_cls

[ 0.184986] Initializing cgroup subsys perf_event

[ 0.186177] Initializing cgroup subsys net_prio

[ 0.187212] Initializing cgroup subsys hugetlb

[ 0.188199] Initializing cgroup subsys pids

[ 0.189261] CPU: Physical Processor ID: 0

[ 0.190257] FEATURE SPEC_CTRL Not Present

[ 0.191931] mce: CPU supports 10 MCE banks

[ 0.192914] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0

[ 0.194081] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0

[ 0.195383] Spectre V2 mitigation: Mitigation: Full generic retpoline

[ 0.196756] Spectre V2 mitigation: Speculation control IBPB not-supported IBRS not-supported

[ 0.198573] Spectre V2 mitigation: Filling RSB on context switch

[ 0.201611] Freeing SMP alternatives memory: 32K

[ 0.208291] ftrace: allocating 32202 entries in 126 pages

[ 0.251768] smpboot: APIC(0) Converting physical 0 to logical package 0

[ 0.253513] smpboot: APIC(1) Converting physical 1 to logical package 1

[ 0.255153] smpboot: APIC(2) Converting physical 2 to logical package 2

[ 0.256714] smpboot: Max logical packages: 3

[ 0.258065] x2apic enabled

[ 0.259050] Switched APIC routing to physical x2apic.

[ 0.261345] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1

[ 0.262933] smpboot: CPU0: Intel Core Processor (Skylake) (family: 0x6, model: 0x5e, stepping: 0x3)

[ 0.265470] Performance Events: unsupported p6 CPU model 94 no PMU driver, software events only.

[ 0.268860] x86: Booting SMP configuration:

[ 0.269991] .... node #0, CPUs: #1

[ 0.271156] kvm-clock: cpu 1, msr 0:7ffcf041, secondary cpu clock

[ 0.293248] KVM setup async PF for cpu 1

[ 0.293422] #2

[ 0.293423] kvm-clock: cpu 2, msr 0:7ffcf081, secondary cpu clock

[ 0.296162] kvm-stealtime: cpu 1, msr 7fc901c0

[ 0.314289] x86: Booted up 1 node, 3 CPUs

[ 0.314292] KVM setup async PF for cpu 2

[ 0.314296] kvm-stealtime: cpu 2, msr 7fd101c0

[ 0.317678] smpboot: Total of 3 processors activated (15599.97 BogoMIPS)

[ 0.319877] devtmpfs: initialized

[ 0.322064] evm: security.selinux

[ 0.323002] evm: security.SMACK64

[ 0.323983] evm: security.SMACK64EXEC

[ 0.324991] evm: security.SMACK64TRANSMUTE

[ 0.326138] evm: security.SMACK64MMAP

[ 0.327159] evm: security.ima

[ 0.328060] evm: security.capability

[ 0.329200] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns

[ 0.331819] futex hash table entries: 1024 (order: 4, 65536 bytes)

[ 0.333630] pinctrl core: initialized pinctrl subsystem

[ 0.335197] RTC time: 19:21:27, date: 05/19/18

[ 0.336569] NET: Registered protocol family 16

[ 0.349807] cpuidle: using governor ladder

[ 0.361829] cpuidle: using governor menu

[ 0.362902] PCCT header not found.

[ 0.363971] ACPI: bus type PCI registered

[ 0.365095] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5

[ 0.366893] PCI: Using configuration type 1 for base access

[ 0.379232] ACPI: Added _OSI(Module Device)

[ 0.380366] ACPI: Added _OSI(Processor Device)

[ 0.381594] ACPI: Added _OSI(3.0 _SCP Extensions)

[ 0.382948] ACPI: Added _OSI(Processor Aggregator Device)

[ 0.385698] ACPI: Interpreter enabled

[ 0.386750] ACPI: (supports S0 S3 S4 S5)

[ 0.387864] ACPI: Using IOAPIC for interrupt routing

[ 0.389261] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug

[ 0.404203] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])

[ 0.405846] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments MSI]

[ 0.407652] acpi PNP0A03:00: _OSC failed (AE_NOT_FOUND); disabling ASPM

[ 0.409009] acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.

[ 0.411606] acpiphp: Slot [3] registered

[ 0.412544] acpiphp: Slot [4] registered

[ 0.413516] acpiphp: Slot [5] registered

[ 0.414437] acpiphp: Slot [6] registered

[ 0.415338] acpiphp: Slot [7] registered

[ 0.416238] acpiphp: Slot [8] registered

[ 0.417238] acpiphp: Slot [9] registered

[ 0.418162] acpiphp: Slot [10] registered

[ 0.419092] acpiphp: Slot [11] registered

[ 0.420086] acpiphp: Slot [12] registered

[ 0.421122] acpiphp: Slot [13] registered

[ 0.422236] acpiphp: Slot [14] registered

[ 0.423159] acpiphp: Slot [15] registered

[ 0.424310] acpiphp: Slot [16] registered

[ 0.425278] acpiphp: Slot [17] registered

[ 0.427252] acpiphp: Slot [18] registered

[ 0.428177] acpiphp: Slot [19] registered

[ 0.429099] acpiphp: Slot [20] registered

[ 0.430023] acpiphp: Slot [21] registered

[ 0.430979] acpiphp: Slot [22] registered

[ 0.431898] acpiphp: Slot [23] registered

[ 0.432824] acpiphp: Slot [24] registered

[ 0.433750] acpiphp: Slot [25] registered

[ 0.434666] acpiphp: Slot [26] registered

[ 0.435581] acpiphp: Slot [27] registered

[ 0.436573] acpiphp: Slot [28] registered

[ 0.437503] acpiphp: Slot [29] registered

[ 0.438423] acpiphp: Slot [30] registered

[ 0.439335] acpiphp: Slot [31] registered

[ 0.440247] PCI host bridge to bus 0000:00

[ 0.441170] pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7 window]

[ 0.442577] pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window]

[ 0.444045] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]

[ 0.445793] pci_bus 0000:00: root bus resource [mem 0x80000000-0xfebfffff window]

[ 0.447578] pci_bus 0000:00: root bus resource [mem 0x100000000-0x17fffffff window]

[ 0.449309] pci_bus 0000:00: root bus resource [bus 00-ff]

[ 0.454218] pci 0000:00:01.1: legacy IDE quirk: reg 0x10: [io 0x01f0-0x01f7]

[ 0.455694] pci 0000:00:01.1: legacy IDE quirk: reg 0x14: [io 0x03f6]

[ 0.457043] pci 0000:00:01.1: legacy IDE quirk: reg 0x18: [io 0x0170-0x0177]

[ 0.458682] pci 0000:00:01.1: legacy IDE quirk: reg 0x1c: [io 0x0376]

[ 0.463649] pci 0000:00:01.3: quirk: [io 0x0600-0x063f] claimed by PIIX4 ACPI

[ 0.465298] pci 0000:00:01.3: quirk: [io 0x0700-0x070f] claimed by PIIX4 SMB

[ 0.491334] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)

[ 0.493107] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)

[ 0.495127] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)

[ 0.496733] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)

[ 0.498345] ACPI: PCI Interrupt Link [LNKS] (IRQs *9)

[ 0.499910] ACPI: Enabled 2 GPEs in block 00 to 0F

[ 0.501277] vgaarb: setting as boot device: PCI:0000:00:02.0

[ 0.502636] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none

[ 0.504417] vgaarb: loaded

[ 0.505216] vgaarb: bridge control possible 0000:00:02.0

[ 0.506617] SCSI subsystem initialized

[ 0.507605] ACPI: bus type USB registered

[ 0.508548] usbcore: registered new interface driver usbfs

[ 0.509767] usbcore: registered new interface driver hub

[ 0.510941] usbcore: registered new device driver usb

[ 0.512698] PCI: Using ACPI for IRQ routing

[ 0.513896] NetLabel: Initializing

[ 0.514738] NetLabel: domain hash size = 128

[ 0.515770] NetLabel: protocols = UNLABELED CIPSOv4

[ 0.516976] NetLabel: unlabeled traffic allowed by default

[ 0.518350] amd_nb: Cannot enumerate AMD northbridges

[ 0.519489] clocksource: Switched to clocksource kvm-clock

[ 0.526617] AppArmor: AppArmor Filesystem Enabled

[ 0.527736] pnp: PnP ACPI init

[ 0.528883] pnp: PnP ACPI: found 5 devices

[ 0.536051] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns

[ 0.538129] NET: Registered protocol family 2

[ 0.539237] TCP established hash table entries: 16384 (order: 5, 131072 bytes)

[ 0.540871] TCP bind hash table entries: 16384 (order: 6, 262144 bytes)

[ 0.542278] TCP: Hash tables configured (established 16384 bind 16384)

[ 0.543657] UDP hash table entries: 1024 (order: 3, 32768 bytes)

[ 0.544935] UDP-Lite hash table entries: 1024 (order: 3, 32768 bytes)

[ 0.546335] NET: Registered protocol family 1

[ 0.547309] pci 0000:00:00.0: Limiting direct PCI/PCI transfers

[ 0.548556] pci 0000:00:01.0: PIIX3: Enabling Passive Release

[ 0.549820] pci 0000:00:01.0: Activating ISA DMA hang workarounds

[ 0.569172] ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 11

[ 0.588539] Unpacking initramfs...

[ 0.779175] Freeing initrd memory: 15352K

[ 0.780438] Scanning for low memory corruption every 60 seconds

[ 0.782003] audit: initializing netlink subsys (disabled)

[ 0.783197] audit: type=2000 audit(1526757688.378:1): initialized

[ 0.784711] Initialise system trusted keyring

[ 0.785837] HugeTLB registered 1 GB page size, pre-allocated 0 pages

[ 0.787194] HugeTLB registered 2 MB page size, pre-allocated 0 pages

[ 0.789576] zbud: loaded

[ 0.790539] VFS: Disk quotas dquot_6.6.0

[ 0.791463] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)

[ 0.793081] squashfs: version 4.0 (2009/01/31) Phillip Lougher

[ 0.794559] fuse init (API version 7.23)

[ 0.795575] Key type big_key registered

[ 0.796488] Allocating IMA MOK and blacklist keyrings.

[ 0.797938] Key type asymmetric registered

[ 0.798915] Asymmetric key parser 'x509' registered

[ 0.800060] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 249)

[ 0.801823] io scheduler noop registered

[ 0.802860] io scheduler deadline registered (default)

[ 0.804056] io scheduler cfq registered

[ 0.805120] pci_hotplug: PCI Hot Plug PCI Core version: 0.5

[ 0.806350] pciehp: PCI Express Hot Plug Controller Driver version: 0.4

[ 0.807840] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0

[ 0.809536] ACPI: Power Button [PWRF]

[ 0.810580] GHES: HEST is not enabled!

[ 0.829152] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 10

[ 0.867373] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 10

[ 0.870053] Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled

[ 0.894774] 00:04: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A

[ 0.897409] Linux agpgart interface v0.103

[ 0.899640] loop: module loaded

[ 0.903947] vda: vda1

[ 0.905362] scsi host0: ata_piix

[ 0.906255] scsi host1: ata_piix

[ 0.907100] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc0a0 irq 14

[ 0.908585] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc0a8 irq 15

[ 0.910205] libphy: Fixed MDIO Bus: probed

[ 0.911237] tun: Universal TUN/TAP device driver, 1.6

[ 0.912457] tun: (C) 1999-2004 Max Krasnyansky <email address hidden>

[ 0.914846] PPP generic driver version 2.4.2

[ 0.915910] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver

[ 0.917277] ehci-pci: EHCI PCI platform driver

[ 0.918300] ehci-platform: EHCI generic platform driver

[ 0.919424] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver

[ 0.920822] ohci-pci: OHCI PCI platform driver

[ 0.921874] ohci-platform: OHCI generic platform driver

[ 0.923959] uhci_hcd: USB Universal Host Controller Interface driver

[ 0.943332] uhci_hcd 0000:00:01.2: UHCI Host Controller

[ 0.944503] uhci_hcd 0000:00:01.2: new USB bus registered, assigned bus number 1

[ 0.946191] uhci_hcd 0000:00:01.2: detected 2 ports

[ 0.947335] uhci_hcd 0000:00:01.2: irq 11, io base 0x0000c040

[ 0.948617] usb usb1: New USB device found, idVendor=1d6b, idProduct=0001

[ 0.950067] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1

[ 0.951691] usb usb1: Product: UHCI Host Controller

[ 0.952758] usb usb1: Manufacturer: Linux 4.4.0-124-generic uhci_hcd

[ 0.954105] usb usb1: SerialNumber: 0000:00:01.2

[ 0.955236] hub 1-0:1.0: USB hub found

[ 0.956239] hub 1-0:1.0: 2 ports detected

[ 0.957328] i8042: PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12

[ 0.960477] serio: i8042 KBD port at 0x60,0x64 irq 1

[ 0.961576] serio: i8042 AUX port at 0x60,0x64 irq 12

[ 0.962781] mousedev: PS/2 mouse device common for all mice

[ 0.964147] rtc_cmos 00:00: RTC can wake from S4

[ 0.965518] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input1

[ 0.967499] rtc_cmos 00:00: rtc core: registered rtc_cmos as rtc0

[ 0.968986] rtc_cmos 00:00: alarms up to one day, y3k, 114 bytes nvram

[ 0.970400] i2c /dev entries driver

[ 0.971310] device-mapper: uevent: version 1.0.3

[ 0.972400] device-mapper: ioctl: 4.34.0-ioctl (2015-10-28) initialised: <email address hidden>

[ 0.974316] ledtrig-cpu: registered to indicate activity on CPUs

[ 0.975944] NET: Registered protocol family 10

[ 0.977135] NET: Registered protocol family 17

[ 0.978261] Key type dns_resolver registered

[ 0.979543] registered taskstats version 1

[ 0.980576] Loading compiled-in X.509 certificates

[ 0.982251] Loaded X.509 cert 'Build time autogenerated kernel key: 97a7db5071cab428e7f67f96f51ed8d32b3fc527'

[ 0.984511] zswap: loaded using pool lzo/zbud

[ 0.987147] Key type trusted registered

[ 0.990720] Key type encrypted registered

[ 1.002474] AppArmor: AppArmor sha1 policy hashing enabled

[ 1.003727] ima: No TPM chip found, activating TPM-bypass!

[ 1.004957] evm: HMAC attrs: 0x1

[ 1.006120] Magic number: 2:261:395

[ 1.007174] rtc_cmos 00:00: setting system clock to 2018-05-19 19:21:28 UTC (1526757688)

[ 1.009252] BIOS EDD facility v0.16 2004-Jun-25, 0 devices found

[ 1.010557] EDD information not available.

[ 1.064852] Freeing unused kernel memory: 1508K

[ 1.065934] Write protecting the kernel read-only data: 14336k

[ 1.067907] Freeing unused kernel memory: 1696K

[ 1.069172] Freeing unused kernel memory: 100K

Loading, please wait...

starting version 229

[ 1.081083] random: systemd-udevd: uninitialized urandom read (16 bytes read, 1 bits of entropy available)

[ 1.081727] random: udevadm: uninitialized urandom read (16 bytes read, 1 bits of entropy available)

[ 1.081747] random: udevadm: uninitialized urandom read (16 bytes read, 1 bits of entropy available)

[ 1.088174] random: systemd-udevd: uninitialized urandom read (16 bytes read, 1 bits of entropy available)

[ 1.088407] random: udevadm: uninitialized urandom read (16 bytes read, 1 bits of entropy available)

[ 1.088446] random: udevadm: uninitialized urandom read (16 bytes read, 1 bits of entropy available)

[ 1.088617] random: udevadm: uninitialized urandom read (16 bytes read, 1 bits of entropy available)

[ 1.088658] random: udevadm: uninitialized urandom read (16 bytes read, 1 bits of entropy available)

[ 1.088700] random: udevadm: uninitialized urandom read (16 bytes read, 1 bits of entropy available)

[ 1.088741] random: udevadm: uninitialized urandom read (16 bytes read, 1 bits of entropy available)

[ 1.119715] virtio_net virtio0 ens3: renamed from eth0

[ 1.124086] input: VirtualPS/2 VMware VMMouse as /devices/platform/i8042/serio1/input/input4

[ 1.126311] input: VirtualPS/2 VMware VMMouse as /devices/platform/i8042/serio1/input/input3

[ 1.132121] FDC 0 is a S82078B

[ 1.135401] AVX2 version of gcm_enc/dec engaged.

[ 1.136548] AES CTR mode by8 optimization enabled

[ 1.271543] usb 1-1: new full-speed USB device number 2 using uhci_hcd

[ 1.436560] usb 1-1: New USB device found, idVendor=0627, idProduct=0001

[ 1.438132] usb 1-1: New USB device strings: Mfr=1, Product=3, SerialNumber=5

[ 1.439691] usb 1-1: Product: QEMU USB Tablet

[ 1.440715] usb 1-1: Manufacturer: QEMU

[ 1.441614] usb 1-1: SerialNumber: 42

[ 1.449330] hidraw: raw HID events driver (C) Jiri Kosina

[ 1.455810] usbcore: registered new interface driver usbhid

[ 1.457084] usbhid: USB HID core driver

[ 1.459881] input: QEMU QEMU USB Tablet as /devices/pci0000:00/0000:00:01.2/usb1/1-1/1-1:1.0/0003:0627:0001.0001/input/input5

[ 1.462712] hid-generic 0003:0627:0001.0001: input,hidraw0: USB HID v0.01 Mouse [QEMU QEMU USB Tablet] on usb-0000:00:01.2-1/input0

[ 1.779575] tsc: Refined TSC clocksource calibration: 2599.629 MHz

[ 1.780975] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x2578ddcaa07, max_idle_ns: 440795279560 ns

Begin: Loading essential drivers ... [ 2.522755] md: linear personality registered for level -1

[ 2.525948] md: multipath personality registered for level -4

[ 2.529325] md: raid0 personality registered for level 0

[ 2.532767] md: raid1 personality registered for level 1

[ 2.603533] raid6: sse2x1 gen() 10141 MB/s

[ 2.671524] raid6: sse2x1 xor() 7713 MB/s

[ 2.739530] raid6: sse2x2 gen() 12546 MB/s

[ 2.807530] raid6: sse2x2 xor() 8340 MB/s

[ 2.875535] raid6: sse2x4 gen() 14625 MB/s

[ 2.943528] raid6: sse2x4 xor() 10307 MB/s

[ 3.011530] raid6: avx2x1 gen() 19536 MB/s

[ 3.079501] raid6: avx2x2 gen() 22464 MB/s

[ 3.147498] raid6: avx2x4 gen() 25560 MB/s

[ 3.148669] raid6: using algorithm avx2x4 gen() 25560 MB/s

[ 3.150132] raid6: using avx2x2 recovery algorithm

[ 3.152232] xor: automatically using best checksumming function:

[ 3.191529] avx : 28402.000 MB/sec

[ 3.193559] async_tx: api initialized (async)

[ 3.199567] md: raid6 personality registered for level 6

[ 3.200781] md: raid5 personality registered for level 5

[ 3.201962] md: raid4 personality registered for level 4

[ 3.207517] md: raid10 personality registered for level 10

done.

Begin: Running /scripts/init-premount ... done.

Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done.

Begin: Running /scripts/local-premount ... [ 3.234789] Btrfs loaded

Scanning for Btrfs filesystems

done.

Warning: fsck not present, so skipping root file system

[ 3.310173] EXT4-fs (vda1): INFO: recovery required on readonly filesystem

[ 3.311654] EXT4-fs (vda1): write access will be enabled during recovery

[ 5.419286] blk_update_request: I/O error, dev vda, sector 2048

[ 5.420745] Buffer I/O error on dev vda1, logical block 0, lost async page write

[ 5.422560] Buffer I/O error on dev vda1, logical block 1, lost async page write

[ 5.436351] blk_update_request: I/O error, dev vda, sector 3080

[ 5.437718] Buffer I/O error on dev vda1, logical block 129, lost async page write

[ 5.439603] Buffer I/O error on dev vda1, logical block 130, lost async page write

[ 5.441540] Buffer I/O error on dev vda1, logical block 131, lost async page write

[ 5.443487] Buffer I/O error on dev vda1, logical block 132, lost async page write

[ 5.445412] Buffer I/O error on dev vda1, logical block 133, lost async page write

[ 5.447183] Buffer I/O error on dev vda1, logical block 134, lost async page write

[ 5.454432] blk_update_request: I/O error, dev vda, sector 3136

[ 5.456074] Buffer I/O error on dev vda1, logical block 136, lost async page write

[ 5.464320] blk_update_request: I/O error, dev vda, sector 3176

[ 5.465891] Buffer I/O error on dev vda1, logical block 141, lost async page write

[ 5.481109] blk_update_request: I/O error, dev vda, sector 3208

[ 5.500706] blk_update_request: I/O error, dev vda, sector 3232

[ 5.515074] blk_update_request: I/O error, dev vda, sector 3424

[ 5.532104] blk_update_request: I/O error, dev vda, sector 3504

[ 5.547614] blk_update_request: I/O error, dev vda, sector 3632

[ 5.557725] blk_update_request: I/O error, dev vda, sector 4072

[ 6.726649] JBD2: recovery failed

[ 6.727554] EXT4-fs (vda1): error loading journal

[ 6.732916] VFS: Dirty inode writeback failed for block device vda1 (err=-5).

mount: mounting /dev/vda1 on /root failed: Input/output error

done.

Begin: Running /scripts/local-bottom ... done.

Begin: Running /scripts/init-bottom ... mount: mounting /dev on /root/dev failed: No such file or directory

done.

mount: mounting /run on /root/run failed: No such file or directory

run-init: current directory on the same filesystem as the root: error 0

Target filesystem doesn't have requested /sbin/init.

run-init: current directory on the same filesystem as the root: error 0

run-init: current directory on the same filesystem as the root: error 0

run-init: current directory on the same filesystem as the root: error 0

run-init: current directory on the same filesystem as the root: error 0

run-init: current directory on the same filesystem as the root: error 0

No init found. Try passing init= bootarg.

Tags: cpe-onsite
Chris Gregan (cgregan)
tags: added: cpe-onsite
Revision history for this message
Ryan Beisner (1chb1n) wrote :

Thank you for your report. I added UCA task for SLA tracking. We're working on reproducing it now, and we will update the status/level as soon as we have confirmation.

Changed in cloud-archive:
assignee: nobody → Sean Feole (sfeole)
importance: Undecided → High
Revision history for this message
Ryan Beisner (1chb1n) wrote :

Can you please add a juju crashdump, and/or a sanitized bundle.yaml for reproduction purposes?

Revision history for this message
Vern Hart (vern) wrote :
Revision history for this message
Sean Feole (sfeole) wrote :

hey vern, would you be able to tell me what versions of MAAS and JUJU you are using?

Revision history for this message
Sean Feole (sfeole) wrote :

I've requested a juju crashdump from the field and customer, I need that information in order to move forward.

Changed in cloud-archive:
status: New → Incomplete
Revision history for this message
Ryan Beisner (1chb1n) wrote :
Download full text (3.1 KiB)

Three fronts to dig into:

1. Please also describe in more detail the procedure which is used to reboot the compute node(s). Is this a cold power-off? Is it `sudo reboot`? Or something else?

2. Typically when a nova compute node is rebooted, the instances on that compute node are not automatically started upon boot of the underlying host. This is as advised by my engineering team, and our support teams. This ensures that an operator is well-aware of a compute node which has rebooted. The compute node will come back up with all of its instances in a SHUTDOWN state. Once the compute node, and all of the corresponding services and storage components are confirmed as up, the operator should then start the nova instances. This is by design, default behavior.

What is not clear, is if this site has overridden that logic, attempting to automatically start nova instances upon server boot, or not. Please confirm and clarify this point on this deployment.

3. The next observation is that this appears to be a classic linux admin type issue (a server was rebooted and did not cleanly unmount a filesystem, therefore is grumpy on the next boot), indicated by the classic symptom:

Warning: fsck not present, so skipping root file system

[ 3.310173] EXT4-fs (vda1): INFO: recovery required on readonly filesystem

[ 3.311654] EXT4-fs (vda1): write access will be enabled during recovery

[ 5.419286] blk_update_request: I/O error, dev vda, sector 2048

[ 5.420745] Buffer I/O error on dev vda1, logical block 0, lost async page write

[ 5.422560] Buffer I/O error on dev vda1, logical block 1, lost async page write

[ 5.436351] blk_update_request: I/O error, dev vda, sector 3080

[ 5.437718] Buffer I/O error on dev vda1, logical block 129, lost async page write

[ 5.439603] Buffer I/O error on dev vda1, logical block 130, lost async page write

[ 5.441540] Buffer I/O error on dev vda1, logical block 131, lost async page write

[ 5.443487] Buffer I/O error on dev vda1, logical block 132, lost async page write

[ 5.445412] Buffer I/O error on dev vda1, logical block 133, lost async page write

[ 5.447183] Buffer I/O error on dev vda1, logical block 134, lost async page write

[ 5.454432] blk_update_request: I/O error, dev vda, sector 3136

[ 5.456074] Buffer I/O error on dev vda1, logical block 136, lost async page write

[ 5.464320] blk_update_request: I/O error, dev vda, sector 3176

[ 5.465891] Buffer I/O error on dev vda1, logical block 141, lost async page write

[ 5.481109] blk_update_request: I/O error, dev vda, sector 3208

[ 5.500706] blk_update_request: I/O error, dev vda, sector 3232

[ 5.515074] blk_update_request: I/O error, dev vda, sector 3424

[ 5.532104] blk_update_request: I/O error, dev vda, sector 3504

[ 5.547614] blk_update_request: I/O error, dev vda, sector 3632

[ 5.557725] blk_update_request: I/O error, dev vda, sector 4072

[ 6.726649] JBD2: recovery failed

[ 6.727554] EXT4-fs (vda1): error loading journal

[ 6.732916] VFS: Dirty inode writeback failed for block device vda1 (err=-5).

mount: mounting /dev/vda1 on /root failed: Input/output error

done.

We will await further detail to this and the other items referenced. Thanks f...

Read more...

Revision history for this message
Sean Feole (sfeole) wrote :

marking back to incomplete awaiting response to ryans questions, i never refreshed the bug :)

Changed in cloud-archive:
status: Incomplete → In Progress
status: In Progress → Incomplete
Revision history for this message
Sean Feole (sfeole) wrote :
Download full text (4.0 KiB)

to give my update on status so far, I have deployed xenial - queens using juju 2.3.8-bionic-amd64

Model Controller Cloud/Region Version SLA
default icarus icarus 2.3.8 unsupported

App Version Status Scale Charm Store Rev OS Notes
ceph-mon 12.2.4 active 3 ceph-mon jujucharms 24 ubuntu
ceph-osd 12.2.4 active 3 ceph-osd jujucharms 261 ubuntu
ceph-radosgw 12.2.4 active 1 ceph-radosgw jujucharms 257 ubuntu
cinder 12.0.1 active 1 cinder jujucharms 271 ubuntu
cinder-ceph 12.0.1 active 1 cinder-ceph jujucharms 232 ubuntu
glance 16.0.1 active 1 glance jujucharms 264 ubuntu
keystone 13.0.0 active 1 keystone jujucharms 278 ubuntu
mysql 5.6.37-26.21 active 1 percona-cluster jujucharms 263 ubuntu
neutron-api 12.0.1 active 1 neutron-api jujucharms 259 ubuntu
neutron-gateway 12.0.1 active 1 neutron-gateway jujucharms 248 ubuntu
neutron-openvswitch 12.0.1 active 3 neutron-openvswitch jujucharms 249 ubuntu
nova-cloud-controller 17.0.3 active 1 nova-cloud-controller jujucharms 309 ubuntu
nova-compute 17.0.3 active 3 nova-compute jujucharms 282 ubuntu
rabbitmq-server 3.5.7 active 1 rabbitmq-server jujucharms 73 ubuntu

I created 2 instances - 1 w/ an attached volume and 1 without

$ openstack server list
+--------------------------------------+--------------+--------+-----------------------------------+--------+----------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+--------------+--------+-----------------------------------+--------+----------+
| ed754278-ab06-4b22-b88c-505bd5ff0316 | xenial-test2 | ACTIVE | internal=10.5.5.13, 10.246.114.83 | xenial | m1.small |
| ccee393e-768f-463b-ae94-8193c4c54bba | xenial-test | ACTIVE | internal=10.5.5.5, 10.246.114.88 | xenial | m1.small |
+--------------------------------------+--------------+--------+-----------------------------------+--------+----------+

$ openstack volume list
+--------------------------------------+----------+-----------+------+---------------------------------------+
| ID | Name | Status | Size | Attached to |
+--------------------------------------+----------+-----------+------+---------------------------------------+
| b5fa9e73-56d3-4c76-a7dc-7d10115860d1 | testvol2 | in-use | 10 | Attached to xenial-test2 on /dev/vdb |
| 9832d124-11f7-44a4-9293-fe87a535f637 | testvol1 | available | 10 | |
+--------------------------------------+----------+-----------+------+------...

Read more...

Revision history for this message
Vern Hart (vern) wrote :

I've pinged the customer for the specific details of the failure.

From my understanding, they pulled the power on the compute node -- but I will get confirmation on that point.

There isn't anything out of the ordinary in the compute configuration so it's likely the instances came up as SHUTDOWN but would not start. Again, I will confirm this.

Regarding the replication test in comment #8, I believe those two instances were booted from images. A critical component of the failure scenario is booting the instance from a volume.

It should be noted that the behavior has been replicated by Craig Bender in his local environment on both Queens and Pike.

Revision history for this message
Sean Feole (sfeole) wrote :

Hi Vern, I took a look at this again booting from a ceph backed volume and was able to reproduce this. Will review and report back.

Revision history for this message
Vern Hart (vern) wrote :

From the customer:

1. This is a cold power off. The initial test was done by pulling the plugs on the box. Subsequent tests are from iDRAC where I do a cold power down.

2. The first test the vms did not boot automatically. After I realized they were not going to start I asked Craig to help me with the autostart settings which we turned on. In both cases the vms came up the same, with a bad file system as described in number

3. I am happy to disable autostart on the vms and test again if that may fix the issue

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nova (Ubuntu):
status: New → Confirmed
Revision history for this message
Aaron (game-on-deactivatedaccount) wrote :

I would like to add I have also seen this when using Ceph as a backend on a Pike deployment.

I have fixed several VMs, by performing the following process (note this can potentially wreck the VM, so be careful):

Shutdown VM
Create RBD snapshot (for backup purposes)
Export the RBD disk as an image
Then, setup the image as a loop device
Run FSCK on the loop device

Weirdly, a lot of times this wasn't enough, so I then had to:
Mount the loop device on a (Ceph) host
Umount the loop device

Re-import the image back into Ceph, overwriting the existing image (or move existing, whatever)

This then allowed me to continue booting VM as normal.

I have tried using a recovery image on the VM and running fsck against the RBD device, but to no avail. Hopefully this may aid in investigation or help someone out.

Thanks

Revision history for this message
James Page (james-page) wrote :

Is 'disk-cachemodes' set in this environment on the nova-compute charm?

Revision history for this message
James Page (james-page) wrote :

Default rbd cache settings from ceph for a compute node:

# ceph-conf -D | grep rbd_cache
rbd_cache = true
rbd_cache_block_writes_upfront = false
rbd_cache_max_dirty = 25165824
rbd_cache_max_dirty_age = 1.000000
rbd_cache_max_dirty_object = 0
rbd_cache_size = 33554432
rbd_cache_target_dirty = 16777216
rbd_cache_writethrough_until_flush = true

Changed in cloud-archive:
status: Incomplete → New
Changed in nova (Ubuntu):
status: Confirmed → New
importance: Undecided → High
assignee: nobody → Sean Feole (sfeole)
Revision history for this message
James Page (james-page) wrote :

or 'rbd-client-cache' in the nova-compute charm.

Revision history for this message
James Page (james-page) wrote :

nm I can see in the bundle attached that neither of those two things has been changed from the default

Revision history for this message
Aaron (game-on-deactivatedaccount) wrote :

My environment has the following for Nova:

[libvirt]
images_type = rbd
...
disk_cachemodes="network=writeback"

# ceph-conf -D | grep rbd_cache
rbd_cache = true
rbd_cache_block_writes_upfront = false
rbd_cache_max_dirty = 25165824
rbd_cache_max_dirty_age = 1.000000
rbd_cache_max_dirty_object = 0
rbd_cache_size = 33554432
rbd_cache_target_dirty = 16777216
rbd_cache_writethrough_until_flush = true

Revision history for this message
James Page (james-page) wrote :

Hmm network=writeback is not guaranteed to be safe:

* writethrough: writethrough mode is the default caching mode. With
  caching set to writethrough mode, the host page cache is enabled, but the
  disk write cache is disabled for the guest. Consequently, this caching mode
  ensures data integrity even if the applications and storage stack in the
  guest do not transfer data to permanent storage properly (either through
  fsync operations or file system barriers). Because the host page cache is
  enabled in this mode, the read performance for applications running in the
  guest is generally better. However, the write performance might be reduced
  because the disk write cache is disabled.

* writeback: With caching set to writeback mode, both the host page cache
  and the disk write cache are enabled for the guest. Because of this, the
  I/O performance for applications running in the guest is good, but the data
  is not protected in a power failure. As a result, this caching mode is
  recommended only for temporary data where potential data loss is not a
  concern.

Revision history for this message
James Page (james-page) wrote :

A defaults charm deployment just does that - it uses the defaults.

So librbd should have safe writethrough caching enabled (balance of perf vs consistency)

And libvirt sets:

    <disk type='network' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <auth username='nova-compute'>
        <secret type='ceph' uuid='514c9fca-8cbe-11e2-9c52-3bc8c7819472'/>
      </auth>
      <source protocol='rbd' name='nova/6c26f1de-8fa8-40f1-aa4d-d3e6206b26d4_disk'>
        <host name='10.5.0.4' port='6789'/>
        <host name='10.5.0.20' port='6789'/>
        <host name='10.5.0.28' port='6789'/>
      </source>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </disk>

Revision history for this message
James Page (james-page) wrote :

OK instance disks survive a clean reboot (i.e. not a power off)

Revision history for this message
James Page (james-page) wrote :

I was able to reproduce this by kill -9 'ing a running instance, avoiding any potential for a clean shutdown.

Revision history for this message
James Page (james-page) wrote :

Pike and Queens have Ceph Luminous - I wonder whether something is not quite sane with the default underlying librbd cache settings.

Revision history for this message
James Page (james-page) wrote :

Tested with 'writethrough' mode set for network devices; still see the same issue post recovery.

Revision history for this message
James Page (james-page) wrote :

Double checked with a qcow2 backed instances - kill -9 and restart is OK so this is a ceph specific issue.

Revision history for this message
James Page (james-page) wrote :

(ftr autostart of instances is not the issue where - its the consistency of the backing rbd volume in the instance where the power is yanked).

Revision history for this message
James Page (james-page) wrote :

Some more insight; with a Xenial/Ocata deployed cloud with the following software versions:

ceph/librbd 10.2.9
libvirt: 2.5.0-3ubuntu5.6
qemu: 1:2.8+dfsg-3ubuntu2.9

I can hard kill a qemu instance, and have it boot back from a ceph volume OK.

I suspect this is something in the interaction between qemu and librbd in later software versions.

Revision history for this message
Adrian Campos Garrido (hadrianweb) wrote :

I think the problem is on ceph.

Seeing Ceph documentation i see this advice

-Important

The raw data format is really the only sensible format option to use with RBD. Technically, you could use other QEMU-supported formats (such as qcow2 or vmdk), but doing so would add additional overhead, and would also render the volume unsafe for virtual machine live migration when caching (see below) is enabled.

http://docs.ceph.com/docs/master/rbd/qemu-rbd/

So maybe the problem it's using qcow2, in my case i was using it, i would try to probe with RAW

Anybody try with RAW images?

Revision history for this message
Aaron (game-on-deactivatedaccount) wrote :

I am using RAW images for all my VMs and have seen this behavior.

Interestingly, I ran a flatten command against and affected volume and it was repaired quite quicky. YMMV.

Revision history for this message
James Page (james-page) wrote :

Raising bug tasks for ceph and qemu as I think this is where the issue lies; nova generates the same libvirt xml disk stanzas for versions that work and versions that have this issue.

Changed in ceph (Ubuntu):
importance: Undecided → High
Changed in qemu (Ubuntu):
importance: Undecided → High
summary: - VMs do not survive host reboot
+ VM rbd backed block devices inconsistent after unexpected host outage
Revision history for this message
James Page (james-page) wrote :

OK figured this one out - the cephx keys are missing a permission which allows them to see blacklisted clients - as a result they can't deal with a hard crash:

  mon 'allow command "osd blacklist"'

This is a charm issue after all.

As a workaround you can manually update the existing client keys for nova-compute using:

  sudo ceph auth caps client.nova-compute mon 'allow r, allow command "osd blacklist"' osd 'allow rwx'

from any mon unit.

Changed in nova:
status: New → Invalid
Changed in ceph (Ubuntu):
status: New → Invalid
Changed in nova (Ubuntu):
status: New → Invalid
Changed in cloud-archive:
status: New → Invalid
Changed in qemu (Ubuntu):
status: New → Invalid
James Page (james-page)
Changed in charms.ceph:
status: New → Triaged
Changed in charm-ceph-mon:
status: New → Triaged
Changed in charms.ceph:
importance: Undecided → High
Changed in charm-ceph-mon:
importance: Undecided → High
milestone: none → 18.08
Changed in cloud-archive:
assignee: Sean Feole (sfeole) → nobody
Changed in nova (Ubuntu):
assignee: Sean Feole (sfeole) → nobody
James Page (james-page)
Changed in charm-ceph-mon:
assignee: nobody → James Page (james-page)
Changed in charms.ceph:
assignee: nobody → James Page (james-page)
Changed in charm-ceph-mon:
status: Triaged → In Progress
Changed in charms.ceph:
status: Triaged → In Progress
Changed in charm-ceph-mon:
status: In Progress → Fix Committed
status: Fix Committed → In Progress
Revision history for this message
Adrian Campos Garrido (hadrianweb) wrote :

Temporal solution is not use cephx on auth, and change it to none on charm, but i think this is not recommended.

I tested that with QCOW2 and RAW images without any problem.

Maybe can help to someone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charms.ceph (master)

Reviewed: https://review.openstack.org/573659
Committed: https://git.openstack.org/cgit/openstack/charms.ceph/commit/?id=4d8f31d0ea0f47fd71939973cba86137d5daaef4
Submitter: Zuul
Branch: master

commit 4d8f31d0ea0f47fd71939973cba86137d5daaef4
Author: James Page <email address hidden>
Date: Fri Jun 8 11:34:28 2018 +0100

    Add 'osd blacklist' to default mon perms

    Ensure that the default permissions for clients include the
    'osd blacklist' command; This ensures that in the event of
    a client crashing (due to power outage or segfault), the
    client and re-connect and write to any devices on reboot.

    Change-Id: I0b43dece4e1c56fb838b0147bfb75fb9906e6657
    Closes-Bug: 1773449

Changed in charms.ceph:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-mon (master)

Reviewed: https://review.openstack.org/573664
Committed: https://git.openstack.org/cgit/openstack/charm-ceph-mon/commit/?id=e071486acf8e90f293ce5ce8880047384527c9f0
Submitter: Zuul
Branch: master

commit e071486acf8e90f293ce5ce8880047384527c9f0
Author: James Page <email address hidden>
Date: Fri Jun 8 12:02:06 2018 +0100

    Add 'osd blacklist' to default mon perms

    Ensure that the default permissions for clients include the
    'osd blacklist' command; This ensures that in the event of
    a client crashing (due to power outage or segfault), the
    client and re-connect and write to any devices on reboot.

    This is a safe permission for all supported Ceph releases.

    Depends-On: I0b43dece4e1c56fb838b0147bfb75fb9906e6657
    Change-Id: Ib1f1e8d7ed54528603b8b08051dafeec075a3232
    Closes-Bug: 1773449

Changed in charm-ceph-mon:
status: In Progress → Fix Committed
James Page (james-page)
Changed in charm-ceph-mon:
status: Fix Committed → Fix Released
milestone: 18.08 → 18.05
Revision history for this message
Aaron (game-on-deactivatedaccount) wrote :

I can confirm this resolved my issue - I tested several times by 'kill -9' an instance before and after the config change to client.cinder Ceph permissions.

*** WARNING!!!
If you make the change manually, as I did, be careful because you need to explicitly define all permissions for the client.cinder user as it overwrites all current permissions. For reference, I issued:

ceph auth caps client.cinder mon 'allow r, allow command "osd blacklist"' osd 'allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=vms, allow rx pool=images' mds ''

This retained all existing permissions. You may need to change your pool names and Ceph user.

Thanks for the help James, very useful.

Revision history for this message
Giulio Fidente (gfidente) wrote :

Fixed in TripleO via change I9639d606bd538f6776c368a4f34aa6783ab91abb

Changed in tripleo:
status: New → Fix Committed
importance: Undecided → High
assignee: nobody → Giulio Fidente (gfidente)
milestone: none → rocky-3
Changed in tripleo:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 8.0.4

This issue was fixed in the openstack/tripleo-heat-templates 8.0.4 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 9.0.0.0b4

This issue was fixed in the openstack/tripleo-heat-templates 9.0.0.0b4 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 7.0.0.0rc3

This issue was fixed in the openstack/kolla-ansible 7.0.0.0rc3 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 8.0.0.0b1

This issue was fixed in the openstack/kolla-ansible 8.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.