udev/charm mismatch. osds already processed

Bug #1845226 reported by David O Neill on 2019-09-24
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack ceph-osd charm
Low
Unassigned

Bug Description

Installation of new drive bumped harddrive name assignment resulting in quasi state of ceph-osd.

Charm ceph-osd 278
==================
App Version Status Scale Charm Store Rev OS Notes
ceph-osd-sf00164790 10.2.11 unknown 3 ceph-osd jujucharms 278 ubuntu

Charm osd-devices config
========================
juju config ceph-osd-sf00164790 osd-devices
/dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1

Openstack version
=================
juju config keystone openstack-origin
cloud:trusty-mitaka

Distribution
============
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.6 LTS"

Long description
================
The customer installed a new SSD into the machine, subsequently udev bumped the drive ordering
looks as if the new drive become 'nvme2n1' while the previous 'nvme2n1' is now nvme3n1

  NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
  sda 8:0 0 558.4G 0 disk
  └─sda1 8:1 0 558.4G 0 part /
  nvme0n1 259:5 0 1.5T 0 disk
  ├─nvme0n1p1 259:8 0 1.5T 0 part /var/lib/ceph/osd/ceph-53
  └─nvme0n1p2 259:9 0 1G 0 part
  nvme1n1 259:1 0 1.5T 0 disk
  ├─nvme1n1p1 259:2 0 1.5T 0 part /var/lib/ceph/osd/ceph-55
  └─nvme1n1p2 259:3 0 1G 0 part
  nvme2n1 259:4 0 2.9T 0 disk
  nvme3n1 259:0 0 1.5T 0 disk
  ├─nvme3n1p1 259:6 0 1.5T 0 part /var/lib/ceph/osd/ceph-59
  └─nvme3n1p2 259:7 0 1G 0 part

Attempting to initialize new OSD results in ceph complaining that the operation is already completed.

Action
======
juju run-action ceph-osd-sf00164790/2 add-disk osd-devices=/dev/nvme2n1 bucket=ssds --wait

Result in the log
=================
juju-log Device /dev/nvme2n1 already processed by charm, skipping

Skipping osd devices previously processed by this unit: ['/dev/nvme0n1', '/dev/nvme1n1', '/dev/nvme2n1']

What we need
============
What we need, is a fix for the charm for this scenario

description: updated
Peter Sabaini (peter-sabaini) wrote :

This probably is avoidable by using stable device names, e.g. from /dev/disk/by-uuid/

As a hacky workaround, edit the unit state to sync the agents notion of device names to the current disk device names:

juju ssh ceph-osd/X
sudo apt install sqlite
sudo systemctl stop jujud-unit-ceph-osd-X
sudo cp -a /var/lib/juju/agents/unit-ceph-osd-X/charm/.unit-state.db ~ # backup
sudo sqlite3 /var/lib/juju/agents/unit-ceph-osd-X/charm/.unit-state.db
sqlite> update kv set data='["/dev/nvme0n1", "/dev/nvme1n1", "/dev/nvme3n1"]' where key="osd-devices";
sudo systemctl start jujud-unit-ceph-osd-X

tags: added: canonical-bootstack
Changed in charm-ceph-osd:
status: New → Triaged
importance: Undecided → High
Changed in charm-ceph-osd:
importance: High → Low
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers