udev/charm mismatch. osds already processed

Bug #1845226 reported by David O Neill
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ceph OSD Charm
Triaged
Low
Unassigned

Bug Description

Installation of new drive bumped harddrive name assignment resulting in quasi state of ceph-osd.

Charm ceph-osd 278
==================
App Version Status Scale Charm Store Rev OS Notes
ceph-osd-sf00164790 10.2.11 unknown 3 ceph-osd jujucharms 278 ubuntu

Charm osd-devices config
========================
juju config ceph-osd-sf00164790 osd-devices
/dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1

Openstack version
=================
juju config keystone openstack-origin
cloud:trusty-mitaka

Distribution
============
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.6 LTS"

Long description
================
The customer installed a new SSD into the machine, subsequently udev bumped the drive ordering
looks as if the new drive become 'nvme2n1' while the previous 'nvme2n1' is now nvme3n1

  NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
  sda 8:0 0 558.4G 0 disk
  └─sda1 8:1 0 558.4G 0 part /
  nvme0n1 259:5 0 1.5T 0 disk
  ├─nvme0n1p1 259:8 0 1.5T 0 part /var/lib/ceph/osd/ceph-53
  └─nvme0n1p2 259:9 0 1G 0 part
  nvme1n1 259:1 0 1.5T 0 disk
  ├─nvme1n1p1 259:2 0 1.5T 0 part /var/lib/ceph/osd/ceph-55
  └─nvme1n1p2 259:3 0 1G 0 part
  nvme2n1 259:4 0 2.9T 0 disk
  nvme3n1 259:0 0 1.5T 0 disk
  ├─nvme3n1p1 259:6 0 1.5T 0 part /var/lib/ceph/osd/ceph-59
  └─nvme3n1p2 259:7 0 1G 0 part

Attempting to initialize new OSD results in ceph complaining that the operation is already completed.

Action
======
juju run-action ceph-osd-sf00164790/2 add-disk osd-devices=/dev/nvme2n1 bucket=ssds --wait

Result in the log
=================
juju-log Device /dev/nvme2n1 already processed by charm, skipping

Skipping osd devices previously processed by this unit: ['/dev/nvme0n1', '/dev/nvme1n1', '/dev/nvme2n1']

What we need
============
What we need, is a fix for the charm for this scenario

description: updated
Revision history for this message
Peter Sabaini (peter-sabaini) wrote :

This probably is avoidable by using stable device names, e.g. from /dev/disk/by-uuid/

As a hacky workaround, edit the unit state to sync the agents notion of device names to the current disk device names:

juju ssh ceph-osd/X
sudo apt install sqlite
sudo systemctl stop jujud-unit-ceph-osd-X
sudo cp -a /var/lib/juju/agents/unit-ceph-osd-X/charm/.unit-state.db ~ # backup
sudo sqlite3 /var/lib/juju/agents/unit-ceph-osd-X/charm/.unit-state.db
sqlite> update kv set data='["/dev/nvme0n1", "/dev/nvme1n1", "/dev/nvme3n1"]' where key="osd-devices";
sudo systemctl start jujud-unit-ceph-osd-X

tags: added: canonical-bootstack
Changed in charm-ceph-osd:
status: New → Triaged
importance: Undecided → High
Changed in charm-ceph-osd:
importance: High → Low
Revision history for this message
Alan Baghumian (alanbach) wrote :

This just resolved my issue! Manually modified the sqlite db and then was able to use the add-disk action to re-add the disk.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.