ceph with multiple OSD pools fails to upgrade osds
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ceph OSD Charm |
Fix Released
|
High
|
James Page | ||
charms.ceph |
Fix Released
|
High
|
Chris MacNaughton |
Bug Description
ceph-osd charm (17.11 but appears may be still present in 18.05) after doing an agent upgrade (juju 1 to juju 2 upgrade) forces OSD cluster upgrades.
Error is:
Traceback (most recent call last):
File "hooks/
hooks.
File "/var/lib/
self.
File "/var/lib/
return f(*args, **kwargs)
File "hooks/
check_
File "hooks/
upgrade_
File "lib/ceph/
osd_
TypeError: unsupported operand type(s) for -: 'NoneType' and 'int'
This is line: https:/
I'm finding that when running get_osd_tree, the parser stops when you get to a second root entry and then members of the second pool of OSDs get this error because their hostnames don't match the first pool of OSD hostnames.
Here's my OSD tree:
ubuntu@
sudo: unable to resolve host juju-machine-
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-9 9.00000 root ssds
-10 3.00000 host OS-CS-10
52 1.00000 osd.52 up 1.00000 1.00000
56 1.00000 osd.56 up 1.00000 1.00000
58 1.00000 osd.58 up 1.00000 1.00000
-12 3.00000 host OS-CS-09
53 1.00000 osd.53 up 1.00000 1.00000
55 1.00000 osd.55 up 1.00000 1.00000
59 1.00000 osd.59 up 1.00000 1.00000
-11 3.00000 host OS-CS-08
51 1.00000 osd.51 up 1.00000 1.00000
54 1.00000 osd.54 up 1.00000 1.00000
57 1.00000 osd.57 up 1.00000 1.00000
-1 39.20966 root default
-2 4.35394 host OS-CS-05
0 0.54399 osd.0 up 1.00000 1.00000
2 0.54399 osd.2 up 1.00000 1.00000
4 0.54399 osd.4 up 1.00000 1.00000
7 0.54399 osd.7 up 1.00000 1.00000
11 0.54399 osd.11 up 1.00000 1.00000
16 0.54399 osd.16 up 1.00000 1.00000
33 1.09000 osd.33 up 1.00000 1.00000
-3 4.35394 host OS-CS-02
1 0.54399 osd.1 up 1.00000 1.00000
3 0.54399 osd.3 up 1.00000 1.00000
6 0.54399 osd.6 up 1.00000 1.00000
10 0.54399 osd.10 up 1.00000 1.00000
15 0.54399 osd.15 up 1.00000 1.00000
20 0.54399 osd.20 up 1.00000 1.00000
35 1.09000 osd.35 up 1.00000 1.00000
-4 4.35394 host OS-CS-03
5 0.54399 osd.5 up 1.00000 1.00000
9 0.54399 osd.9 up 1.00000 1.00000
13 0.54399 osd.13 up 1.00000 1.00000
18 0.54399 osd.18 up 1.00000 1.00000
22 0.54399 osd.22 up 1.00000 1.00000
25 0.54399 osd.25 up 1.00000 1.00000
34 1.09000 osd.34 up 1.00000 1.00000
-5 4.35394 host OS-CS-01
8 0.54399 osd.8 up 1.00000 1.00000
12 0.54399 osd.12 up 1.00000 1.00000
17 0.54399 osd.17 up 1.00000 1.00000
21 0.54399 osd.21 up 1.00000 1.00000
24 0.54399 osd.24 up 1.00000 1.00000
27 0.54399 osd.27 up 1.00000 1.00000
31 1.09000 osd.31 up 1.00000 1.00000
This error is occuring on OS-CS-02.
When I put in debug statements inside get_upgrade_
Changed in charm-ceph-osd: | |
milestone: | none → 18.08 |
Changed in charm-ceph-osd: | |
status: | Fix Committed → Fix Released |
Changed in charm-ceph-osd: | |
status: | New → In Progress |
assignee: | Chris MacNaughton (chris.macnaughton) → James Page (james-page) |
Marking Field-Critical. Currently live issue in production. Working on a patch, but want to get more eyes on this for bulletproof solution.