simplex: host did not become active controller after initial unlock
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Elena Taivan |
Bug Description
Brief Description
-----------------
One simplex system did not become active after initial unlock.
controller-0:~$ source /etc/platform/
Openstack Admin credentials can only be loaded from the active controller.
Severity
--------
Major
Steps to Reproduce
------------------
- Install simplex
- Bootstrap
- Configure and unlock the node
Expected Behavior
------------------
- Host unlocked successfully and became active controller
Actual Behavior
----------------
- Host did not become active controller
controller-0:~$ source /etc/platform/
Openstack Admin credentials can only be loaded from the active controller.
controller-0:~$ sudo sm-dump
Password:
/var/run/sm/sm.db not available.
Reproducibility
---------------
Reproducible on this system (wcp122). Have not seen on other simplex system (wcp112, ml350-g10-1)
System Configuration
-------
One node system
Lab-name: wcp122
Branch/Pull Time/Commit
-------
2020-06-15_20-00-00
Last Pass
---------
2020-06-10_20-00-00
Timestamp/Logs
--------------
https:/
Some error from puppet log, but I'm NOT sure if it's related.
2020-06-
2020-06-
2020-06-
2020-06-
2020-06-
2020-06-
2020-06-
2020-06-
Test Activity
-------------
Sanity
description: | updated |
tags: | added: stx.4.0 stx.storage |
Changed in starlingx: | |
importance: | Undecided → High |
importance: | High → Medium |
status: | New → Triaged |
assignee: | nobody → Ovidiu Poncea (ovidiu.poncea) |
Changed in starlingx: | |
assignee: | Ovidiu Poncea (ovidiu.poncea) → Elena Taivan (etaivan) |
tags: | added: stx.retestneeded |
tags: | added: in-r-stx40 |
The error is this:
2020-06- 18T05:40: 05.471 ^[[0;32mInfo: 2020-06-18 05:40:05 +0000 /Stage[ main]/Ceph/ Ceph_config[ global/ osd_pool_ default_ size]: Scheduling refresh of Service[ ceph-mon- controller- 0]^[[0m 18T05:40: 05.473 ^[[0;36mDebug: 2020-06-18 05:40:05 +0000 /Stage[ main]/Ceph/ Ceph_config[ global/ osd_pool_ default_ size]: The container Class[Ceph] will propagate my refresh event^[[0m 18T05:40: 05.475 ^[[0;36mDebug: 2020-06-18 05:40:05 +0000 /Stage[ main]/Ceph/ Ceph_config[ global/ osd_crush_ update_ on_start] : Nothing to manage: no ensure and the resource doesn't exist^[[0m 18T05:40: 05.477 ^[[0;36mDebug: 2020-06-18 05:40:05 +0000 /Stage[ main]/Ceph/ Ceph_config[ global/ cluster_ require_ signatures] : Nothing to manage: no ensure and the resource doesn't exist^[[0m 18T05:40: 05.479 ^[[mNotice: 2020-06-18 05:40:05 +0000 /Stage[ main]/Ceph/ Ceph_config[ global/ ms_bind_ ipv6]/ensure: created^[[0m 18T05:40: 05.481 ^[[0;32mInfo: 2020-06-18 05:40:05 +0000 /Stage[ main]/Ceph/ Ceph_config[ global/ ms_bind_ ipv6]: Scheduling refresh of Service[ ceph-mon- controller- 0]^[[0m 18T05:40: 05.483 ^[[0;36mDebug: 2020-06-18 05:40:05 +0000 /Stage[ main]/Ceph/ Ceph_config[ global/ ms_bind_ ipv6]: The container Class[Ceph] will propagate my refresh event^[[0m 18T05:40: 05.485 ^[[0;36mDebug: 2020-06-18 05:40:05 +0000 Executing: '/usr/sbin/lvs cgts-vg'^[[0m 18T05:40: 05.487 ^[[0;36mDebug: 2020-06-18 05:40:05 +0000 Executing: '/usr/sbin/lvs --noheading --unit g /dev/cgts- vg/backup- lv'^[[0m 18T05:40: 05.489 ^[[0;36mDebug: 2020-06-18 05:40:05 +0000 Executing: '/usr/sbin/blkid /dev/cgts- vg/backup- lv'^[[0m 18T05:40: 05.491 ^[[0;36mDebug: 2020-06-18 05:40:05 +0000 Executing: 'mkfs.ext4 /dev/cgts- vg/backup- lv'^[[0m 18T05:40: 05.493 ^[[1;31mError: 2020-06-18 05:40:05 +0000 Execution of 'mkfs.ext4 /dev/cgts- vg/backup- lv' returned 1: mke2fs 1.42.9 (28-Dec-2013) 18T05:40: 05.495 /dev/cgts- vg/backup- lv is mounted; will not make a filesystem here!
2020-06-
2020-06-
2020-06-
2020-06-
2020-06-
2020-06-
2020-06-
2020-06-
2020-06-
2020-06-
2020-06-
2020-06-
The two main questions we need answwers to are: Why would Ceph trigger a refresh event on backup-lv? And, even if it does so, why is it not idempotent?
Looking over the manifests executed: 0:/home/ sysadmin# ls -latr /var/log/ puppet/ latest puppet/ latest -> /var/log/ puppet/ 2020-06- 18-05-38- 46_controller 0:/home/ sysadmin# ls -latr /var/log/puppet 18-05-15- 08_controller 18-05-18- 22_controller 18-05-20- 18_worker 18-05-33- 07_worker 18-05-33- 29_worker 18-05-33- 49_worker 18-05-34- 03_worker 18-05-34- 11_controller puppet/ 2020-06- 18-...
controller-
lrwxrwxrwx 1 root root 46 Jun 18 05:38 /var/log/
controller-
total 272
drwxr-xr-x 2 root root 4096 Jun 18 05:15 2020-06-
-rw-r--r-- 1 root root 231555 Jun 18 05:17 first_apply.tgz
drwxrwxrwx 2 root root 4096 Jun 18 05:18 2020-06-
drwxrwxrwx 2 root root 4096 Jun 18 05:20 2020-06-
drwxrwxrwx 2 root root 4096 Jun 18 05:33 2020-06-
drwxrwxrwx 2 root root 4096 Jun 18 05:33 2020-06-
drwxrwxrwx 2 root root 4096 Jun 18 05:33 2020-06-
drwxrwxrwx 2 root root 4096 Jun 18 05:34 2020-06-
drwxrwxrwx 2 root root 4096 Jun 18 05:34 2020-06-
lrwxrwxrwx 1 root root 46 Jun 18 05:38 latest -> /var/log/