stx.3.0: Drdb fails for AIO Simplex after unlock controller-0

Bug #1859951 reported by hutianhao27
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
hutianhao27

Bug Description

Brief Description
-----------------
After unlock controller-0, drdb fails and some lvm are not mounted successfully.

Severity
--------
Critical

Steps to Reproduce
------------------
Follow the guide to install StarlingX AIO Simplex(https://docs.starlingx.io/deploy_install_guides/r3_release/virtual/aio_simplex_install_kubernetes.html)

Expected Behavior
------------------
Drdb succeeds and all lvm are mounted successfully after unlock.

Actual Behavior
----------------
Drdb fails and not all lvm are mounted successfully after unlock.

Reproducibility
---------------
100% reproducible

System Configuration
--------------------
AIO Simplex in both virtual environment and bare metal

Branch/Pull Time/Commit
-----------------------
http://mirror.starlingx.cengn.ca/mirror/starlingx/release/3.0.0/centos/outputs/iso/bootimage.iso

Timestamp/Logs
--------------
controller-0:~$ drbd-overview
  1:??not-found?? WFConnection Secondary/Unknown UpToDate/DUnknown C r----s
  2:??not-found?? WFConnection Secondary/Unknown UpToDate/DUnknown C r----s
  5:??not-found?? WFConnection Secondary/Unknown UpToDate/DUnknown C r----s
  7:??not-found?? WFConnection Secondary/Unknown UpToDate/DUnknown C r----s
  8:??not-found?? WFConnection Secondary/Unknown UpToDate/DUnknown C r----s

controller-0:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 600G 0 disk
├─sda1 8:1 0 1M 0 part
├─sda2 8:2 0 500M 0 part /boot
├─sda3 8:3 0 19.5G 0 part /
├─sda4 8:4 0 229G 0 part
│ ├─cgts--vg-scratch--lv 253:0 0 8G 0 lvm /scratch
│ ├─cgts--vg-log--lv 253:1 0 7.8G 0 lvm /var/log
│ ├─cgts--vg-extension--lv 253:2 0 1G 0 lvm
│ │ └─drbd5 147:5 0 1024M 1 disk
│ ├─cgts--vg-pgsql--lv 253:3 0 20G 0 lvm
│ ├─cgts--vg-docker--lv 253:4 0 30G 0 lvm /var/lib/docker
│ ├─cgts--vg-kubelet--lv 253:5 0 10G 0 lvm /var/lib/kubelet
│ ├─cgts--vg-etcd--lv 253:6 0 5G 0 lvm
│ │ └─drbd7 147:7 0 5G 1 disk
│ ├─cgts--vg-backup--lv 253:7 0 25G 0 lvm /opt/backups
│ ├─cgts--vg-dockerdistribution--lv 253:8 0 16G 0 lvm
│ │ └─drbd8 147:8 0 16G 1 disk
│ ├─cgts--vg-rabbit--lv 253:9 0 2G 0 lvm
│ │ └─drbd1 147:1 0 2G 1 disk
│ ├─cgts--vg-platform--lv 253:10 0 10G 0 lvm
│ │ └─drbd2 147:2 0 10G 1 disk
│ └─cgts--vg-ceph--mon--lv 253:11 0 20G 0 lvm /var/lib/ceph/mon
└─sda5 8:5 0 34G 0 part
sdb 8:16 0 200G 0 disk
├─sdb1 8:17 0 199G 0 part /var/lib/ceph/osd/ceph-0
└─sdb2 8:18 0 1G 0 part
sdc 8:32 0 200G 0 disk
sr0 11:0 1 1.9G 0 rom

Revision history for this message
hutianhao27 (hutianhao) wrote :
Revision history for this message
hutianhao27 (hutianhao) wrote :
Ghada Khalil (gkhalil)
summary: - Drdb fails for AIO Simplex after unlock controller-0
+ stx.3.0: Drdb fails for AIO Simplex after unlock controller-0
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Can you confirm that you set the system_mode: simplex & selected: ‘All-in-one Controller Configuration’ from the install menu?

Changed in starlingx:
status: New → Incomplete
Revision history for this message
hutianhao27 (hutianhao) wrote :

Yes, I can confirm that I select the system mode is simplex and also select ‘All-in-one Controller Configuration’ from the install menu.

Revision history for this message
hutianhao27 (hutianhao) wrote :
Download full text (4.4 KiB)

And the following is part of the log(/var/log/puppet/latest/puppet.log):

2020-01-16T09:35:07.689 Debug: 2020-01-16 09:35:07 +0000 /Stage[main]/Platform::Ceph::Osds/Platform_ceph_osd[stor-1]/Ceph::Osd[/dev/disk/by-path/pci-0000:00:1f.2-ata-2.0]/Exec[ceph-osd-prepare-/dev/disk/by-path/pci-0000:00:1f.2-ata-2.0]/unless: /usr/lib/python2.7/site-packages/ceph_disk/main.py:5707: UserWarning:
2020-01-16T09:35:07.732 Debug: 2020-01-16 09:35:07 +0000 /Stage[main]/Platform::Ceph::Osds/Platform_ceph_osd[stor-1]/Ceph::Osd[/dev/disk/by-path/pci-0000:00:1f.2-ata-2.0]/Exec[ceph-osd-prepare-/dev/disk/by-path/pci-0000:00:1f.2-ata-2.0]/unless: /usr/lib/python2.7/site-packages/ceph_disk/main.py:5739: UserWarning:
2020-01-16T09:36:15.967 Notice: 2020-01-16 09:36:15 +0000 /Stage[main]/Platform::Ceph::Osds/Platform_ceph_osd[stor-1]/Ceph::Osd[/dev/disk/by-path/pci-0000:00:1f.2-ata-2.0]/Exec[ceph-osd-prepare-/dev/disk/by-path/pci-0000:00:1f.2-ata-2.0]/returns: /usr/lib/python2.7/site-packages/ceph_disk/main.py:5707: UserWarning:
2020-01-16T09:36:16.507 Notice: 2020-01-16 09:36:16 +0000 /Stage[main]/Platform::Ceph::Osds/Platform_ceph_osd[stor-1]/Ceph::Osd[/dev/disk/by-path/pci-0000:00:1f.2-ata-2.0]/Exec[ceph-osd-prepare-/dev/disk/by-path/pci-0000:00:1f.2-ata-2.0]/returns: /usr/lib/python2.7/site-packages/ceph_disk/main.py:5739: UserWarning:
2020-01-16T09:36:20.485 Notice: 2020-01-16 09:36:20 +0000 /Stage[main]/Platform::Ceph::Osds/Platform_ceph_osd[stor-1]/Ceph::Osd[/dev/disk/by-path/pci-0000:00:1f.2-ata-2.0]/Exec[ceph-osd-activate-/dev/disk/by-path/pci-0000:00:1f.2-ata-2.0]/returns: /usr/lib/python2.7/site-packages/ceph_disk/main.py:5707: UserWarning:
2020-01-16T09:36:20.568 Notice: 2020-01-16 09:36:20 +0000 /Stage[main]/Platform::Ceph::Osds/Platform_ceph_osd[stor-1]/Ceph::Osd[/dev/disk/by-path/pci-0000:00:1f.2-ata-2.0]/Exec[ceph-osd-activate-/dev/disk/by-path/pci-0000:00:1f.2-ata-2.0]/returns: /usr/lib/python2.7/site-packages/ceph_disk/main.py:5739: UserWarning:
2020-01-16T09:36:34.615 Error: 2020-01-16 09:36:34 +0000 yes yes | drbdadm create-md drbd-pgsql -W--peer-max-bio-size=128k returned 40 instead of one of [0]
2020-01-16T09:36:34.843 Error: 2020-01-16 09:36:34 +0000 /Stage[main]/Platform::Drbd::Pgsql/Platform::Drbd::Filesystem[drbd-pgsql]/Drbd::Resource[drbd-pgsql]/Drbd::Resource::Enable[drbd-pgsql]/Drbd::Resource::Up[drbd-pgsql]/Exec[initialize DRBD metadata for drbd-pgsql]/returns: change from notrun to 0 failed: yes yes | drbdadm create-md drbd-pgsql -W--peer-max-bio-size=128k returned 40 instead of one of [0]
2020-01-16T09:36:34.852 Warning: 2020-01-16 09:36:34 +0000 /Stage[main]/Platform::Drbd::Pgsql/Platform::Drbd::Filesystem[drbd-pgsql]/Drbd::Resource[drbd-pgsql]/Drbd::Resource::Enable[drbd-pgsql]/Drbd::Resource::Up[drbd-pgsql]/Exec[enable DRBD resource drbd-pgsql]: Skipping because of failed dependencies
2020-01-16T09:36:35.052 Warning: 2020-01-16 09:36:34 +0000 /Stage[main]/Drbd::Service/Service[drbd]: Skipping because of failed dependencies
2020-01-16T09:36:35.094 Warning: 2020-01-16 09:36:34 +0000 /Stage[main]/Platform::Anchors/Anchor[platform::services]: Skipping because of failed dependencies
2020-01-16T09:36:35.122 Warning: 2020-01-16 09:36:34 +...

Read more...

hutianhao27 (hutianhao)
Changed in starlingx:
assignee: nobody → hutianhao27 (hutianhao)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/708603
Committed: https://git.openstack.org/cgit/starlingx/ansible-playbooks/commit/?id=3a21623f6b7285ac9e822a8a9b23d5d0fd0f4e38
Submitter: Zuul
Branch: master

commit 3a21623f6b7285ac9e822a8a9b23d5d0fd0f4e38
Author: hutianhao <hu.tianhao@99cloud.net>
Date: Wed Feb 19 17:14:44 2020 +0800

    Select another way to get device information

    If terminal language is Chinese(maybe other language will also have
    this problem), it can't get exact root disk size because the word
    "Disk" is not in English. So drdb config will fail after unlock.
    Select another way to get device information to avoid this problem.

    Closes-Bug: 1859951
    Change-Id: Ie144f7f302b73854c8407681a5dd0981797d4c5f
    Signed-off-by: hutianhao <hu.tianhao@99cloud.net>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/716133

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (f/centos8)
Download full text (12.5 KiB)

Reviewed: https://review.opendev.org/716133
Committed: https://git.openstack.org/cgit/starlingx/ansible-playbooks/commit/?id=ddcb11f4b773f4b3190663defe3ba0f3ec4201c8
Submitter: Zuul
Branch: f/centos8

commit bf103f3c54eb45c26d52a43c35339d1d863a42de
Author: Mihnea Saracin <email address hidden>
Date: Fri Mar 27 18:19:02 2020 +0200

    Fix B&R when the controller needs to be unlocked

    After running the restore playbook, all the applications
    should be in an uploaded state. But they are in an
    applied state instead, making the controller-0
    unable to unlock.

    Closes-Bug: 1869403
    Change-Id: I8bd9c51e250969cc334d52b78c616f9ad082afd8
    Signed-off-by: Mihnea Saracin <email address hidden>

commit 6e875971afeaf1378c2c8aeb845359459838ce30
Author: Stefan Dinescu <email address hidden>
Date: Sat Mar 21 16:57:57 2020 +0200

    Fix Netapp port conflict

    By default, the Trident Netapp service opens port 8443 for
    HTTPS REST api usage. This conflicts with the port the
    Horizon dashboard uses on an HTTPS enabled setup (the port
    is also 8443).

    In order to fix this, we change the default port from 8443
    to 8678, but also make it configurable through ansible
    overrides.

    The Trident service also opens port 8001 for metrics usage.
    While that doesn't currently conflict with any other service
    on the system, I also made that configurable through
    ansible overrides, in case such a conflict appears in the
    future.

    Change-Id: I08db939acac6082f82b9e12e932d8289c7cecdeb
    Closes-bug: 1868382
    Signed-off-by: Stefan Dinescu <email address hidden>

commit 5a9ba6786e393f2cd93bfae8c3a8f09f0cf9eb26
Author: Robert Church <email address hidden>
Date: Thu Mar 19 19:08:17 2020 -0400

    Upversion Multus to 3.4

    Updates the Multus configuration to align with version 3.4

    Change-Id: Ifc236ccbbe4e559987d7ef522902f638062348ca
    Depends-On: https://review.opendev.org/#/c/714024/
    Story: 2006999
    Task: 39110
    Signed-off-by: Robert Church <email address hidden>

commit 6a261463f9ac0f81d9c7f054dd3cb10a51934d4a
Author: Robert Church <email address hidden>
Date: Wed Mar 18 22:01:03 2020 -0400

    Upversion Calico from 3.6 to 3.12

    Updates the Calico configuration to align with version 3.12. This
    introduces support for a Flex Volume Driver which requires enabling the
    --volume-plugin-dir option for kubelet, the --flex-volume-plugin-dir
    option for kube-controller-manager, and pulling the pod2daemon-flexvol
    image used by calico-node pods.

    Change-Id: I74bc5c53ffcb16c8e3c06cebf20eac296b9ccc65
    Story: 2006999
    Task: 39109
    Depends-On: https://review.opendev.org/#/c/714023
    Signed-off-by: Robert Church <email address hidden>

commit b35387f8bc40714e9633e6191267284b8af8ccee
Author: Stefan Dinescu <email address hidden>
Date: Thu Mar 19 18:13:26 2020 +0200

    Netapp: Fix handling of IPv6 addresses

    Using bash process subtitution to pass the file parameter
    to the "create backend" command doesn't work as the bash
    variable expansion...

tags: added: in-f-centos8
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.4.0 stx.config
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.