host data is not inventoried after nodes go online

Bug #1795400 reported by Yang Liu
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Critical
Don Penney

Bug Description

Brief Description
-----------------
Host disks are not listed in 'system host-disk-list' after host goes online. This happens for all hosts except controller-0.

Severity
--------
Critical

Steps to Reproduce
------------------
- Install and configure controller-0
- Install other hosts
- After other hosts (e.g., controller-1, computes, storage nodes) go online, check host disks via 'system host-device-list <host>'

Expected Behavior
------------------
- devices are listed

Actual Behavior
----------------
No device listed even though they can be found under /dev/disk/ on the host.

[wrsroot@controller-0 ~(keystone_admin)]$ system host-device-list controller-1

[wrsroot@controller-0 ~(keystone_admin)]$ ssh controller-1
controller-1:~$ ls -la /dev/disk/by-path/
total 0
drwxr-xr-x 2 root root 160 Oct 1 05:28 .
drwxr-xr-x 7 root root 140 Oct 1 05:28 ..
lrwxrwxrwx 1 root root 9 Oct 1 05:28 pci-0000:00:1f.2-ata-1.0 -> ../../sda
lrwxrwxrwx 1 root root 10 Oct 1 05:28 pci-0000:00:1f.2-ata-1.0-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Oct 1 05:28 pci-0000:00:1f.2-ata-1.0-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Oct 1 05:28 pci-0000:00:1f.2-ata-1.0-part3 -> ../../sda3
lrwxrwxrwx 1 root root 10 Oct 1 05:28 pci-0000:00:1f.2-ata-1.0-part4 -> ../../sda4
lrwxrwxrwx 1 root root 9 Oct 1 05:28 pci-0000:00:1f.2-ata-2.0 -> ../../sdb

Reproducibility
---------------
Reproducible

System Configuration
--------------------
Two node system, Multi-node system

Branch/Pull Time/Commit
-----------------------
master as of 2018-09-28_12-39-00

Timestamp/Logs
--------------
Run lab setup after computes and controller-1 are online:
(2018-10-01 05:29:30) [INFO] [MainThread] Running cmd: /home/wrsroot//lab_setup.sh

Sourcing /home/wrsroot/lab_setup.conf from the config chain
Checking for required files
Skipping license configuration; already done
Setting system name
Skipping vswitch type configuration; already done
HTTPS security confguration change is not required.
Skipping Neutron service parameter configuration; already done
Skipping DNS configuration; already done
Skipping NTP configuration; already done
Setting partitions for host controller-0
Setting partitions for host controller-1
ERROR: could not find the disk (/dev/disk/by-path/pci-0000:00:1f.2-ata-2.0) for controller-1
ERROR: Couldn't create all the partitions for host controller-1
Failed to create partitions, ret=4

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Required for stx.2018.10 as this is a serious issue which is blocking system configuration

Changed in starlingx:
importance: Undecided → Critical
status: New → Triaged
assignee: nobody → Don Penney (dpenney)
tags: added: stx.2018.10 stx.config
Ghada Khalil (gkhalil)
summary: - STX: host disks are not inventoried after go online
+ STX: host disks are not inventoried after nodes go online
Revision history for this message
Don Penney (dpenney) wrote : Re: STX: host disks are not inventoried after nodes go online

sysinv is writing the platform software_version to hieradata as unicode:
platform::params::software_version: !!python/unicode '18.10'

Puppet is automatically converting this to a float, treating it as '18.1'. As a result, some components are using 18.1 while others are using 18.10, and the system fails to come up properly.

Revision history for this message
Ghada Khalil (gkhalil) wrote : Re: STX: host data is not inventoried after nodes go online

This appears to be an issue with all host data, not only disks.
See https://bugs.launchpad.net/starlingx/+bug/1795466 which reports a similar issue with the network interfaces.

summary: - STX: host disks are not inventoried after nodes go online
+ STX: host data is not inventoried after nodes go online
Ghada Khalil (gkhalil)
summary: - STX: host data is not inventoried after nodes go online
+ host data is not inventoried after nodes go online
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-config (master)

Fix proposed to branch: master
Review: https://review.openstack.org/607025

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
Erich Cordoba (ericho) wrote :

As this issue is present (and reported) from the r/2018.10 branch a cherry pick will be needed.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-config (master)

Reviewed: https://review.openstack.org/607025
Committed: https://git.openstack.org/cgit/openstack/stx-config/commit/?id=d2bad43356da8369f5bfc55389b9403f80f06c39
Submitter: Zuul
Branch: master

commit d2bad43356da8369f5bfc55389b9403f80f06c39
Author: Don Penney <email address hidden>
Date: Mon Oct 1 14:55:07 2018 -0400

    Write software_version in hieradata as a string

    When puppet reads a unicode value from hieradata that it believes
    is a floating number, it automatically converts it to a float. In
    the case of the software_version, that means a version like '18.10'
    is converted to 18.1 by puppet. This results in some components
    and directories using the wrong value for software_version, and
    certain services fail.

    In particular, this resulted in sysinv being unable to populate
    the host inventory data when new nodes were installed in a running
    system. Queries like host-data-list or host-if-list would return
    empty data.

    This update to sysinv writes the software_version to hieradata as
    a string rather than unicode, ensuring puppet treats it properly.

    Closes-Bug: 1795400
    Change-Id: Ic3ab3aea2f7fc6662f0b523070afb4b3ef7ee282
    Signed-off-by: Don Penney <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-config (r/2018.10)

Fix proposed to branch: r/2018.10
Review: https://review.openstack.org/607054

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-config (r/2018.10)

Reviewed: https://review.openstack.org/607054
Committed: https://git.openstack.org/cgit/openstack/stx-config/commit/?id=41b0a21603ea33dd6e5215717219c227580cc067
Submitter: Zuul
Branch: r/2018.10

commit 41b0a21603ea33dd6e5215717219c227580cc067
Author: Don Penney <email address hidden>
Date: Mon Oct 1 14:55:07 2018 -0400

    Write software_version in hieradata as a string

    When puppet reads a unicode value from hieradata that it believes
    is a floating number, it automatically converts it to a float. In
    the case of the software_version, that means a version like '18.10'
    is converted to 18.1 by puppet. This results in some components
    and directories using the wrong value for software_version, and
    certain services fail.

    In particular, this resulted in sysinv being unable to populate
    the host inventory data when new nodes were installed in a running
    system. Queries like host-data-list or host-if-list would return
    empty data.

    This update to sysinv writes the software_version to hieradata as
    a string rather than unicode, ensuring puppet treats it properly.

    Closes-Bug: 1795400
    Change-Id: Ic3ab3aea2f7fc6662f0b523070afb4b3ef7ee282
    Signed-off-by: Don Penney <email address hidden>

Ken Young (kenyis)
tags: added: stx.1.0
removed: stx.2018.10
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.