AIO Duplex install failure: Failed to determine the boot device

Bug #1843893 reported by Pratik M.
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Don Penney

Bug Description

Brief Description
-----------------
ansible-playbook command fails with the output below. A pre-condition to note is that during installation I manually changed the anaconda install cmdline to point the boot and rootfs dev to sdb (since sda was my USB drive). Pl. see
http://lists.starlingx.io/pipermail/starlingx-discuss/2019-September/005992.html

Severity
--------
<Minor: System/Feature is usable with minor issue>

Here is the output:

TASK [persist-config : Add the management floating address] **********************************************************************************
changed: [localhost]

TASK [persist-config : Saving config in sysinv database] *************************************************************************************************************************************
changed: [localhost]

TASK [persist-config : debug] ****************************************************************************************************************************************************************
ok: [localhost] => {
    "populate_result": {
        "changed": true,
        "failed": false,
        "failed_when_result": false,
        "msg": "non-zero return code",
        "rc": 1,
        "stderr": "No handlers could be found for logger \"cgtsclient.common.http\"\nTraceback (most recent call last):\n File \"/tmp/.ansible-sysadmin/tmp/ansible-tmp-1568374168.03-130569190515156/populate_initial_config.py\", line 958, in <module>\n controller = populate_controller_config(client)\n File \"/tmp/.ansible-sysadmin/tmp/ansible-tmp-1568374168.03-130569190515156/populate_initial_config.py\", line 817, in populate_controller_config\n boot_device = get_device_from_function(find_boot_device)\n File \"/tmp/.ansible-sysadmin/tmp/ansible-tmp-1568374168.03-130569190515156/populate_initial_config.py\", line 757, in get_device_from_function\n device_node = get_disk_function()\n File \"/tmp/.ansible-sysadmin/tmp/ansible-tmp-1568374168.03-130569190515156/populate_initial_config.py\", line 737, in find_boot_device\n raise ConfigFail(\"Failed to determine the boot device\")\ncontrollerconfig.common.exceptions.ConfigFail: Failed to determine the boot device\n",
        "stderr_lines": [
            "No handlers could be found for logger \"cgtsclient.common.http\"",
            "Traceback (most recent call last):",
            " File \"/tmp/.ansible-sysadmin/tmp/ansible-tmp-1568374168.03-130569190515156/populate_initial_config.py\", line 958, in <module>",
            " controller = populate_controller_config(client)",
            " File \"/tmp/.ansible-sysadmin/tmp/ansible-tmp-1568374168.03-130569190515156/populate_initial_config.py\", line 817, in populate_controller_config",
            " boot_device = get_device_from_function(find_boot_device)",
            " File \"/tmp/.ansible-sysadmin/tmp/ansible-tmp-1568374168.03-130569190515156/populate_initial_config.py\", line 757, in get_device_from_function",
            " device_node = get_disk_function()",
            " File \"/tmp/.ansible-sysadmin/tmp/ansible-tmp-1568374168.03-130569190515156/populate_initial_config.py\", line 737, in find_boot_device",
            " raise ConfigFail(\"Failed to determine the boot device\")",
            "controllerconfig.common.exceptions.ConfigFail: Failed to determine the boot device"
        ],
        "stdout": "Populating system config...\nSystem type is All-in-one\nSystem config completed.\nPopulating load config...\nLoad config completed.\nPopulating management network...\nPopulating pxeboot network...\nPopulating oam network...\nPopulating multicast network...\nPopulating cluster host network...\nPopulating cluster pod network...\nPopulating cluster service network...\nNetwork config completed.\nPopulating/Updating DNS config...\nDNS config completed.\nManagement mac = 00:00:00:00:00:00\nRoot fs device = /dev/disk/by-path/pci-0000:00:1f.2-ata-1.0\nFailed to update the initial system config.\n",
        "stdout_lines": [
            "Populating system config...",
            "System type is All-in-one",
            "System config completed.",
            "Populating load config...",
            "Load config completed.",
            "Populating management network...",
            "Populating pxeboot network...",
            "Populating oam network...",
            "Populating multicast network...",
            "Populating cluster host network...",
            "Populating cluster pod network...",
            "Populating cluster service network...",
            "Network config completed.",
            "Populating/Updating DNS config...",
            "DNS config completed.",
            "Management mac = 00:00:00:00:00:00",
            "Root fs device = /dev/disk/by-path/pci-0000:00:1f.2-ata-1.0",
            "Failed to update the initial system config."
        ]
    }
}

TASK [persist-config : Fail if populate config script throws an exception] *******************************************************************************************************************
fatal: [localhost]: FAILED! => {"changed": false, "msg": "Failed to provision initial system configuration."}

PLAY RECAP ***********************************************************************************************************************************************************************************
localhost : ok=97 changed=26 unreachable=0 failed=1

-----

And here is the relevant log:
controller-0:~$ sudo cat /var/log/anaconda/storage.log | grep -i biosboot
08:31:06,860 DEBUG blivet: getFormat('biosboot') returning BIOSBoot instance with object id 84
08:31:06,879 DEBUG blivet: PartitionDevice._setFormat: req0 ; current: None ; type: biosboot ;
08:31:06,881 DEBUG blivet: PartitionDevice._setFormat: req0 ; current: biosboot ; type: biosboot ;
08:31:06,881 INFO blivet: registered action: [88] create format biosboot on partition req0 (id 85)
08:31:07,072 DEBUG blivet: fixing size of non-existent 1024 KiB partition sdb1 (85) with non-existent biosboot
08:31:08,614 DEBUG blivet: action: [88] create format biosboot on partition sdb1 (id 85)
08:31:08,640 DEBUG blivet: action: [88] create format biosboot on partition sdb1 (id 85)
08:31:15,576 INFO blivet: executing action: [88] create format biosboot on partition sdb1 (id 85)
08:31:15,698 DEBUG blivet: BIOSBoot.create: device: /dev/sdb1 ; status: False ; type: biosboot ;
controller-0:~$

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as stx.3.0 gating - It appears that the system cannot be bootstrapped if the boot device is updated from /dev/sda

Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
tags: added: stx.3.0 stx.config
Changed in starlingx:
assignee: nobody → Don Penney (dpenney)
Revision history for this message
Don Penney (dpenney) wrote :

We can take a look at improving how the boot device info is determined. In the meantime, I'd suggest using the /dev/disk/by-path link for the boot args, rather than sdb. That would be a persistent name, since sda/sdb appears to be changing

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)

Fix proposed to branch: master
Review: https://review.opendev.org/682687

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/682687
Committed: https://git.openstack.org/cgit/starlingx/ansible-playbooks/commit/?id=8d49c16bc82b2387b851bcef17d5c59226334452
Submitter: Zuul
Branch: master

commit 8d49c16bc82b2387b851bcef17d5c59226334452
Author: Don Penney <email address hidden>
Date: Tue Sep 17 12:18:58 2019 -0400

    Update boot device query for initial install

    Currently, the initial boot device for the first controller is
    determined by checking the anaconda install log. This check is
    problematic as the log may use a device name like /dev/sda that could
    potentially change. For example, a system that has booted with the USB
    enumerated as sda would need to install to sdb, but once the USB is
    removed, this disk is now sda, causing a mismatch with the logs.

    This update uses the /boot mountpoint instead, determining the disk on
    which it is mounted and using its persistent name.

    Change-Id: Ifbefc005ae1f422c1794f807460d02a02ce058db
    Closes-Bug: 1843893
    Signed-off-by: Don Penney <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.