[CenOS7.6] deploy controller-1 fail with error log"TFTP error using baseboard NIC"

Bug #1814360 reported by chen haochuan
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
chen haochuan

Bug Description

Brief Description
-----------------
Failed to deploy controller-1 with error log "TFTP error from server"

Severity
--------
Critical

Steps to Reproduce
------------------
1, deploy controller-0 with image built on CentOS7.6
2, server controller-1 enter BIOS setting to select boot from NIC
3, on controller-0, make host-add

Expected Behavior
------------------
Controller-1 deploy suceed

Actual Behavior
----------------
Error log on screen "TFTP error from server"

Reproducibility
---------------
Reproducible

System Configuration
--------------------
Controller-1 OAM with NIC I210 Gigabit

Branch/Pull Time/Commit
-----------------------
f/centos76 as 2019/01/30

Changed in starlingx:
assignee: nobody → chen haochuan (martin1982)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-integ (f/centos76)

Fix proposed to branch: f/centos76
Review: https://review.openstack.org/634559

Revision history for this message
Don Penney (dpenney) wrote :

I'm curious as to how this occurred. Looking at the fix, this was broken by this update:
https://review.openstack.org/#/c/627454/1

I requested in this review that additional testing be done to verify the installation of nodes from the controller, and was told it had been done. Does this failure only occur with legacy BIOS boot? it's fine for EFI?

Revision history for this message
chen haochuan (martin1982) wrote :

Failure for EFI, and fine for legacy BIOS.

Revision history for this message
Don Penney (dpenney) wrote :

Ok, so the response I received on the review (https://review.openstack.org/#/c/627454/1), indicating that the update was tested by installing a multi-node system with EFI, was inaccurate. This testing was not performed and EFI booting was broken as a result.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as release gating -- issue was introduced by the rebase of the grub2 package

Changed in starlingx:
importance: Undecided → High
status: New → Triaged
tags: added: stx.2019.05 stx.distro.other
Revision history for this message
Cindy Xie (xxie1) wrote :

Don, sorry for the bug. We will definitely improve the testing. Here this bug was tested on:
- multi-node bare-metal with 2 controllers + 1 compute node;
- Duplex bare-metal

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-integ (f/centos76)

Reviewed: https://review.openstack.org/634559
Committed: https://git.openstack.org/cgit/openstack/stx-integ/commit/?id=54fa029a436afbbd0ed1d2d08fdf552702c2ff6b
Submitter: Zuul
Branch: f/centos76

commit 54fa029a436afbbd0ed1d2d08fdf552702c2ff6b
Author: Martin, Chen <email address hidden>
Date: Sun Feb 3 07:17:10 2019 +0800

    Fix pxe boot fail, for incorrect folder access /pxe/EFI/ on controller-0

    Closes-Bug: 1814360

    Test Case:
    Deploy 2 controller and 1 compute on bare metal

    Change-Id: I4ec59180a28ac743935601332cb8f210e87e4a85
    Signed-off-by: Martin, Chen <email address hidden>

tags: added: in-f-centos76
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as Fix Committed given that the fix was merged in the centos76 feature branch. This should be marked as Fix Released once the feature branch is merged to master.

Changed in starlingx:
status: Triaged → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-integ (master)

Fix proposed to branch: master
Review: https://review.openstack.org/642481

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-integ (master)
Download full text (22.7 KiB)

Reviewed: https://review.openstack.org/642481
Committed: https://git.openstack.org/cgit/openstack/stx-integ/commit/?id=79462d149ab7e046b785f54263043f500d674ee5
Submitter: Zuul
Branch: master

commit 54fa029a436afbbd0ed1d2d08fdf552702c2ff6b
Author: Martin, Chen <email address hidden>
Date: Sun Feb 3 07:17:10 2019 +0800

    Fix pxe boot fail, for incorrect folder access /pxe/EFI/ on controller-0

    Closes-Bug: 1814360

    Test Case:
    Deploy 2 controller and 1 compute on bare metal

    Change-Id: I4ec59180a28ac743935601332cb8f210e87e4a85
    Signed-off-by: Martin, Chen <email address hidden>

commit 2e2da431f7bc8c53fdf43e7c11eb28417be44895
Author: Shuicheng Lin <email address hidden>
Date: Wed Jan 30 23:54:55 2019 +0800

    Fix compile error for integrity driver

    integrity tarball in my local mirror is wrong, cause the patch is
    not correct. Correct the patch with the right tarball.

    Story: 2004521
    Task: 29194

    Change-Id: Iee0e7afa12b8583d1bb3d620a5f7626a28f57fed
    Signed-off-by: Shuicheng Lin <email address hidden>

commit 84c46def935bb3703a7b2453c35975897108aa54
Author: Shuicheng Lin <email address hidden>
Date: Thu Dec 27 22:33:11 2018 +0800

    fix tpm driver build failure with 3.10.0-957.1.3 kernel

    Porting upstream patch to fix the build failure with CentOS 7.6 kernel
    If we choose to upgrade tpm driver to include this patch, there will
    be other build failure due to some structure missing in 957 kernel.
    So I decide to back port upstream patch instead of upgrade tpm driver.

    Depends-On: https://review.openstack.org/625785
    Depends-On: https://review.openstack.org/625786

    Story: 2004521
    Task: 28534

    Change-Id: I00d88f4d27ac47107825a17b3bf6d8c74194a7ff
    Signed-off-by: Shuicheng Lin <email address hidden>

commit 020fe434c1e2670fed0b464e4bff0a95d922facc
Author: Shuicheng Lin <email address hidden>
Date: Thu Jan 3 23:38:43 2019 +0800

    upgrade mellanox driver to 4.5-1.0.1.0 which supports CentOS 7.6

    Depends-On: https://review.openstack.org/628099
    Depends-On: https://review.openstack.org/625785
    Depends-On: https://review.openstack.org/625786
    Story: 2004521
    Task: 28537

    Change-Id: Icdcf90ee08b202bc4ba44edb58e2100c7d1b8cc5
    Signed-off-by: Shuicheng Lin <email address hidden>

commit 93d224c64cd9d8334b61fdd3b906c6614e0f2427
Author: Shuicheng Lin <email address hidden>
Date: Thu Dec 27 22:52:17 2018 +0800

    fix integrity driver build issue with CentOS 7.6 3.10.0-957.1.3 kernel

    Porting upstream patch to fix the build failure with the new kernel

    Depends-On: https://review.openstack.org/625785
    Depends-On: https://review.openstack.org/625786

    Story: 2004521
    Task: 28584

    Change-Id: I261d2d9534d90064d250ffabc11221caadcc2a04
    Signed-off-by: Shuicheng Lin <email address hidden>

commit 474aedee05842da7c9469cd9db4c6d30a335107a
Author: Shuicheng Lin <email address hidden>
Date: Fri Jan 4 00:06:58 2019 +0800

    upgrade mellanox libibverbs to 4.5-1.0.1.0 which supports CentOS 7.6

    Depends-On: https://review....

Changed in starlingx:
status: Fix Committed → Fix Released
Ken Young (kenyis)
tags: added: stx.2.0
removed: stx.2019.05
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.