Servers Fails to PXE Boot after Moving to Fuel 8.0

Bug #1569028 reported by Rob Neff
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
MOS Linux

Bug Description

We have a deployment with 8 1U 2-socket OpenStack servers and 6 2U 2-socket Ceph servers. When we deploy with Fuel 7.0, we are able to deploy successfully a Fuel 7.0 cluster with OpenStack + Ceph.

When we deploy Fuel 8.0 with just 1U Compute nodes (1 SSD each), we are able to successfully deploy a Fuel 8.0 cluster.

When we deploy Fuel 8.0 with both 1U & 2U servers, we see the following error message. It seems at this point that the machines are permanently unable to PXE Boot even after installing CentOS via USB on them.

Attempted various debugging methods on Client.
Zero out MBR on SSD
Swap SSD with new SSD
Swap SSD with working SSD from another server
Switch from 10G NIC to 1G NIC for PXE
Switch to Fuel 7 from Fuel 8
Pull CMOS battery to reset
All of the above did not work. Issue will be pushed to Advanced Integration for further debugging.

Installed Centos from USB CD/DVD on server.
Installation completed successfully..
After Installation tried to PXE boot and issue still remains.

Revision history for this message
Rob Neff (rob-neff) wrote :
Revision history for this message
Bartosz Kupidura (zynzel) wrote :

Please provide exact server specs (server model, nic model, ...)

Changed in fuel:
status: New → Incomplete
importance: Undecided → High
assignee: nobody → MOS Linux (mos-linux)
milestone: none → 8.0-updates
Revision history for this message
Rob Neff (rob-neff) wrote :

Note that the Fuel Diagnostic Snapshot is attached to the Duplicate bug.

Also, we cannot repro this issue when deploying to the 1U Compute nodes only. It only happens when we deploy to the 2U Storage nodes.
___________

SERVER SPECS

These are 2 socket Flex Ciii whitebox servers (not Dell/HP/SuperMicro/etc) and both servers use the same motherboard.

1U Compute Server
-----------------------------
2x E5-2630v3
192GB DDR4 Micron DIMM
1 Intel S3500 SSD 512GB

2U Ceph Server 2U
-----------------------------
2x E5-2630v3
256GB DDR4 Micron DIMM
LSI 9361-8i HBA card
Intel HBA Expander
2 SATA Transcend 512GBSSD's (OS)
24 drive SAS backplane
- 20 Seagate 10k.6 1.2TB drives (OSDs)
- 4 Intel S3710 400GB SSD's (Journals)

Revision history for this message
Alexander Gordeev (a-gordeev) wrote :

Hi Rob,

i think it's related to newly discovered regression introduced in 8.0 https://bugs.launchpad.net/fuel/+bug/1567930

If fuel 7.0 is able to deploy the same h/w and nothing has been changed, then 8.0 should be able to deploy it too.

However, i'm not 100% sure about that.

Provided diagnostic snapshot didn't contain fuel-agent logs at all.

It seems that all nodes were removed from the environment, so their remote log files were removed too.

Could you take the snapshot right after deployment has been failed? With no additional actions.

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Marking as Invalid after a month without feedback. Please feel to provide more info and re-open the bug.

Changed in fuel:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.