node failed to reboot

Bug #1467671 reported by Big Switch Networks
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
MOS Linux
6.1.x
Fix Released
Critical
Vitaly Sedelnik
7.0.x
Fix Released
High
Denis Meltsaykin

Bug Description

We are trying to deploy fuel6.1 with build 521, trying for single node cluster.

1.) bootstrap did go through fine.
2.) While installing Ubuntu, reboot of the node failed, and stuck at below
kvm: exiting hardware virtualization
sd 0:0:0:0: [sda] Synchronizing SCSI cache

Looking at few online forums, suggestions made to update the bios, our bios was the latest from DELL. We are using Dell R220 HW and using 1.4.0 bios revision. In addition to this suggestion, there were few mentioning about setting the acpi to off. Not sure if that helps...I have copied /var/log/* from the node and attached here. Please let us know if you need more information.

Workaround we tried:

1.) Manually powering off/on the node, worked.

Revision history for this message
Big Switch Networks (fuel-bugs-internal) wrote :
Changed in fuel:
assignee: nobody → Vladimir Kozhukalov (kozhukalov)
milestone: none → 6.1
Changed in fuel:
assignee: Vladimir Kozhukalov (kozhukalov) → MOS Linux (mos-linux)
status: New → Confirmed
Revision history for this message
Aleksander Mogylchenko (amogylchenko) wrote :

Could you please clarify, how often this happens?

Changed in fuel:
milestone: 6.1 → 6.1-updates
importance: Undecided → Medium
Revision history for this message
Big Switch Networks (fuel-bugs-internal) wrote :

It happens all the time on Dell R220 servers .

Albert Syriy (asyriy)
Changed in fuel:
assignee: MOS Linux (mos-linux) → asyriy (asyriy)
Revision history for this message
Big Switch Networks (fuel-bugs-internal) wrote :

Thanks , we have around 500 R220 servers to be brought online with Fuel 6.1 , Would appreciate any quick turnaround or patch.

Changed in fuel:
importance: Medium → Critical
Revision history for this message
Albert Syriy (asyriy) wrote :

Could we "borrow" the server for the bug investigation ?

Revision history for this message
Albert Syriy (asyriy) wrote :

Need access to the server to investigate the issue.

Changed in fuel:
status: Confirmed → Incomplete
Revision history for this message
Pavel Boldin (pboldin) wrote :

The problem was that bnx2x device driver is not removing the devices during the shutdown, leaving some resources initialized (e.g. kernel timers and PCI resources).

Fixed that by backporting the following upstream changesets:

* Sun Jun 28 2015 Pavel Boldin <email address hidden> - 3.10.55-mira4
- Fix the Broadcom NetXtreme II reboot kernel hang by backporting the
  listed commits from the upstream:
    * b030ed2fdc8a396dba71e4d550236a0f1bb38b40
      bnx2x: Implement PCI shutdown
    * d9aee591b0f06bd44cd577b757d3f267bc35fe4d
      bnx2x: Don't release PCI bars on shutdown

Please see the https://review.fuel-infra.org/8839/ .

Albert Syriy (asyriy)
Changed in fuel:
assignee: asyriy (asyriy) → Pavel Boldin (pboldin)
Revision history for this message
Pavel Boldin (pboldin) wrote :

Current status is:

1. Custom ISO built.
2. Custom ISO passes both CentOS and Ubuntu BVT.

Changed in fuel:
status: Incomplete → In Progress
tags: added: mos-linux
tags: added: 6.1-mu1
tags: added: 6.1-mu-1
removed: 6.1-mu1
Changed in fuel:
milestone: 6.1-updates → 6.1-mu-1
Changed in fuel:
milestone: 6.1-mu-1 → 6.1-updates
Revision history for this message
Michael Semenov (msemenov) wrote :

Change request for CentOS - https://review.fuel-infra.org/#/c/9734/

tags: added: 6.1-mu-2
removed: 6.1-mu-1
Changed in fuel:
milestone: 6.1-updates → 6.1-mu-2
Revision history for this message
Michael Semenov (msemenov) wrote :

Fix prepared, so assigning to maintenance team.

Changed in fuel:
assignee: Pavel Boldin (pboldin) → Vitaly Sedelnik (vsedelnik)
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Related fix merged to packages/centos6/openvswitch-kmod (6.1)

Reviewed: https://review.fuel-infra.org/9734
Submitter: Vitaly Sedelnik <email address hidden>
Branch: 6.1

Commit: bcff343db585abf90256dff41aecdeeed1aa4246
Author: Pavel Boldin <email address hidden>
Date: Wed Jul 22 13:46:32 2015

Bump version to match kernel-lt

Bump openvswitch-kmod version so it matches the kernel-lt version after
https://review.fuel-infra.org/#q,I8648f704a942883479c66ba62068870a70135ccd,n,z

Change-Id: I8fcfa1b56d845e73e5f0e0760f47dc72bcba6c73
Related-Bug: #1467671

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to packages/centos6/kernel-lt (6.1)

Reviewed: https://review.fuel-infra.org/8839
Submitter: Vitaly Sedelnik <email address hidden>
Branch: 6.1

Commit: 22cde41df6e5a831d910b1654cbd4d03548e6acf
Author: Pavel Boldin <email address hidden>
Date: Mon Jun 29 11:29:56 2015

Fix Broadcast NetXtreme II reboot hung

Broadcast NetXtreme II driver was lacking the shutdown handler required
to disable the devices. This led to a hang during the reboot because the
enabled driver code was attempting to use the freed resources.

Change-Id: I8648f704a942883479c66ba62068870a70135ccd
Closes-Bug: #1467671

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Related fix proposed to packages/centos6/openvswitch-kmod (6.1)

Related fix proposed to branch: 6.1
Change author: Vitaly Sedelnik <email address hidden>
Review: https://review.fuel-infra.org/10208

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Related fix merged to packages/centos6/openvswitch-kmod (6.1)

Reviewed: https://review.fuel-infra.org/10208
Submitter: Vitaly Sedelnik <email address hidden>
Branch: 6.1

Commit: e52257594df40fc567131e145464421168fc9ed3
Author: Vitaly Sedelnik <email address hidden>
Date: Fri Aug 7 10:32:58 2015

Empty commit to rebuild packages for change I8648f704a942883479c66ba62068870a70135ccd

Related-Bug: #1467671
Closes-Bug: #1482562

Change-Id: Ie2cfffecd6bc15c630f5bd0dca8a85fa11a9a7e0

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Vadim Rovachev (vrovachev) wrote :

Verified on 6.1.

Changed in fuel:
status: Fix Committed → Fix Released
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to packages/centos6/kernel-lt (7.0)

Fix proposed to branch: 7.0
Change author: Pavel Boldin <email address hidden>
Review: https://review.fuel-infra.org/14372

Dmitry Pyzhov (dpyzhov)
tags: added: area-mos
no longer affects: fuel/8.0.x
tags: added: team-linux
tags: removed: team-linux
tags: added: area-linux
removed: area-mos
Revision history for this message
Ivan Suzdal (isuzdal) wrote :

In newer kernels (ubuntu - 3.13.0/centos - 3.10.0) this patch already included. So it's probably should be moved to "won't fix" state.

Changed in fuel:
status: Confirmed → Invalid
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to packages/centos6/kernel-lt (7.0)

Reviewed: https://review.fuel-infra.org/14372
Submitter: Denis V. Meltsaykin <email address hidden>
Branch: 7.0

Commit: 1fcb648d4f49c71d7bb134898f3019dcd259f6f0
Author: Pavel Boldin <email address hidden>
Date: Wed Dec 2 13:27:03 2015

Fix Broadcast NetXtreme II reboot hung

Broadcast NetXtreme II driver was lacking the shutdown handler required
to disable the devices. This led to a hang during the reboot because the
enabled driver code was attempting to use the freed resources.

Kernel version has been bumped to mos6.

Change-Id: I8648f704a942883479c66ba62068870a70135ccd
Closes-Bug: #1467671
(cherry picked from commit 22cde41df6e5a831d910b1654cbd4d03548e6acf)

Dmitry (dtsapikov)
tags: added: on-verification
Revision history for this message
Dmitry (dtsapikov) wrote :

Verified on 7.0 release

tags: removed: on-verification
tags: added: 7.0-mu-2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.