Failure installing OpenStack on the first controller node node: Node ... has gone away, then deployment fails

Bug #1425284 reported by Erik Swanson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
Low
Matthew Mosesohn

Bug Description

Getting the same failure in 6.0 and 6.1-92, with both CentOS and Ubuntu, 100% of the time.

1. Create environment (HA) (Same behavior with CentOS and Ubuntu)
2. Choose Neutron with VLAN segmentation, Ceph for everything; no extra services.
3. Settings: Ceph for Nova and Swift.
4. Configure networks: nodes' eth0 is native on the admin (pxe) network, eth1 is everything else with vlan tagging.
5. Assign roles: 3x Controller + Ceph-OSD, 2x Computer + Ceph-OSD.
6. Network test passes with no issues presented.
7. After the first controller node gets to "Installing OpenStack...", the node goes away and never returns.

Connection to the node's console indicates that it seems to be at least running.

Syslog before the failure (captured in the attached snapshot) indicates that it's trying to DHCP on eth1, which doesn't seem right at all.

I can provide a logs snapshot from 6.1-92 as well.

Revision history for this message
Erik Swanson (erik-swanson) wrote :
  • Logs Edit (4.9 MiB, application/x-tar)
Revision history for this message
Erik Swanson (erik-swanson) wrote :

(I have since destroyed the install the attached logs relate to; there is no concern re any config secrets in them.)

Revision history for this message
Erik Swanson (erik-swanson) wrote :

My instance of this problem has been resolved. The bug is that the networking test reports that all is okay even in the presence of the circumstances for this failure.

The deployment was on VMWare, and the vSwitch used for management/public/storage/etc was not configured to allow promiscuous mode. Reference for anyone else with this problem: https://vimeo.com/104349381

Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

There are some performance concerns with running Ceph OSD on controllers, especially on virtualized environments. The disk I/O bandwidth available to virtual nodes is often too small to support both roles at the same time. Are you able to reproduce these issues by segregating Controller and Ceph OSD roles?

Changed in fuel:
assignee: nobody → Matthew Mosesohn (raytrac3r)
milestone: none → 6.1
importance: Undecided → Low
status: New → Incomplete
Revision history for this message
Erik Swanson (erik-swanson) wrote :

Performance is rather bad, but unnoticeable compared to the overhead of QEMU. Additionally, I have only one ESX host to host all the nodes, so separating them into even more VMs probably won't have a positive impact.

(Finally, as I mentioned, the fix for my situation was entirely dependent on promiscuous mode.)

Background: I'm experimenting with fuel itself, seeing how it sets up nodes and what implications there are on maintenance/upgradeability/etc as a dress rehearsal for taking over a bunch of actual servers, so performance is not even on my radar for the foreseeable future.

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

We are already tracking initiative for better network verification. I am marking the bug as Invalid and putting promisc mode into roadmap

Changed in fuel:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.