Fuel for OpenStack

Failure installing OpenStack on the first controller node node: Node ... has gone away, then deployment fails

Bug #1425284 reported by Erik Swanson on 2015-02-24

6

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Invalid	Low	Matthew Mosesohn	Fuel for OpenStack 6.1

Bug Description

Getting the same failure in 6.0 and 6.1-92, with both CentOS and Ubuntu, 100% of the time.

1. Create environment (HA) (Same behavior with CentOS and Ubuntu)
2. Choose Neutron with VLAN segmentation, Ceph for everything; no extra services.
3. Settings: Ceph for Nova and Swift.
4. Configure networks: nodes' eth0 is native on the admin (pxe) network, eth1 is everything else with vlan tagging.
5. Assign roles: 3x Controller + Ceph-OSD, 2x Computer + Ceph-OSD.
6. Network test passes with no issues presented.
7. After the first controller node gets to "Installing OpenStack...", the node goes away and never returns.

Connection to the node's console indicates that it seems to be at least running.

Syslog before the failure (captured in the attached snapshot) indicates that it's trying to DHCP on eth1, which doesn't seem right at all.

I can provide a logs snapshot from 6.1-92 as well.

Revision history for this message

Erik Swanson (erik-swanson) wrote on 2015-02-24:

#1

Logs Edit (4.9 MiB, application/x-tar)

Revision history for this message

Erik Swanson (erik-swanson) wrote on 2015-02-24:

#2

(I have since destroyed the install the attached logs relate to; there is no concern re any config secrets in them.)

Revision history for this message

Erik Swanson (erik-swanson) wrote on 2015-02-25:

#3

My instance of this problem has been resolved. The bug is that the networking test reports that all is okay even in the presence of the circumstances for this failure.

The deployment was on VMWare, and the vSwitch used for management/public/storage/etc was not configured to allow promiscuous mode. Reference for anyone else with this problem: https://vimeo.com/104349381

Revision history for this message

Matthew Mosesohn (raytrac3r) wrote on 2015-02-25:

#4

There are some performance concerns with running Ceph OSD on controllers, especially on virtualized environments. The disk I/O bandwidth available to virtual nodes is often too small to support both roles at the same time. Are you able to reproduce these issues by segregating Controller and Ceph OSD roles?

Changed in fuel:
assignee:	nobody → Matthew Mosesohn (raytrac3r)
milestone:	none → 6.1
importance:	Undecided → Low
status:	New → Incomplete

Revision history for this message

Erik Swanson (erik-swanson) wrote on 2015-02-26:

#5

Performance is rather bad, but unnoticeable compared to the overhead of QEMU. Additionally, I have only one ESX host to host all the nodes, so separating them into even more VMs probably won't have a positive impact.

(Finally, as I mentioned, the fix for my situation was entirely dependent on promiscuous mode.)

Background: I'm experimenting with fuel itself, seeing how it sets up nodes and what implications there are on maintenance/upgradeability/etc as a dress rehearsal for taking over a bunch of actual servers, so performance is not even on my radar for the foreseeable future.

Revision history for this message

Vladimir Kuklin (vkuklin) wrote on 2015-04-02:

#6

We are already tracking initiative for better network verification. I am marking the bug as Invalid and putting promisc mode into roadmap

Changed in fuel:
status:	Incomplete → Invalid

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Logs Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.