Use send_arp_for_ha=True in docs, explain why

Bug #1093000 reported by Anne Gentle
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
openstack-manuals
Fix Released
High
Tom Fifield

Bug Description

Reported to Anne via email from a RedHatter, Dan.

I just feel like sharing, today.

I've run across this problem several times over the past 5 months whilst
setting up our Essex 5 compute node cluster with flatDHCP networking in
multi-host mode and ran into it again.

Yesterday, I had noticed that my floating_ips, fixed_ips, and instances
tables were out-of-sync wrt a VM that had been terminated months ago -
the fixed ip address was still allocated, which was tying up a floating
IP, which was causing some other, strange issues. I got that all
normalized, manually.

Then I noticed that the number of VMs on each compute node didn't match
the number of IPs assigned in /var/lib/nova/network/nova-br4.conf. Some
compute nodes had more IPs than VMs while others had fewer. Why? The
only explanation I can come up with is that dnsmasq from one compute
node answered to DHCP requests from VMs on another compute node.

So, today, 2 VMs came up and the users couldn't ping or ssh the public
IPs which are automatically assigned. They pinged them and got the
"Destination Host Prohibited" error. I saw the same thing from my
laptop. Logging into the compute nodes hosting the VMs, I *could* ping
the public IPs. Try ping again from my laptop, and lo and behold, I
could ping and ssh into the formerly Prohibited IPs. Why? I think the
arp caches got munged and pinging the public IPs from the compute nodes
helped clear them out.

I've experienced, in the past, ssh'ing into one IP and landing on the
wrong VM. So have the users.

o_O

Definitely an arp problem.

So, I set send_arp_for_ha=true and restarted nova-network. I think this
should be set for all HA networking options and the docs should reflect
this.

Tags: nova
Anne Gentle (annegentle)
Changed in openstack-manuals:
status: New → Triaged
importance: Undecided → High
Tom Fifield (fifieldt)
tags: added: nova
Tom Fifield (fifieldt)
Changed in openstack-manuals:
assignee: nobody → Tom Fifield (fifieldt)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-manuals (master)

Fix proposed to branch: master
Review: https://review.openstack.org/18585

Revision history for this message
Tom Fifield (fifieldt) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-manuals (master)

Reviewed: https://review.openstack.org/18585
Committed: http://github.com/openstack/openstack-manuals/commit/a60ca71bb4786f507375780ad2662baf8e4d1660
Submitter: Jenkins
Branch: master

commit a60ca71bb4786f507375780ad2662baf8e4d1660
Author: Tom Fifield <email address hidden>
Date: Sun Dec 23 11:10:49 2012 +1100

    Add send_arp_for_ha option to multihost-HA doc

    fixes bug 1093000

    The bug report explains well the issues that can be encountered
    when using multihost HA flatDHCP, where ARP caches are not up to
     date. This patch adds the config option as a recommendation and
    explains its purpose.

    Change-Id: I0ca1e218c219f15f86f196531bb8a5fc342383dc

Changed in openstack-manuals:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.