name resolution error with DVR+HA routers

Bug #1733987 reported by Armando Migliaccio
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Armando Migliaccio

Bug Description

Steps to repro:

* Deploy with multiple DHCP agents per network (e.g. 3) and multiple L3 agents per router (e.g. 2)
* Create a network
* Create a subnet
* Create a DVR+HA router
* Uplink router to external network
* Deploy a VM on the network

The resolv.conf of the VM looks something like this:

cat /etc/resolv.conf
search openstack.local
nameserver 192.168.0.2
nameserver 192.168.0.4
nameserver 192.168.0.3

Where .2, .3. and .4 are your DHCP servers that relay DNS requests.

Name resolution may fail when using one of these servers, due to the lack of qrouter namespace on one of the network nodes associated with the qdhcp namespace hosting the DHCP service for the network.

Expected behavior:

All nameservers can resolve correctly.

This happens in master and prior versions.

Changed in neutron:
importance: Undecided → Medium
tags: added: l3-dvr-backlog
tags: added: l3-ha
tags: added: l3-ipam-dhcp
Changed in neutron:
status: New → Confirmed
description: updated
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Having looked at this myself I have pinpointed this issue to the following problem:

Whenever a DHCP port is created, a qrouter namespace should be provisioned alongside the qdhcp namespace hosting the DHCP port. When DVR+HA routers are involved it looks like a conditional in the l3_hamode_db mixin (which is not triggered in any other case) prevents the server from returning a non-empty list of routers to the node running L3/DHCP agent, and thus kick off the provisioning of the qrouter namespace.

Changed in neutron:
assignee: nobody → Armando Migliaccio (armando-migliaccio)
status: Confirmed → In Progress
Revision history for this message
venkata anil (anil-venkata) wrote :

To reach the DHCP server, why the request from vm is going through router?

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote : Re: [Bug 1733987] Re: name resolution error with DVR+HA routers

On Thu, Nov 30, 2017 at 03:20 venkata anil <email address hidden>
wrote:

> To reach the DHCP server, why the request from vm is going through
> router?
>

It’s not the dhcp server that is required to reach but the dns authority
behind it. This may need the router

>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1733987
>
> Title:
> name resolution error with DVR+HA routers
>
> Status in neutron:
> In Progress
>
> Bug description:
> Steps to repro:
>
> * Deploy with multiple DHCP agents per network (e.g. 3) and multiple L3
> agents per router (e.g. 2)
> * Create a network
> * Create a subnet
> * Create a DVR+HA router
> * Uplink router to external network
> * Deploy a VM on the network
>
> The resolv.conf of the VM looks something like this:
>
> cat /etc/resolv.conf
> search openstack.local
> nameserver 192.168.0.2
> nameserver 192.168.0.4
> nameserver 192.168.0.3
>
> Where .2, .3. and .4 are your DHCP servers that relay DNS requests.
>
> Name resolution may fail when using one of these servers, due to the
> lack of qrouter namespace on one of the network nodes associated with
> the qdhcp namespace hosting the DHCP service for the network.
>
> Expected behavior:
>
> All nameservers can resolve correctly.
>
> This happens in master and prior versions.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/neutron/+bug/1733987/+subscriptions
>

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

On 30 November 2017 at 06:47, Armando M. <email address hidden> wrote:

>
> On Thu, Nov 30, 2017 at 03:20 venkata anil <email address hidden>
> wrote:
>
>> To reach the DHCP server, why the request from vm is going through
>> router?
>>
>
> It’s not the dhcp server that is required to reach but the dns authority
> behind it. This may need the router.
>

Think about a situation where the DNS for the DHCP agent is eg. 8.8.8.8,
and the guest VM's DNS is 192.168.1.2 (the DHCP interface). The DNS VM's
request hits the DHCP namespace but it can't get out as it can't reach the
router to go and resolve the name of whatever node the VM's need to reach.

HTH
Armando

>
>> --
>> You received this bug notification because you are subscribed to the bug
>> report.
>> https://bugs.launchpad.net/bugs/1733987
>>
>> Title:
>> name resolution error with DVR+HA routers
>>
>> Status in neutron:
>> In Progress
>>
>> Bug description:
>> Steps to repro:
>>
>> * Deploy with multiple DHCP agents per network (e.g. 3) and multiple L3
>> agents per router (e.g. 2)
>> * Create a network
>> * Create a subnet
>> * Create a DVR+HA router
>> * Uplink router to external network
>> * Deploy a VM on the network
>>
>> The resolv.conf of the VM looks something like this:
>>
>> cat /etc/resolv.conf
>> search openstack.local
>> nameserver 192.168.0.2
>> nameserver 192.168.0.4
>> nameserver 192.168.0.3
>>
>> Where .2, .3. and .4 are your DHCP servers that relay DNS requests.
>>
>> Name resolution may fail when using one of these servers, due to the
>> lack of qrouter namespace on one of the network nodes associated with
>> the qdhcp namespace hosting the DHCP service for the network.
>>
>> Expected behavior:
>>
>> All nameservers can resolve correctly.
>>
>> This happens in master and prior versions.
>>
>> To manage notifications about this bug go to:
>> https://bugs.launchpad.net/neutron/+bug/1733987/+subscriptions
>>
>

Revision history for this message
venkata anil (anil-venkata) wrote :

Thanks Armando

But in the steps to reproduce, adding router interface is missing( i.e neutron router-interface-add ). I think it was by mistake and you are adding router interface, right?

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

On 30 November 2017 at 07:10, venkata anil <email address hidden>
wrote:

> Thanks Armando
>
> But in the steps to reproduce, adding router interface is missing( i.e
> neutron router-interface-add ). I think it was by mistake and you are
> adding router interface, right?
>

* Create a network
* Create a subnet
* Create a DVR+HA router
* Uplink router to external network

<forgot to add the subnet downlink here, I thought it was implicit> :)

* Deploy a VM on the network

>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1733987
>
> Title:
> name resolution error with DVR+HA routers
>
> Status in neutron:
> In Progress
>
> Bug description:
> Steps to repro:
>
> * Deploy with multiple DHCP agents per network (e.g. 3) and multiple L3
> agents per router (e.g. 2)
> * Create a network
> * Create a subnet
> * Create a DVR+HA router
> * Uplink router to external network
> * Deploy a VM on the network
>
> The resolv.conf of the VM looks something like this:
>
> cat /etc/resolv.conf
> search openstack.local
> nameserver 192.168.0.2
> nameserver 192.168.0.4
> nameserver 192.168.0.3
>
> Where .2, .3. and .4 are your DHCP servers that relay DNS requests.
>
> Name resolution may fail when using one of these servers, due to the
> lack of qrouter namespace on one of the network nodes associated with
> the qdhcp namespace hosting the DHCP service for the network.
>
> Expected behavior:
>
> All nameservers can resolve correctly.
>
> This happens in master and prior versions.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/neutron/+bug/1733987/+subscriptions
>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/522362
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=b24013f569024f71197370b10dd23a7647d22c73
Submitter: Zuul
Branch: master

commit b24013f569024f71197370b10dd23a7647d22c73
Author: Armando Migliaccio <email address hidden>
Date: Wed Nov 22 10:59:27 2017 -0800

    Fix DNS connectivity issues with DVR+HA routers and DHCP-HA

    Before this change, DVR_SNAT agents would get no routers when
    asking for updates due to provisioning of DHCP ports on the
    node they are running on. This means that there's no connectivity
    between the DHCP port and the network gateway (that may be
    hosted on a different node), and therefore things like DNS may
    break when a VM attempts resolution when talking to the affected
    DHCP port.

    This change relaxed a conditional that prevents the right list of
    routers to be compiled and returned from the server to the agent.
    The agent on the other hand needs to make sure to allocate the
    right type of router based on what is being returned from the server.

    Closes-bug: #1733987

    Change-Id: I6124738c3324e0cc3f7998e3a541ff7547f2a8a7

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/524777

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/pike)

Reviewed: https://review.openstack.org/524777
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=cd4527733767721f1d5198536a79b3a24d5d1fea
Submitter: Zuul
Branch: stable/pike

commit cd4527733767721f1d5198536a79b3a24d5d1fea
Author: Armando Migliaccio <email address hidden>
Date: Wed Nov 22 10:59:27 2017 -0800

    Fix DNS connectivity issues with DVR+HA routers and DHCP-HA

    Before this change, DVR_SNAT agents would get no routers when
    asking for updates due to provisioning of DHCP ports on the
    node they are running on. This means that there's no connectivity
    between the DHCP port and the network gateway (that may be
    hosted on a different node), and therefore things like DNS may
    break when a VM attempts resolution when talking to the affected
    DHCP port.

    This change relaxed a conditional that prevents the right list of
    routers to be compiled and returned from the server to the agent.
    The agent on the other hand needs to make sure to allocate the
    right type of router based on what is being returned from the server.

    Closes-bug: #1733987

    Change-Id: I6124738c3324e0cc3f7998e3a541ff7547f2a8a7
    (cherry picked from commit b24013f569024f71197370b10dd23a7647d22c73)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 12.0.0.0b2

This issue was fixed in the openstack/neutron 12.0.0.0b2 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/528009

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/ocata)

Reviewed: https://review.openstack.org/528009
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=2a4b74a98ab07b9cb50db702961790b13940dd66
Submitter: Zuul
Branch: stable/ocata

commit 2a4b74a98ab07b9cb50db702961790b13940dd66
Author: Armando Migliaccio <email address hidden>
Date: Wed Nov 22 10:59:27 2017 -0800

    Fix DNS connectivity issues with DVR+HA routers and DHCP-HA

    Before this change, DVR_SNAT agents would get no routers when
    asking for updates due to provisioning of DHCP ports on the
    node they are running on. This means that there's no connectivity
    between the DHCP port and the network gateway (that may be
    hosted on a different node), and therefore things like DNS may
    break when a VM attempts resolution when talking to the affected
    DHCP port.

    This change relaxed a conditional that prevents the right list of
    routers to be compiled and returned from the server to the agent.
    The agent on the other hand needs to make sure to allocate the
    right type of router based on what is being returned from the server.

    Closes-bug: #1733987

    Change-Id: I6124738c3324e0cc3f7998e3a541ff7547f2a8a7
    (cherry picked from commit b24013f569024f71197370b10dd23a7647d22c73)
    (cherry picked from commit cd4527733767721f1d5198536a79b3a24d5d1fea)

tags: added: in-stable-ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 10.0.5

This issue was fixed in the openstack/neutron 10.0.5 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 11.0.3

This issue was fixed in the openstack/neutron 11.0.3 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.