VM gets wrong ipv6 address from neutron-dhcp-agent after ipv6 address on port was changed

Bug #1959697 reported by Anton Kurbatov
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
New
Undecided
Unassigned

Bug Description

I run into a problem when neutron dhcp-agent is still replying to the old address confirmation.
Simple steps to reproduce:
- create a port with IPv6 address in dhcpv6-stateful subnet
- create a VM with cloud-init inside
- change the IPv6 port address
- reboot the VM

Here are my commands:

$ openstack subnet create --subnet-range 2001:db8:123::/64 --ip-version 6 --ipv6-address-mode dhcpv6-stateful --network public subv6
$ openstack subnet list --network public
+--------------------------------------+-------+--------------------------------------+-------------------+
| ID | Name | Network | Subnet |
+--------------------------------------+-------+--------------------------------------+-------------------+
| 6d9a7fb5-5c1b-4759-b32b-5720b5cedbf4 | subv4 | f1f3d967-26db-41b3-b6f6-1d5356e33a84 | 10.136.16.0/22 |
| 76db898c-6a7a-4301-9253-23241cafaa83 | subv6 | f1f3d967-26db-41b3-b6f6-1d5356e33a84 | 2001:db8:123::/64 |
+--------------------------------------+-------+--------------------------------------+-------------------+
$

$ openstack port create my-port --network public --fixed-ip ip-address=10.136.17.163 --fixed-ip ip-address=2001:db8:123::111
$ openstack server create test --flavor m1.small --port my-port --image CentOS-7-x86_64-GenericCloud-2009.qcow2 --key-name key --use-config-drive

Check IPv6 address inside VM (it's correct):

[centos@test ~]$ ip a s eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether fa:16:3e:2e:66:ac brd ff:ff:ff:ff:ff:ff
    inet 10.136.17.163/22 brd 10.136.19.255 scope global dynamic eth0
       valid_lft 86371sec preferred_lft 86371sec
    inet6 2001:db8:123::111/128 scope global dynamic
       valid_lft 7473sec preferred_lft 7173sec
    inet6 fe80::f816:3eff:fe2e:66ac/64 scope link
       valid_lft forever preferred_lft forever
[centos@test ~]$

Change IPv6 address and reboot the VM:
$ openstack port set my-port --no-fixed-ip --fixed-ip ip-address=10.136.17.163 --fixed-ip ip-address=2001:db8:123::222
$ openstack server reboot test

[centos@test ~]$ ip a s eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether fa:16:3e:2e:66:ac brd ff:ff:ff:ff:ff:ff
    inet 10.136.17.163/22 brd 10.136.19.255 scope global dynamic eth0
       valid_lft 86382sec preferred_lft 86382sec
    inet6 2001:db8:123::111/128 scope global dynamic
       valid_lft 7482sec preferred_lft 7182sec
    inet6 fe80::f816:3eff:fe2e:66ac/64 scope link
       valid_lft forever preferred_lft forever
[centos@test ~]$

^^ you can see the VM got the old IPv6 address and actually all traffic is blocked by port-security feature. If I remove a lease file and re-spawn a dhclient, all is fine:

[centos@test ~]$ ps axf | grep dhcl
  780 ? Ss 0:00 /sbin/dhclient -1 -q -lf /var/lib/dhclient/dhclient--eth0.lease -pf /var/run/dhclient-eth0.pid -H test eth0
  868 ? Ss 0:00 /sbin/dhclient -6 -1 -lf /var/lib/dhclient/dhclient6--eth0.lease -pf /var/run/dhclient6-eth0.pid eth0 -H test
 1371 pts/0 S+ 0:00 \_ grep --color=auto dhcl
[centos@test ~]$ sudo kill -9 868
[centos@test ~]$ sudo ip addr del 2001:db8:123::111/128 dev eth0
[centos@test ~]$ sudo rm -rf /var/lib/dhclient/dhclient6--eth0.lease
[centos@test ~]$ sudo /sbin/dhclient -6 -1 -lf /var/lib/dhclient/dhclient6--eth0.lease -pf /var/run/dhclient6-eth0.pid eth0 -H test
[centos@test ~]$ ip a s eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether fa:16:3e:2e:66:ac brd ff:ff:ff:ff:ff:ff
    inet 10.136.17.163/22 brd 10.136.19.255 scope global dynamic eth0
       valid_lft 86319sec preferred_lft 86319sec
    inet6 2001:db8:123::222/128 scope global dynamic
       valid_lft 7481sec preferred_lft 7181sec
    inet6 fe80::f816:3eff:fe2e:66ac/64 scope link
       valid_lft forever preferred_lft forever
[centos@test ~]$

I found some logic with dhcpv6 leases removing here:
https://opendev.org/openstack/neutron/src/commit/e7b70521d0e230143a80974e7e4795a2acafcc9b/neutron/agent/linux/dhcp.py#L600
but it looks like it doesn't help in case of DHCPCONFIRM client request:
In the dnsmasq logs I see the following DHCPCONFIRM->DHCPREPLY messages exchange after the VM came back after the reboot (see also https://datatracker.ietf.org/doc/html/rfc3315#page-50):

Feb 1 16:49:12 dnsmasq-dhcp[1360521]: DHCPREQUEST(tapc233cb5c-8f) 10.136.17.163 fa:16:3e:2e:66:ac
Feb 1 16:49:12 dnsmasq-dhcp[1360521]: DHCPACK(tapc233cb5c-8f) 10.136.17.163 fa:16:3e:2e:66:ac host-10-136-17-163
Feb 1 16:49:15 dnsmasq-dhcp[1360521]: DHCPCONFIRM(tapc233cb5c-8f) 00:01:00:01:29:8c:20:5e:fa:16:3e:2e:66:ac
Feb 1 16:49:15 dnsmasq-dhcp[1360521]: DHCPREPLY(tapc233cb5c-8f) 2001:db8:123::111 00:01:00:01:29:8c:20:5e:fa:16:3e:2e:66:ac host-2001-db8-123--222

description: updated
summary: - VM gets wrong ipv6 address from dhcp-agent after ipv6 address on port
- was changed
+ VM gets wrong ipv6 address from neutron-dhcp-agent after ipv6 address on
+ port was changed
tags: added: ipv6 l3-ipam-dhcp
Revision history for this message
Lajos Katona (lajos-katona) wrote :

Thanks for the bug report, I tried but can't reproduce your issue.
I used latest master (though as I see there was no code change around that code path), Ubuntu 18.04.5 as test VM
After rebooting the test VM, I have the correct address:
ubuntu@test:~$ ip a
....
2: ens2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc fq_codel state UP group default qlen 1000
    link/ether fa:16:3e:c2:32:df brd ff:ff:ff:ff:ff:ff
    inet 192.171.0.5/27 brd 192.171.0.31 scope global dynamic ens2
       valid_lft 86201sec preferred_lft 86201sec
    inet6 2001:db8:123::222/128 scope global dynamic noprefixroute
       valid_lft 86202sec preferred_lft 86202sec
    inet6 fe80::f816:3eff:fec2:32df/64 scope link
       valid_lft forever preferred_lft forever

Just a small headsup: from https://docs.openstack.org/neutron/latest/admin/config-ipv6.html#ipv6-ra-mode-and-ipv6-address-mode-combinations only this combination is support for subnet creation with "dhcpv6-stateful":
openstack subnet create --ip-version 6 --ipv6-address-mode dhcpv6-stateful --ipv6-ra-mode dhcpv6-stateful....

As I checked the api-ref (and in my env) by not directly setting it (--ipv6-ra-mode dhcpv6-stateful) when creating the subnet the value will be null, and that is not supported by the reference implementation (what is in openstack/neutron repo)

Revision history for this message
Anton Kurbatov (akurbatov) wrote :
Download full text (3.9 KiB)

Hi,
I was also unable to reproduce the issue if systemd-networkd is used inside the guest.
But then I've switched to the NetworkManager backend inside netplan.
Here are my steps to reproduce the issue on ubuntu 18.04.6:

[centos@devstack devstack]$ openstack port create my-port --network public --fixed-ip ip-address=10.136.17.164 --fixed-ip ip-address=2001:db8:123::111
[centos@devstack devstack]$ openstack server create test --flavor m1.small --port my-port --image bionic-server-cloudimg-i386.img --key-name key --use-config-drive

ubuntu@test:~$ cat /etc/os-release | grep -i -w version
VERSION="18.04.6 LTS (Bionic Beaver)"
ubuntu@test:~$ # switch netplan to NetworkManager backend:
ubuntu@test:~$ sudo apt-get install network-manager -y
ubuntu@test:~$ sudo cp /etc/netplan/50-cloud-init.yaml /etc/netplan/50-cloud-init.yaml.copy
ubuntu@test:~$ sudo vi /etc/netplan/50-cloud-init.yaml
ubuntu@test:~$ diff -u /etc/netplan/50-cloud-init.yaml.copy /etc/netplan/50-cloud-init.yaml
--- /etc/netplan/50-cloud-init.yaml.copy 2022-02-08 13:15:16.034537763 +0000
+++ /etc/netplan/50-cloud-init.yaml 2022-02-08 13:15:28.646031734 +0000
@@ -5,6 +5,7 @@
 # network: {config: disabled}
 network:
     version: 2
+ renderer: NetworkManager
     ethernets:
         ens2:
             accept-ra: true
ubuntu@test:~$

[centos@devstack devstack]$ IPv4=10.136.17.164
[centos@devstack devstack]$ set_ip() {
ip=2001:db8:123::$1;
echo setting ip address: $ip
openstack port set my-port --no-fixed-ip --fixed-ip ip-address=$IPv4 --fixed-ip ip-address=$ip;
echo rebooting the VM ...;
openstack server reboot test;
while :; do openstack server show test -c status | grep -q ACTIVE && break || sleep 5; done
while :; do ssh ubuntu@$IPv4 ip a | grep 2001:db8:123 && break || sleep 5; done;
ssh ubuntu@$IPv4 ip a | grep -q $ip || { echo "IP address is not correct"; return 1; }
ssh ubuntu@$IPv4 sync;
}
[centos@devstack devstack]$ for ip_tail in {222,333,444,555}; do set_ip $ip_tail || break; done
setting ip address: 2001:db8:123::222
rebooting the VM ...
    inet6 2001:db8:123::222/128 scope global dynamic noprefixroute
setting ip address: 2001:db8:123::333
rebooting the VM ...
    inet6 2001:db8:123::222/128 scope global tentative dynamic noprefixroute
IP address is not correct
[centos@devstack devstack]$

[centos@devstack devstack]$ openstack port show my-port -c fixed_ips
+-----------+----------------------------------------------------------------------------------+
| Field | Value |
+-----------+----------------------------------------------------------------------------------+
| fixed_ips | ip_address='10.136.17.164', subnet_id='6d9a7fb5-5c1b-4759-b32b-5720b5cedbf4' |
| | ip_address='2001:db8:123::333', subnet_id='263cee07-f53d-4d4e-85d1-8b9f507c602a' |
+-----------+----------------------------------------------------------------------------------+
[centos@devstack devstack]$

ubuntu@test:~$ sudo ip addr del 2001:db8:123::222/128 dev ens2
ubuntu@test:~$ sudo netplan apply
ubuntu@test:~$ ip a
...
2: ens2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group defaul...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.