After kernel upgrade, nf_conntrack_ipv4 module unloaded, no IP traffic to instances

Bug #1834213 reported by Drew Freiberger
26
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Neutron Open vSwitch Charm
Fix Released
Low
Tiago Pasqualini da Silva
neutron
Fix Released
Undecided
Brian Haley
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

With an environment running Xenial-Queens, and having just upgraded the linux-image-generic kernel for MDS patching, a few of our hypervisor hosts that were rebooted (3 out of 100) ended up dropping IP (tcp/udp) ingress traffic.

It turns out that nf_conntrack module was loaded, but nf_conntrack_ipv4 was not loading, and the traffic was being dropped by this rule:

 table=72, n_packets=214989, priority=50,ct_state=+inv+trk actions=resubmit(,93)

The ct_state "inv" means invalid conntrack state, which the manpage describes as:

                     The state is invalid, meaning that the connection tracker
                     couldn’t identify the connection. This flag is a catch-
                     all for problems in the connection or the connection
                     tracker, such as:

                     • L3/L4 protocol handler is not loaded/unavailable.
                            With the Linux kernel datapath, this may mean that
                            the nf_conntrack_ipv4 or nf_conntrack_ipv6 modules
                            are not loaded.

                     • L3/L4 protocol handler determines that the packet
                            is malformed.

                     • Packets are unexpected length for protocol.

It appears that there may be an issue when patching the OS of a hypervisor not running instances may fail to update initrd to load nf_conntrack_ipv4 (and/or _ipv6).

I couldn't find anywhere in the charm code that this would be loaded unless the charm's "harden" option is used on nova-compute charm (see charmhelpers contrib/host templates). It is unset in our environment, so we are not using any special module probing.

Did nf_conntrack_ipv4 get split out from nf_conntrack in recent kernel upgrades or is it possible that the charm should define a modprobe file if we have the OVS firewall driver configured?

Revision history for this message
James Page (james-page) wrote :

You're correct in that the charm does not do any module loading; that's handled by neutron.

Revision history for this message
James Page (james-page) wrote :
Download full text (3.2 KiB)

I can't actually see this module:

$ ls -l /lib/modules/4.15.0-50-generic/kernel/net/netfilter/nf_*
-rw-r--r-- 1 root root 203014 May 6 16:59 /lib/modules/4.15.0-50-generic/kernel/net/netfilter/nf_conntrack.ko
-rw-r--r-- 1 root root 10518 May 6 16:59 /lib/modules/4.15.0-50-generic/kernel/net/netfilter/nf_conntrack_amanda.ko
-rw-r--r-- 1 root root 5070 May 6 16:59 /lib/modules/4.15.0-50-generic/kernel/net/netfilter/nf_conntrack_broadcast.ko
-rw-r--r-- 1 root root 26230 May 6 16:59 /lib/modules/4.15.0-50-generic/kernel/net/netfilter/nf_conntrack_ftp.ko
-rw-r--r-- 1 root root 100054 May 6 16:59 /lib/modules/4.15.0-50-generic/kernel/net/netfilter/nf_conntrack_h323.ko
-rw-r--r-- 1 root root 15934 May 6 16:59 /lib/modules/4.15.0-50-generic/kernel/net/netfilter/nf_conntrack_irc.ko
-rw-r--r-- 1 root root 6694 May 6 16:59 /lib/modules/4.15.0-50-generic/kernel/net/netfilter/nf_conntrack_netbios_ns.ko
-rw-r--r-- 1 root root 56158 May 6 16:59 /lib/modules/4.15.0-50-generic/kernel/net/netfilter/nf_conntrack_netlink.ko
-rw-r--r-- 1 root root 26334 May 6 16:59 /lib/modules/4.15.0-50-generic/kernel/net/netfilter/nf_conntrack_pptp.ko
-rw-r--r-- 1 root root 17062 May 6 16:59 /lib/modules/4.15.0-50-generic/kernel/net/netfilter/nf_conntrack_proto_gre.ko
-rw-r--r-- 1 root root 13446 May 6 16:59 /lib/modules/4.15.0-50-generic/kernel/net/netfilter/nf_conntrack_sane.ko
-rw-r--r-- 1 root root 41318 May 6 16:59 /lib/modules/4.15.0-50-generic/kernel/net/netfilter/nf_conntrack_sip.ko
-rw-r--r-- 1 root root 7238 May 6 16:59 /lib/modules/4.15.0-50-generic/kernel/net/netfilter/nf_conntrack_snmp.ko
-rw-r--r-- 1 root root 13726 May 6 16:59 /lib/modules/4.15.0-50-generic/kernel/net/netfilter/nf_conntrack_tftp.ko
-rw-r--r-- 1 root root 5294 May 6 16:59 /lib/modules/4.15.0-50-generic/kernel/net/netfilter/nf_dup_netdev.ko
-rw-r--r-- 1 root root 10214 May 6 16:59 /lib/modules/4.15.0-50-generic/kernel/net/netfilter/nf_log_common.ko
-rw-r--r-- 1 root root 6670 May 6 16:59 /lib/modules/4.15.0-50-generic/kernel/net/netfilter/nf_log_netdev.ko
-rw-r--r-- 1 root root 34838 May 6 16:59 /lib/modules/4.15.0-50-generic/kernel/net/netfilter/nf_nat.ko
-rw-r--r-- 1 root root 7054 May 6 16:59 /lib/modules/4.15.0-50-generic/kernel/net/netfilter/nf_nat_amanda.ko
-rw-r--r-- 1 root root 9390 May 6 16:59 /lib/modules/4.15.0-50-generic/kernel/net/netfilter/nf_nat_ftp.ko
-rw-r--r-- 1 root root 8734 May 6 16:59 /lib/modules/4.15.0-50-generic/kernel/net/netfilter/nf_nat_irc.ko
-rw-r--r-- 1 root root 6102 May 6 16:59 /lib/modules/4.15.0-50-generic/kernel/net/netfilter/nf_nat_redirect.ko
-rw-r--r-- 1 root root 17438 May 6 16:59 /lib/modules/4.15.0-50-generic/kernel/net/netfilter/nf_nat_sip.ko
-rw-r--r-- 1 root root 6022 May 6 16:59 /lib/modules/4.15.0-50-generic/kernel/net/netfilter/nf_nat_tftp.ko
-rw-r--r-- 1 root root 13662 May 6 16:59 /lib/modules/4.15.0-50-generic/kernel/net/netfilter/nf_synproxy_core.ko
-rw-r--r-- 1 root root 125414 May 6 16:59 /lib/modules/4.15.0-50-generic/kernel/net/netfilter/nf_tables.ko
-rw-r--r-- 1 root root 7854 May 6 16:59 /lib/modules/4.15.0-50-generic/kernel/net/netfilter/nf_tables_inet.ko
-rw-r--r-- 1 root root ...

Read more...

Revision history for this message
James Page (james-page) wrote :

Ignore prior comment:

$ lsmod | grep conntrack
nf_conntrack_ipv6 20480 1
nf_conntrack_ipv4 16384 1
nf_defrag_ipv4 16384 1 nf_conntrack_ipv4
nf_defrag_ipv6 36864 2 nf_conntrack_ipv6,openvswitch
nf_conntrack 131072 6 nf_conntrack_ipv6,nf_conntrack_ipv4,nf_nat,nf_nat_ipv6,nf_nat_ipv4,openvswitch
libcrc32c 16384 5 nf_conntrack,nf_nat,openvswitch,xfs,raid456

as soon as a loaded the openvswitch kernel module the nf_conntrack_* modules where loaded as well.

Changed in charm-neutron-openvswitch:
status: New → Incomplete
Revision history for this message
James Page (james-page) wrote :

Raising a kernel bug task.

Note my testing was on Bionic not Xenial.

Drew - can you confirm which kernel version and packages you are using.

Changed in charm-neutron-openvswitch:
importance: Undecided → Low
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1834213

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Steven Parker (sbparke) wrote :

Kernel version

uname -r
4.4.0-150-generic

apt list --installed | fgrep image

cloud-image-utils/xenial-updates,now 0.27-0ubuntu25.1 all [installed,automatic]
genisoimage/xenial,now 9:1.1.11-3ubuntu1 amd64 [installed]
linux-image-4.4.0-137-generic/xenial-updates,xenial-security,now 4.4.0-137.163 amd64 [installed,automatic]
linux-image-4.4.0-148-generic/xenial-updates,xenial-security,now 4.4.0-148.174 amd64 [installed,automatic]
linux-image-4.4.0-150-generic/xenial-updates,xenial-security,now 4.4.0-150.176 amd64 [installed,automatic]
linux-image-extra-4.4.0-137-generic/xenial-updates,xenial-security,now 4.4.0-137.163 amd64 [installed,automatic]
linux-image-generic/now 4.4.0.150.158 amd64 [installed,upgradable to: 4.4.0.154.162]
linux-signed-image-4.4.0-137-generic/xenial-updates,xenial-security,now 4.4.0-137.163 amd64 [installed,automatic]
ubuntu-cloudimage-keyring/xenial,now 2013.11.11 all [installed]

openvswitch version

apt list --installed | fgrep vswitch

neutron-openvswitch-agent/now 2:12.0.5-0ubuntu1~cloud0 all [installed,upgradable to: 2:12.0.6-0ubuntu2~cloud0]
openvswitch-common/xenial-updates,now 2.9.2-0ubuntu0.18.04.3~cloud0 amd64 [installed]
openvswitch-switch/xenial-updates,now 2.9.2-0ubuntu0.18.04.3~cloud0 amd64 [installed]
python-openvswitch/xenial-updates,now 2.9.2-0ubuntu0.18.04.3~cloud0 all [installed]

let me know if you need anything else.

Thanks,

Steven

Revision history for this message
Drew Freiberger (afreiberger) wrote :

oddly, this did not happen on all hosts with this version kernel, it was pseudo random and about ~30-40%. There must be another variable at play.

Changed in charm-neutron-openvswitch:
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
tags: added: sts
Revision history for this message
Tiago Pasqualini da Silva (tiago.pasqualini) wrote :

Spent some time debugging this and I found some interesting bits. I was able to reproduce this by deploying a xenial-queens environment with VXLAN and the OVS firewall. Investigating this, here is what I found:

1) This module is first loaded on the compute nodes when libvirt-bin is installed. This package's postinst script creates the default libvirt network, so when libvirt service is enabled it will create some rules on iptables for this network, which will load the nf_conntrack_ipv4 module.

2) At some point during the configuration of the environment, this default network is destroyed (I'm still investigating who is doing this, but makes sense since nova/neutron won't use the default libvirt network), so those iptables rules won't be added anymore on libvirt service startup, so what was previously loading the module, won't do it.

3) Neutron relies on conntrack for the OVS firewall to work. It's on the documentation: https://docs.openstack.org/newton/networking-guide/config-ovsfwdriver.html

4) As pointed on the bug description, OVS complains whenever the module is not loaded, so we can assume that it's not its responsibility to load it.

In my opinion this is something that neutron-ovs-agent should be loading, since the OVS firewall requires conntrack to work and OVS complains that it is not loaded.

It would be interesting to see how (if) this works on different openstack deployments.

Revision history for this message
Tiago Pasqualini da Silva (tiago.pasqualini) wrote :

Just tested on devstack deployed on Xenial. The module gets loaded at some point during neutron configuration on the deployment script.

It seems like a neutron bug to me. It relies on conntrack for the firewall to work, but never actually loads the module. In most cases something else will end up loading it, but in the event that no one else loads it, it will fail.

Revision history for this message
Tiago Pasqualini da Silva (tiago.pasqualini) wrote :
Changed in charm-neutron-openvswitch:
assignee: nobody → Tiago Pasqualini da Silva (tiago.pasqualini)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-neutron-openvswitch (master)

Reviewed: https://review.opendev.org/678956
Committed: https://git.openstack.org/cgit/openstack/charm-neutron-openvswitch/commit/?id=b76a59299794700fae1878af513c90ca5182a9f6
Submitter: Zuul
Branch: master

commit b76a59299794700fae1878af513c90ca5182a9f6
Author: tpsilva <email address hidden>
Date: Tue Aug 27 17:41:24 2019 -0300

    Explicitly load nf_conntrack_ipv4 module

    When neutron-openvswitch-agent is using the openvswitch firewall,
    it needs the nf_conntrack_ipv4 module to be loaded. Usually, this
    module gets loaded by some other external tool, but in case this
    does not happen, neither the charm nor neutron will load it, so
    all traffic to the instances in this host will fail. This patch
    fixes that by explicitly loading the module.

    Change-Id: Ia788e870c124de7da17961c02259cfe80938e5d2
    Closes-bug: #1834213

Changed in charm-neutron-openvswitch:
status: In Progress → Fix Committed
Revision history for this message
James Page (james-page) wrote :

Adding a neutron bug-task to get an upstream opinion on whether neutron should be loading these modules as the n-ovs-agent starts up.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Hi James. I don't think that Neutron should load this module. We are not managing any modules in Neutron AFAICT. It is on deployment tools/distro to ensure that proper modules are loaded.
Maybe we should add note about this module in https://github.com/openstack/neutron/blob/688bbdd5cd10a13b010902525617fd43d8a415b6/doc/source/admin/config-ovsfwdriver.rst - What do You think about it?

Revision history for this message
Ionuț Bîru (ionut-3) wrote :

on the new HWE kernel from ubuntu 18.04, which is linux 5.0, the modules are not present anymore nf_conntrack_ipv4 and nf_conntrack_ipv6

i think it was merged into nf_conntrack but i'm not sure.

Revision history for this message
Steven Parker (sbparke) wrote :

Work around

Load:
sudo modprobe nf_conntrack_ipv4

Confirm:
lsmod | grep nf_conntrack_ipv4

David Ames (thedac)
Changed in charm-neutron-openvswitch:
milestone: none → 19.10
David Ames (thedac)
Changed in charm-neutron-openvswitch:
status: Fix Committed → Fix Released
Revision history for this message
Nobuto Murata (nobuto) wrote :

I've filed a follow-up bug of neutron-openvswitch on kernel upgrade: https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1851764

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/871659

Changed in neutron:
status: New → In Progress
Changed in neutron:
assignee: nobody → Brian Haley (brian-haley)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/871659
Committed: https://opendev.org/openstack/neutron/commit/c609084b59c68f003153a58c2063f99b52f169e0
Submitter: "Zuul (22348)"
Branch: master

commit c609084b59c68f003153a58c2063f99b52f169e0
Author: Brian Haley <email address hidden>
Date: Tue Jan 24 15:16:47 2023 -0500

    Add doc note on nf_conntrack module requirement

    The OVS firewall driver requires nf_conntrack module(s)
    to be loaded to function properly. While they are typically
    loaded automatically, add a note to the admin guide about
    the requirement to make it explicit.

    Closes-bug: #1834213

    Change-Id: I55871eff1e37d4155b8d2b5ae8c182d160c4af9f

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 22.0.0.0rc1

This issue was fixed in the openstack/neutron 22.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.