netfilter regression introducing a performance slowdown in binary arp/ip/ip6tables

Bug #1640786 reported by Eric Desrochers
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Eric Desrochers
Xenial
Fix Released
High
Eric Desrochers
Yakkety
Fix Released
High
Eric Desrochers

Bug Description

[SRU JUSTIFICATION]

[Impact]

It has been brought to my attention that Ubuntu kernel 4.4 has a severe netfilter regression affecting the performance of "/sbin/iptables" command, especially when adding large number of policies. My source have documented everything here[2].

Note that the situation can also be reproduce with latest and greatest upstream kernel v4.9-rc4.

I was able to reproduce the situation on my side, and a kernel bisect identified the same offending commit[1] as my source found for this bug.

Running the commit right before the offending one have proven to have expected performance :

# commit [71ae0dff] <== Offending commit
real 0m33.314s
user 0m1.520s
sys 0m26.192s

# commit [d7b59742] <== Right before offending commit
real 0m5.952s
user 0m0.124s
sys 0m0.220s

[Test Case]

* Reproducer #1
$ iptables -F
$ time (./list-addrs 3000 | xargs -n1 iptables -A FORWARD -j ACCEPT -s)

* Reproducer #2
$ iptables -F
$ time for f in `seq 1 3000` ; do iptables -A FORWARD ; done

"list-addrs" script can be found here[3]

[Regression Potential]

 * none expected, the patches have been proven to work on mainline kernel, and was reviewed by a few netfilters maintainer + tested by myself.

Reference:
https://kernel.googlesource.com/pub/scm/linux/kernel/git/pablo/nf-next/

Patches:
https://kernel.googlesource.com/pub/scm/linux/kernel/git/pablo/nf-next/+/2394ae21e8b652aff0db1c02e946243c1e2f5edb
https://kernel.googlesource.com/pub/scm/linux/kernel/git/pablo/nf-next/+/722d6785e3b29a3b9f95c4d77542a1416094786a
https://kernel.googlesource.com/pub/scm/linux/kernel/git/pablo/nf-next/+/18b61e8161cc308cbfd06d2e2c6c0758dfd925ef

[Other Info]

* "iptables-restore" doesn't suffer of that netfilter regression, and I'm also aware that "iptables-restore" is the favourite approach since it is way more efficient than iptables that is executed over and over, once for each policy one want to set, but since "binary arp/ip/ip6tables" takes vastly longer to perform with that commit, I think this need to be address anyway.

[Related Documents]

[1] - https://github.com/torvalds/linux/commit/71ae0dff02d756e4d2ca710b79f2ff5390029a5f
[2] - https://gist.github.com/williammartin/b75e3faf5964648299e4d985413e6c0c
[3] - https://gist.github.com/williammartin/b75e3faf5964648299e4d985413e6c0c#file-list-addrs

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1640786

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Eric Desrochers (slashd) wrote : Re: netfilter regression introducing a performance slowdown in iptables

I tried the above protocol using upstream mainline kernel v4.9-rc4, and the regression is still present. Unfortunately, it seems like there is nothing as we speak that we can cherry-pick/backport from upstream into Ubuntu kernel.

tags: added: sts
description: updated
Revision history for this message
Eric Desrochers (slashd) wrote :

Workaround :

Use "iptables-restore"

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
importance: Undecided → Medium
Revision history for this message
Eric Desrochers (slashd) wrote :

Additional note :

The following situation, is ONLY happening on system with more than 1 CPUs.

Eric Desrochers (slashd)
summary: - netfilter regression introducing a performance slowdown in iptables
+ netfilter regression introducing a performance slowdown in binary
+ ip/ip6tables
tags: added: kernel-da-key
Eric Desrochers (slashd)
Changed in linux (Ubuntu):
assignee: nobody → Eric Desrochers (slashd)
Revision history for this message
Eric Desrochers (slashd) wrote :

There is 3 upstream patches[1] "Under Review", that has been created to mitigate this specific situation. The test (with these patches) so far has been proven to provide better performance.

The patches consist on allocating the percpu counters in page-sized batch chunks.

[1] - Patches :
http://patchwork.ozlabs.org/patch/697259/
http://patchwork.ozlabs.org/patch/697260/
http://patchwork.ozlabs.org/patch/697261/

- Eric

summary: netfilter regression introducing a performance slowdown in binary
- ip/ip6tables
+ arp/iptables/ip6tables
summary: netfilter regression introducing a performance slowdown in binary
- arp/iptables/ip6tables
+ arp/ip/ip6tables
Revision history for this message
Eric Desrochers (slashd) wrote :

I also tested the protocol found here[1], on top of mainline kernel v4.9

* Without the nf-next patches :

# time (./list-addrs 3000 | xargs -n1 iptables -A FORWARD -j ACCEPT -s)

real 0m32.994s
user 0m1.288s
sys 0m26.076s

* With the nf-next patches :

# time (./list-addrs 3000 | xargs -n1 iptables -A FORWARD -j ACCEPT -s)

real 0m5.428s
user 0m0.068s
sys 0m0.288s

I do notice a significant performance increase.

[1] - https://gist.github.com/williammartin/b75e3faf5964648299e4d985413e6c0c

- Eric

Changed in linux (Ubuntu):
status: Confirmed → In Progress
Revision history for this message
Eric Desrochers (slashd) wrote :

I have backport the "Under Review" patches into Xenial 4.4.0-49 to test, here are the numbers :

# uname -a
Linux <HOSTNAME> 4.4.0-49-generic #70hf121102v20161124b2-Ubuntu SMP Fri Nov 25 02:34:36 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

# iptables -F
# time (./list-addrs 3000 | xargs -n1 iptables -A FORWARD -j ACCEPT -s)

real 0m5.661s
user 0m0.096s
sys 0m0.452s

So the patches have positive impact on mainline kernel and Xenial Ubuntu kernel.

Once the patches are approved, I will submit it to the kernel team.

Eric

Changed in linux (Ubuntu Trusty):
status: New → In Progress
importance: Undecided → Medium
Eric Desrochers (slashd)
Changed in linux (Ubuntu Trusty):
assignee: nobody → Eric Desrochers (slashd)
no longer affects: linux (Ubuntu Trusty)
Changed in linux (Ubuntu Xenial):
importance: Undecided → Medium
status: New → In Progress
Eric Desrochers (slashd)
Changed in linux (Ubuntu Xenial):
assignee: nobody → Eric Desrochers (slashd)
Eric Desrochers (slashd)
Changed in linux (Ubuntu):
importance: Medium → High
Changed in linux (Ubuntu Xenial):
importance: Medium → High
Revision history for this message
Eric Desrochers (slashd) wrote :

A quick update on comment #6.

The patches are no longer set "Under Review" and has been now merge in a repository called "nf-next" which stands for "Netfilter's -next tree"

The "-next" tree is the holding area for patches aimed at the next kernel merge window.

Reference:
https://kernel.googlesource.com/pub/scm/linux/kernel/git/pablo/nf-next/

Patches:
https://kernel.googlesource.com/pub/scm/linux/kernel/git/pablo/nf-next/+/2394ae21e8b652aff0db1c02e946243c1e2f5edb
https://kernel.googlesource.com/pub/scm/linux/kernel/git/pablo/nf-next/+/722d6785e3b29a3b9f95c4d77542a1416094786a
https://kernel.googlesource.com/pub/scm/linux/kernel/git/pablo/nf-next/+/18b61e8161cc308cbfd06d2e2c6c0758dfd925ef

Once the patches are merged in upstream kernel, I will then start the working on backporting the patches in 4.4 kernel and late.

Eric

Eric Desrochers (slashd)
description: updated
description: updated
description: updated
Eric Desrochers (slashd)
description: updated
Eric Desrochers (slashd)
Changed in linux (Ubuntu Yakkety):
importance: Undecided → High
assignee: nobody → Eric Desrochers (slashd)
status: New → In Progress
Revision history for this message
Eric Desrochers (slashd) wrote :

The patchset is now in v4.10-rc1.

# git clone https://github.com/torvalds/linux.git

ae0ac0e netfilter: x_tables: pack percpu counter allocations
f28e15b netfilter: x_tables: pass xt_counters struct to counter allocator
4d31eef netfilter: x_tables: pass xt_counters struct instead of packet counter

# git describe --contains 4d31eef
# git describe --contains f28e15b
# git describe --contains ae0ac0e
v4.10-rc1

- Eric

Revision history for this message
Eric Desrochers (slashd) wrote :

I have submitted the patchset to the Ubuntu kernel team for the following releases(kernel versions) :

- Zesty (v4.9)
- Yakkety (v4.8)
- Xenial (v4.4)

- Eric

Luis Henriques (henrix)
Changed in linux (Ubuntu Yakkety):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (8.0 KiB)

This bug was fixed in the package linux - 4.9.0-15.16

---------------
linux (4.9.0-15.16) zesty; urgency=low

  [ Tim Gardner ]

  * Release Tracking Bug
    - LP: #1658101

  * Zesty update to v4.9.5 stable release (LP: #1658099)
    - Input: xpad - use correct product id for x360w controllers
    - Input: i8042 - add Pegatron touchpad to noloop table
    - pinctrl: imx: fix imx_pinctrl_desc initialization
    - pinctrl: sh-pfc: r8a7795: Use lookup function for bias data
    - pinctrl: sh-pfc: Add helper to handle bias lookup table
    - regulator: tps65086: Fix 25mV ranges for BUCK regulators
    - regulator: axp20x: Fix axp809 ldo_io registration error on cold boot
    - drm/tegra: dpaux: Fix error handling
    - drm/vc4: Fix a couple error codes in vc4_cl_lookup_bos()
    - drm/savage: dereferencing an error pointer
    - selftests: do not require bash to run netsocktests testcase
    - selftests: do not require bash for the generated test
    - zram: revalidate disk under init_lock
    - zram: support BDI_CAP_STABLE_WRITES
    - dax: fix deadlock with DAX 4k holes
    - mm: pmd dirty emulation in page fault handler
    - mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}
    - ocfs2: fix crash caused by stale lvb with fsdlm plugin
    - mm, memcg: fix the active list aging for lowmem requests when memcg is enabled
    - mm: support anonymous stable page
    - mm/slab.c: fix SLAB freelist randomization duplicate entries
    - mm/hugetlb.c: fix reservation race when freeing surplus pages
    - KVM: x86: fix emulation of "MOV SS, null selector"
    - KVM: eventfd: fix NULL deref irqbypass consumer
    - jump_labels: API for flushing deferred jump label updates
    - KVM: x86: flush pending lapic jump label updates on module unload
    - KVM: x86: fix NULL deref in vcpu_scan_ioapic
    - KVM: x86: add Align16 instruction flag
    - KVM: x86: add asm_safe wrapper
    - KVM: x86: emulate FXSAVE and FXRSTOR
    - KVM: x86: Introduce segmented_write_std
    - efi/libstub/arm*: Pass latest memory map to the kernel
    - efi/x86: Prune invalid memory map entries and fix boot regression
    - x86/efi: Don't allocate memmap through memblock after mm_init()
    - nl80211: fix sched scan netlink socket owner destruction
    - gpio: Move freeing of GPIO hogs before numbing of the device
    - xfs: Timely free truncated dirty pages
    - bridge: netfilter: Fix dropping packets that moving through bridge interface
    - x86/cpu/AMD: Clean up cpu_llc_id assignment per topology feature
    - x86/bugs: Separate AMD E400 erratum and C1E bug
    - x86/CPU/AMD: Fix Bulldozer topology
    - wusbcore: Fix one more crypto-on-the-stack bug
    - usb: musb: fix runtime PM in debugfs
    - USB: serial: kl5kusb105: fix line-state error handling
    - USB: serial: ch341: fix initial modem-control state
    - USB: serial: ch341: fix resume after reset
    - USB: serial: ch341: fix open error handling
    - USB: serial: ch341: fix control-message error handling
    - USB: serial: ch341: fix open and resume after B0
    - Input: elants_i2c - avoid divide by 0 errors on bad touchscreen data
    - i2c: print correct device invalid address
    - i2c: fix kern...

Read more...

Changed in linux (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
tags: added: verification-needed-yakkety
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-yakkety' to 'verification-done-yakkety'. If the problem still exists, change the tag 'verification-needed-yakkety' to 'verification-failed-yakkety'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message
Eric Desrochers (slashd) wrote :

I confirm that iptables offers way better performance now on Xenial kernel.

BEFORE:
$ uname -r
4.4.0-62-generic

$ time (./list-addrs 3000 | xargs -n1 iptables -A FORWARD -j ACCEPT -s)

real 0m34.502s
user 0m1.372s
sys 0m27.428s

AFTER:
$ uname -r
4.4.0-63-generic

$ time (./list-addrs 3000 | xargs -n1 iptables -A FORWARD -j ACCEPT -s)

real 0m5.680s
user 0m0.100s
sys 0m0.264s

tags: added: verification-done-xenial
removed: verification-needed-xenial
Eric Desrochers (slashd)
tags: added: verification-done-yakkety
removed: verification-needed-yakkety
Revision history for this message
Eric Desrochers (slashd) wrote :

It has been brought to my attention :

"... we did confirm that everything looks good from our end as well. Perf results are at https://gist.github.com/teddyking/af3c404bc313e22048c90bb381b58300 if you'd like to review.

Thanks so much for the great support and followup!"

- Eric

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (23.0 KiB)

This bug was fixed in the package linux - 4.4.0-63.84

---------------
linux (4.4.0-63.84) xenial; urgency=low

  [ Thadeu Lima de Souza Cascardo ]

  * Release Tracking Bug
    - LP: #1660704

  * Backport Dirty COW patch to prevent wineserver freeze (LP: #1658270)
    - SAUCE: mm: Respect FOLL_FORCE/FOLL_COW for thp

  * Kdump through NMI SMP and single core not working on Ubuntu16.10
    (LP: #1630924)
    - x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic
    - SAUCE: hv: don't reset hv_context.tsc_page on crash

  * [regression 4.8.0-14 -> 4.8.0-17] keyboard and touchscreen lost on Acer
    Chromebook R11 (LP: #1630238)
    - [Config] CONFIG_PINCTRL_CHERRYVIEW=y

  * Call trace when testing fstat stressor on ppc64el with virtual keyboard and
    mouse present (LP: #1652132)
    - SAUCE: HID: usbhid: Quirk a AMI virtual mouse and keyboard with ALWAYS_POLL

  * VLAN SR-IOV regression for IXGBE driver (LP: #1658491)
    - ixgbe: Force VLNCTRL.VFE to be set in all VMDq paths

  * "Out of memory" errors after upgrade to 4.4.0-59 (LP: #1655842)
    - mm, page_alloc: convert alloc_flags to unsigned
    - mm, compaction: change COMPACT_ constants into enum
    - mm, compaction: distinguish COMPACT_DEFERRED from COMPACT_SKIPPED
    - mm, compaction: simplify __alloc_pages_direct_compact feedback interface
    - mm, compaction: distinguish between full and partial COMPACT_COMPLETE
    - mm, compaction: abstract compaction feedback to helpers
    - mm, oom: protect !costly allocations some more
    - mm: consider compaction feedback also for costly allocation
    - mm, oom, compaction: prevent from should_compact_retry looping for ever for
      costly orders
    - mm, oom: protect !costly allocations some more for !CONFIG_COMPACTION
    - mm, oom: prevent premature OOM killer invocation for high order request

  * Backport 3 patches to fix bugs with AIX clients using IBMVSCSI Target Driver
    (LP: #1657194)
    - SAUCE: ibmvscsis: Fix max transfer length
    - SAUCE: ibmvscsis: fix sleeping in interrupt context
    - SAUCE: ibmvscsis: Fix srp_transfer_data fail return code

  * NVMe: adapter is missing after abnormal shutdown followed by quick reboot,
    quirk needed (LP: #1656913)
    - nvme: apply DELAY_BEFORE_CHK_RDY quirk at probe time too

  * Ubuntu 16.10 KVM SRIOV: if enable sriov while ping flood is running ping
    will stop working (LP: #1625318)
    - PCI: Do any VF BAR updates before enabling the BARs
    - PCI: Ignore BAR updates on virtual functions
    - PCI: Update BARs using property bits appropriate for type
    - PCI: Separate VF BAR updates from standard BAR updates
    - PCI: Don't update VF BARs while VF memory space is enabled
    - PCI: Remove pci_resource_bar() and pci_iov_resource_bar()
    - PCI: Decouple IORESOURCE_ROM_ENABLE and PCI_ROM_ADDRESS_ENABLE
    - PCI: Add comments about ROM BAR updating

  * Linux rtc self test fails in a VM under xenial (LP: #1649718)
    - kvm: x86: Convert ioapic->rtc_status.dest_map to a struct
    - kvm: x86: Track irq vectors in ioapic->rtc_status.dest_map
    - kvm: x86: Check dest_map->vector to match eoi signals for rtc

  * Xenial update to v4.4.44 stable releas...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (20.4 KiB)

This bug was fixed in the package linux - 4.8.0-38.41

---------------
linux (4.8.0-38.41) yakkety; urgency=low

  [ Thadeu Lima de Souza Cascardo ]

  * Release Tracking Bug
    - LP: #1661232

  * Backport Dirty COW patch to prevent wineserver freeze (LP: #1658270)
    - SAUCE: mm: Respect FOLL_FORCE/FOLL_COW for thp

  * Kdump through NMI SMP and single core not working on Ubuntu16.10
    (LP: #1630924)
    - x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic
    - SAUCE: hv: don't reset hv_context.tsc_page on crash

  * Call trace when testing fstat stressor on ppc64el with virtual keyboard and
    mouse present (LP: #1652132)
    - HID: usbhid: Quirk a AMI virtual mouse and keyboard with ALWAYS_POLL

  * regression in linux-libc-dev in yakkety: C++ style comments are not allowed
    in ISO C90 (LP: #1659654)
    - generic syscalls: kill cruft from removed pkey syscalls

  * [16.04.2] POWER9 patches on top of 4.8 (LP: #1650263)
    - powerpc/book3s: Add a cpu table entry for different POWER9 revs
    - powerpc/mm/radix: Use different RTS encoding for different POWER9 revs
    - powerpc/mm/radix: Use different pte update sequence for different POWER9
      revs
    - powerpc/mm: Update the HID bit when switching from radix to hash
    - powerpc/64/kexec: NULL check "clear_all" in kexec_sequence
    - powerpc/64/kexec: Fix MMU cleanup on radix
    - powerpc/mm: Add radix flush all with IS=3
    - powerpc/64/kexec: Copy image with MMU off when possible
    - powerpc/64: Simplify adaptation to new ISA v3.00 HPTE format
    - powerpc/mm/radix: Invalidate ERAT on tlbiel for POWER9 DD1
    - powerpc/mm: Fix missing update of HID register on secondary CPUs
    - powerpc/64: Add some more SPRs and SPR bits for POWER9
    - powerpc/64: Provide functions for accessing POWER9 partition table
    - powerpc/powernv: Define real-mode versions of OPAL XICS accessors
    - powerpc/64: Define new ISA v3.00 logical PVR value and PCR register value
    - mm: update mmu_gather range correctly
    - mm/hugetlb: add tlb_remove_hugetlb_entry for handling hugetlb pages
    - mm: add tlb_remove_check_page_size_change to track page size change
    - powerpc: Revert Load Monitor Register Support
    - powerpc/mm: Correct process and partition table max size
    - powernv: Clear SPRN_PSSCR when a POWER9 CPU comes online
    - powerpc/mm/radix: Setup AMOR in HV mode to allow key 0
    - powerpc/mm: Detect instruction fetch denied and report
    - powerpc/mm/radix: Prevent kernel execution of user space
    - powerpc/mm: Rename hugetlb-radix.h to hugetlb.h
    - powerpc/mm/hugetlb: Handle hugepage size supported by hash config
    - powerpc/mm: Introduce _PAGE_LARGE software pte bits
    - powerpc/mm: Add radix__tlb_flush_pte_p9_dd1()
    - powerpc/mm: update radix__ptep_set_access_flag to not do full mm tlb flush
    - powerpc/mm: update radix__pte_update to not do full mm tlb flush
    - powerpc/mm: Batch tlb flush when invalidating pte entries
    - powerpc/sparse: Make a bunch of things static
    - powerpc/perf: factor out the event format field
    - powerpc/perf: update attribute_group data structure
    - powerpc/perf: power9 raw event format en...

Changed in linux (Ubuntu Yakkety):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.