Neighbour confirmation broken, breaks ARP cache aging

Bug #1715812 reported by Daniel Axtens on 2017-09-08
26
This bug affects 5 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Daniel Axtens
Xenial
Undecided
Unassigned
Zesty
Undecided
Unassigned

Bug Description

[SRU Justification]

[Impact]
A host can lose access to another host whose MAC address changes if they have active connections to other hosts that share a route. The ARP cache does not time out as expected - instead the old MAC address is continuously reconfirmed.

[Fix]
Apply series [1], which changes the algorithm for neighbour confirmation.
That is, from upstream:
51ce8bd4d17a net: pending_confirm is not used anymore
0dec879f636f net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP
63fca65d0863 net: add confirm_neigh method to dst_ops
c3a2e8370534 tcp: replace dst_confirm with sk_dst_confirm
c86a773c7802 sctp: add dst_pending_confirm flag
4ff0620354f2 net: add dst_pending_confirm flag to skbuff
9b8805a32559 sock: add sk_dst_pending_confirm flag

[Test case]
Create 3 real or virtual systems, all hooked up to a switch.
One system needs an active-backup bond with fail_over_mac=1 num_grat_arp=0.

Put all the systems in the same subnet, e.g. 192.168.200.0/24

Call the system with the bond A, and the other two systems B and C.

On B, run in 3 shells:
 - netperf -t TCP_RR to C
 - ping -f A
 - watch 'ip -s neigh show 192.168.200.0/24'

On A, cause the bond to fail over.

Observe that:

 - without the patches, B intermittently fails to notice the change in A's MAC address. This presents as the ping failing and not recovering, and the arp table showing the old mac address never timing out and never being replace with a new mac address.

 - with the patches, the arp cache times out and B sends another mac probe and detects A's new address.

It helps to use taskset to put ping and netperf on the same CPU, or use single-CPU vms.

See [2] for more details.

[References]
[2] Original report: https://<email address hidden>/msg138762.html
[1]: https://www.spinics.net/lists/linux-rdma/msg45907.html

CVE References

Stefan Bader (smb) on 2017-09-15
Changed in linux (Ubuntu Zesty):
status: New → Fix Committed
Changed in linux (Ubuntu Xenial):
status: New → Fix Committed

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
tags: added: verification-needed-zesty

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-zesty' to 'verification-done-zesty'. If the problem still exists, change the tag 'verification-needed-zesty' to 'verification-failed-zesty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Fabio B (illobo) on 2017-09-27
tags: added: verification-done-xenial
removed: verification-needed-xenial
Daniel Axtens (daxtens) wrote :

Verified on Xenial and Zesty.

tags: added: verification-done-zesty
removed: verification-needed-zesty
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.10.0-37.41

---------------
linux (4.10.0-37.41) zesty; urgency=low

  * CVE-2017-1000255
    - SAUCE: powerpc/64s: Use emergency stack for kernel TM Bad Thing program
      checks
    - SAUCE: powerpc/tm: Fix illegal TM state in signal handler

linux (4.10.0-36.40) zesty; urgency=low

  * linux: 4.10.0-36.40 -proposed tracker (LP: #1718143)

  * Neighbour confirmation broken, breaks ARP cache aging (LP: #1715812)
    - sock: add sk_dst_pending_confirm flag
    - net: add dst_pending_confirm flag to skbuff
    - sctp: add dst_pending_confirm flag
    - tcp: replace dst_confirm with sk_dst_confirm
    - net: add confirm_neigh method to dst_ops
    - net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP
    - net: pending_confirm is not used anymore

  * SRIOV: warning if unload VFs (LP: #1715073)
    - PCI: Lock each enable/disable num_vfs operation in sysfs
    - PCI: Disable VF decoding before pcibios_sriov_disable() updates resources

  * Kernel has troule recognizing Corsair Strafe RGB keyboard (LP: #1678477)
    - usb: quirks: add delay init quirk for Corsair Strafe RGB keyboard

  * CVE-2017-14106
    - tcp: initialize rcv_mss to TCP_MIN_MSS instead of 0

  * [CIFS] Fix maximum SMB2 header size (LP: #1713884)
    - CIFS: Fix maximum SMB2 header size

  * Middle button of trackpoint doesn't work (LP: #1715271)
    - Input: trackpoint - assume 3 buttons when buttons detection fails

  * Drop GPL from of_node_to_nid() export to match other arches (LP: #1709179)
    - powerpc: Drop GPL from of_node_to_nid() export to match other arches

  * vhost guest network randomly drops under stress (kvm) (LP: #1711251)
    - Revert "vhost: cache used event for better performance"

  * arm64 arch_timer fixes (LP: #1713821)
    - Revert "UBUNTU: SAUCE: arm64: arch_timer: Enable CNTVCT_EL0 trap if
      workaround is enabled"
    - arm64: arch_timer: Enable CNTVCT_EL0 trap if workaround is enabled
    - clocksource/arm_arch_timer: Fix arch_timer_mem_find_best_frame()
    - clocksource/drivers/arm_arch_timer: Fix read and iounmap of incorrect
      variable
    - clocksource/drivers/arm_arch_timer: Fix mem frame loop initialization
    - clocksource/drivers/arm_arch_timer: Avoid infinite recursion when ftrace is
      enabled

  * Touchpad not detected (LP: #1708852)
    - Input: elan_i2c - add ELAN0608 to the ACPI table

 -- Thadeu Lima de Souza Cascardo <email address hidden> Fri, 06 Oct 2017 16:45:48 -0300

Changed in linux (Ubuntu Zesty):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (7.8 KiB)

This bug was fixed in the package linux - 4.4.0-97.120

---------------
linux (4.4.0-97.120) xenial; urgency=low

  * linux: 4.4.0-97.120 -proposed tracker (LP: #1718149)

  * blk-mq: possible deadlock on CPU hot(un)plug (LP: #1670634)
    - [Config] s390x -- disable CONFIG_{DM, SCSI}_MQ_DEFAULT

  * Xenial update to 4.4.87 stable release (LP: #1715678)
    - irqchip: mips-gic: SYNC after enabling GIC region
    - i2c: ismt: Don't duplicate the receive length for block reads
    - i2c: ismt: Return EMSGSIZE for block reads with bogus length
    - ceph: fix readpage from fscache
    - cpumask: fix spurious cpumask_of_node() on non-NUMA multi-node configs
    - cpuset: Fix incorrect memory_pressure control file mapping
    - alpha: uapi: Add support for __SANE_USERSPACE_TYPES__
    - CIFS: remove endian related sparse warning
    - wl1251: add a missing spin_lock_init()
    - xfrm: policy: check policy direction value
    - drm/ttm: Fix accounting error when fail to get pages for pool
    - kvm: arm/arm64: Fix race in resetting stage2 PGD
    - kvm: arm/arm64: Force reading uncached stage2 PGD
    - epoll: fix race between ep_poll_callback(POLLFREE) and ep_free()/ep_remove()
    - crypto: algif_skcipher - only call put_page on referenced and used pages
    - Linux 4.4.87

  * Xenial update to 4.4.86 stable release (LP: #1715430)
    - scsi: isci: avoid array subscript warning
    - ALSA: au88x0: Fix zero clear of stream->resources
    - btrfs: remove duplicate const specifier
    - i2c: jz4780: drop superfluous init
    - gcov: add support for gcc version >= 6
    - gcov: support GCC 7.1
    - lightnvm: initialize ppa_addr in dev_to_generic_addr()
    - p54: memset(0) whole array
    - lpfc: Fix Device discovery failures during switch reboot test.
    - arm64: mm: abort uaccess retries upon fatal signal
    - x86/io: Add "memory" clobber to insb/insw/insl/outsb/outsw/outsl
    - arm64: fpsimd: Prevent registers leaking across exec
    - scsi: sg: protect accesses to 'reserved' page array
    - scsi: sg: reset 'res_in_use' after unlinking reserved array
    - drm/i915: fix compiler warning in drivers/gpu/drm/i915/intel_uncore.c
    - Linux 4.4.86

  * Xenial update to 4.4.85 stable release (LP: #1714298)
    - af_key: do not use GFP_KERNEL in atomic contexts
    - dccp: purge write queue in dccp_destroy_sock()
    - dccp: defer ccid_hc_tx_delete() at dismantle time
    - ipv4: fix NULL dereference in free_fib_info_rcu()
    - net_sched/sfq: update hierarchical backlog when drop packet
    - ipv4: better IP_MAX_MTU enforcement
    - sctp: fully initialize the IPv6 address in sctp_v6_to_addr()
    - tipc: fix use-after-free
    - ipv6: reset fn->rr_ptr when replacing route
    - ipv6: repair fib6 tree in failure case
    - tcp: when rearming RTO, if RTO time is in past then fire RTO ASAP
    - irda: do not leak initialized list.dev to userspace
    - net: sched: fix NULL pointer dereference when action calls some targets
    - net_sched: fix order of queue length updates in qdisc_replace()
    - mei: me: add broxton pci device ids
    - mei: me: add lewisburg device ids
    - Input: trackpoint - add new trackpoint firmware ID
    - Input: elan_i2c...

Read more...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Daniel Axtens (daxtens) on 2017-11-02
Changed in linux (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers