Cannot create ipvlans with > 1500 MTU on recent Bionic kernels

Bug #1879658 reported by Nivedita Singhvi on 2020-05-20
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Critical
Unassigned
Bionic
Critical
Nivedita Singhvi

Bug Description

[IMPACT]

Setting an MTU larger than the default 1500 results in an
error on the recent (4.15.0-92+) Bionic/Xenial -hwe kernels
when attempting to create ipvlan interfaces:

# ip link add test0 mtu 9000 link eno1 type ipvlan mode l2
RTNETLINK answers: Invalid argument

This breaks Docker and other applications which use a Jumbo
MTU (9000) when using ipvlans.

The bug is caused by the following recent commit to Bionic
& Xenial-hwe; which is pulled in via the stable patchset below,
which enforces a strict min/max MTU when MTUs are being set up
via rtnetlink for ipvlans:

Breaking commit:
-------------------
Ubuntu-hwe-4.15.0-92.93~16.04.1
* Bionic update: upstream stable patchset 2020-02-21 (LP: #1864261)
  * net: rtnetlink: validate IFLA_MTU attribute in rtnl_create_link()

The above patch applies checks of dev->min_mtu and dev->max_mtu
to avoid a malicious user from crashing the kernel with a bad
value. It was patching the original patchset to centralize min/max
MTU checking from various different subsystems of the networking
kernel. However, in that patchset, the max_mtu had not been set
to the largest phys (64K) or jumbo (9000 bytes), and defaults to
1500. The recent commit above which enforces strict bounds checking
for MTU size exposes the bug of the max mtu not being set correctly
for the ipvlan driver (this has been previously fixed in bonding,
teaming drivers).

Fix:
-------
This was fixed in the upstream kernel as of v4.18-rc2 for ipvlans,
but was not backported to Bionic along with other patches. The missing commit in the Bionic backport:

ipvlan: use ETH_MAX_MTU as max mtu
commit 548feb33c598dfaf9f8e066b842441ac49b84a8a

[Test Case]

1. Install any kernel earlier than 4.15.0-92 (Bionic/Xenial-hwe)

2. # ip link add test1 mtu 9000 link eno1 type ipvlan mode l2
   (where test1 eno1 is the physical interface you are adding
    the ipvlan on)

3. # ip link
...
14: test1@eno1: <BROADCAST,MULTICAST> mtu 9000 qdisc noop state DOWN mode DEFAULT group default qlen 1000
...
  // check that your test1 ipvlan is created with mtu 9000

4. Install 4.15.0-92 kernel or later

5. # ip link add test1 mtu 9000 link eno1 type ipvlan mode l2
RTNETLINK answers: Invalid argument

6. With the above fix commit backported to the xenial-hwe/Bionic,
the jumbo mtu ipvlan creation works again, identical to before 92.

[Regression Potential]

This commit is in upstream mainline as of v4.18-rc2, and hence
is already in Cosmic and later, i.e. all post Bionic releases
currently. Hence there's low regression potential here.

It only impacts ipvlan functionality, and not other networking
systems, so core systems should not be affected by this. And
affects on setup so it either works or doesn't. Patch is trivial.

It only impacts Bionic/Xenial-hwe 4.15.0-92 onwards versions
(where the latent bug got exposed).

Changed in linux (Ubuntu):
importance: Undecided → Critical
Changed in linux (Ubuntu Bionic):
importance: Undecided → Critical
description: updated
description: updated

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1879658

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu Bionic):
status: New → Incomplete
tags: added: bionic
Changed in linux (Ubuntu Bionic):
status: Incomplete → In Progress
assignee: nobody → Nivedita Singhvi (niveditasinghvi)

SRU request has been submitted.

If anyone would like to test, there are test images up on:
https://people.canonical.com/~nivedita/ipvlan-test-fix-278887/

You can 'wget' the files and then 'dpkg -i' the modules,
linux-image, modules-extra debs in that order, and reboot.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
status: Confirmed → In Progress
status: In Progress → Invalid

Test kernel has been tested successfully so far by
original reporter and has fixed the Docker breakage
and so on.

Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic

Tested.

tags: added: verification-done-bionic
removed: verification-needed-bionic

Packages tested

linux-gcp (4.15.0-1078.88~16.04.1) xenial;
linux-hwe (4.15.0-107.108~16.04.1) xenial;
linux-gcp-4.15 (4.15.0-1078.88) bionic;
linux (4.15.0-107.108) bionic;

Launchpad Janitor (janitor) wrote :
Download full text (17.7 KiB)

This bug was fixed in the package linux - 4.15.0-109.110

---------------
linux (4.15.0-109.110) bionic; urgency=medium

  * Packaging resync (LP: #1786013)
    - [Packaging] update helper scripts
    - update dkms package versions

  * Build and ship a signed wireguard.ko (LP: #1861284)
    - [Packaging] wireguard -- add support for building signed .ko

  * CVE-2019-16089
    - SAUCE: nbd_genl_status: null check for nla_nest_start

  * CVE-2019-19642
    - kernel/relay.c: handle alloc_percpu returning NULL in relay_open

  * CVE-2019-12380
    - efi/x86/Add missing error handling to old_memmap 1:1 mapping code

  * CVE-2019-19039 // CVE-2019-19377
    - btrfs: sink flush_fn to extent_write_cache_pages
    - btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up
    - btrfs: Don't submit any btree write bio if the fs has errors

  * CVE-2019-19036
    - btrfs: volumes: Use more straightforward way to calculate map length
    - btrfs: tree-checker: Try to detect missing INODE_ITEM
    - Btrfs: tree-checker: detect file extent items with overlapping ranges
    - Btrfs: make tree checker detect checksum items with overlapping ranges
    - btrfs: harden agaist duplicate fsid on scanned devices
    - Btrfs: fix missing data checksums after replaying a log tree
    - btrfs: reloc: fix reloc root leak and NULL pointer dereference
    - btrfs: Validate child tree block's level and first key
    - btrfs: Detect unbalanced tree with empty leaf before crashing btree
      operations

  * CVE-2019-19318
    - btrfs: tree-checker: Replace root parameter with fs_info
    - btrfs: tree-checker: Check level for leaves and nodes
    - btrfs: tree-checker: get fs_info from eb in generic_err
    - btrfs: tree-checker: get fs_info from eb in file_extent_err
    - btrfs: tree-checker: get fs_info from eb in check_csum_item
    - btrfs: tree-checker: get fs_info from eb in dir_item_err
    - btrfs: tree-checker: get fs_info from eb in check_dir_item
    - btrfs: tree-checker: get fs_info from eb in block_group_err
    - btrfs: tree-checker: get fs_info from eb in check_block_group_item
    - btrfs: tree-checker: get fs_info from eb in check_extent_data_item
    - btrfs: tree-checker: get fs_info from eb in check_leaf_item
    - btrfs: tree-checker: get fs_info from eb in check_leaf
    - btrfs: tree-checker: get fs_info from eb in chunk_err
    - btrfs: tree-checker: get fs_info from eb in dev_item_err
    - btrfs: tree-checker: get fs_info from eb in check_dev_item
    - btrfs: tree-checker: get fs_info from eb in check_inode_item
    - btrfs: tree-checker: Add ROOT_ITEM check
    - btrfs: tree-checker: Add EXTENT_ITEM and METADATA_ITEM check
    - btrfs: tree-checker: Add simple keyed refs check
    - btrfs: tree-checker: Add EXTENT_DATA_REF check
    - btrfs: tree-checker: Fix wrong check on max devid
    - Btrfs: fix selftests failure due to uninitialized i_mode in test inodes

  * CVE-2019-19813 // CVE-2019-19816
    - btrfs: Refactor parameter of BTRFS_MAX_DEVS() from root to fs_info
    - btrfs: Move btrfs_check_chunk_valid() to tree-check.[ch] and export it
    - btrfs: tree-checker: Make chunk item checker messages more readable
    - btrfs...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released

Some of the 4.15 kernels fixed:

Bionic linux kernel: 4.15.0-109.110
Bionic linux-aws kernel: 4.15.0-1077.81
Xenial linux-hwe kernel: 4.15.0-107.108~16.04.1
Xenial linux-gcp kernel: 4.15.0-1078.88~16.04.1

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers