dentry_reset_mounted walks entire mount list holding vfsmount write lock

Bug #1226726 reported by Chris J Arges
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Precise
Won't Fix
High
Unassigned
Quantal
Fix Released
High
Chris J Arges
Raring
Fix Released
High
Chris J Arges
Saucy
Fix Released
Undecided
Unassigned

Bug Description

SRU Justification:

Impact: When creating thousands of network namespaces the delay in executing commands increases exponentially in kernels before 84d17192.

Fix: In 84d17192 in the upstream kernel, locking code in fs/namespace.c is greatly improved resulting in much better performance when the number namespaces increase.

Testcase: Below, test_ns.sh can be run and a graph can be compared between the existing version and the patched version.

Additional Information: Because this is a change in the vfs layer, I ran the xfstests and compared before and after results of this patch. The patch did not create any additional failures in the generic xfstests.

The quantal and raring solutions differ but are both based on the 84d17192
patch. The quantal solution does a backport of this patch instead of clean
cherry-picks because of the amount of deps required to just use cherry-picks.
The raring solution was able to be done with two clean cherry-picks and that's
why that solution was chosen.

--

Whenever one enters a network namespace via "ip netns exec foobar somecommand" there is a mount done of the appropriate device on /sys since "somecommand" needs to see namespace specific versions of /sys directories. When the ip process exits these mounts need to be torn down, and that requires a global write lock for vfsmount_lock (this is a single writer multiple reader lock). This has serious performance implications when the number of name spaces increase.

The commit 84d17192 addresses this issue, and it is clear by running the attached testcase that it fixes performance issues when dealing with large numbers of namespaces. I've included a graph with the differences in performance between this fix and its parent commit to show the the improve in performance. The x-axis represents the number of namespaces and the y-axis is execution time in ms. After applying the patch the performance delays are not exponentially increasing.

This affects 3.2/3.5/3.8 series kernels, as it was fixed in 3.10.

Revision history for this message
Chris J Arges (arges) wrote :
Revision history for this message
Chris J Arges (arges) wrote :

This script will test performance of 'ip netns exec <ns> /bin/true" when increasing namespaces by 1000.

Changed in linux (Ubuntu Raring):
assignee: nobody → Chris J Arges (arges)
Changed in linux (Ubuntu Quantal):
assignee: nobody → Chris J Arges (arges)
Changed in linux (Ubuntu Precise):
assignee: nobody → Chris J Arges (arges)
importance: Undecided → Medium
Changed in linux (Ubuntu Quantal):
importance: Undecided → Medium
Changed in linux (Ubuntu Raring):
importance: Undecided → Medium
Changed in linux (Ubuntu Saucy):
importance: Medium → Undecided
assignee: Chris J Arges (arges) → nobody
status: New → Fix Released
Changed in linux (Ubuntu Precise):
status: New → In Progress
Changed in linux (Ubuntu Quantal):
status: New → In Progress
Changed in linux (Ubuntu Raring):
status: New → In Progress
Chris J Arges (arges)
Changed in linux (Ubuntu Precise):
importance: Medium → High
Changed in linux (Ubuntu Raring):
importance: Medium → High
Changed in linux (Ubuntu Quantal):
importance: Medium → High
Revision history for this message
Chris J Arges (arges) wrote :

On 3.8 I have been able to solve the problem with two patches:
84d17192d2afd52aeba88c71ae4959a015f56a38
57eccb830f1cc93d4b506ba306d8dfa685e0c88f

Doing more extensive testing now.

Chris J Arges (arges)
description: updated
Chris J Arges (arges)
description: updated
Andy Whitcroft (apw)
Changed in linux (Ubuntu Quantal):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Raring):
status: In Progress → Fix Committed
Chris J Arges (arges)
Changed in linux (Ubuntu Precise):
status: In Progress → Won't Fix
assignee: Chris J Arges (arges) → nobody
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-quantal' to 'verification-done-quantal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-quantal
tags: added: verification-needed-raring
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-raring' to 'verification-done-raring'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message
Jonathan Davies (jpds) wrote :

Precise 3.2.0-52-virtual large VM:

#ns ms
1 103
1001 138
2001 219
3001 374
4001 606
5001 847
6001 921
7001 1438
8001 1734
9001 2099
10001 2441

Precise 3.5.0-43-generic lts-quantal large VM:

#ns ms
1 23
1001 45
2001 86
3001 93
4001 171
5001 147
6001 258
7001 403
8001 359
9001 538
10001 386

tags: added: verification-done-quantal
removed: verification-needed-quantal
Revision history for this message
Jonathan Davies (jpds) wrote :

Precise 3.8.0-33-generic lts-raring large VM:

#ns ms
1 9
1001 29
2001 34
3001 103
4001 97
5001 212
6001 198
7001 362
8001 333
9001 545
10001 501

tags: added: verification-done-raring
removed: verification-needed-raring
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (19.1 KiB)

This bug was fixed in the package linux - 3.5.0-43.66

---------------
linux (3.5.0-43.66) quantal; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1242895

  [ Timo Aaltonen ]

  * SAUCE: ubuntu/i915: silence unclaimed register poking debug messages
    - LP: #1138787

  [ Upstream Kernel Changes ]

  * Revert "xfs: fix _xfs_buf_find oops on blocks beyond the filesystem
    end"
    - LP: #1236041
    - CVE-2013-1819 fix backport:
  * Revert "sctp: fix call to SCTP_CMD_PROCESS_SACK in
    sctp_cmd_interpreter()"
    - LP: #1241093
  * get rid of full-hash scan on detaching vfsmounts
    - LP: #1226726
  * Smack: Fix the bug smackcipso can't set CIPSO correctly
    - LP: #1236743
  * SAUCE: (no-up) Only let characters through when there are active
    readers.
    - LP: #1208740
  * usb: xhci: define port register names and use them instead of magic
    numbers
    - LP: #1229576
  * usb: xhci: add USB2 Link power management BESL support
    - LP: #1229576
  * iwl4965: fix rfkill set state regression
    - LP: #1241093
  * ath9k_htc: Restore skb headroom when returning skb to mac80211
    - LP: #1241093
  * ALSA: opti9xx: Fix conflicting driver object name
    - LP: #1241093
  * SUNRPC: Fix memory corruption issue on 32-bit highmem systems
    - LP: #1241093
  * drm/i915: ivb: fix edp voltage swing reg val
    - LP: #1241093
  * drm/vmwgfx: Split GMR2_REMAP commands if they are to large
    - LP: #1241093
  * ALSA: ak4xx-adda: info leak in ak4xxx_capture_source_info()
    - LP: #1241093
  * Bluetooth: Add support for Foxconn/Hon Hai [0489:e04d]
    - LP: #1241093
  * [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a
    signal
    - LP: #1241093
  * xen-gnt: prevent adding duplicate gnt callbacks
    - LP: #1241093
  * usb: config->desc.bLength may not exceed amount of data returned by the
    device
    - LP: #1241093
  * USB: cdc-wdm: fix race between interrupt handler and tasklet
    - LP: #1241093
  * xhci-plat: Don't enable legacy PCI interrupts.
    - LP: #1241093
  * ASoC: wm8960: Fix PLL register writes
    - LP: #1241093
  * rculist: list_first_or_null_rcu() should use list_entry_rcu()
    - LP: #1241093
  * USB: mos7720: use GFP_ATOMIC under spinlock
    - LP: #1241093
  * USB: mos7720: fix big-endian control requests
    - LP: #1241093
  * staging: comedi: dt282x: dt282x_ai_insn_read() always fails
    - LP: #1241093
  * usb: ehci-mxc: check for pdata before dereferencing
    - LP: #1241093
  * usb: xhci: Disable runtime PM suspend for quirky controllers
    - LP: #1241093
  * USB: OHCI: Allow runtime PM without system sleep
    - LP: #1241093
  * ACPI / EC: Add HP Folio 13 to ec_dmi_table in order to skip DSDT scan
    - LP: #1241093
  * ACPI / EC: Add ASUSTEK L4R to quirk list in order to validate ECDT
    - LP: #1241093
  * USB: fix build error when CONFIG_PM_SLEEP isn't enabled
    - LP: #1241093
  * ALSA: hda - hdmi: Fallback to ALSA allocation when selecting CA
    - LP: #1241093
  * regmap: silence GCC warning
    - LP: #1241093
  * target: Fix trailing ASCII space usage in INQUIRY vendor+model
    - LP: #1241093
  * iwlwifi: dvm: don't send BT_CONFIG on devices w/o Bluetooth
    - LP: #1...

Changed in linux (Ubuntu Quantal):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (8.8 KiB)

This bug was fixed in the package linux - 3.8.0-33.48

---------------
linux (3.8.0-33.48) raring; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1242849

  [ Maximiliano Curia ]

  * SAUCE: (no-up) Only let characters through when there are active
    readers.
    - LP: #1208740

  [ Upstream Kernel Changes ]

  * cciss: fix info leak in cciss_ioctl32_passthru()
    - LP: #1188355
    - CVE-2013-2147
  * cpqarray: fix info leak in ida_locked_ioctl()
    - LP: #1188355
    - CVE-2013-2147
  * mount: consolidate permission checks
    - LP: #1226726
  * get rid of full-hash scan on detaching vfsmounts
    - LP: #1226726
  * Smack: Fix the bug smackcipso can't set CIPSO correctly
    - LP: #1236743
  * ipvs: add backup_only flag to avoid loops
    - LP: #1238494
  * tuntap: correctly handle error in tun_set_iff()
    - LP: #1229975
    - CVE-2013-4343
  * htb: fix sign extension bug
    - LP: #1240580
  * net: avoid to hang up on sending due to sysctl configuration overflow.
    - LP: #1240580
  * net: check net.core.somaxconn sysctl values
    - LP: #1240580
  * macvlan: validate flags
    - LP: #1240580
  * neighbour: populate neigh_parms on alloc before calling ndo_neigh_setup
    - LP: #1240580
  * bonding: modify only neigh_parms owned by us
    - LP: #1240580
  * fib_trie: remove potential out of bound access
    - LP: #1240580
  * bridge: don't try to update timers in case of broken MLD queries
    - LP: #1240580
  * tcp: cubic: fix overflow error in bictcp_update()
    - LP: #1240580
  * tcp: cubic: fix bug in bictcp_acked()
    - LP: #1240580
  * ipv6: don't stop backtracking in fib6_lookup_1 if subtree does not
    match
    - LP: #1240580
  * 8139cp: Fix skb leak in rx_status_loop failure path.
    - LP: #1240580
  * tun: signedness bug in tun_get_user()
    - LP: #1240580
  * ipv6: remove max_addresses check from ipv6_create_tempaddr
    - LP: #1240580
  * ipv6: Store Router Alert option in IP6CB directly.
    - LP: #1240580
  * ipv6: drop packets with multiple fragmentation headers
    - LP: #1240580
  * tcp: set timestamps for restored skb-s
    - LP: #1240580
  * net: usb: Add HP hs2434 device to ZLP exception table
    - LP: #1240580
  * tcp: initialize rcv_tstamp for restored sockets
    - LP: #1240580
  * ipv4: sendto/hdrincl: don't use destination address found in header
    - LP: #1240580
  * tcp: tcp_make_synack() should use sock_wmalloc
    - LP: #1240580
  * tipc: set sk_err correctly when connection fails
    - LP: #1240580
  * net: bridge: convert MLDv2 Query MRC into msecs_to_jiffies for
    max_delay
    - LP: #1240580
  * ICMPv6: treat dest unreachable codes 5 and 6 as EACCES, not EPROTO
    - LP: #1240580
  * tg3: Don't turn off led on 5719 serdes port 0
    - LP: #1240580
  * vhost_net: poll vhost queue after marking DMA is done
    - LP: #1240580
  * net: ipv6: tcp: fix potential use after free in tcp_v6_do_rcv
    - LP: #1240580
  * drm/radeon/si: Add support for CP DMA to CS checker for compute v2
    - LP: #1240580
  * sfc: Fix efx_rx_buf_offset() for recycled pages
    - LP: #1240580
  * cfq: explicitly use 64bit divide operation for 64bit arguments
    - LP: #1240580
  * drm/radeon/atom: wor...

Read more...

Changed in linux (Ubuntu Raring):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.