cgroup: remove cgroup directory leading kernel crash in kill_css

Bug #1748342 reported by haibinzhang
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)

Bug Description

We got feedback from customer that cvm(cloud virtual machine) crashed when using kubelet updating container-service in ubuntu xenial. Logs show as follow.
We find a patch (commit 33c35aa4817864e056fd772230b0c6b552e36ea2) in linux mainline, which can indeed fix this bug. But ubuntu-xenial.git has not merged it yet.

Do you guys have a plan for merging?

----------------------panic log-----------------------------
[2018-02-02 10:21:48][4397731.721563] BUG: unable to handle kernel paging request at 000000010000005c
[2018-02-02 10:40:50][4397731.722666] IP: css_clear_dir+0x5/0x70
[2018-02-02 10:40:50][4397731.723261] PGD a12b067
[2018-02-02 10:40:50][4397731.723261] PUD 0
[2018-02-02 10:40:50][4397731.723628]
[2018-02-02 10:40:50][4397731.724004] Oops: 0000 [#1] SMP
[2018-02-02 10:40:50][4397731.724004] Modules linked in: xt_statistic nf_conntrack_netlink ebt_ip ebtable_filter ebtables veth xt_set ip_set_hash_net ip_set nfnetlink xt_nat xt_recent xt_mark ipt_REJ[2018-02-02 10:40:50]ECT nf_reject_ipv4 xt_tcpudp xt_comment ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_fil[2018-02-02 10:40:50]ter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs ppdev sb_edac edac_core crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev input_le[2018-02-02 10:40:50]ds serio_raw parport_pc parport i2c_piix4 mac_hid ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 a[2018-02-02 10:40:50]sync_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath
[2018-02-02 10:40:50][4397731.724004] linear cirrus ttm drm_kms_helper syscopyarea sysfillrect sysimgblt aesni_intel fb_sys_fops aes_x86_64 crypto_simd cryptd glue_helper psmouse virtio_blk virtio_n[2018-02-02 10:40:50]et drm pata_acpi floppy
[2018-02-02 10:40:50][4397731.724004] CPU: 0 PID: 23347 Comm: kubelet Not tainted 4.10.0-32-generic #36~16.04.1-Ubuntu
[2018-02-02 10:40:50][4397731.724004] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[2018-02-02 10:40:50][4397731.724004] task: ffff92abde590000 task.stack: ffffbaa94165c000
[2018-02-02 10:40:50][4397731.724004] RIP: 0010:css_clear_dir+0x5/0x70
[2018-02-02 10:40:50][4397731.724004] RSP: 0018:ffffbaa94165fe10 EFLAGS: 00010206
[2018-02-02 10:40:50][4397731.724004] RAX: 000047fd40005d7b RBX: 00000000ffffffe8 RCX: ffff92abffc0fcec
[2018-02-02 10:40:50][4397731.724004] RDX: ffffffff9b070800 RSI: 0000000000000206 RDI: 00000000ffffffe8
[2018-02-02 10:40:50][4397731.724004] RBP: ffffbaa94165fe20 R08: 00000000c8b18701 R09: 0000000180220017
[2018-02-02 10:40:50][4397731.724004] R10: ffff92abc8b187f8 R11: ffff92abf7751d00 R12: ffff92abd5601000
[2018-02-02 10:40:50][4397731.724004] R13: 0000000000000000 R14: ffff92abd5601150 R15: 0000000000000000
[2018-02-02 10:40:50][4397731.724004] FS: 00007f6f92ffd700(0000) GS:ffff92abffc00000(0000) knlGS:0000000000000000
[2018-02-02 10:40:50][4397731.724004] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2018-02-02 10:40:50][4397731.724004] CR2: 000000010000005c CR3: 00000000280cb000 CR4: 00000000000406f0
[2018-02-02 10:40:50][4397731.724004] Call Trace:
[2018-02-02 10:40:50][4397731.724004] ? kill_css+0x12/0x60
[2018-02-02 10:40:50][4397731.724004] cgroup_destroy_locked+0xa5/0xf0
[2018-02-02 10:40:50][4397731.724004] cgroup_rmdir+0x2c/0x90
[2018-02-02 10:40:50][4397731.724004] kernfs_iop_rmdir+0x4d/0x80
[2018-02-02 10:40:50][4397731.724004] vfs_rmdir+0xb4/0x130
[2018-02-02 10:40:50][4397731.724004] do_rmdir+0x1c7/0x1e0
[2018-02-02 10:40:50][4397731.724004] SyS_unlinkat+0x22/0x30
[2018-02-02 10:40:50][4397731.724004] entry_SYSCALL_64_fastpath+0x1e/0xad
[2018-02-02 10:40:50][4397731.724004] RIP: 0033:0x481bd4
[2018-02-02 10:40:50][4397731.724004] RSP: 002b:000000c422893af0 EFLAGS: 00000246 ORIG_RAX: 0000000000000107
[2018-02-02 10:40:50][4397731.724004] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000481bd4
[2018-02-02 10:40:50][4397731.724004] RDX: 0000000000000200 RSI: 000000c421c7ef00 RDI: ffffffffffffff9c
[2018-02-02 10:40:50][4397731.724004] RBP: 000000c422893bc0 R08: 0000000000000000 R09: 0000000000000000
[2018-02-02 10:40:50][4397731.724004] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000000000ce
[2018-02-02 10:40:50][4397731.724004] R13: 00000000ffffffee R14: 0000000000001740 R15: 0000000000000055
[2018-02-02 10:40:50][4397731.724004] Code: fd ff ff 85 c0 41 89 c6 0f 84 5b fd ff ff eb 83 4d 89 fc e9 0f ff ff ff e8 d9 37 f6 ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 <8b> 47 74 a8 08 74 5d 55 [2018-02-02 10:40:50]83 e0 f7 48 89 e5 41 55 41 54 53 89 47
[2018-02-02 10:40:50][4397731.724004] RIP: css_clear_dir+0x5/0x70 RSP: ffffbaa94165fe10
[2018-02-02 10:40:50][4397731.724004] CR2: 000000010000005c

----------------------patch in linux.git----------------------------
commit 33c35aa4817864e056fd772230b0c6b552e36ea2
Author: Waiman Long <email address hidden>
Date: Mon May 15 09:34:06 2017 -0400

    cgroup: Prevent kill_css() from being called more than once

    The kill_css() function may be called more than once under the condition
    that the css was killed but not physically removed yet followed by the
    removal of the cgroup that is hosting the css. This patch prevents any
    harmm from being done when that happens.

    Signed-off-by: Waiman Long <email address hidden>
    Signed-off-by: Tejun Heo <email address hidden>
    Cc: <email address hidden> # v4.5+

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index c3c9a0e1b3c9..8d4e85eae42c 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -4265,6 +4265,11 @@ static void kill_css(struct cgroup_subsys_state *css)

+ if (css->flags & CSS_DYING)
+ return;
+ css->flags |= CSS_DYING;
         * This must happen before css is disassociated with its cgroup.
         * See seq_css() for details.

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1748342

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: zesty
Revision history for this message
Daniel Axtens (daxtens) wrote :


I'm happy to submit this patch to the kernel team, but I wanted to talk about the kernel process and ask a question first.

The way this process usually works is:
 - patch submitted to kernel team
 - kernel team checks patch and if they are happy with it, applies it to the kernel
 - this is built into a "proposed" kernel.
 - the bug is updated with the proposed kernel.
 - someone - usually the bug reporter - must verify that the proposed kernel fixes the bug. There is usually a 5 working day window to do this.
 - if the verification is done, the new kernel contains the fix. If verification is not done, the patch is not included in the released kernel.

I am not able to do the verification. If the kernel team provides a proposed kernel, are you or your customer able to verify it?


Changed in linux (Ubuntu):
importance: Undecided → High
status: Incomplete → Triaged
Changed in linux (Ubuntu Xenial):
status: New → Triaged
importance: Undecided → High
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers