Starting a LXC Container can lead to a kernel oops

Bug #1115786 reported by Chuck Short
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Chuck Short

Bug Description

If you are using a qcow iamge with an lxc container, starting a lxc container will lead to a kernel oops (attached below), this is due to the nbd being disconnected while the lxc container is still running.

Feb 4 15:10:58 homer kernel: [ 2483.980180] Kernel BUG at ffffffff811c50b4 [verbose debug info unavailable]
Feb 4 15:10:58 homer kernel: [ 2483.982652] Modules linked in: ebt_arp(F) ebt_ip(F) veth(F) xt_conntrack(F) xt_nat(F) openvswitch ip6table_filter(F) ip6_tables(F) ebtable_nat(F) ebtables(F) xt_state(F) ipt_REJECT(F) xt_CHECKSUM(F) iptable_mangle(F) xt_tc
pudp(F) iptable_filter(F) ipt_MASQUERADE(F) iptable_nat(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F) nf_nat_ipv4(F) nf_nat(F) nf_conntrack(F) ip_tables(F) x_tables(F) bridge(F) stp(F) llc(F) ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp(
F) libiscsi_tcp(F) libiscsi(F) scsi_transport_iscsi(F) nfsd(F) auth_rpcgss(F) nfs_acl(F) nfs(F) lockd(F) sunrpc(F) fscache(F) dm_crypt(F) arc4(F) rt2800pci rt2800lib snd_hda_codec_hdmi rt2x00pci snd_hda_codec_realtek rt2x00lib snd_hda_intel mac80211 snd_h
da_codec kvm_amd snd_hwdep(F) cfg80211 kvm snd_pcm(F) snd_page_alloc(F) eeprom_93cx6<4>[ 2484.001668] CPU 0
Feb 4 15:10:58 homer kernel: [ 2484.007849] RIP: 0010:[<ffffffff811c50b4>] [<ffffffff811c50b4>] submit_bh+0x154/0x1e0
Feb 4 15:10:58 homer kernel: [ 2484.014575] RAX: 0000000000040005 RBX: ffff88006cdbdb60 RCX: 0000000000000019
Feb 4 15:10:58 homer kernel: [ 2484.021599] RBP: ffff8800cf925aa0 R08: 000000000a000020 R09: 0000000000000000
Feb 4 15:10:58 homer kernel: [ 2484.028831] R13: ffff880067b5a400 R14: 0000000000005bc8 R15: ffff8801dfb9ec00
Feb 4 15:10:58 homer kernel: [ 2484.036334] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 4 15:10:58 homer kernel: [ 2484.044179] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb 4 15:10:58 homer kernel: [ 2484.052237] Process init (pid: 10006, threadinfo ffff8800cf924000, task ffff8801df86dd00)
Feb 4 15:10:58 homer kernel: [ 2484.060569] ffff88006cdbdb60 0000000000000411 ffff880067b5a400 0000000000005bc8
Feb 4 15:10:58 homer kernel: [ 2484.069418] ffff8800cf925ad0 ffffffff811c6263 ffff8800cf925b18 ffffffff812526e8
Feb 4 15:10:58 homer kernel: [ 2484.078454] [<ffffffff811c61d7>] __sync_dirty_buffer+0x57/0xd0
Feb 4 15:10:58 homer kernel: [ 2484.087813] [<ffffffff812526e8>] ext4_commit_super+0x198/0x230
Feb 4 15:10:58 homer kernel: [ 2484.097368] [<ffffffff81253914>] ext4_error_inode+0x64/0x120
Feb 4 15:10:58 homer kernel: [ 2484.107094] [<ffffffff811ca0f7>] ? bio_put+0x97/0xc0
Feb 4 15:10:58 homer kernel: [ 2484.116986] [<ffffffff8123bf53>] ext4_find_entry+0x323/0x4e0
Feb 4 15:10:58 homer kernel: [ 2484.127088] [<ffffffff8123c162>] ext4_lookup+0x52/0x170
Feb 4 15:10:58 homer kernel: [ 2484.137356] [<ffffffff8119d278>] __lookup_hash+0x38/0x50
Feb 4 15:10:58 homer kernel: [ 2484.147795] [<ffffffff811a1c09>] path_lookupat+0x6b9/0x740
Feb 4 15:10:58 homer kernel: [ 2484.158434] [<ffffffff8117c531>] ? kmem_cache_alloc+0x31/0x130
Feb 4 15:10:58 homer kernel: [ 2484.169290] [<ffffffff811a1cc4>] filename_lookup+0x34/0xc0
Feb 4 15:10:58 homer kernel: [ 2484.180358] [<ffffffff813071ba>] ? apparmor_cred_prepare+0x3a/0x60
Feb 4 15:10:58 homer kernel: [ 2484.191661] [<ffffffff811922ec>] sys_faccessat+0x9c/0x1e0
Feb 4 15:10:58 homer kernel: [ 2484.203168] [<ffffffff81192448>] sys_access+0x18/0x20
Feb 4 15:10:58 homer kernel: [ 2484.214875] Code: e4 a1 44 89 e0 41 5c 41 5d 41 5e 5d c3 66 2e 0f 1f 84 00 00 00 00 00 40 f6 c7 01 0f 84 14 ff ff ff f0 80 66 01 f7 e9 0a ff ff ff <0f> 0b 48 8b 53 70 c1 e0 09 41 f6 c5 01 89 43 30 89 42 08 75 9a
Feb 4 15:10:58 homer kernel: [ 2484.234017] RSP <ffff8800cf925a80>
Feb 4 15:10:58 homer kernel: [ 2484.480951] block nbd3: Attempted send on closed socket

Tags: lxc
Chuck Short (zulcss)
tags: added: lxc
Revision history for this message
Thierry Carrez (ttx) wrote :

I guess we can workaround it in userland, but I'd think that kernel should not OOPS anyway, so there might be a kernel issue to fix as well.

Changed in nova:
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/21217

Changed in nova:
assignee: nobody → Chuck Short (zulcss)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/21217
Committed: http://github.com/openstack/nova/commit/00f66513690153125713fa4c48907387d8321b4a
Submitter: Jenkins
Branch: master

commit 00f66513690153125713fa4c48907387d8321b4a
Author: Chuck Short <email address hidden>
Date: Tue Feb 5 08:54:01 2013 -0600

    lxc: Clean up namespace mounts

    5f697f64e5c445ba1b62c82d9167fd6b9c7256d2 introduced a regression
    when using lxc containers with qcow2 and qemu-nbd.

    When removing the rootfs from the host namespace, it would
    terminate the qemu-nbd process as well. This would cause
    the kernel too oops under Ubuntu 13.04. Since the underlying
    device thtat the libvirt_lxc process disapears.

    To get around this we just clean up the host namespace if the
    instance is powered-on. When the instance terminates it will
    teardown the whole container.

    This fixes LP: #1115786

    Change-Id: I98bec2338cb455dbd277295ab36767149e05634c
    Signed-off-by: Chuck Short <email address hidden>

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → grizzly-3
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: grizzly-3 → 2013.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.