Kernel traces with skb_warn_bad_offload showing up during an AIO deployment on Ubuntu 14.04

Bug #1488815 reported by Cristian Calin
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Fix Released
Low
Evan Callicoat
Juno
Won't Fix
Low
Unassigned
Kilo
Fix Released
Low
Evan Callicoat
Trunk
Fix Released
Low
Evan Callicoat
linux (Ubuntu)
Incomplete
Undecided
Unassigned
qemu-kvm (Ubuntu)
New
Undecided
Unassigned

Bug Description

Setting up an os-ansible-deployment AIO in a VM running in KVM results in kernel traces with skb_warn_bad_offload during the setup phase.

This is happening regardless of the kernel (tested with 3.13, 3.16 and 3.19) on a fresh VM with no specific customization.

Order of steps to reproduce:

1) launch image from ubuntu cloud catalog
2) apt-get diet-upgrade
3) apt-get install git linux-image-virtual-lts-vivid
4) git clone https://github.com/stackforge/os-ansible-deployment.git
5) cd os-ansible-deployment
6) git checkout kilo
7) ./scripts/bootstrap-ansible.sh
8) ./scripts/run-aio-build.sh

This appears to be harmless as the deployment eventually succeeds.

Here is a brief extract of the traces:

Aug 26 07:57:27 test-osad kernel: [ 4695.687317] ------------[ cut here ]------------
Aug 26 07:57:27 test-osad kernel: [ 4695.687328] WARNING: CPU: 0 PID: 20433 at /build/linux-lts-vivid-BZwsXG/linux-lts-vivid-3.19.0/net/core/dev.c:2302 skb_warn_bad_offload+0xd5/0xe2()
Aug 26 07:57:27 test-osad kernel: [ 4695.687332] : caps=(0x000000801fdb78e9, 0x000000801fdb78e9) len=4396 data_len=2920 gso_size=1448 gso_type=1 ip_summed=3
Aug 26 07:57:27 test-osad kernel: [ 4695.687334] Modules linked in: vhost_net vhost macvtap macvlan nbd iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipt_REJECT nf_reject_ipv4 ip6table_filter ip6_tables dm_snapshot dm_bufio dm_multipath scsi_dh 8021q garp mrp veth ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_CHECKSUM xt_tcpudp bridge stp llc btrfs xor raid6_pq ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs libcrc32c vxlan ip6_udp_tunnel udp_tunnel iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables dm_crypt kvm_amd ppdev kvm parport_pc parport serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse floppy
Aug 26 07:57:27 test-osad kernel: [ 4695.687390] CPU: 0 PID: 20433 Comm: ssh Tainted: G W 3.19.0-26-generic #28~14.04.1-Ubuntu
Aug 26 07:57:27 test-osad kernel: [ 4695.687392] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011
Aug 26 07:57:27 test-osad kernel: [ 4695.687395] ffffffff81b3db38 ffff880266e576f8 ffffffff817aeed7 0000000000000000
Aug 26 07:57:27 test-osad kernel: [ 4695.687399] ffff880266e57748 ffff880266e57738 ffffffff81074d8a ffff880266e57788
Aug 26 07:57:27 test-osad kernel: [ 4695.687402] ffff8803736ef4e8 ffff88042caba000 0000000000000001 0000000000000003
Aug 26 07:57:27 test-osad kernel: [ 4695.687406] Call Trace:
Aug 26 07:57:27 test-osad kernel: [ 4695.687414] [<ffffffff817aeed7>] dump_stack+0x45/0x57
Aug 26 07:57:27 test-osad kernel: [ 4695.687420] [<ffffffff81074d8a>] warn_slowpath_common+0x8a/0xc0
Aug 26 07:57:27 test-osad kernel: [ 4695.687424] [<ffffffff81074e06>] warn_slowpath_fmt+0x46/0x50
Aug 26 07:57:27 test-osad kernel: [ 4695.687427] [<ffffffff817b185a>] skb_warn_bad_offload+0xd5/0xe2
Aug 26 07:57:27 test-osad kernel: [ 4695.687434] [<ffffffff816a669c>] skb_checksum_help+0x1ac/0x1c0
Aug 26 07:57:27 test-osad kernel: [ 4695.687441] [<ffffffffc05ea079>] checksum_tg+0x29/0x30 [xt_CHECKSUM]
Aug 26 07:57:27 test-osad kernel: [ 4695.687446] [<ffffffffc0476009>] ipt_do_table+0x2d9/0x6bd [ip_tables]
Aug 26 07:57:27 test-osad kernel: [ 4695.687451] [<ffffffffc0476062>] ? ipt_do_table+0x332/0x6bd [ip_tables]
Aug 26 07:57:27 test-osad kernel: [ 4695.687455] [<ffffffffc0476062>] ? ipt_do_table+0x332/0x6bd [ip_tables]
Aug 26 07:57:27 test-osad kernel: [ 4695.687461] [<ffffffff816e7cf0>] ? ip_fragment+0x8a0/0x8a0
Aug 26 07:57:27 test-osad kernel: [ 4695.687464] [<ffffffffc0484066>] iptable_mangle_hook+0x66/0x140 [iptable_mangle]
Aug 26 07:57:27 test-osad kernel: [ 4695.687467] [<ffffffff816e7cf0>] ? ip_fragment+0x8a0/0x8a0
Aug 26 07:57:27 test-osad kernel: [ 4695.687470] [<ffffffff816dcb1a>] nf_iterate+0x9a/0xb0
Aug 26 07:57:27 test-osad kernel: [ 4695.687472] [<ffffffff816e7cf0>] ? ip_fragment+0x8a0/0x8a0
Aug 26 07:57:27 test-osad kernel: [ 4695.687475] [<ffffffff816dcba4>] nf_hook_slow+0x74/0x130
Aug 26 07:57:27 test-osad kernel: [ 4695.687477] [<ffffffff816e7cf0>] ? ip_fragment+0x8a0/0x8a0
Aug 26 07:57:27 test-osad kernel: [ 4695.687479] [<ffffffff816e6bc0>] ? ip_forward_options+0x1c0/0x1c0
Aug 26 07:57:27 test-osad kernel: [ 4695.687481] [<ffffffff816e9952>] ip_output+0x92/0xa0
Aug 26 07:57:27 test-osad kernel: [ 4695.687484] [<ffffffff816e909a>] ? __ip_local_out+0xaa/0xb0
Aug 26 07:57:27 test-osad kernel: [ 4695.687486] [<ffffffff816e90d0>] ip_local_out_sk+0x30/0x40
Aug 26 07:57:27 test-osad kernel: [ 4695.687488] [<ffffffff816e9449>] ip_queue_xmit+0x149/0x3d0
Aug 26 07:57:27 test-osad kernel: [ 4695.687491] [<ffffffff81700d9a>] tcp_transmit_skb+0x4aa/0x950
Aug 26 07:57:27 test-osad kernel: [ 4695.687494] [<ffffffff817013c2>] tcp_write_xmit+0x182/0xd10
Aug 26 07:57:27 test-osad kernel: [ 4695.687497] [<ffffffff817021c2>] __tcp_push_pending_frames+0x32/0xd0
Aug 26 07:57:27 test-osad kernel: [ 4695.687500] [<ffffffff816f0e6f>] tcp_push+0xef/0x120
Aug 26 07:57:27 test-osad kernel: [ 4695.687502] [<ffffffff816f4659>] tcp_sendmsg+0xb9/0xc60
Aug 26 07:57:27 test-osad kernel: [ 4695.687507] [<ffffffff8171ec93>] inet_sendmsg+0x63/0xb0
Aug 26 07:57:27 test-osad kernel: [ 4695.687512] [<ffffffff81347471>] ? apparmor_socket_sendmsg+0x21/0x30
Aug 26 07:57:27 test-osad kernel: [ 4695.687515] [<ffffffff8168dbe7>] sock_aio_write+0x117/0x140
Aug 26 07:57:27 test-osad kernel: [ 4695.687520] [<ffffffff811ebaaa>] do_sync_write+0x5a/0x90
Aug 26 07:57:27 test-osad kernel: [ 4695.687524] [<ffffffff811ec515>] vfs_write+0x195/0x1f0
Aug 26 07:57:27 test-osad kernel: [ 4695.687527] [<ffffffff811ecf2e>] ? vfs_read+0x11e/0x140
Aug 26 07:57:27 test-osad kernel: [ 4695.687530] [<ffffffff811ed046>] SyS_write+0x46/0xb0
Aug 26 07:57:27 test-osad kernel: [ 4695.687534] [<ffffffff810df111>] ? posix_ktime_get_ts+0x11/0x20
Aug 26 07:57:27 test-osad kernel: [ 4695.687537] [<ffffffff817b688d>] system_call_fastpath+0x16/0x1b
Aug 26 07:57:27 test-osad kernel: [ 4695.687540] ---[ end trace 17da8a026eae9b77 ]---

Tags: in-kilo
Revision history for this message
Cristian Calin (cristi-calin) wrote :

Full kernel log attached for reference.

Revision history for this message
Cristian Calin (cristi-calin) wrote :

Full kernel log attached for reference.

Revision history for this message
Jesse Pretorius (jesse-pretorius) wrote :

@Apsu noted yesterday that this appears to relate to veth's staying behind after container restarts and that it's present in both AIO's and multi-node deployments. While this doesn't appear to affect the environment in a critical way, it may cause packet loss - this is yet to be confirmed. The workaround for now is to clean up leftover veth's in the environment using this script: https://gist.github.com/Apsu/7947a3347fcc86bb45a7

Revision history for this message
Matt Kassawara (ionosphere80) wrote :

Upon further investigation, it appears that scatter-gather offload on the host bridges causes these traces rather than dangling veth pairs.

Revision history for this message
Matt Kassawara (ionosphere80) wrote :

Fix for AIOs and gate checks:

https://review.openstack.org/#/c/219292/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-ansible-deployment (master)

Reviewed: https://review.openstack.org/219292
Committed: https://git.openstack.org/cgit/stackforge/os-ansible-deployment/commit/?id=6f6a37fce832be1b3acf0d6a1702ffcee4a33e85
Submitter: Jenkins
Branch: master

commit 6f6a37fce832be1b3acf0d6a1702ffcee4a33e85
Author: Matthew Kassawara <email address hidden>
Date: Tue Sep 1 09:12:50 2015 -0500

    Disable scatter-gather offload on host bridges

    Disable scatter-gather offload on host bridges to eliminate
    kernel traces that may impact container connectivity. Only
    addressing AIO interfaces for now as host configuration for
    actual deployments resides in documentation.

    Change-Id: Ia66b2bb64b9ace66f5fa3ca8edcc9909af54a4f2
    Partial-Bug: #1488815
    Co-Authored-By: Evan Callicoat <email address hidden>

Revision history for this message
Christian Felsing (ip6li) wrote :

Problem depends on guest. Steps to reproduce on Ubuntu 14.04 64bit server kernel 3.19.0-28-generic and qemu-kvm.

* set up a guest with virt-manager and set up a virtio ethernet interface
* install FreeBSD 10.1 64bit as guest
* set up an usable IP address on vtnet0 (FreeBSD name of virtio ethernet)

Make some traffic and check dmesg. After some time "skb_warn_bad_offload" occurs in dmesg on host.
Problem does not occur if e1000 is used instead of virtio, as recommended for FreeBSD on kvm.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible (kilo)

Reviewed: https://review.openstack.org/223292
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible/commit/?id=cdcfa92b8e9604b82c7c3f8aca6fb3776df56552
Submitter: Jenkins
Branch: kilo

commit cdcfa92b8e9604b82c7c3f8aca6fb3776df56552
Author: Matthew Kassawara <email address hidden>
Date: Tue Sep 1 09:12:50 2015 -0500

    Disable scatter-gather offload on host bridges

    Disable scatter-gather offload on host bridges to eliminate
    kernel traces that may impact container connectivity. Only
    addressing AIO interfaces for now as host configuration for
    actual deployments resides in documentation.

    Change-Id: Ia66b2bb64b9ace66f5fa3ca8edcc9909af54a4f2
    Partial-Bug: #1488815
    Co-Authored-By: Evan Callicoat <email address hidden>
    (cherry picked from commit 6f6a37fce832be1b3acf0d6a1702ffcee4a33e85)

tags: added: in-kilo
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1488815

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.