TCP splicing crashes haproxy

Bug #1501640 reported by Rui Bernardino
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Invalid
High
MOS Linux
6.1.x
Won't Fix
High
MOS Maintenance
7.0.x
Won't Fix
High
MOS Maintenance
8.0.x
Invalid
High
MOS Linux

Bug Description

We're evaluating MOS 6.1 using 3 controllers + 2 computes, all nodes on similar HW with NetXtreme II BCM57810 10 Gigabit Ethernet.

We experienced many TOTAL downtime of the OpenStack environment during operational loads (eg. create instance, upload image,etc).

Managed to trace the problem back to haproxy and specifically to TCP splicing:

Sep 29 14:02:18 node-45 kernel: ------------[ cut here ]------------
Sep 29 14:02:18 node-45 kernel: WARNING: at net/ipv4/tcp.c:1208 tcp_cleanup_rbuf+0x5a/0x110() (Tainted: G W --------------- )
Sep 29 14:02:18 node-45 kernel: Hardware name: ProLiant BL460c Gen8
Sep 29 14:02:18 node-45 kernel: cleanup rbuf bug: copied 2142C3BD seq 21429929 rcvnxt 216D958D
Sep 29 14:02:18 node-45 kernel: Modules linked in: dccp_diag dccp tcp_diag inet_diag ipt_REJECT veth bridge bonding 8021q garp stp llc openvswitch(U) gre xt_multiport iptable_filter xt_NOTRACK iptable_raw ipt_MASQUERADE xt_comment iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 iptable_mangle ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 xfs exportfs ext2 power_meter acpi_ipmi ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support hpilo hpwdt ses enclosure sg bnx2x libcrc32c mdio serio_raw lpc_ich mfd_core ioatdma dca shpchp ext4 jbd2 mbcache sd_mod crc_t10dif qla2xxx scsi_transport_fc scsi_tgt hpsa(U) video output dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Sep 29 14:02:18 node-45 kernel: Pid: 2341, comm: haproxy Tainted: G W --------------- 2.6.32-504.1.3.el6.mos61.x86_64 #1
Sep 29 14:02:18 node-45 kernel: Call Trace:
Sep 29 14:02:18 node-45 kernel: [<ffffffff81074df7>] ? warn_slowpath_common+0x87/0xc0
Sep 29 14:02:18 node-45 kernel: [<ffffffff81074ee6>] ? warn_slowpath_fmt+0x46/0x50
Sep 29 14:02:18 node-45 kernel: [<ffffffff814a44fa>] ? tcp_cleanup_rbuf+0x5a/0x110
Sep 29 14:02:18 node-45 kernel: [<ffffffff814a6ae4>] ? tcp_read_sock+0xc4/0x250
Sep 29 14:02:18 node-45 kernel: [<ffffffff814a5ae0>] ? tcp_splice_data_recv+0x0/0x50
Sep 29 14:02:18 node-45 kernel: [<ffffffff814a6d17>] ? tcp_splice_read+0xa7/0x210
Sep 29 14:02:18 node-45 kernel: [<ffffffff81446872>] ? sock_splice_read+0x62/0x80
Sep 29 14:02:18 node-45 kernel: [<ffffffff811bdbfb>] ? do_splice_to+0x6b/0xa0
Sep 29 14:02:18 node-45 kernel: [<ffffffff8118f411>] ? fget_light+0x21/0x90
Sep 29 14:02:18 node-45 kernel: [<ffffffff811bef00>] ? sys_splice+0x300/0x5d0
Sep 29 14:02:18 node-45 kernel: [<ffffffff810e5c87>] ? audit_syscall_entry+0x1d7/0x200
Sep 29 14:02:18 node-45 kernel: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
Sep 29 14:02:18 node-45 kernel: ---[ end trace ee0255f2568ce83f ]---
Sep 29 14:02:18 node-45 kernel: ------------[ cut here ]------------

After adding "no" to "option splice-auto" in haproxy.cfg the problem went way and it's now rock solid.

[root@node-43 ~]# rpm -qa | grep haproxy
haproxy-1.5.3-3.mira3.x86_64
[root@node-43 ~]# uname -a
Linux node-43.ptin.corppt.com 2.6.32-504.1.3.el6.mos61.x86_64 #1 SMP Fri May 22 10:40:43 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Tags: area-linux
description: updated
Changed in mos:
assignee: nobody → MOS Linux (mos-linux)
importance: Undecided → High
status: New → Confirmed
milestone: none → 8.0
Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

A possible fix: http://permalink.gmane.org/gmane.linux.network/231739

(We should let 2.6 kernel rest in peace)

Revision history for this message
Jay Pipes (jaypipes) wrote :

Pavel, please investigate this particular issue and advise on whether this is something we need to address in MOS 8.0. Are we even supporting 2.6 kernels?

Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Won't Fix for 6.1-updates and 7.0-updates as we are not going to include new kernels into maintenance updates

tags: added: area-linux
Revision history for this message
Aleksander Mogylchenko (amogylchenko) wrote :

We do not have this kernel in 8.0 any more. Please recheck on any 8.0 ISO.

Revision history for this message
Dmitry Teselkin (teselkin-d) wrote :

Last activity was a month ago, moving to Invalid

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.