TCP splicing crashes haproxy
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Mirantis OpenStack |
Invalid
|
High
|
MOS Linux | ||
6.1.x |
Won't Fix
|
High
|
MOS Maintenance | ||
7.0.x |
Won't Fix
|
High
|
MOS Maintenance | ||
8.0.x |
Invalid
|
High
|
MOS Linux |
Bug Description
We're evaluating MOS 6.1 using 3 controllers + 2 computes, all nodes on similar HW with NetXtreme II BCM57810 10 Gigabit Ethernet.
We experienced many TOTAL downtime of the OpenStack environment during operational loads (eg. create instance, upload image,etc).
Managed to trace the problem back to haproxy and specifically to TCP splicing:
Sep 29 14:02:18 node-45 kernel: ------------[ cut here ]------------
Sep 29 14:02:18 node-45 kernel: WARNING: at net/ipv4/tcp.c:1208 tcp_cleanup_
Sep 29 14:02:18 node-45 kernel: Hardware name: ProLiant BL460c Gen8
Sep 29 14:02:18 node-45 kernel: cleanup rbuf bug: copied 2142C3BD seq 21429929 rcvnxt 216D958D
Sep 29 14:02:18 node-45 kernel: Modules linked in: dccp_diag dccp tcp_diag inet_diag ipt_REJECT veth bridge bonding 8021q garp stp llc openvswitch(U) gre xt_multiport iptable_filter xt_NOTRACK iptable_raw ipt_MASQUERADE xt_comment iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 iptable_mangle ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 xfs exportfs ext2 power_meter acpi_ipmi ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support hpilo hpwdt ses enclosure sg bnx2x libcrc32c mdio serio_raw lpc_ich mfd_core ioatdma dca shpchp ext4 jbd2 mbcache sd_mod crc_t10dif qla2xxx scsi_transport_fc scsi_tgt hpsa(U) video output dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Sep 29 14:02:18 node-45 kernel: Pid: 2341, comm: haproxy Tainted: G W --------------- 2.6.32-
Sep 29 14:02:18 node-45 kernel: Call Trace:
Sep 29 14:02:18 node-45 kernel: [<ffffffff81074
Sep 29 14:02:18 node-45 kernel: [<ffffffff81074
Sep 29 14:02:18 node-45 kernel: [<ffffffff814a4
Sep 29 14:02:18 node-45 kernel: [<ffffffff814a6
Sep 29 14:02:18 node-45 kernel: [<ffffffff814a5
Sep 29 14:02:18 node-45 kernel: [<ffffffff814a6
Sep 29 14:02:18 node-45 kernel: [<ffffffff81446
Sep 29 14:02:18 node-45 kernel: [<ffffffff811bd
Sep 29 14:02:18 node-45 kernel: [<ffffffff8118f
Sep 29 14:02:18 node-45 kernel: [<ffffffff811be
Sep 29 14:02:18 node-45 kernel: [<ffffffff810e5
Sep 29 14:02:18 node-45 kernel: [<ffffffff8100b
Sep 29 14:02:18 node-45 kernel: ---[ end trace ee0255f2568ce83f ]---
Sep 29 14:02:18 node-45 kernel: ------------[ cut here ]------------
After adding "no" to "option splice-auto" in haproxy.cfg the problem went way and it's now rock solid.
[root@node-43 ~]# rpm -qa | grep haproxy
haproxy-
[root@node-43 ~]# uname -a
Linux node-43.
description: | updated |
Changed in mos: | |
assignee: | nobody → MOS Linux (mos-linux) |
importance: | Undecided → High |
status: | New → Confirmed |
milestone: | none → 8.0 |
tags: | added: area-linux |
A possible fix: http:// permalink. gmane.org/ gmane.linux. network/ 231739
(We should let 2.6 kernel rest in peace)