TCP connections are not sending their entire buffers

Bug #1552864 reported by Rahman Syed on 2016-03-03
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Designate
Critical
Graham Hayes

Bug Description

When sending large payloads over networks, it appears that the connection is dropped before the entire TCP payload is sent from Designate.

For example, when performing an AXFR for a zone with a large number of records (several hundred or higher) in a deployment with mDNS and PowerDNS on a network that fragments TCP payloads, error messages can be observed. PDNS example: "Remote nameserver closed TCP connection"

Reporter: Erik Andersson

Graham Hayes (grahamhayes) wrote :

OK, after changing https://github.com/openstack/designate/blob/d5d0706705c64dba847cea5a30b4a6be39ecd63f/designate/service.py#L342 to sendall() we get:

2016-03-03 19:02:42 ERROR designate.service [req-76111e6a-17e9-4a9e-8e03-39ac34d829cb - - - - -] Unhandled exception while processing request from 89.101.195.206:60079
2016-03-03 19:02:42.008 TRACE designate.service Traceback (most recent call last):
2016-03-03 19:02:42.008 TRACE designate.service File "/home/graham/designate/designate/service.py", line 343, in _dns_handle
2016-03-03 19:02:42.008 TRACE designate.service client.sendall(tcp_response)
2016-03-03 19:02:42.008 TRACE designate.service File "/usr/local/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 388, in sendall
2016-03-03 19:02:42.008 TRACE designate.service tail += self.send(data[tail:], flags)
2016-03-03 19:02:42.008 TRACE designate.service File "/usr/local/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 379, in send
2016-03-03 19:02:42.008 TRACE designate.service return self._send_loop(self.fd.send, data, flags)
2016-03-03 19:02:42.008 TRACE designate.service File "/usr/local/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 374, in _send_loop
2016-03-03 19:02:42.008 TRACE designate.service timeout_exc=socket.timeout("timed out"))
2016-03-03 19:02:42.008 TRACE designate.service File "/usr/local/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 203, in _trampoline
2016-03-03 19:02:42.008 TRACE designate.service mark_as_closed=self._mark_as_closed)
2016-03-03 19:02:42.008 TRACE designate.service File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/__init__.py", line 162, in trampoline
2016-03-03 19:02:42.008 TRACE designate.service return hub.switch()
2016-03-03 19:02:42.008 TRACE designate.service File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 294, in switch
2016-03-03 19:02:42.008 TRACE designate.service return self.greenlet.switch()
2016-03-03 19:02:42.008 TRACE designate.service timeout: timed out
2016-03-03 19:02:42.008 TRACE designate.service

Adding an "eventlet.sleep(0)" before the sendall() allows this to complete properly

Changed in designate:
importance: Undecided → Critical
milestone: none → mitaka-rc1
status: New → Triaged

Fix proposed to branch: master
Review: https://review.openstack.org/288510

Changed in designate:
assignee: nobody → Graham Hayes (grahamhayes)
status: Triaged → In Progress

Reviewed: https://review.openstack.org/288510
Committed: https://git.openstack.org/cgit/openstack/designate/commit/?id=f2c06477afbfb0e44d08f4228a1a87a9208c5c5f
Submitter: Jenkins
Branch: master

commit f2c06477afbfb0e44d08f4228a1a87a9208c5c5f
Author: Graham Hayes <email address hidden>
Date: Fri Mar 4 15:00:57 2016 +0000

    Fix for TCP connections not sending full content

    Eventlet previously broke the standard API for sockets
    and made socket.send() work in the same manor as socket.sendall()

    https://github.com/eventlet/eventlet/commit/c315ee86dac996ac533b738f7c8777f4d01a0472
    reverted to the standard behaviour.

    This was released as part of 0.18.0.

    The bug manifests itself when large (multi TCP message) AXFRs are
    performed over long distances.

    (I replicated it when the messages grew to 3,
    over USWest -> EU transfer)

    see http://graham.hayes.ie/posts/minidns-tcp-and-the-internet/
    for details on testing.

    This change can cause packets to be dropped intermitently -
    but retry will allow this to be overcome.

    Change-Id: Ia0c15d843fb2092cc693b37dc701492396c647d0
    Closes-Bug: #1552864

Changed in designate:
status: In Progress → Fix Released

Reviewed: https://review.openstack.org/293408
Committed: https://git.openstack.org/cgit/openstack/designate/commit/?id=6c49166b065078e6ed50cbdd9a0afb44b3054474
Submitter: Jenkins
Branch: stable/liberty

commit 6c49166b065078e6ed50cbdd9a0afb44b3054474
Author: Graham Hayes <email address hidden>
Date: Fri Mar 4 15:00:57 2016 +0000

    Fix for TCP connections not sending full content

    Eventlet previously broke the standard API for sockets
    and made socket.send() work in the same manor as socket.sendall()

    https://github.com/eventlet/eventlet/commit/c315ee86dac996ac533b738f7c8777f4d01a0472
    reverted to the standard behaviour.

    This was released as part of 0.18.0.

    The bug manifests itself when large (multi TCP message) AXFRs are
    performed over long distances.

    (I replicated it when the messages grew to 3,
    over USWest -> EU transfer)

    see http://graham.hayes.ie/posts/minidns-tcp-and-the-internet/
    for details on testing.

    This change can cause packets to be dropped intermitently -
    but retry will allow this to be overcome.

    Change-Id: Ia0c15d843fb2092cc693b37dc701492396c647d0
    Closes-Bug: #1552864
    (cherry picked from commit f2c06477afbfb0e44d08f4228a1a87a9208c5c5f)

tags: added: in-stable-liberty

This issue was fixed in the openstack/designate 2.0.0.0rc1 release candidate.

This issue was fixed in the openstack/designate 1.0.2 release.

Reviewed: https://review.openstack.org/293507
Committed: https://git.openstack.org/cgit/openstack/designate/commit/?id=a4b4c6dfb87459a604be03885da22191660e9ad6
Submitter: Jenkins
Branch: stable/kilo

commit a4b4c6dfb87459a604be03885da22191660e9ad6
Author: Graham Hayes <email address hidden>
Date: Fri Mar 4 15:00:57 2016 +0000

    Fix for TCP connections not sending full content

    Eventlet previously broke the standard API for sockets
    and made socket.send() work in the same manor as socket.sendall()

    https://github.com/eventlet/eventlet/commit/c315ee86dac996ac533b738f7c8777f4d01a0472
    reverted to the standard behaviour.

    This was released as part of 0.18.0.

    The bug manifests itself when large (multi TCP message) AXFRs are
    performed over long distances.

    (I replicated it when the messages grew to 3,
    over USWest -> EU transfer)

    see http://graham.hayes.ie/posts/minidns-tcp-and-the-internet/
    for details on testing.

    This change can cause packets to be dropped intermitently -
    but retry will allow this to be overcome.

    Change-Id: Ia0c15d843fb2092cc693b37dc701492396c647d0
    Closes-Bug: #1552864
    (cherry picked from commit f2c06477afbfb0e44d08f4228a1a87a9208c5c5f)
    (cherry picked from commit 6c49166b065078e6ed50cbdd9a0afb44b3054474)

tags: added: in-stable-kilo

This issue was fixed in the openstack/designate 1.0.2 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers