TCP connections are not sending their entire buffers

Bug #1552864 reported by Rahman Syed
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Designate
Fix Released
Critical
Graham Hayes

Bug Description

When sending large payloads over networks, it appears that the connection is dropped before the entire TCP payload is sent from Designate.

For example, when performing an AXFR for a zone with a large number of records (several hundred or higher) in a deployment with mDNS and PowerDNS on a network that fragments TCP payloads, error messages can be observed. PDNS example: "Remote nameserver closed TCP connection"

Reporter: Erik Andersson

Revision history for this message
Graham Hayes (grahamhayes) wrote :

OK, after changing https://github.com/openstack/designate/blob/d5d0706705c64dba847cea5a30b4a6be39ecd63f/designate/service.py#L342 to sendall() we get:

2016-03-03 19:02:42 ERROR designate.service [req-76111e6a-17e9-4a9e-8e03-39ac34d829cb - - - - -] Unhandled exception while processing request from 89.101.195.206:60079
2016-03-03 19:02:42.008 TRACE designate.service Traceback (most recent call last):
2016-03-03 19:02:42.008 TRACE designate.service File "/home/graham/designate/designate/service.py", line 343, in _dns_handle
2016-03-03 19:02:42.008 TRACE designate.service client.sendall(tcp_response)
2016-03-03 19:02:42.008 TRACE designate.service File "/usr/local/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 388, in sendall
2016-03-03 19:02:42.008 TRACE designate.service tail += self.send(data[tail:], flags)
2016-03-03 19:02:42.008 TRACE designate.service File "/usr/local/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 379, in send
2016-03-03 19:02:42.008 TRACE designate.service return self._send_loop(self.fd.send, data, flags)
2016-03-03 19:02:42.008 TRACE designate.service File "/usr/local/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 374, in _send_loop
2016-03-03 19:02:42.008 TRACE designate.service timeout_exc=socket.timeout("timed out"))
2016-03-03 19:02:42.008 TRACE designate.service File "/usr/local/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 203, in _trampoline
2016-03-03 19:02:42.008 TRACE designate.service mark_as_closed=self._mark_as_closed)
2016-03-03 19:02:42.008 TRACE designate.service File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/__init__.py", line 162, in trampoline
2016-03-03 19:02:42.008 TRACE designate.service return hub.switch()
2016-03-03 19:02:42.008 TRACE designate.service File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 294, in switch
2016-03-03 19:02:42.008 TRACE designate.service return self.greenlet.switch()
2016-03-03 19:02:42.008 TRACE designate.service timeout: timed out
2016-03-03 19:02:42.008 TRACE designate.service

Adding an "eventlet.sleep(0)" before the sendall() allows this to complete properly

Changed in designate:
importance: Undecided → Critical
milestone: none → mitaka-rc1
status: New → Triaged
Revision history for this message
Graham Hayes (grahamhayes) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to designate (master)

Fix proposed to branch: master
Review: https://review.openstack.org/288510

Changed in designate:
assignee: nobody → Graham Hayes (grahamhayes)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to designate (master)

Reviewed: https://review.openstack.org/288510
Committed: https://git.openstack.org/cgit/openstack/designate/commit/?id=f2c06477afbfb0e44d08f4228a1a87a9208c5c5f
Submitter: Jenkins
Branch: master

commit f2c06477afbfb0e44d08f4228a1a87a9208c5c5f
Author: Graham Hayes <email address hidden>
Date: Fri Mar 4 15:00:57 2016 +0000

    Fix for TCP connections not sending full content

    Eventlet previously broke the standard API for sockets
    and made socket.send() work in the same manor as socket.sendall()

    https://github.com/eventlet/eventlet/commit/c315ee86dac996ac533b738f7c8777f4d01a0472
    reverted to the standard behaviour.

    This was released as part of 0.18.0.

    The bug manifests itself when large (multi TCP message) AXFRs are
    performed over long distances.

    (I replicated it when the messages grew to 3,
    over USWest -> EU transfer)

    see http://graham.hayes.ie/posts/minidns-tcp-and-the-internet/
    for details on testing.

    This change can cause packets to be dropped intermitently -
    but retry will allow this to be overcome.

    Change-Id: Ia0c15d843fb2092cc693b37dc701492396c647d0
    Closes-Bug: #1552864

Changed in designate:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to designate (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/293408

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to designate (stable/liberty)

Reviewed: https://review.openstack.org/293408
Committed: https://git.openstack.org/cgit/openstack/designate/commit/?id=6c49166b065078e6ed50cbdd9a0afb44b3054474
Submitter: Jenkins
Branch: stable/liberty

commit 6c49166b065078e6ed50cbdd9a0afb44b3054474
Author: Graham Hayes <email address hidden>
Date: Fri Mar 4 15:00:57 2016 +0000

    Fix for TCP connections not sending full content

    Eventlet previously broke the standard API for sockets
    and made socket.send() work in the same manor as socket.sendall()

    https://github.com/eventlet/eventlet/commit/c315ee86dac996ac533b738f7c8777f4d01a0472
    reverted to the standard behaviour.

    This was released as part of 0.18.0.

    The bug manifests itself when large (multi TCP message) AXFRs are
    performed over long distances.

    (I replicated it when the messages grew to 3,
    over USWest -> EU transfer)

    see http://graham.hayes.ie/posts/minidns-tcp-and-the-internet/
    for details on testing.

    This change can cause packets to be dropped intermitently -
    but retry will allow this to be overcome.

    Change-Id: Ia0c15d843fb2092cc693b37dc701492396c647d0
    Closes-Bug: #1552864
    (cherry picked from commit f2c06477afbfb0e44d08f4228a1a87a9208c5c5f)

tags: added: in-stable-liberty
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to designate (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/293507

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/designate 2.0.0.0rc1

This issue was fixed in the openstack/designate 2.0.0.0rc1 release candidate.

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote : Fix included in openstack/designate 1.0.2

This issue was fixed in the openstack/designate 1.0.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to designate (stable/kilo)

Reviewed: https://review.openstack.org/293507
Committed: https://git.openstack.org/cgit/openstack/designate/commit/?id=a4b4c6dfb87459a604be03885da22191660e9ad6
Submitter: Jenkins
Branch: stable/kilo

commit a4b4c6dfb87459a604be03885da22191660e9ad6
Author: Graham Hayes <email address hidden>
Date: Fri Mar 4 15:00:57 2016 +0000

    Fix for TCP connections not sending full content

    Eventlet previously broke the standard API for sockets
    and made socket.send() work in the same manor as socket.sendall()

    https://github.com/eventlet/eventlet/commit/c315ee86dac996ac533b738f7c8777f4d01a0472
    reverted to the standard behaviour.

    This was released as part of 0.18.0.

    The bug manifests itself when large (multi TCP message) AXFRs are
    performed over long distances.

    (I replicated it when the messages grew to 3,
    over USWest -> EU transfer)

    see http://graham.hayes.ie/posts/minidns-tcp-and-the-internet/
    for details on testing.

    This change can cause packets to be dropped intermitently -
    but retry will allow this to be overcome.

    Change-Id: Ia0c15d843fb2092cc693b37dc701492396c647d0
    Closes-Bug: #1552864
    (cherry picked from commit f2c06477afbfb0e44d08f4228a1a87a9208c5c5f)
    (cherry picked from commit 6c49166b065078e6ed50cbdd9a0afb44b3054474)

tags: added: in-stable-kilo
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/designate 1.0.2

This issue was fixed in the openstack/designate 1.0.2 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.