apt-avahi-discover can take a long time when the destination service port is blocked

Bug #1733694 reported by Tiago Stürmer Daitx
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
squid-deb-proxy (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

When trying to use apt I noticed that actions that required the proxy were taking a long time to complete. I tracked that down to the squid-deb-proxy-client package, in particular the apt-avahi-discover script, which was taking more than 2 minutes to complete.

Current output:
$ time /usr/share/squid-deb-proxy-client/apt-avahi-discover --debug
avahi-browse output:
<snipped>
no timeout set, using default '2'
error: uncaptured python exception, closing channel <AptAvahiClient> ('192.168.86.60', 127): 9223372036854775807 (<class 'socket.error'>:[Errno 110] Connection timed out [/usr/lib/python2.7/asyncore.py|readwrite|108] [/usr/lib/python2.7/asyncore.py|handle_read_event|446] [/usr/lib/python2.7/asyncore.py|handle_connect_event|454])
sorted hosts: '[<AptAvahiClient> ('192.168.86.30', 8000): 0.00731706619263, <AptAvahiClient> ('10.113.100.1', 8000): 0.00739908218384, <AptAvahiClient> ('fd42:6e97:d27c:25bc::1', 8000, 0, 0): 0.00753402709961, <AptAvahiClient> ('192.168.86.60', 127): 9223372036854775807]'
http://192.168.86.30:8000/
/usr/share/squid-deb-proxy-client/apt-avahi-discover --debug 0.03s user 0.01s system 0% cpu 2:12.20 total

Here 192.168.86.30 is the "localhost", which also runs lxd (thus the 10.113.1001 address) and the squid-deb-proxy service. My laptop is 192.168.86.60 and also runs its own squid-deb-proxy service.

My laptop (192.168.86.60) is firewalled, so although its own avahi-daemon contacted the other server (192.168.86.30) to announce the service, the other server can't really access it (thus the huge int for the "response" time.

Interrupting the calls led me to:
<snipped>
Traceback (most recent call last):
  File "/usr/share/squid-deb-proxy-client/apt-avahi-discover", line 130, in <module>
    address = get_proxy_host_port_from_avahi()
  File "/usr/share/squid-deb-proxy-client/apt-avahi-discover", line 116, in get_proxy_host_port_from_avahi
    asyncore.loop(timeout=timeout)
  File "/usr/lib/python2.7/asyncore.py", line 216, in loop
    poll_fun(timeout, map)
  File "/usr/lib/python2.7/asyncore.py", line 145, in poll
    r, w, e = select.select(r, w, e, timeout)

So what happens is that the asyncore.loop() call will hang until the sockets timeout.

I have tried a few things such as:
- setting default timeout to 1
- setting the socket.settimeout(DEFAULT_CONNECT_TIMEOUT_SEC)
  inside AptAvahiClient before the connect call
but neither of the above worked.

What finally worked was to set asyncore.loop(timeout=timeout, count=1). It seems that the timeout alone is not enough because the select() call will simply loop over the timeout [1,2].

Output with asyncore.loop(timeout=timeout, count=1):
$ time /usr/share/squid-deb-proxy-client/apt-avahi-discover --debug
avahi-browse output:
<snipped>
no timeout set, using default '2'
sorted hosts: '[<AptAvahiClient> ('192.168.86.30', 8000): 0.00732398033142, <AptAvahiClient> ('10.113.100.1', 8000): 0.00736904144287, <AptAvahiClient> ('fd42:6e97:d27c:25bc::1', 8000, 0, 0): 0.00745701789856, <AptAvahiClient> ('192.168.86.60', 127): 9223372036854775807]'
http://192.168.86.30:8000/
/usr/share/squid-deb-proxy-client/apt-avahi-discover --debug 0.02s user 0.01s system 2% cpu 1.027 total

So, around 1 seconds, way better.

I also tried using other values for count, here is a small table with timeout = 2 (the default):
count = 1 -> 1 second
count = 2 -> 3 seconds
count = 3 -> 5 seconds
count = 4 -> 7 seconds
count = 5 -> 9 seconds
count = 8 -> 15 seconds
count = 15 -> 29 seconds

For timeout = 5:
count = 1 -> 1 second
count = 2 -> 6 seconds
count = 3 -> 11 seconds
count = 4 -> 16 seconds
count = 5 -> 21 seconds

For the time measurements I ran the commands at least 5 times to check for any outliers, the few changes I saw there mostly corresponded to the timing variation from just the avahi-browse call [3].

It seems that the very first try "timeouts" pretty quick (under a second), but the following retries (if count > 1) do take the set timeout into account. Thus, for my tests, the actual timeout is around: ((timeout * (count -1)) + 1. Remember that the +1 comes from the avahi-browser in my environment, if it takes longer or shorter than that won't really affect (or be affected by) my proposed fix.

I'm attaching a patch that uses count=2 which should give a timeout pretty close to the one set by the user (or the default), but that might need additional review as I'm not sure what are the possible drawbacks from setting count to such a low value.

Additional info:
$cat /etc/os-release
NAME="Ubuntu"
VERSION="17.10 (Artful Aardvark)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 17.10"
VERSION_ID="17.10"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=artful
UBUNTU_CODENAME=artful

$ apt-cache policy squid-deb-proxy
squid-deb-proxy:
  Installed: 0.8.14
  Candidate: 0.8.14
  Version table:
 *** 0.8.14 500
        500 http://br.archive.ubuntu.com/ubuntu artful/universe amd64 Packages
        500 http://br.archive.ubuntu.com/ubuntu artful/universe i386 Packages
        100 /var/lib/dpkg/status

[1] https://stackoverflow.com/a/15842027
[2] https://stackoverflow.com/a/14485737
[3] The "avahi-browse -kprtf _apt_proxy._tcp" command (called by apt-avahi-discover) runs in around 1 second most of the time, but it did take 3+ seconds a couple times.

Revision history for this message
Tiago Stürmer Daitx (tdaitx) wrote :
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "apt-avahi-discover_timeout-fix.patch" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in squid-deb-proxy (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers