More specific error messages when connect() fails

Bug #1408598 reported by Tim Starling on 2015-01-08
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
libmemcached
Undecided
Unassigned

Bug Description

We've been logging intermittent connection failures for a while. It took a while to work out that it was probably due to local (ephemeral) port exhaustion. This causes connect() to fail with EADDRINUSE. We can easily reproduce connection failures in the libmemcached under realistic connection rates. On Linux, the error will occur when there are more than about 28232 connections from a single client host to a single server in a 60 second period (the TIME_WAIT expiry).

In libmemcached's network_connect(), EADDRINUSE is handled by the "default" case, so just gives MEMCACHED_CONNECTION_FAILURE with no other details. It would be nice if more information could be given, for the purposes of logging. MEMCACHED_CONNECTION_FAILURE could be split, or memcached_last_error_message() could be documented as a public API and populated with some errno-specific error message in the event of connect() failure.

It would be nice if any errno was handled, since EACCES, ENETUNREACH and ENOMEM are probably also possible.

Also, according to Linux's man connect(2), EAGAIN indicates "no more free local ports or insufficient entries in the routing cache", so you probably shouldn't call poll() on the FD if that happens.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers