Fixes for MDNS device communication errors

Bug #1616861 reported by Martin Wilck
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
HPLIP
Opinion
Undecided
Unassigned

Bug Description

In my home network, the HP printer/scanner (ENVY 5530) often fails with messages like this:

hp-toolbox[7119]: [7119]: error: Unable to communicate with device (code=12): hp:/net/ENVY_5530_series?zc=HP3464A9E628B4

I've tracked this down to two problems with the MDNS implementation in HPLIP:

 1. On a multi-homed host, HPLIP uses only one interface for MDNS multicast send and receive. This leads to failure if the printer is not on the "default" network. The code uses just INADDR_ANY in its multicast IP_MULTPATH_IF and IP_ADD_MEMBERSHIP setsockopt calls, meaning that the kernel has to figure out the interface to use. This fails if the system has no default route, or if the HP device is on a different network than the default route. The solution is to receive and send multicast on all interfaces.

 2. MDNS authorities (including HP printers) don't answer every query, especially the same query is repeated quickly. The latter happens when the HP tools start, causing the MDNS lookup procedure to fail and the "communication error" to be detected. This can be solved by retrying the query after a short wait (1-2s).

I have created patches that solve both problems. Tested in my home network.

Many HPLIP bug reports on (code=12) errors can be found on the web; I have reason to believe that these patches would fix some of them, and would obsolete questionable advices that are found in various places ("disable SELinux"; "disable the Firewall").

I am attaching the "hp-check" output as required by your bug reporting guidelines but I assure you that it has no relevance for the problem. My patched hplip runs smoothly although hp-check reports many missing dependencies.

I am working under OpenSUSE Tumbleweed, hplip package 3.16.5. I double-checked against 3.16.7 and have no reason to believe that any of the problems I encountered are fixed there, as the respective code has remained unchanged.

Tags: mdns multicast
Revision history for this message
Martin Wilck (mwilck) wrote :
Revision history for this message
Martin Wilck (mwilck) wrote :
Revision history for this message
Martin Wilck (mwilck) wrote :

In addition to the previous patch, which enables receiving of
MDNS packets on all interfaces, this patch causes hplip
to also send queries on all interfaces.

Revision history for this message
Martin Wilck (mwilck) wrote :

In addition to the previous patch, which enables receiving of
MDNS packets on all interfaces, this patch causes hplip
to also send queries on all interfaces.

(sorry, the patch in the previous comment was wrong).

Revision history for this message
Martin Wilck (mwilck) wrote :

This patch makes the same changes to the python MDNS implementation
as the previous patches to the C implementation.

Revision history for this message
Martin Wilck (mwilck) wrote :
Revision history for this message
Martin Wilck (mwilck) wrote :

The preceding patches make HPLIP listen on different interfaces. That makes it more likely to receive MDNS packets that aren't actually what we are looking for. This patch and the next one cause MDNS not to detect such situations, avoiding them to cause communication failures.

Revision history for this message
Martin Wilck (mwilck) wrote :

... forgot to attach patch in previous comment ...

Revision history for this message
Martin Wilck (mwilck) wrote :
Revision history for this message
Martin Wilck (mwilck) wrote :

The previous patches will use EVERY interface on the system. It makes sense to add some sanity checking. This will skip lo and interfaces with no IPv4 address, e.g. virtual interfaces created by libvirt.

Revision history for this message
Martin Wilck (mwilck) wrote :

MDNS authorities don't reply to every request. They may delay
responses and may not reply if the same query is carried out after
a very short time. This happens in hplip because the same queries
are issued during startup repeatedly.

This can be solved easily by retrying the query after a a short
wait. Typically in my environment queries will reliably succeed after
500ms for single lookups and 1000ms for scanner lookups.

The logic is the same for both normal fqdn lookup and scanner query,
so put it in a separate function.

Revision history for this message
Martin Wilck (mwilck) wrote :

With the previous patch, we will retry queries. This turns out
to be more reliable than simply waiting for an answer. Reducing
the wait time for receive; 30 ms are a lot for a MNDS reply on
modern networks. If nothing arrives in this time, it's more likely
that the server simply hasn't replied to this query.

Revision history for this message
Martin Wilck (mwilck) wrote :
Revision history for this message
Martin Wilck (mwilck) wrote :

This and the following patch are NOT required to solve the problem. I just found them helpful during debugging.

Revision history for this message
Martin Wilck (mwilck) wrote :

Test program can be compiled e.g. like this:
gcc $CFLAGS -D__TEST__=1 -DMDNS_DEBUG=1 -DMDNS_STDERR=1 mdns.c

The test program is useful for checking mdns responses to
various queries in the network. Run without arguments to scan
for scanners, and with host name argument to run a simple
name lookup.

Revision history for this message
Martin Wilck (mwilck) wrote :

Forgot to mention: The patch series as posted here is against hplip 3.16.7. Patches apply on 3.16.5, too.

Revision history for this message
Martin Wilck (mwilck) wrote :
Changed in hplip:
status: New → Opinion
Revision history for this message
Martin Wilck (mwilck) wrote :

I'm not sure what "Opinion" means to you, but this fixes a real problem in a real environment.

Revision history for this message
Martin Wilck (mwilck) wrote :

Changes in HPLIP 3.17.11 have caused my patches to no longer apply.

I'd be grateful for an explanation what "_uscan._tcp.local" (besides "_scanner._tcp.local") is needed for.

To post a comment you must log in.