Comment 2 for bug 343069

Revision history for this message
jcornwall (jay-jcornwall) wrote :

OK, I had some time to debug this.

The problem is caused by a thread-unsafe patch introduced by the upstream maintainer (in debian/patches/13_avahi_fix_and_handle_restart.dpatch). This patch was intended to restore Firefly's broken Avahi handler so that daemon reconnections could be handled gracefully. By reenabling this broken part of Firefly, a slew of race conditions between Firefly and Avahi, which were already present in the code, were exposed.

The core of the problem is this: the avahi_simple_poll_* API makes it impossible to interact with from outside the event loop thread. This is by design and led to development of the thread-aware avahi_threaded_poll_* API.

I have put together a preliminary patch to resolve this (attached). It passes Valgrind's Helgrind race condition checker in all live paths I have tested; previously, crash-causing races were detected in several places. It has one weakness:

* _rend_avahi_lock/unlock are messily extended with a from_callback parameter. This was necessary because some locking code is shared by external threads and by the event thread (in callbacks) and Avahi provides no way to detect the thread context. Locking Avahi in the callback thread would otherwise lead to deadlock.

This is a criticism of elegance and not functionality. I have tested the patch and am fairly happy with it.