Comment 5 for bug 825593

Revision history for this message
Edoardo Tirtarahardja (etirta) wrote :

Hi Stefano,

I just want to give you an update on my trouble shooting effort today.

I found out that the execution of '/etc/network/if-up.d/' is not reliable all the time. From one boot to another boot I get different result.

Practically I have 3 interfaces, lo, eth00, and eth10.

On some boot, I get '/etc/network/if-up.d/' 3x on each those interfaces. On other boot, I only get 1x. Most of the boot, I get 2x on eth00 & eth10 (some times the order is the other way around). Obviously at boot ubuntu may combine the execution of '/etc/network/if-up.d/' if multiple network interface is up very close.

Now, when I have multiple execution of '/etc/network/if-up.d/', I can see that some times I have race condition (if the '/etc/network/if-up.d/cntlm' are executed too close with each other) that the start-stop-daemon fail to detect the start of the previous instance. Giving me 2 instances of cntlm trying to start at the same time.

Due to all this process happens pretty much in parallel, I even have the following sequence:
- eth00 up event executes '/etc/network/if-up.d/cntlm'
- eth10 up event executes '/etc/network/if-up.d/cntlm'
- '/etc/init.d/cntlm' is executed via '/etc/rc2.d/S20cntlm'
- '/etc/init.d/cntlm' is executed by '/etc/network/if-up.d/cntlm' due to eth00 is up
  This some times cause the 2nd instance of cntlm to be started if it's executed too close
  with the above.
- '/etc/init.d/cntlm' is executed by '/etc/network/if-up.d/cntlm' due to eth10 is up

So I ends up putting the following in my '/etc/network/if-up.d/cntlm':
# Check whether cntlm is enabled in this runlevel, if so restart it.
level="unknown"
while [ "$level" = "unknown" ]; do
    level=`runlevel | cut -d" " -f2`
done
if [ -e /etc/rc${level}.d/S??cntlm -a "$IFACE" != "lo" ]; then
    logger -f /var/log/syslog -t $IFACE Restarting cntlm in $(($$ % 10)) s.
    sleep $(($$ % 10))
# invoke-rc.d --quiet cntlm restart >/dev/null 2>&1 || true
fi

And by putting random sleep, I managed to avoid race condition at most of the time.

But, the cntlm STILL failed in forwarding the request from the client. Even I take away the '/etc/network/if-up.d/cntlm', I still get exact same result on both Config 1 & 2.

After putting '-v' when I start the cntlm in '/etc/init.d/cntlm', I get extra prints and get the following failure:
direct_request() -> host_connect()> so_resolv().
with error code that the destination is temporarily not available.

Now the strange thing is I always managed to ping the destination server by name when the cntlm error situation occurs. Since the so_resolve() is only a wrapper to gethostbyname(), then it should work.

Putting delay upto 10 sec. inside '/etc/init.d/cntlm' just before calling start-stop-daemon in start case, didn't help. I literally *always* have to restart the cntlm manually to make it work.

Any thought???

Cheers //Edo