I just want to give you an update on my trouble shooting effort today.
I found out that the execution of '/etc/network/if-up.d/' is not reliable all the time. From one boot to another boot I get different result.
Practically I have 3 interfaces, lo, eth00, and eth10.
On some boot, I get '/etc/network/if-up.d/' 3x on each those interfaces. On other boot, I only get 1x. Most of the boot, I get 2x on eth00 & eth10 (some times the order is the other way around). Obviously at boot ubuntu may combine the execution of '/etc/network/if-up.d/' if multiple network interface is up very close.
Now, when I have multiple execution of '/etc/network/if-up.d/', I can see that some times I have race condition (if the '/etc/network/if-up.d/cntlm' are executed too close with each other) that the start-stop-daemon fail to detect the start of the previous instance. Giving me 2 instances of cntlm trying to start at the same time.
Due to all this process happens pretty much in parallel, I even have the following sequence:
- eth00 up event executes '/etc/network/if-up.d/cntlm'
- eth10 up event executes '/etc/network/if-up.d/cntlm'
- '/etc/init.d/cntlm' is executed via '/etc/rc2.d/S20cntlm'
- '/etc/init.d/cntlm' is executed by '/etc/network/if-up.d/cntlm' due to eth00 is up
This some times cause the 2nd instance of cntlm to be started if it's executed too close
with the above.
- '/etc/init.d/cntlm' is executed by '/etc/network/if-up.d/cntlm' due to eth10 is up
So I ends up putting the following in my '/etc/network/if-up.d/cntlm':
# Check whether cntlm is enabled in this runlevel, if so restart it.
level="unknown"
while [ "$level" = "unknown" ]; do
level=`runlevel | cut -d" " -f2`
done
if [ -e /etc/rc${level}.d/S??cntlm -a "$IFACE" != "lo" ]; then
logger -f /var/log/syslog -t $IFACE Restarting cntlm in $(($$ % 10)) s.
sleep $(($$ % 10))
# invoke-rc.d --quiet cntlm restart >/dev/null 2>&1 || true
fi
And by putting random sleep, I managed to avoid race condition at most of the time.
But, the cntlm STILL failed in forwarding the request from the client. Even I take away the '/etc/network/if-up.d/cntlm', I still get exact same result on both Config 1 & 2.
After putting '-v' when I start the cntlm in '/etc/init.d/cntlm', I get extra prints and get the following failure:
direct_request() -> host_connect()> so_resolv().
with error code that the destination is temporarily not available.
Now the strange thing is I always managed to ping the destination server by name when the cntlm error situation occurs. Since the so_resolve() is only a wrapper to gethostbyname(), then it should work.
Putting delay upto 10 sec. inside '/etc/init.d/cntlm' just before calling start-stop-daemon in start case, didn't help. I literally *always* have to restart the cntlm manually to make it work.
Hi Stefano,
I just want to give you an update on my trouble shooting effort today.
I found out that the execution of '/etc/network/ if-up.d/ ' is not reliable all the time. From one boot to another boot I get different result.
Practically I have 3 interfaces, lo, eth00, and eth10.
On some boot, I get '/etc/network/ if-up.d/ ' 3x on each those interfaces. On other boot, I only get 1x. Most of the boot, I get 2x on eth00 & eth10 (some times the order is the other way around). Obviously at boot ubuntu may combine the execution of '/etc/network/ if-up.d/ ' if multiple network interface is up very close.
Now, when I have multiple execution of '/etc/network/ if-up.d/ ', I can see that some times I have race condition (if the '/etc/network/ if-up.d/ cntlm' are executed too close with each other) that the start-stop-daemon fail to detect the start of the previous instance. Giving me 2 instances of cntlm trying to start at the same time.
Due to all this process happens pretty much in parallel, I even have the following sequence: if-up.d/ cntlm' if-up.d/ cntlm' d/S20cntlm' if-up.d/ cntlm' due to eth00 is up if-up.d/ cntlm' due to eth10 is up
- eth00 up event executes '/etc/network/
- eth10 up event executes '/etc/network/
- '/etc/init.d/cntlm' is executed via '/etc/rc2.
- '/etc/init.d/cntlm' is executed by '/etc/network/
This some times cause the 2nd instance of cntlm to be started if it's executed too close
with the above.
- '/etc/init.d/cntlm' is executed by '/etc/network/
So I ends up putting the following in my '/etc/network/ if-up.d/ cntlm': {level} .d/S??cntlm -a "$IFACE" != "lo" ]; then
# Check whether cntlm is enabled in this runlevel, if so restart it.
level="unknown"
while [ "$level" = "unknown" ]; do
level=`runlevel | cut -d" " -f2`
done
if [ -e /etc/rc$
logger -f /var/log/syslog -t $IFACE Restarting cntlm in $(($$ % 10)) s.
sleep $(($$ % 10))
# invoke-rc.d --quiet cntlm restart >/dev/null 2>&1 || true
fi
And by putting random sleep, I managed to avoid race condition at most of the time.
But, the cntlm STILL failed in forwarding the request from the client. Even I take away the '/etc/network/ if-up.d/ cntlm', I still get exact same result on both Config 1 & 2.
After putting '-v' when I start the cntlm in '/etc/init. d/cntlm' , I get extra prints and get the following failure:
direct_request() -> host_connect()> so_resolv().
with error code that the destination is temporarily not available.
Now the strange thing is I always managed to ping the destination server by name when the cntlm error situation occurs. Since the so_resolve() is only a wrapper to gethostbyname(), then it should work.
Putting delay upto 10 sec. inside '/etc/init.d/cntlm' just before calling start-stop-daemon in start case, didn't help. I literally *always* have to restart the cntlm manually to make it work.
Any thought???
Cheers //Edo