I've closed down on the root cause being the /usr/lib/NetworkManager/nm-dhcp-helper tool. Occasionally, this binary runs, but fails to correctly send the update to NetworkManager. No errors occur when this happens; NetworkManager in debug mode just says "accepted connection on private socket" then "closed connection on private socket" without any updates happening.
I've managed to work around the issue by wrapping /usr/lib/NetworkManager/nm-dhcp-helper in a shell script that simply performs the same lease update until the logs indicate that NetworkManager received it. This doesn't fix the communication problem, but adds a safety net that prevents the resulting issues. It's been tested in an office network of some 12 PCs.
If anyone runs into this issue as well, run the following script to work around it:
function is_elf() {
readelf -h "$1" >/dev/null 2>&1
if [ "$?" = "1" ]; then
echo "0"
else
echo "1"
fi
}
if [ "$(is_elf $HELPERSCRIPT)" = "1" ]; then
mv "$HELPERSCRIPT" "$HELPERBIN"
fi
cat <<EOF >"$HELPERSCRIPT"
#!/usr/bin/perl
use strict;
use warnings;
if(\$< != 0) {
die "Must run as root\n";
}
my \$reason = \$ENV{reason} || "";
if(\$reason eq "PREINIT") {
# not lease information, so waiting for the journal will make
# nm-dhcp-helper wait for too long, just send it once and exit so
# dhclient will start to get a lease
system("${HELPERBIN}");
exit(0);
}
my \$attempts = 0;
my \$success = 0;
while(\$attempts < 10) {
\$attempts++;
my \$time = time();
sleep(1);
system("${HELPERBIN}");
sleep(1);
my \$leasetime = \`/bin/journalctl --since='\\@\$time' | grep NetworkManager | grep ' lease time ' | wc -l\`;
if(\$leasetime == 1) {
\$success = 1;
last;
}
# Try again in 5 seconds
sleep(5);
}
if(\$attempts > 1) {
open my \$fh, ">>", "/tmp/nm-helper-retries.log" or die \$!;
my \$date = \`/bin/date\`;
1 while chomp \$date;
if(\$success) {
print \$fh "\$date: needed \$attempts attempts to update NetworkManager (\$reason).\n";
} else {
print \$fh "\$date: gave up after \$attempts attempts (\$reason).\n";
}
close \$fh;
}
I've closed down on the root cause being the /usr/lib/ NetworkManager/ nm-dhcp- helper tool. Occasionally, this binary runs, but fails to correctly send the update to NetworkManager. No errors occur when this happens; NetworkManager in debug mode just says "accepted connection on private socket" then "closed connection on private socket" without any updates happening.
I've managed to work around the issue by wrapping /usr/lib/ NetworkManager/ nm-dhcp- helper in a shell script that simply performs the same lease update until the logs indicate that NetworkManager received it. This doesn't fix the communication problem, but adds a safety net that prevents the resulting issues. It's been tested in an office network of some 12 PCs.
If anyone runs into this issue as well, run the following script to work around it:
-----8<-----
#!/bin/bash
HELPERSCRIPT= "/usr/lib/ NetworkManager/ nm-dhcp- helper" "/usr/lib/ NetworkManager/ nm-dhcp- helper. bin"
HELPERBIN=
function is_elf() {
readelf -h "$1" >/dev/null 2>&1
if [ "$?" = "1" ]; then
echo "0"
else
echo "1"
fi
}
if [ "$(is_elf $HELPERSCRIPT)" = "1" ]; then
mv "$HELPERSCRIPT" "$HELPERBIN"
fi
cat <<EOF >"$HELPERSCRIPT"
#!/usr/bin/perl
use strict;
use warnings;
if(\$< != 0) {
die "Must run as root\n";
}
my \$reason = \$ENV{reason} || ""; "${HELPERBIN} ");
if(\$reason eq "PREINIT") {
# not lease information, so waiting for the journal will make
# nm-dhcp-helper wait for too long, just send it once and exit so
# dhclient will start to get a lease
system(
exit(0);
}
my \$attempts = 0; "${HELPERBIN} ");
my \$success = 0;
while(\$attempts < 10) {
\$attempts++;
my \$time = time();
sleep(1);
system(
sleep(1);
my \$leasetime = \`/bin/journalctl --since='\\@\$time' | grep NetworkManager | grep ' lease time ' | wc -l\`;
if(\$leasetime == 1) {
\$success = 1;
last;
}
# Try again in 5 seconds
sleep(5);
}
if(\$attempts > 1) { helper- retries. log" or die \$!;
open my \$fh, ">>", "/tmp/nm-
my \$date = \`/bin/date\`;
1 while chomp \$date;
if(\$success) {
print \$fh "\$date: needed \$attempts attempts to update NetworkManager (\$reason).\n";
} else {
print \$fh "\$date: gave up after \$attempts attempts (\$reason).\n";
}
close \$fh;
}
exit(0); aa-complain /etc/apparmor. d/sbin. dhclient
EOF
chmod +x $HELPERSCRIPT
/usr/sbin/