Bug #1805857 “network-manager dep8 failure blocks dnsmasq propos...” : Bugs : network-manager package : Ubuntu

Revision history for this message

Robie Basak (racb) wrote on 2018-11-29:

#1

Download full text (4.2 KiB)

I've done some investigation and my best educated guess right now is that there race somewhere, most like in the n-m dep8 tests but possibly in n-m, which is being triggered by an unrelated timing change in the newer dnsmasq and is causing some unwanted result that results in the dep8 test teardown to fail.

I reproduced by:

Creating a VM running Disco and dist-upgrading it and rebooting (without -proposed enabled).

Pulling the network-manager source and installing the test dependencies listed against "nm" in debian/tests/control

Running "sudo ./nm" from debian/tests/ by hand.

This passed, but a subsequent upgrade of dnsmasq-base (only) from disco-proposed reproduced the failure.

I isolated by:

Modifying the dep8 test to allow isolation (https://code.launchpad.net/~racb/network-manager/+git/network-manager/+merge/359837)

Running "sudo python3 -m unittest nm.ColdplugWifi.test_open_b_ip6_raonly_no_pe" from debian/tests/ by hand.

I found that after a test failure a reboot of the VM was necessary to cause the tests to work again. Swapping between the release pocket and proposed pocket version of dnsmasq-base mostly caused the failure to come and go as expected. However I believe that one time I saw the test pass against dnsmasq-base from proposed, which is why I think it's a race condition.

To get to the actual failure reason, I applied the follow patch:

--- a/debian/tests/nm.py
+++ b/debian/tests/nm.py
@@ -532,7 +538,6 @@ wpa_passphrase=12345678
     # connections and such); as it is very brittle and hard to track down
     # all remaining references to any NM* object after a test, we rather
     # run each test in a separate subprocess
- @network_test_base.run_in_subprocess
     def do_test(self, hostapd_conf, ipv6_mode, expected_max_bitrate,
                 secret=None, ip6_privacy=None):
         '''Actual test code, parameterized for the particular test case'''

This gave me the following output:

======================================================================
FAIL: test_open_b_ip6_raonly_no_pe (nm.ColdplugWifi)
Open network, 802.11b, IPv6 with only RA, PE disabled
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ubuntu/network-manager/debian/tests/nm.py", line 165, in shutdown_connections
    self.assert_iface_down(self.dev_e_client)
  File "/home/ubuntu/network-manager/debian/tests/nm.py", line 195, in assert_iface_down
    self.assertIn('state DOWN', out)
AssertionError: 'state DOWN' not found in '4: eth42@veth42: <NO-CARRIER,BROADCAST,MULTICAST,UP,M-DOWN> mtu 1500 qdisc noqueue state LOWERLAYERDOWN group default qlen 1000\n link/ether 1e:64:0d:9f:da:2e brd ff:ff:ff:ff:ff:ff\n'

----------------------------------------------------------------------
Ran 1 test in 35.639s

FAILED (failures=1)

The following didn't fix the problem:

--- a/debian/tests/nm.py
+++ b/debian/tests/nm.py
@@ -187,12 +187,18 @@ class NetworkManagerTest(network_test_base.NetworkTestBase):
else:
self.fail(message or 'timed out waiting for ' + str(condition))

+ @staticmethod
+ def iface_is_down(iface):
+ out = subprocess.check_output(['ip'...

I've done some investigation and my best educated guess right now is that there race somewhere, most like in the n-m dep8 tests but possibly in n-m, which is being triggered by an unrelated timing change in the newer dnsmasq and is causing some unwanted result that results in the dep8 test teardown to fail.

I reproduced by:

Creating a VM running Disco and dist-upgrading it and rebooting (without -proposed enabled).

Pulling the network-manager source and installing the test dependencies listed against "nm" in debian/tests/control

Running "sudo ./nm" from debian/tests/ by hand.

This passed, but a subsequent upgrade of dnsmasq-base (only) from disco-proposed reproduced the failure.

I isolated by:

Modifying the dep8 test to allow isolation (https://code.launchpad.net/~racb/network-manager/+git/network-manager/+merge/359837)

Running "sudo python3 -m unittest nm.ColdplugWifi.test_open_b_ip6_raonly_no_pe" from debian/tests/ by hand.

I found that after a test failure a reboot of the VM was necessary to cause the tests to work again. Swapping between the release pocket and proposed pocket version of dnsmasq-base mostly caused the failure to come and go as expected. However I believe that one time I saw the test pass against dnsmasq-base from proposed, which is why I think it's a race condition.

To get to the actual failure reason, I applied the follow patch:

--- a/debian/tests/nm.py
+++ b/debian/tests/nm.py
@@ -532,7 +538,6 @@ wpa_passphrase=12345678
     # connections and such); as it is very brittle and hard to track down
     # all remaining references to any NM* object after a test, we rather
     # run each test in a separate subprocess
-    @network_test_base.run_in_subprocess
     def do_test(self, hostapd_conf, ipv6_mode, expected_max_bitrate,
                 secret=None, ip6_privacy=None):
         '''Actual test code, parameterized for the particular test case'''

This gave me the following output:

======================================================================
FAIL: test_open_b_ip6_raonly_no_pe (nm.ColdplugWifi)
Open network, 802.11b, IPv6 with only RA, PE disabled
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ubuntu/network-manager/debian/tests/nm.py", line 165, in shutdown_connections
    self.assert_iface_down(self.dev_e_client)
  File "/home/ubuntu/network-manager/debian/tests/nm.py", line 195, in assert_iface_down
    self.assertIn('state DOWN', out)
AssertionError: 'state DOWN' not found in '4: eth42@veth42: <NO-CARRIER,BROADCAST,MULTICAST,UP,M-DOWN> mtu 1500 qdisc noqueue state LOWERLAYERDOWN group default qlen 1000\n    link/ether 1e:64:0d:9f:da:2e brd ff:ff:ff:ff:ff:ff\n'

----------------------------------------------------------------------
Ran 1 test in 35.639s

FAILED (failures=1)

The following didn't fix the problem:

--- a/debian/tests/nm.py
+++ b/debian/tests/nm.py
@@ -187,12 +187,18 @@ class NetworkManagerTest(network_test_base.NetworkTestBase):
         else:
             self.fail(message or 'timed out waiting for ' + str(condition))
 
+    @staticmethod
+    def iface_is_down(iface):
+        out = subprocess.check_output(['ip', 'a', 'show', 'dev', iface],
+                                      universal_newlines=True)
+        return 'state DOWN' in out
+
     def assert_iface_down(self, iface):
         '''Assert that client interface is down'''
 
+        self.assertEventually(lambda: self.iface_is_down(iface))
         out = subprocess.check_output(['ip', 'a', 'show', 'dev', iface],
                                       universal_newlines=True)
-        self.assertIn('state DOWN', out)
         self.assertNotIn('inet 192', out)
         self.assertNotIn('inet6 2600', out)

So I think the race is causing some change prior to this point.

I don't think it's clear that this is definitely not a dnsmasq regression, but it seems more likely to me to be an issue in n-m or its tests. I don't think it'd be useful for me to learn n-m internals wearing my server team hat to fix this since the server team doesn't get much involved in n-m, so I'm leaving this to the desktop team for now. Please let me know though if you disagree with my assessment or if it starts looking like a dnsmasq regression, and I'll be happy to look again.

Revision history for this message

Sebastien Bacher (seb128) wrote on 2018-11-29:

#2

Thanks Robie!

Changed in network-manager (Ubuntu):
importance:	Undecided → High
tags:	added: rls-dd-incoming

Will Cooke (willcooke) on 2018-12-04

Changed in network-manager (Ubuntu Disco):
assignee:	nobody → Iain Lane (laney)

Revision history for this message

Sebastien Bacher (seb128) wrote on 2018-12-19:

#3

IRC comment from upstream on #nm in case it's useful

'not that `nmcli device disconnect $DEV` and `nmcli connection down $PROFILE` do not set the interface down (like `ip link set $IF down`). Instead, the ~logically~ disconnect the interface, which essentially means that `nmcli device` reports that the device is disconnected and that there is no IP configuration on the device.

asserting that the interface is down, may just check the wrong thing.'

Revision history for this message

Iain Lane (laney) wrote on 2018-12-20:

#4

After looking for a while, I think that the error we were managing to reproduce (LOWERLAYERDOWN) was probably a red herring, and we were reproducing some other (valid) bug. The original cause of the testsuite failure still exists.

Not entirely sure why but this only reproduces for me 'properly' when I use the real autopkgtest cloud. It's quite reliable there, so I ran a bisect and it resulted in this mail:

http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2018q4/012709.html

I freely admit my relative ignorance here, so if anyone can help please feel free to chip in.

Revision history for this message

Iain Lane (laney) wrote on 2019-01-07:

#5

I did some more poking this morning and followed up to the thread. A change in 2.80 has made this start happening. I'm hoping that someone upstream will understand what's going on enough to propose a fix.

Jeremy Bícha (jbicha) on 2019-01-08

tags:

added: disco
removed: rls-dd-incoming

Revision history for this message

Launchpad Janitor (janitor) wrote on 2019-01-08:

#6

This bug was fixed in the package network-manager - 1.12.6-0ubuntu3

---------------
network-manager (1.12.6-0ubuntu3) disco; urgency=medium

  * debian/tests/nm.py: Make assert_iface_down() not check the interface's
    state. We call nmclient.deactivate_connection() to terminate connections
    that the testsuite sets up, and according to upstream this is not
    guaranteed to do anything in particular to the link state. It seems that
    dnsmasq 2.80 somehow alters the previous assumption that it would be
    'state DOWN', so the implementation detail we were checking previously no
    longer holds. The testsuite does still check that the IPs are removed from
    the interface, which is logically what we want anyway. (LP: #1805857)
  * debian/control: Fix Vcs-Git

-- Iain Lane <email address hidden> Wed, 19 Dec 2018 13:12:58 +0000

Changed in network-manager (Ubuntu Disco):
status:	Triaged → Fix Released

Revision history for this message

Iain Lane (laney) wrote on 2019-01-14:

#7

Fix Released for dnsmasq (because of the revert), but we should still keep an eye on the upstream thread.

Changed in dnsmasq (Ubuntu Disco):
status:	New → Fix Released

Ubuntu
network-manager package

network-manager dep8 failure blocks dnsmasq proposed migration

Bug Description

Other bug subscribers

Remote bug watches

	Status	Importance	Assigned to
dnsmasq (Ubuntu)	Fix Released	Undecided	Unassigned
Disco	Fix Released	Undecided	Unassigned
network-manager (Ubuntu)	Fix Released	High	Iain Lane
Disco	Fix Released	High	Iain Lane

Ubuntunetwork-manager package

network-manager dep8 failure blocks dnsmasq proposed migration

Bug Description

Other bug subscribers

Remote bug watches

Ubuntu
network-manager package