Network indicator lists the non-exist AP (timeout for the AP to be removed is too big, ~6min)

Bug #1425172 reported by Ricardo Salveti on 2015-02-24
58
This bug affects 12 people
Affects Status Importance Assigned to Milestone
Canonical System Image
High
John McAleely
indicator-network (Ubuntu)
Undecided
Pete Woods
network-manager (Ubuntu)
High
Tony Espy
network-manager (Ubuntu RTM)
High
Tony Espy
ubuntu-system-settings (Ubuntu)
Undecided
Unassigned

Bug Description

Summary: Network indicator lists the non-exist AP
Steps to reproduce:
1. Boot to system
2. Scroll down the Network indicator
3. It lists about 10 AP (In Taipei office)
4. Go to another place and check network indicator again

Expected Result:
It should not list non-exist AP and only show available AP

Actual Result:
It shows about 12 AP on the screen but only two are real AP for connecting, and others 10 are from last list.

This is reproducible on mako/krillin on both RTM and vivid.

The main issue is that the timeout for the AP to be removed from the known AP list is too big when comparing with other phones.

Related branches

Ricardo Salveti (rsalveti) wrote :

From previous conversations:

Pat McGowan (pat-mcgowan) wrote on 2015-02-12:
Yes, I currently have some 50 APs in the list both on command line and the indicator
Its as if the phone was picking them up while I was driving

Mathieu Trudel-Lapierre (mathieu-tl) wrote on 2015-02-13:
Scanlist updates aren't instantaneous, it depends on whether another scan has happened, and whether the network is still in range; and given moving fast through APs as I recall they will also have to timeout from the list.
How long in between tests from the command-line? What is the distance between the two locations?

Marcus Tomlinson (marcustomlinson) wrote on 2015-02-16:
I tested this by turning on and off a WiFi hotspot a few times and observing both the output of "nmcli d wifi list" (executed every ±30s), and the list displayed in the Network Indicator.
Over 4 test cycles I observed the following:
When turning on the hotspot it takes about 1min (maybe 2min) for both nm and the indicator to display the new AP.
When turning off the hotspot, nm and the indicator took anywhere between 5-10min to recognise that the AP had dissappeared.
AFAICT signal strength updates for APs on the list (other than the currently connected AP) also lag by this 5-10min delay.
Seeing that the indicator is always in sync with Network Manager, these delays are clearly not caused by the Network Indicator, but by Network Manager itself (as Mathieu mentioned).
So yes, Network Manager does recognise when an AP is no longer in range or drops in signal strength, but personally I find the delay far too long (which I think is the concern expressed here).

John McAleely (john.mcaleely) wrote 50 seconds ago:
Anecdotally, going on a bus journey in london (or a train journey) will produce a long list of garbage AP's that are not interesting when you finally sit down near a hotspot you want to join. It's not an issue on 'other' OS.
Seen on RTM based hansets too.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in network-manager (Ubuntu):
status: New → Confirmed
Changed in network-manager (Ubuntu RTM):
status: New → Confirmed
Changed in network-manager (Ubuntu):
importance: Undecided → High
Changed in network-manager (Ubuntu RTM):
importance: Undecided → High
Changed in network-manager (Ubuntu):
assignee: nobody → Canonical Phone Foundations (canonical-phonedations-team)
Changed in network-manager (Ubuntu RTM):
assignee: nobody → Canonical Phone Foundations (canonical-phonedations-team)
Changed in canonical-devices-system-image:
status: New → Confirmed
importance: Undecided → High
milestone: none → ww11-2015
Pat McGowan (pat-mcgowan) wrote :

This will be a blocker, still seeing in build 127
The list is never pruned. A reboot resets the list of APs.
This is definitely Vivid only behavior. The size of the AP list also makes me wonder if we are actually suspending the phone.

Changed in canonical-devices-system-image:
milestone: ww11-2015 → ww13-2015
assignee: nobody → Michael Frey (mfrey)
assignee: Michael Frey (mfrey) → nobody
importance: High → Critical
Changed in canonical-devices-system-image:
assignee: nobody → Michael Frey (mfrey)
Changed in canonical-devices-system-image:
assignee: Michael Frey (mfrey) → Canonical Phone Foundations (canonical-phonedations-team)
tags: added: connectivity
tags: added: lt-blocker lt-category-visible
summary: - Network indicator lists the non-exist AP
+ Network indicator lists the non-exist AP (timeout for the AP to be
+ removed is too big)
description: updated
summary: Network indicator lists the non-exist AP (timeout for the AP to be
- removed is too big)
+ removed is too big, ~6min)

Wireless APs are removed after a while, based on whether we receive an AP removal signal from the supplicant, or if not, after a reasonable amount of time. This time is currently defined as three times the maximum scan interval (2 min), so after 6 minutes of the AP not changing strength or other properties to cause an update of its last-seen value, and not otherwise being explicitly removed by the supplicant. This is because we can miss signals from the supplicant for a variety of reasons, and also because APs might not be updating in a noticeable way for a long period of time between explicit scans, in which case it would also be wrong to remove them from the list.

Here it's important to also report findings from 'nmcli dev wifi list' as opposed to the indicator if it's an issue for network-manager, as the indicator may have further logic that changes the behavior of the AP list.

Now, I don't recall seeing such issues on desktop in any noticeable way, so it would definitely be important to make sure whether it's also an issue there (and it should if the problem is in NM, since we use the same NM, the same supplicant, and the same code for handling APs on all systems).

One thing this could be however is a driver issue. Drivers are entitled to report access-points the way they want, and if there is an internal timeout to removing APs from scan results (or the driver not actually scanning when we ask it to), then this could appear. In this case it does remain relevant to know how desktop compares to the phone.

Tony Espy (awe) wrote :

I tested krillin #20 ( ubuntu-touch/ubuntu-rtm/14.09 ) and it took ~6:30 for my Android hotspot to disappear from the scan list ( using 'nmcli d list | grep Anroid' ).

I tested my desktop ( Macair w/Intel WiFi ) running an up-to-date utopic image and it took 6m for an AP to disappear the first try, and ~7:30m the second time! network-manager version is: network-manager 0.9.8.8-0ubuntu28.

I tested arale #159 ( ubuntu-touch/vivid-proposed ), which includes the latest network-manager in vivid, and it took ~6:30 for my Android hotspot to disappear from the scan list ( this time using 'nmcli d wifi list' as nmcli changed with 0.9.10 ). On a side-note, it appears that arale doesn't do any background scanning while associated to an access point. I had to forget my current network before I could pickup a new iPhone or Android hotspot in the scan list. I'll enter a new bug for this.

Ricardo Salveti (rsalveti) wrote :

Just to compare, on both iphone and android the AP disappear in no more than 30 seconds.

Tony Espy (awe) wrote :

Strike my previous comment about arale not picking up a newly activated hotspot in my previous comment. This seems to work now.

Tony Espy (awe) wrote :

Note, I tested OS X, and it also wipes APs from the list in <= 30s. This was tested using a hotspot on another phone.

Tony Espy (awe) wrote :

I installed network-manager 0.9.10.0-4ubuntu14~mtrudel1 from ppa:mathieu-tl/nv-build on my krilling running vivid-proposed, and I can no longer see a hotspot from an iPhone when it's enabled.

I can see it from my desktop, and from krillin running RTM.

Tony Espy (awe) on 2015-04-09
Changed in indicator-network (Ubuntu):
status: New → Confirmed
Changed in ubuntu-system-settings (Ubuntu):
status: New → Confirmed
Tony Espy (awe) wrote :

I installed network-manager 0.9.10.0-4ubuntu14~mtrudel2 from ppa:mathieu-tl/nv-build on my mako ( vivid-proposed / #166 ) and on arale ( vivid-proposed / #165 ).

Both phones have problems recognizing an iPhone 6 hotspot ( security==WPA2). On boot, it gets recognized maybe 50% of the time. So I fired up a Nexus5 hotspot ( security==none ), and on boot, both devices always recognize it.

That said, neither properly ages it out of the network-indicator menu. I ran the same test on both devices, and after 15m, I still see 'AndroidAP' in the access point results returned by 'nmcli d wifi list'. Note, arale was associated to another AP at the time, whereas mako was not.

Also, neither mako nor arale seem to dynamically detect the Android AP after boot ( ie. boot the device with WiFi enabled, then activate the hotspot ). Again, I wanted >10m on both devices and never see the hotspot.

Sounds to me like the best course of action will be for the client apps (ie. indicator-network) to use the "last-seen" property on access-point objects to make its own decisions about which access-points to show. Provided it also regularly requests scans from NM (and it up to about every 10 seconds), it should be possible to aggressively remove older access-points from the scan list that indicator-network goes to show -- for instance, if scans are requested every 30 seconds, it's probably fine to not show access-points older than slightly more than 30 seconds (just in case an AP would only miss *one and only one* scan).

Tony Espy (awe) wrote :

I'm changing the system-settings task to Incomplete.

It was explained to me that it just uses the indicator's list of APs, so in theory there shouldn't be any extra work required. That said, Mathieu's comment about the indicator scan requests makes me think that maybe the settings page would also need to trigger these scan requests too?

Changed in ubuntu-system-settings (Ubuntu):
status: Confirmed → Incomplete
Changed in indicator-network (Ubuntu):
assignee: nobody → Pete Woods (pete-woods)
importance: Undecided → High
Tony Espy (awe) wrote :

@Mathieu

I think we're in agreement that an indicator fix is required.

That said, there's still the underlying problem of the APs never going away on arale and mako ( vivid-proposed ) that needs to be addressed. I will try and test krillin ( vivid-proposed ) later today.

Changed in canonical-devices-system-image:
milestone: ww13-2015 → ww17-2015
Pete Woods (pete-woods) wrote :

The hardest part of this for the indicator is knowing exactly when to start doing extra scans.

The indicator-network service has no idea when the indicator menu itself is actually opened. For this to work both unity8 and qmenumodel would need to be modified to support the "submenu-action" convention. (https://bugs.launchpad.net/ubuntu/+source/unity8/+bug/1398888)

Adding a filtered model between the raw APs and the current APs based on the last-seen property would be reasonably straight-forward.

My main objection would really be that there should really be a consistent strategy for AP aging across all clients. So, e.g. unity7 and unity8 will have the same behaviour.

Thomas Voß (thomas-voss) wrote :

I investigated into the Android approach, and it turns out that they solely rely on the last scan results to populate the settings panel. Please see https://docs.google.com/document/d/1DTvgpnJ37PvdHt_L65xtj4wdCnBfo8kAaUcBIp5c1U4/edit for a summary together with links to the relevant Java code.

Pete Woods (pete-woods) wrote :

I would propose a strategy of having the indicator starting a scan each time it is opened (it doesn't do that now).

Then having network manager be responsible for the actual aging, possibly through a new property on the main object called "recent access points" or similar, or by reducing the main list of access points.

Changed in canonical-devices-system-image:
milestone: ww17-2015 → ww21-2015
Changed in canonical-devices-system-image:
milestone: ww21-2015 → ww22-2015
Changed in canonical-devices-system-image:
importance: Critical → High
Matthew Paul Thomas (mpt) wrote :

Pete's suggestion is also what OS X does: whenever you open the menu, it triggers a scan, with the first menu item saying "Wi-Fi: Looking for networks…" until the scan completes. When I suggested the same for Ubuntu a couple of years ago, I was told (by Mathieu, I think?) that it wouldn't apply to Network Manager because it is constantly scanning in the background. Is that still the case, or did I misremember, or has the behavior changed since?

Changed in canonical-devices-system-image:
milestone: ww22-2015 → ww24-2015
Changed in indicator-network (Ubuntu):
status: Confirmed → Incomplete
assignee: Pete Woods (pete-woods) → nobody
importance: High → Undecided
Jonas G. Drange (jonas-drange) wrote :

Will changing the behaviour in the indicator also mean that the behaviour in System Settings will change? Does opening System Settings equal to opening the network indicator?

tags: removed: lt-blocker lt-category-visible
Changed in indicator-network (Ubuntu):
status: Incomplete → Confirmed
assignee: nobody → Pete Woods (pete-woods)
Changed in canonical-devices-system-image:
milestone: ww24-2015 → ww34-2015
Changed in canonical-devices-system-image:
assignee: Canonical Phone Foundations (canonical-phonedations-team) → John McAleely (john.mcaleely)
Changed in network-manager (Ubuntu RTM):
status: Confirmed → Invalid
assignee: Canonical Phone Foundations (canonical-phonedations-team) → nobody
Matthew Paul Thomas (mpt) wrote :

Jonas, it's possible that the answer to your question is "no, because bug 1467438 isn't fixed yet".

Tony Espy (awe) wrote :

Re-tested OTA5 images on arale, mako, and krillin.

Two scenarios:

1. Device connected to an AP
   - Android hotspot enabled, wait till it shows in the scan list
   - hotspot disabled, timer started
   - note time when hotspot disappears from scan list

2. Device ! connected to an AP; same steps as above.

I ran the scenarios three times each on all of the devices. The results looks to be pretty similar across all devices. For the most part, it takes ~6:30m for the hotspot to disappear from the list of APs exported by NM. In one case ( mako / connected ), the AP was never removed ( I waited more than 15m ). If I can reproduce this again, it'll require some more investigation.

That said, I also discovered that the reason for the all the devices having the same approximate time till the AP gets removed. The NMDeviceWiFi class is responsible for aging out access points. It uses a routine called cull_scan_list, which is invoked when a scan completes, or NM gets notified of a BSS updated ( signal strength ) or a BSS removed. This routine will only remove an AP from the list when it's last-seen time is >= 3 * MAX_SCAN_INTERVAL ( 2m ). This is contrary to Mathieu's assertion in comment #5. When a bss_removed signal is received from wpa_supplicant, a WPAS_REMOVED property is set on the GObject, but the AP is *not* removed from the AP list till the prune_interval ( 6m ) is exceeded.

I was able to verify this by running wpa_supplicant's command line tool ( 'wpa_cli scan_results | grep Android' ) in addition to nmcli. After 2m, the AP disappears from wpa_supplicant's scan results, but not NM's.

I didn't actually verify the receipt of the 'bss_removed' signal in NM, and will do so this afternoon.

Tony Espy (awe) wrote :

@Matthew

Regarding comment #18. Yes, NM does regular scans automatically. Over time, if it's not able to connect to an AP, the scan interval is gradually increased up to a maximum of 2m between scans. A long time ago, there used to be a method that nm-applet would use to tell NM that the user was interacting with the UI, which would in turn cause the scan_interval to revert back to it's minimum ( note, this may have just happened when the GetAccessPoints method was invoked ), but at some point a RequestScan method was added. I'm not sure if there's a trigger to cause the scan_interval to revert to minimum anymore.

We could add logic to the indicator to request a scan, however there's no way I can see to easily coordinate this with the current NM auto-scan logic. NMDeviceWiFi doesn't appear to export it's internal 'Scanning' property, nor does it export the current scan interval.

It may be that the we can change the cull_scan_list function to remove the AP when wpa_supplicant indicates that a BSS has been removed, although this might cause issues with roaming configurations as there might be other BSSes in the same SSID group ), or we could lower the prune_scan_interval... Fixing this logic is probably the most promising short-term fix. Long term, getting the removal to approach that of OS X or Android ( <= 30s ) will definitely involve the indicator in some way...

Tony Espy (awe) wrote :

The internal NMDeviceWiFi scan logic works like this:

It's minimum interval is 20s. It's set to this when the device first becomes ready or the device_state -> DISCONNECTED.

If the supplicant dies, it's set to 23s.

If an activation succeeds, the interval is set to 26s.

Every time schedule_scan is called and the prior request to scan succeeded, the interval is incremented by 10s or 20s ( depends on the current device_state ), up to a maximum of 120s.

Tony Espy (awe) wrote :

I suspect there my be deeper issues with scanning. Bug #1445134 reports that NM 0.9.10.0 is not periodically scanning.

I'm seeing the same thing on krillin and mako running the latest stable ( OTA5 ) images:

If I start wpa_cli ( as root ) on a system running utopic ( NM 0.9.8.x ), I see periodic scan events reported:

<3>CTRL-EVENT-SCAN-STARTED
<3>CTRL-EVENT-SCAN-RESULTS

I also can see the updated scan results using the wpa_cli 'scan_results' command.

If I perform the same test on krillin or mako ( NM 0.9.10.0 ), I don't see any periodic scan happening.

Tony Espy (awe) on 2015-08-07
Changed in indicator-network (Ubuntu):
status: Confirmed → Incomplete
Changed in network-manager (Ubuntu RTM):
status: Invalid → In Progress
Changed in network-manager (Ubuntu):
status: Confirmed → In Progress
assignee: Canonical Phone Foundations (canonical-phonedations-team) → Tony Espy (awe)
Changed in network-manager (Ubuntu RTM):
assignee: nobody → Tony Espy (awe)
Tony Espy (awe) on 2015-08-11
Changed in network-manager (Ubuntu):
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package network-manager - 0.9.10.0-4ubuntu23

---------------
network-manager (0.9.10.0-4ubuntu23) wily; urgency=medium

  [ Tony Espy ]
  * d/p/lp1444162-add-ip6-config-to-nm-ofono-connections.patch: disable
    IPv6 for ofono connections (LP: #1444162).

 -- Mathieu Trudel-Lapierre <email address hidden> Wed, 12 Aug 2015 15:45:38 -0400

Changed in network-manager (Ubuntu):
status: Fix Committed → Fix Released
Tony Espy (awe) wrote :

FixReleased as version 0.9.10.0-4ubuntu15.1.7 to ppa:ci-train-ppa-service/stable-phone-overlay.

Changed in network-manager (Ubuntu RTM):
status: In Progress → Fix Released
Tony Espy (awe) wrote :

Changing Canonical System Image task to FixCommitted as 0.9.10.0-4ubuntu15.1.7 has landed in the latest rc-proposed images ( confirmed on krillin / rc-proposed / #102 which contains ubuntu version '20150818.1' ).

Changed in canonical-devices-system-image:
status: Confirmed → Fix Committed
Changed in canonical-devices-system-image:
status: Fix Committed → Fix Released
Pete Woods (pete-woods) on 2017-11-14
Changed in indicator-network (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers