Comment 5 for bug 2004659

Revision history for this message
Dan Bungert (dbungert) wrote : Re: failed to install efi packages, not in pool, bad apt config

Summary of problem on Subiquity side:

TLDR: we want to patch subiquity to read the list of default routes when we receive a route_change event

* when we believe that we are offline, we intentionally limit the scope of where we look for packages to just the cdrom, this part is working fine
* when we are actually online, if we are acting as if we are offline (has_network == False), then that is incorrect behavior
* has_network is derived from the existence of default routes - if the default_route info is not correct, we risk the wrong has_network state
* subiquity receives route_change events from probert
* probert builds route_change events from libnl-route-3
* per https://github.com/thom311/libnl/issues/226#issuecomment-527888667, libln-route-3 route events are hashed using certain keys, and metric is not among those keys
* on live-server, which is not using network manager, adjusting the link state to active / inactive will cause a single route event to be generated, and subiquity can rely on that to know if there is a default route or not
* on desktop, which is using network manager, a different sequence of network events is generated. `ip monitor route|grep default` is helpful here. Using `ip monitor`, we can see the following sequence:

link goes active - ip monitor route reports 3 events
1) default route added at metric 20100 for interface X for family Y
2) default route added at metric 100 for interface X for family Y
3) default route deleted at metric 20100 for interface X for family Y

libnl-route-3 reports only 2 events
- default route added for interface X for family Y
- default route deleted for interface X for family Y

* subiquity sees only that interface X has no default route (more correctly, it sees a default route for a few milliseconds) - the unintended coalescing of the two events with different metrics means the events aren't enough info for subiquity to know conclusively that there is a default route.

In summary we need to use these events as a trigger to read the default routes instead of using the events directly to maintain the list.

Some notes from testing
* why do VMs work better - I believe these VMs usually or always have ipv6 in them, which is causing a different set of route events, which happens to produce a has_network state more out of luck than anything
* why does server work better - simpler set of route events with no coalesing hiding one of the ones we need
* why does physical installs of desktop work only sometimes on some hardware - different ordering of events from libnl-route3