udev needs to be triggered if rules are written to a new file

Bug #1817368 reported by Robert Schweikert on 2019-02-22
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init
Undecided
Robert Schweikert

Bug Description

On a VM with many interfaces it is possible that udev is slow. This may lead to the situation that 70-persistent-net.rules is being written to by cloud-init while also being written to by the net rules generator (75-persistent-net-generator.rules). This may lead to interleaving, duplicate definitions or incomplete data.

The user can already specify an optional rules file with the "netrules_path". However this will not have any effect when the file gets written as udev is not re-triggered. When the user configures a custom file (which logically should be something like 71-persistent-net.rules) cloud-init should trigger udev to add any new rules the code might have written.

Changed in cloud-init:
assignee: nobody → Robert Schweikert (rjschwei)
Changed in cloud-init:
status: New → In Progress
Ryan Harper (raharper) wrote :

I think we're in conflict with 75-persistent-net-generator.rules; as both cloud-init and that rule are attempting to map a MAC and interface name. This is going to be trouble, as you've seen.

Even if we sorted the race, if users may be confused as to the two files, or the interleaved values etc.

Does the image have systemd and persistent net naming enabled?

In cases like this, cloud-init has introduced logic to check if an in-image file is present and renames, or deletes it if it's taking an expected action (like generating persistent net rules for network config it has discovered).

Changed in cloud-init:
status: In Progress → Incomplete
Robert Schweikert (rjschwei) wrote :

Well, the way I see, based on investigation and input from others at SUSE we have 2 options. We can set the default file name for the netrules_path to something safe (80-pesistent-netrules-cloud-init.rules) for example. That would avoid the issue, the cloud-init generated rules would be last and thus the user supplied network setup would be applied. If we choose this path then IMHO netrules_path should no longer be configurable.

If we stay with the current intend, i.e. netrules_path is configurable then we want the updates in MP.

I have no strong opinion either way.

The image used that created this problem does have persistent names and systemd, yes. It also has a "large" number of network interfaces.

The way the systemd dependencies are setup today cloud-init-local.service and systemd-udevd.service can run at the same time and both write to 70-persistent-net.rules so there will always be a race. The third option is the cloud-init-local.service gets an

"After=systemd-udevd.service"

but I am not certain that will not create a cycle as cloud-init-local.service also has a

"Before=network-pre.target"

If we can avoid cycles, then the best solution, IMHO, is the change in order in cloud-init-local.service.

Ryan Harper (raharper) wrote :

I'm still a little confused around the presence of 75-persistent-net-generator.rules AND having systemd creating persistent nic names. That's going to cause problems unless it's always booted with net.ifnames=0.

Putting that aside, I do think using a higher number (as you suggest 80-persistent-net-cloud-init.rules) is a good fix which avoids the race by writing to different files altogether.

I don't think we need to prevent netrules_path from being configured; in Ubuntu Xenial, we still utilize our on known value. You can be the path in your Distro class as part of the renderer config.

% git diff cloudinit/distros/opensuse.py
diff --git a/cloudinit/distros/opensuse.py b/cloudinit/distros/opensuse.py
index 1bfe047..b48d40e 100644
--- a/cloudinit/distros/opensuse.py
+++ b/cloudinit/distros/opensuse.py
@@ -36,6 +36,8 @@ class Distro(distros.Distro):
     tz_local_fn = '/etc/localtime'
     renderer_configs = {
         'sysconfig': {
+ 'netrules_path':
+ 'etc/udev/rules.d/80-persistent-net-cloud-init.rules',
             'control': 'etc/sysconfig/network/config',
             'iface_templates': '%(base)s/network/ifcfg-%(name)s',
             'route_templates': {

Robert Schweikert (rjschwei) wrote :

I have updated the code. The current implementation works in the customer case. Also should not call settle in general, only when we get into the situation where things are "racy".

But I am having difficulties sorting out the tests and could use some help. Ryan, Dan, can you help me with the tests?

Thanks

Robert Schweikert (rjschwei) wrote :

Still need help with the tests, Ryan, Dan?

Thanks

On Fri, May 31, 2019 at 10:56 AM Robert Schweikert <
<email address hidden>> wrote:

> Still need help with the tests, Ryan, Dan?
>

Thanks for the ping. We've not completely forgotten =)

Ryan

>
> Thanks
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1817368
>
> Title:
> udev needs to be triggered if rules are written to a new file
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/cloud-init/+bug/1817368/+subscriptions
>

This bug is fixed with commit b3a87fc0 to cloud-init on branch master.
To view that commit see the following URL:
https://git.launchpad.net/cloud-init/commit/?id=b3a87fc0

Changed in cloud-init:
status: Incomplete → Fix Committed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers