Unable to find fallback nic when dhcp

Bug #1907883 reported by QianBiao Ng
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init
Expired
Undecided
Unassigned

Bug Description

Hello,

I am using cloud-init with OpenStack metadata in a ramdisk os. And sometimes on some server, cloud-init can not find fallback nic when dhcp.

Thanks in advance for any help.

Revision history for this message
QianBiao Ng (woo.cubic) wrote :

cloud-init collect-logs outputs

description: updated
Revision history for this message
Ryan Harper (raharper) wrote : Re: [Bug 1907883] Re: Unable to find fallback nic when dhcp

* QianBiao Ng <email address hidden> [2020-12-12 04:10]:
> cloud-init collect-logs outputs
>
> ** Attachment added: "cloud-init.tar.gz"
> https://bugs.launchpad.net/cloud-init/+bug/1907883/+attachment/5442893/+files/cloud-init.tar.gz
>
> ** Description changed:
>
> Hello,
>
> I am using cloud-init with OpenStack metadata in a ramdisk os. And
> sometimes on some server, cloud-init can not find fallback nic when
> dhcp.
> +
> + Thanks in advance for any help.

Thanks for submitting a bug and including the logs. I believe what's
happening is that in your ramdisk OS cloud-init is running before any
of the nics are present on the system (driver is loaded but nothing has
triggered/settled any of the devices).

The logs show this:

2020-12-12 09:26:18,381 - dhcp.py[DEBUG]: Skip dhcp_discovery: Unable to find fallback nic.
2020-12-12 09:26:18,381 - util.py[DEBUG]:
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/cloudinit/sources/DataSourceOpenStack.py", line 131, in _get_data
    with EphemeralDHCPv4(self.fallback_interface):
  File "/usr/lib/python2.7/site-packages/cloudinit/net/dhcp.py", line 57, in __enter__
    return self.obtain_lease()
  File "/usr/lib/python2.7/site-packages/cloudinit/net/dhcp.py", line 87, in obtain_lease
    raise NoDHCPLeaseError()
NoDHCPLeaseError

The Skip dhcp_discovery message is shown when the result of calling
net.find_fallback_nic() returns None, which happens when after looking
at /sys/class/net/* it cannot find any valid devices. Also note that later
in the logs when cloud-init.service runs, we see the notice about
unstable nics; this means nics that have yet to have had their
predictable names applied by udev suggesting that the nics have appeared
after cloud-init-local had already run.

Looking at the journal.txt file, we can see cloud-init local start here:

Dec 12 09:26:11.040127 localhost.localdomain systemd[1]: Starting
Initial cloud-init job (pre-networking)...

but the kernel doesn't "see" eth0 until 11 seconds later:

Dec 12 09:26:22.554428 localhost.localdomain kernel: i40e 0000:1a:00.0 eth0:
NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None

In your ramdisk OS, I would suggest that you use udevadm settle before
cloud-init-local.service, either as a standalone unit (udevadm-settle.service)
or just asa drop-in ExecStartPre=.

Typically udev is run in an initramfs and cloud-init does not run until
the localfs is mounted; however in your ramdisk OS, the initramfs file system
is the localfs so cloud-init runs before device initialization has been
completed.

>
> --
> You received this bug notification because you are subscribed to cloud-
> init.
> https://bugs.launchpad.net/bugs/1907883
>
> Title:
> Unable to find fallback nic when dhcp
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/cloud-init/+bug/1907883/+subscriptions

Revision history for this message
QianBiao Ng (woo.cubic) wrote :

Thanks you @raharper for such a detailed explanation.

If I understand you correctly, you suggest me to update cloud-init-local.service file adding a After condition like:

vi /usr/lib/systemd/system/cloud-init-local.service
[Unit]
Description=Initial cloud-init job (pre-networking)
.....
After=udevadm-settle.service
.....

Revision history for this message
Dan Watkins (oddbloke) wrote :

> After=udevadm-settle.service

Ryan is suggesting that you write the udevadm-settle.service unit file (and also add this line to the appropriate cloud-init unit file). I'd suggest testing `ExecStartPre=udevadm settle` as a drop-in.

> vi /usr/lib/systemd/system/cloud-init-local.service

systemd allows you to override units in /etc; this is preferable to editing the units shipped by cloud-init itself: https://unix.stackexchange.com/questions/398540/how-to-override-systemd-unit-file-settings has some more details on that.

(I'm moving this to Incomplete; if you're still seeing issues, please do move it back to New!)

Changed in cloud-init:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for cloud-init because there has been no activity for 60 days.]

Changed in cloud-init:
status: Incomplete → Expired
Revision history for this message
James Falcon (falcojr) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.