systemd-networkd thinks it loses its lease every renewal

Bug #1896229 reported by Shaun Crampton
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

With a server running 20.04 on AWS, I noticed connectivity glitches once per half hour. Eventually managed to correlate it with DHCP renewals. Each time systemd-networkd does a renewal, it logs that the lease was lost and goes through a cycle of removing and re-adding the IP and routes (Even though it's the same IP and routes). This causes disruption, especially to SNATted flows; if a packet arrives for an SNATted flow during hte windows where the IP is removed then (I think) the host sends a RST and the flow gets torn down. (In any case, such flows get lost during the glitch.)

I'd expect a DHCP renewal to be completely transparent; the IP shouldn't flap, it should just be updated to have a longer lifetime.

I managed to capture a PCAP of the DHCP renewals along with a debug log from systemd-networkd.

ProblemType: Bug
DistroRelease: Ubuntu 19.10
Package: systemd 242-7ubuntu3.11
ProcVersionSignature: User Name 5.3.0-1030.32-aws 5.3.18
Uname: Linux 5.3.0-1030-aws x86_64
ApportVersion: 2.20.11-0ubuntu8.9
Architecture: amd64
Date: Fri Sep 18 13:05:22 2020
Ec2AMI: ami-0d3d788094d3f0aa9
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: us-west-2a
Ec2InstanceType: t3.large
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
Lsusb: Error: command ['lsusb'] failed with exit code 1:
MachineType: Amazon EC2 t3.large
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=C.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.3.0-1030-aws root=PARTUUID=e96a8035-01 ro console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295 panic=-1
SourcePackage: systemd
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 10/16/2017
dmi.bios.vendor: Amazon EC2
dmi.bios.version: 1.0
dmi.board.asset.tag: i-0fc0b0107c428fe19
dmi.board.vendor: Amazon EC2
dmi.chassis.asset.tag: Amazon EC2
dmi.chassis.type: 1
dmi.chassis.vendor: Amazon EC2
dmi.modalias: dmi:bvnAmazonEC2:bvr1.0:bd10/16/2017:svnAmazonEC2:pnt3.large:pvr:rvnAmazonEC2:rn:rvr:cvnAmazonEC2:ct1:cvr:
dmi.product.name: t3.large
dmi.sys.vendor: Amazon EC2

Revision history for this message
Shaun Crampton (fasaxc) wrote :
Revision history for this message
Shaun Crampton (fasaxc) wrote :

Hmm, looks like this server is on 19.10, not 20.04 as I'd thought.

Revision history for this message
Shaun Crampton (fasaxc) wrote :
Revision history for this message
Shaun Crampton (fasaxc) wrote :

Looks like the restart to enable debug triggered a DHCPDISCOVER, that's the easiest point to use to synchronise the PCAP and the log.

Note logs of successful DHCP ACK messages followed by "Lease lost!".

Revision history for this message
Shaun Crampton (fasaxc) wrote :

Doesn't seem to repro on 20.04 with systemd 245.4-4ubuntu3.2

Changed in systemd (Ubuntu):
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.