[iotg][ehl] After reboot, both Ethernet interfaces don't come back

Bug #1941934 reported by Doug Jacobs
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
intel
Fix Committed
Undecided
Unassigned
Lookout-canyon-series
Fix Released
Undecided
Unassigned

Bug Description

[Summary] For testing the EHL board, I connected an Ethernet cable to both of the ports. After booting, they had the following IP addresses:
enp0s30f1 - 10.10.0.64
enp0s30f4 - 10.10.0.67
Ubuntu Core stated you could use either IP to connect to the device and I could ssh to it using both IP addresses.

I started CDTS remotely using 10.10.0.67, and the tests were running fine until the reboot stress test. This test will reboot the device 100 times. After a few cycles, I noticed CDTS was unable to reconnect to the device. Looking at the Ubuntu Core screen, it only said the 64 IP address was available. So I had CDTS disconnect and reconnect with the 64 address. When the device rebooted a few more times, it said only the 67 address was available. After each reboot, it seems a different combination of interfaces were brought back up.

ceqa@ubuntu:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp0s29f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 52:4d:68:cd:30:89 brd ff:ff:ff:ff:ff:ff
    altname �?�ky
    altname �;�ky
    inet 10.10.0.64/24 brd 10.10.0.255 scope global dynamic enp0s29f1
       valid_lft 3510sec preferred_lft 3510sec
    inet6 fe80::504d:68ff:fecd:3089/64 scope link
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether b2:27:af:a2:33:1f brd ff:ff:ff:ff:ff:ff
    altname �F���U

ceqa@ubuntu:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 52:4d:68:cd:30:89 brd ff:ff:ff:ff:ff:ff
    altname P��V0V
3: enp0s30f4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether b2:27:af:a2:33:1f brd ff:ff:ff:ff:ff:ff
    altname p4�V0V
    altname ��K��
    inet 10.10.0.67/24 brd 10.10.0.255 scope global dynamic enp0s30f4
       valid_lft 2559sec preferred_lft 2559sec
    inet6 fe80::b027:afff:fea2:331f/64 scope link
       valid_lft forever preferred_lft forever

[Steps to reproduce]
1. Connect both Ethernet ports to the same Switch (same network)
2. Boot the device.
3. Verify both interfaces have come up, have IP addresses, and you can connect to both addresses.
4. Start CDTS remotely
5. Choose Ubuntu Core Automated Tests, choose the Reboot Stress test.

[Expected result]
Each cycle, the device should reboot and both interfaces should come back up with the same IP addresses.

[Actual result]
Sometimes only 1 interface comes back up. If this is not the address CDTS remote was using, the test session will hang until you disconnect, and then reconnect to the other interface.

[Failure rate] 50%(average)

[Additional information]
CID: 202105-29105
SKU:
Image: 20210813.5
system-manufacturer: Intel Corporation
system-product-name: Elkhart Lake Embedded Platform
bios-version: EHLSFWI1.R00.3162.A01.2104131432
CPU: Intel Atom(R) x6425RE Processor @ 1.90GHz (4x)
GPU: 00:02.0 VGA compatible controller [0300]: Intel Corporation Device [8086:4571] (rev 01)
kernel-version: 5.11.0-1012-intel

[Stage]
Issue reported and logs collected right after it happened

Revision history for this message
Doug Jacobs (djacobs98) wrote :

Automatically attached

Revision history for this message
Doug Jacobs (djacobs98) wrote :

Automatically attached

Revision history for this message
Doug Jacobs (djacobs98) wrote :

Automatically attached

Revision history for this message
Doug Jacobs (djacobs98) wrote :

Automatically attached

description: updated
Revision history for this message
Doug Jacobs (djacobs98) wrote :

I tried disconnecting one cable, thinking that this would force the system to always use the remaining connection.

That was not the case. If the system comes up and tries to use the disconnected interface, it just complains that you need to configure DHCP on your network.

Doug Jacobs (djacobs98)
tags: added: lookout-canyon
Revision history for this message
Alex Hung (alexhung) wrote :

I had similar issue with my wifi card on a Kaby Lake CRB, and this can be improved by disabling "fast boot". A possible reason is some hardware requires longer initialization time.

I could find the option at "Boot maintenance Manager Menu -> Boot Configuration Menu -> Fast Boot", but it can differ from a system to another.

Revision history for this message
Alex Hung (alexhung) wrote :

Intel CRB may also have an option called "PCI delay optimization" as attached. This may (or not) help.

Revision history for this message
Kent Lin (kent-jclin) wrote :

This issue could not be reproduced with BIOS: Intel Corporation: EHLSFWI1.R00.3162.A01.2104131432 (UEFI) + Daily Build.

Changed in intel:
status: New → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.