Sometime the Eoan s390x LPAR can't get an IP address after reboot

Bug #1860421 reported by Po-Hsu Lin
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
Invalid
Undecided
Unassigned
linux (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Node: s2lp4 Ubuntu Eoan s390x LPAR

Sometimes after a reboot (we always reboot it before start testing), this instance will lost its network connection. This is not the first time that I saw this issue with Eoan in this SRU cycle. With the most recent respin (5.3.0-29.31) this happened only once.

The system is still alive, just the network connection not working. You will have to access it via HMC console and reboot it from there.

You will find the dmesg when the networking is not working in the attachment.

Here is the ip addr output when this happens:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: encc000: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 2a:55:61:1d:aa:38 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::2855:61ff:fe1d:aa38/64 scope link
       valid_lft forever preferred_lft forever
3: enP1p0s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 82:0a:2d:0c:b8:70 brd ff:ff:ff:ff:ff:ff
4: enP1p0s0d1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 82:0a:2d:0c:b8:71 brd ff:ff:ff:ff:ff:ff
5: enP2p0s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 82:0a:2d:0c:b7:00 brd ff:ff:ff:ff:ff:ff
6: enP2p0s0d1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 82:0a:2d:0c:b7:01 brd ff:ff:ff:ff:ff:ff
7: lxcbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:16:3e:00:00:00 brd ff:ff:ff:ff:ff:ff
    inet 10.0.3.1/24 scope global lxcbr0
       valid_lft forever preferred_lft forever
8: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 52:54:00:4c:f5:88 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
       valid_lft forever preferred_lft forever
9: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master virbr0 state DOWN group default qlen 1000
    link/ether 52:54:00:4c:f5:88 brd ff:ff:ff:ff:ff:ff

I don't recall we have this issue before, it might need some more investigation with reboot stress test.

The last test suite executed before this issue happens is the "sru-misc-stable" test, it should be executed again to make sure this issue is not caused by tests.

BTW, there is a similar bug 1859530 for Bionic.

Po-Hsu Lin (cypressyew)
tags: added: eoan s390x
tags: added: 5.3
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1860421

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Po-Hsu Lin (cypressyew) wrote : Re: Sometime the Eoan s390x LPAR can't get an IP address

dmesg output when there is no network connection.

summary: - Sometime the Eoan s390x LPAR can't get an IP address
+ Sometime the Eoan s390x LPAR can't get an IP address after reboot
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

ip addr output when the network is good:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: encc000: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 2a:55:61:1d:aa:38 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::2855:61ff:fe1d:aa38/64 scope link
       valid_lft forever preferred_lft forever
3: enP1p0s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 82:0a:2d:0c:b8:70 brd ff:ff:ff:ff:ff:ff
4: enP1p0s0d1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 82:0a:2d:0c:b8:71 brd ff:ff:ff:ff:ff:ff
5: enP2p0s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 82:0a:2d:0c:b7:00 brd ff:ff:ff:ff:ff:ff
6: enP2p0s0d1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 82:0a:2d:0c:b7:01 brd ff:ff:ff:ff:ff:ff
7: encc000.2586@encc000: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 2a:55:61:1d:aa:38 brd ff:ff:ff:ff:ff:ff
    inet 10.245.80.42/22 brd 10.245.83.255 scope global encc000.2586
       valid_lft forever preferred_lft forever
    inet6 fe80::2855:61ff:fe1d:aa38/64 scope link
       valid_lft forever preferred_lft forever
8: lxcbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:16:3e:00:00:00 brd ff:ff:ff:ff:ff:ff
    inet 10.0.3.1/24 scope global lxcbr0
       valid_lft forever preferred_lft forever
9: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 52:54:00:4c:f5:88 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
       valid_lft forever preferred_lft forever
10: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master virbr0 state DOWN group default qlen 1000
    link/ether 52:54:00:4c:f5:88 brd ff:ff:ff:ff:ff:ff

The difference is here:
7: encc000.2586@encc000: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 2a:55:61:1d:aa:38 brd ff:ff:ff:ff:ff:ff
    inet 10.245.80.42/22 brd 10.245.83.255 scope global encc000.2586
       valid_lft forever preferred_lft forever
    inet6 fe80::2855:61ff:fe1d:aa38/64 scope link
       valid_lft forever preferred_lft forever

Po-Hsu Lin (cypressyew)
description: updated
tags: added: sru-20200106
description: updated
Sean Feole (sfeole)
Changed in ubuntu-kernel-tests:
status: New → Triaged
assignee: nobody → Sean Feole (sfeole)
Revision history for this message
Sean Feole (sfeole) wrote :

The correct network device to be used for all testing should be encc000.2586 , verified
The netplan yaml is correct, I don't see problems with it. It will explicitly tell the host to use the static IPv4 address

I need logs at the time of the failure, unfortunately I feel that resetting the lpar did fix the network, but also erased any useful information pertaining to why the network did not come up.
We will need some additional information at the time of the failure. I've restarted the testing for Focal S2lp4 and see if we can reproduce.

Revision history for this message
Sean Feole (sfeole) wrote :

errrr i meant Eoan, s390x s2lp4, not focal.. sorry

Revision history for this message
Sean Feole (sfeole) wrote :

I was able to reboot the lpar a number of 10 times without seeing an interrupt on the network.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

This issue hits me again with this SRU cycle (5.3.0-41.33) on the same Eoan s2lp4 node.
I had to manually reboot it for SRU tests.

tags: added: sru-20200217
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Eoan LPAR upgraded to Focal, thus I am closing this bug.
Thanks

Changed in ubuntu-kernel-tests:
status: Triaged → Invalid
assignee: Sean Feole (sfeole) → nobody
Changed in linux (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.