problems with network speed reporting in sysfs on s390

Bug #1572347 reported by Jeff Lane on 2016-04-20
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned

Bug Description

On an amd64 system with a 1Gb NIC, I can examine /sys/class/net/<DEVICE> for various bits of data. In this case, we need link speed.

For example:

bladernr@galactica:~/Datacenters/Home$ cat /sys/class/net/enp2s0/speed
1000

However, on my z/VM instance of Xenial on s390, the reported speed is WAY below what it actually is:
hwe@hwe-zvm1:/sys/class/net/enc600$ cat speed
10

iperf confirms that this virtual device is actually functioning at 1Gb speeds. This is causing failures in the cert networking tests that probe for reported link speed and then compare that to reported maximum speed.

On a zKVM instance of Xenial, this is even worse as I am completely unable to examine /sys/class/net/<DEVICE>/speed at all:

hwe@s1lp9g003:~$ cat /sys/class/net/eth0/speed
cat: /sys/class/net/eth0/speed: Invalid argument
hwe@s1lp9g003:/sys/class/net/eth0$ ll
total 0
drwxr-xr-x 5 root root 0 Apr 19 20:02 ./
drwxr-xr-x 3 root root 0 Apr 19 20:02 ../
-r--r--r-- 1 root root 4096 Apr 19 20:02 addr_assign_type
-r--r--r-- 1 root root 4096 Apr 19 20:02 addr_len
-r--r--r-- 1 root root 4096 Apr 19 20:02 address
-r--r--r-- 1 root root 4096 Apr 19 20:02 broadcast
-rw-r--r-- 1 root root 4096 Apr 19 20:02 carrier
-r--r--r-- 1 root root 4096 Apr 19 20:02 carrier_changes
-r--r--r-- 1 root root 4096 Apr 19 20:02 dev_id
-r--r--r-- 1 root root 4096 Apr 19 20:02 dev_port
lrwxrwxrwx 1 root root 0 Apr 19 20:02 device -> ../../../virtio1/
-r--r--r-- 1 root root 4096 Apr 19 20:02 dormant
-r--r--r-- 1 root root 4096 Apr 19 20:02 duplex
-rw-r--r-- 1 root root 4096 Apr 19 20:02 flags
-rw-r--r-- 1 root root 4096 Apr 19 20:02 gro_flush_timeout
-rw-r--r-- 1 root root 4096 Apr 19 20:02 ifalias
-r--r--r-- 1 root root 4096 Apr 19 20:02 ifindex
-r--r--r-- 1 root root 4096 Apr 19 20:02 iflink
-r--r--r-- 1 root root 4096 Apr 19 20:02 link_mode
-rw-r--r-- 1 root root 4096 Apr 19 20:02 mtu
-r--r--r-- 1 root root 4096 Apr 19 20:02 name_assign_type
-rw-r--r-- 1 root root 4096 Apr 19 20:02 netdev_group
-r--r--r-- 1 root root 4096 Apr 19 20:02 operstate
-r--r--r-- 1 root root 4096 Apr 19 20:02 phys_port_id
-r--r--r-- 1 root root 4096 Apr 19 20:02 phys_port_name
-r--r--r-- 1 root root 4096 Apr 19 20:02 phys_switch_id
drwxr-xr-x 2 root root 0 Apr 19 20:02 power/
-rw-r--r-- 1 root root 4096 Apr 19 20:02 proto_down
drwxr-xr-x 4 root root 0 Apr 19 20:02 queues/
-r--r--r-- 1 root root 4096 Apr 19 20:02 speed
drwxr-xr-x 2 root root 0 Apr 19 20:02 statistics/
lrwxrwxrwx 1 root root 0 Apr 19 20:02 subsystem -> ../../../../../../../class/net/
-rw-r--r-- 1 root root 4096 Apr 19 20:02 tx_queue_len
-r--r--r-- 1 root root 4096 Apr 19 20:02 type
-rw-r--r-- 1 root root 4096 Apr 19 20:02 uevent
hwe@s1lp9g003:/sys/class/net/eth0$ sudo cat speed
cat: speed: Invalid argument
hwe@s1lp9g003:/sys/class/net/eth0$ sudo su
root@s1lp9g003:/sys/devices/css0/0.0.0001/0.0.0001/virtio1/net/eth0# cat speed
cat: speed: Invalid argument

So on zKVM even root is not able to view the current link speed.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.4.0-18-generic 4.4.0-18.34
ProcVersionSignature: Ubuntu 4.4.0-18.34-generic 4.4.6
Uname: Linux 4.4.0-18-generic s390x
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access '/dev/snd/': No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.1-0ubuntu2
Architecture: s390x
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found.
Date: Tue Apr 19 22:58:52 2016
HibernationDevice: RESUME=UUID=7ddabd20-1d15-492c-bcd5-ec2dbb16b777
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lspci:

Lsusb: Error: command ['lsusb'] failed with exit code 1:
PciMultimedia:

ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: Error: [Errno 2] No such file or directory: '/proc/fb'
ProcKernelCmdLine: root=UUID=b732515f-dacf-4007-85af-a27b5bf48fdf crashkernel=196M BOOT_IMAGE=0
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-18-generic N/A
 linux-backports-modules-4.4.0-18-generic N/A
 linux-firmware 1.157
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
---
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access '/dev/snd/': No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.20.1-0ubuntu2
Architecture: s390x
ArecordDevices: Error: [Errno 2] No such file or directory
CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found.
DistroRelease: Ubuntu 16.04
HibernationDevice: RESUME=UUID=7ddabd20-1d15-492c-bcd5-ec2dbb16b777
IwConfig: Error: [Errno 2] No such file or directory
Lspci:

Lsusb: Error: command ['lsusb'] failed with exit code 1:
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: Error: [Errno 2] No such file or directory: '/proc/fb'
ProcKernelCmdLine: root=UUID=b732515f-dacf-4007-85af-a27b5bf48fdf crashkernel=196M BOOT_IMAGE=0
ProcVersionSignature: Ubuntu 4.4.0-18.34-generic 4.4.6
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-18-generic N/A
 linux-backports-modules-4.4.0-18-generic N/A
 linux-firmware 1.157
RfKill: Error: [Errno 2] No such file or directory
Tags: xenial
Uname: Linux 4.4.0-18-generic s390x
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom cpacfstats dip lpadmin lxd plugdev sambashare sudo
_MarkForUpload: True

Jeff Lane (bladernr) wrote :

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1572347

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete

apport information

tags: added: apport-collected
description: updated

apport information

apport information

apport information

apport information

apport information

apport information

apport information

Jeff Lane (bladernr) wrote :

I don't understand... why am I having to run apport-collect? Why didnt ubuntu-bug submit all the logs?

Changed in linux (Ubuntu):
status: Incomplete → New
Jeff Lane (bladernr) wrote :

Looks to me like apport-collect just submitted the exact same logs as ubuntu-bug did. Your bot may be broken...

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1572347

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Jeff Lane (bladernr) wrote :

Whoops, set it to New, not Confirmed, that triggered your bot again.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Dimitri John Ledkov (xnox) wrote :

1) Note directories in /sys/class/net are actually symlinks to /sys/devices, and that you are looking at qeth kernel devices
2) On an LPAR, it's a real connection to a real card and I can see 10000 connection
3) on z/KVM qeth is on virtio bus, thus it's all virtual. Only the host will know what the actual link speed is
4) similar on the z/VM only the host LPAR will know the real link speed and how it's shaped / limited by z/VM hypervisor for each guest

My expectation is that physical links are only validated on physical devices as connected to an LPAR, and not on virtualised devices as visible on KVM guest and z/VM guest.

Dimitri John Ledkov (xnox) wrote :

One can attempt to do:
$ cat /sys/class/net/encc000/device/card_type
in some modes it will tell you if the card is real or not, e.g.:
"OSD_10GIG"
is real card on an lpar.

on KVM that file doesn't exist as device is a symlink to "virtio0".

on z/VM it has:
"Virt.NIC QDIO"

Which is well, a fake virtualised card that can be anything.

I would recommend to verify link speeds for an interface on s390x if:
1) /sys/class/net/<interface>/device/card_type exists
2) does not start with Virt
3) does not have "unknown"
4) does not have HiperSockets
5) starts with OS -> which should catch all known physical cards e.g. OSD_100/1000/10GIG/FE_LATE/GbE_NAME/ATM_LANE/Express; OSN; OSM_1000; OSX_10GIG

Download full text (8.8 KiB)

Validating the speed is a part of the network testing. Network tests
don't run if the link speed is less than advertised max. E.g. if
someone plugs a 10G card into a 1G network segment, the test will not
run because it's pointless to test a 10G device at 1/10th the
supported max speed.

This is not a problem on x86 inside KVM, nor is it a problem on
PowerKVM, or PowerVM which are similar (admittedly different but
similar) scenarios. We have run this successfully in every one of
those scenarios, s390x should be no different here. This is not on
parity with other platforms.

On Wed, Apr 20, 2016 at 6:29 AM, Dimitri John Ledkov
<email address hidden> wrote:
> One can attempt to do:
> $ cat /sys/class/net/encc000/device/card_type
> in some modes it will tell you if the card is real or not, e.g.:
> "OSD_10GIG"
> is real card on an lpar.
>
> on KVM that file doesn't exist as device is a symlink to "virtio0".
>
> on z/VM it has:
> "Virt.NIC QDIO"
>
> Which is well, a fake virtualised card that can be anything.
>
> I would recommend to verify link speeds for an interface on s390x if:
> 1) /sys/class/net/<interface>/device/card_type exists
> 2) does not start with Virt
> 3) does not have "unknown"
> 4) does not have HiperSockets
> 5) starts with OS -> which should catch all known physical cards e.g. OSD_100/1000/10GIG/FE_LATE/GbE_NAME/ATM_LANE/Express; OSN; OSM_1000; OSX_10GIG
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1572347
>
> Title:
> problems with network speed reporting in sysfs on s390
>
> Status in linux package in Ubuntu:
> Confirmed
>
> Bug description:
> On an amd64 system with a 1Gb NIC, I can examine
> /sys/class/net/<DEVICE> for various bits of data. In this case, we
> need link speed.
>
> For example:
>
> bladernr@galactica:~/Datacenters/Home$ cat /sys/class/net/enp2s0/speed
> 1000
>
> However, on my z/VM instance of Xenial on s390, the reported speed is WAY below what it actually is:
> hwe@hwe-zvm1:/sys/class/net/enc600$ cat speed
> 10
>
> iperf confirms that this virtual device is actually functioning at 1Gb
> speeds. This is causing failures in the cert networking tests that
> probe for reported link speed and then compare that to reported
> maximum speed.
>
> On a zKVM instance of Xenial, this is even worse as I am completely
> unable to examine /sys/class/net/<DEVICE>/speed at all:
>
> hwe@s1lp9g003:~$ cat /sys/class/net/eth0/speed
> cat: /sys/class/net/eth0/speed: Invalid argument
> hwe@s1lp9g003:/sys/class/net/eth0$ ll
> total 0
> drwxr-xr-x 5 root root 0 Apr 19 20:02 ./
> drwxr-xr-x 3 root root 0 Apr 19 20:02 ../
> -r--r--r-- 1 root root 4096 Apr 19 20:02 addr_assign_type
> -r--r--r-- 1 root root 4096 Apr 19 20:02 addr_len
> -r--r--r-- 1 root root 4096 Apr 19 20:02 address
> -r--r--r-- 1 root root 4096 Apr 19 20:02 broadcast
> -rw-r--r-- 1 root root 4096 Apr 19 20:02 carrier
> -r--r--r-- 1 root root 4096 Apr 19 20:02 carrier_changes
> -r--r--r-- 1 root root 4096 Apr 19 20:02 dev_id
> -r--r--r-- 1 root root 4096 Apr 19 20:02 dev_port
> lrwxrwxrwx 1 root root 0 Apr 19...

Read more...

Changed in linux (Ubuntu):
importance: Undecided → Medium
Dimitri John Ledkov (xnox) wrote :
Download full text (9.7 KiB)

Right, but the bug report is incomprehensible at the moment. And the
device drivers involved here are different. What is happening is that
systems are running at higher than advertised max.

Where is the code for these checks? Can I clone and run them myself to
debug what is happening?

Regards,

Dimitri.

On 20 April 2016 at 14:59, Jeff Lane <email address hidden> wrote:
> Validating the speed is a part of the network testing. Network tests
> don't run if the link speed is less than advertised max. E.g. if
> someone plugs a 10G card into a 1G network segment, the test will not
> run because it's pointless to test a 10G device at 1/10th the
> supported max speed.
>
> This is not a problem on x86 inside KVM, nor is it a problem on
> PowerKVM, or PowerVM which are similar (admittedly different but
> similar) scenarios. We have run this successfully in every one of
> those scenarios, s390x should be no different here. This is not on
> parity with other platforms.
>
>
> On Wed, Apr 20, 2016 at 6:29 AM, Dimitri John Ledkov
> <email address hidden> wrote:
>> One can attempt to do:
>> $ cat /sys/class/net/encc000/device/card_type
>> in some modes it will tell you if the card is real or not, e.g.:
>> "OSD_10GIG"
>> is real card on an lpar.
>>
>> on KVM that file doesn't exist as device is a symlink to "virtio0".
>>
>> on z/VM it has:
>> "Virt.NIC QDIO"
>>
>> Which is well, a fake virtualised card that can be anything.
>>
>> I would recommend to verify link speeds for an interface on s390x if:
>> 1) /sys/class/net/<interface>/device/card_type exists
>> 2) does not start with Virt
>> 3) does not have "unknown"
>> 4) does not have HiperSockets
>> 5) starts with OS -> which should catch all known physical cards e.g. OSD_100/1000/10GIG/FE_LATE/GbE_NAME/ATM_LANE/Express; OSN; OSM_1000; OSX_10GIG
>>
>> --
>> You received this bug notification because you are subscribed to the bug
>> report.
>> https://bugs.launchpad.net/bugs/1572347
>>
>> Title:
>> problems with network speed reporting in sysfs on s390
>>
>> Status in linux package in Ubuntu:
>> Confirmed
>>
>> Bug description:
>> On an amd64 system with a 1Gb NIC, I can examine
>> /sys/class/net/<DEVICE> for various bits of data. In this case, we
>> need link speed.
>>
>> For example:
>>
>> bladernr@galactica:~/Datacenters/Home$ cat /sys/class/net/enp2s0/speed
>> 1000
>>
>> However, on my z/VM instance of Xenial on s390, the reported speed is WAY below what it actually is:
>> hwe@hwe-zvm1:/sys/class/net/enc600$ cat speed
>> 10
>>
>> iperf confirms that this virtual device is actually functioning at 1Gb
>> speeds. This is causing failures in the cert networking tests that
>> probe for reported link speed and then compare that to reported
>> maximum speed.
>>
>> On a zKVM instance of Xenial, this is even worse as I am completely
>> unable to examine /sys/class/net/<DEVICE>/speed at all:
>>
>> hwe@s1lp9g003:~$ cat /sys/class/net/eth0/speed
>> cat: /sys/class/net/eth0/speed: Invalid argument
>> hwe@s1lp9g003:/sys/class/net/eth0$ ll
>> total 0
>> drwxr-xr-x 5 root root 0 Apr 19 20:02 ./
>> drwxr-xr-x 3 root root 0 Apr 19 20:02 ../
>> -r--r--r--...

Read more...

Dimitri John Ledkov (xnox) wrote :

For LPAR -> it's a valid test to verity max speed and actual speeds
For z/VM -> the cards are virtualised on the host, thus any test of max & actual speeds is bogus
For z/KVM -> the cards are virtualised on the host, thus any test of max & actual speeds is bogus
Comparing z/VM and z/KVM with other architectures is also bogus, as completely different kernel drivers are used to implement the cards (qeth driver on a CCW bus and CCW-virtio bus, CCW buses are entirely s390x specific)

At best this is a wishlist bug report against upstream z/VM & upstream qemu to expose something more realistic in the guest, which kernel then can pick up and display. What kernel is showing to you, is as much as it knows, and there is no bug there as far as I can tell.

tags: added: kernel-da-key
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers