Oracle cosmic image does not find broadcom network device in Shape VMStandard2.1

Bug #1790652 reported by Scott Moser on 2018-09-04
22
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Seth Forshee
Cosmic
High
Seth Forshee

Bug Description

I tried to register and boot a cosmic image to verify new changes in it and in cloud-init.
The image failed to bring up networking in the initramfs, and thus failed to find iscsi root.
this could be user error.

Here is what I did to publish the image.

 - use oci build tool [1].
   following
   https://docs.cloud.oracle.com/iaas/Content/Compute/Tasks/imageimportexport.htm#ImportinganImage
 - Download a livefs build from cloudware
   https://launchpad.net/~cloudware/+livefs/ubuntu/cosmic/cpc/
   example: livecd.ubuntu-cpc.oracle_bare_metal.img
   My image had version 20180821.1

 - oci os bucket create --name=smoser-devel
 - oci os object put \
      --parallel-upload-count=4 \
      --part-size=10 \
      --bucket-name=smoser-devel \
      --file=/tmp/livecd.ubuntu-cpc.oracle_bare_metal.img \
      --name=cosmic-20180821.1.img

 - import the object
    $ oci compute image import from-object \
        --display-name=smoser-cosmic-20180821.1.img \
        --launch-mode=NATIVE \
        --namespace=intcanonical \
        --bucket-name=smoser-devel \
        --name=cosmic-20180821.1.img \
        --source-image-type=QCOW2

Then I launched from the web UI a VM.Standard2.1.

--
 https://docs.cloud.oracle.com/iaas/Content/API/Concepts/cliconcepts.htm

CVE References

Scott Moser (smoser) wrote :

I built and uploaded an image with cosmic-proposed enabled to get the 4.18.0-7-generic kernel.
Note that I'm fairly sure I also got linux-firmware at 1.175 (cosmic-proposed version). But if that didn't make it into the initramfs (I'm assuming that some linux-firmware files must make it into the initramfs) then that would be 1.174.

This is some information collected from the initramfs there.

affects: ubuntu → linux (Ubuntu)
Changed in linux (Ubuntu):
importance: Undecided → High
status: New → Confirmed
Scott Moser (smoser) wrote :

I updated the description to clarify. This issue only occurs on VMStandard2.1 size. The image I uploaded works for VMStandard1.1 (which gets a different NIC).

summary: - Oracle cosmic image does not find broadcom network device in initramfs
+ Oracle cosmic image does not find broadcom network device in Shape
+ VMStandard2.1
tags: added: cosmic kernel-da-key
tags: added: kernel-key
removed: kernel-da-key
Changed in linux (Ubuntu Cosmic):
status: Confirmed → Triaged
Seth Forshee (sforshee) wrote :

Just an update, I sent a report of the bug upstream over a week ago but haven't received any response.

Robert C Jennings (rcj) wrote :

Copied from Seth's upstream mailing list posting
https://www.spinics.net/lists/netdev/msg521428.html

This is with a kernel based on 4.18.5, and it has also been seen with a
4.17-based kernel. I'm not currently aware of any working kernel
version. The driver seems to be getting an error response from the
firmware when trying to set the MAC address.

[ 2.437420] Broadcom NetXtreme-C/E driver bnxt_en v1.9.1
[ 2.449820] bnxt_en 0000:00:03.0 (unnamed net_device) (uninitialized): hwrm req_type 0xf seq id 0x5 error 0xffff
[ 2.455610] bnxt_en 0000:00:03.0 (unnamed net_device) (uninitialized): VF MAC address 00:00:17:02:05:d0 not approved by the PF
[ 2.461443] bnxt_en 0000:00:03.0: Unable to initialize mac address.
[ 2.483531] bnxt_en: probe of 0000:00:03.0 failed with error -99

Posting here in case @pgraydon-oracle has seen this.

Paul Graydon (pgraydon-oracle) wrote :

I haven't specifically seen that one, but I'll check in with both the Oracle Linux team and our Hypervisor team.

Paul Graydon (pgraydon-oracle) wrote :

I've been able to replicate the situation with a few different distributions. It seems to only occur with VMs. When I tried 4.18.7 on a bare metal instance, there was no problem.

We believe we've isolated the kernel commit that is introducing the problem to 707e7e96602675beb5e09bb994195663da6eb56d

Scott Moser (smoser) wrote :

@Seth,
I'm confused by:
"I'm not currently aware of any working kernel version."

Ubuntu 18.04 kernels (4.15) work correctly.

Seth Forshee (sforshee) wrote :

@Scott: I've put a test build with the fix at http://people.canonical.com/~sforshee/lp1790652/. Can you test it and confirm the fix? Thanks.

Seth Forshee (sforshee) wrote :

Still waiting on testing, but the patch has been added to the upstream stable queue so I went ahead and applied it for cosmic as well. Would still appreciate some testing of the kernel I posted so we can be sure of the fix.

Changed in linux (Ubuntu Cosmic):
assignee: nobody → Seth Forshee (sforshee)
status: Triaged → Fix Committed
Robert C Jennings (rcj) wrote :

Seth,

I've installed your kernel in an image that wasn't booting on the instance type and validated that the test kernel from comment #12 does fix boot.

$ sudo ethtool -i ens3
driver: bnxt_en
version: 1.9.1
firmware-version: 20.8.172.0/pkg 20.8.29.0
expansion-rom-version:
bus-info: 0000:00:03.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

$ uname -a
Linux rcj-lp1790652-21 4.18.0-8-generic #9+lp1790652v201809170859 SMP Mon Sep 17 07:01:34 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

$ dmesg|grep bnx
[ 2.366526] Broadcom NetXtreme-C/E driver bnxt_en v1.9.1
[ 2.384175] bnxt_en 0000:00:03.0 (unnamed net_device) (uninitialized): hwrm req_type 0xf seq id 0x5 error 0xffff
[ 2.394757] bnxt_en 0000:00:03.0 eth0: Broadcom NetXtreme-E Ethernet Virtual Function found at mem 2000100000, node addr 00:00:17:02:42:58
[ 2.401353] bnxt_en 0000:00:03.0: 0.000 Gb/s available PCIe bandwidth, limited by Unknown speed x0 link at 0000:00:03.0 (capable of 63.008 Gb/s with 8 GT/s x8 link)
[ 2.474211] bnxt_en 0000:00:03.0 ens3: renamed from eth0
[ 4.379763] bnxt_en 0000:00:03.0 ens3: NIC Link is Up, 25000 Mbps full duplex, Flow control: none
[ 4.382846] bnxt_en 0000:00:03.0 ens3: FEC autoneg off encodings: None

Launchpad Janitor (janitor) wrote :
Download full text (60.2 KiB)

This bug was fixed in the package linux - 4.18.0-9.10

---------------
linux (4.18.0-9.10) cosmic; urgency=medium

  * linux: 4.18.0-9.10 -proposed tracker (LP: #1796346)

  * Cosmic update: v4.18.12 upstream stable release (LP: #1796139)
    - crypto: skcipher - Fix -Wstringop-truncation warnings
    - iio: adc: ina2xx: avoid kthread_stop() with stale task_struct
    - tsl2550: fix lux1_input error in low light
    - misc: ibmvmc: Use GFP_ATOMIC under spin lock
    - vmci: type promotion bug in qp_host_get_user_memory()
    - siox: don't create a thread without starting it
    - x86/numa_emulation: Fix emulated-to-physical node mapping
    - staging: rts5208: fix missing error check on call to rtsx_write_register
    - power: supply: axp288_charger: Fix initial constant_charge_current value
    - misc: sram: enable clock before registering regions
    - serial: sh-sci: Stop RX FIFO timer during port shutdown
    - uwb: hwa-rc: fix memory leak at probe
    - power: vexpress: fix corruption in notifier registration
    - iommu/amd: make sure TLB to be flushed before IOVA freed
    - Bluetooth: Add a new Realtek 8723DE ID 0bda:b009
    - USB: serial: kobil_sct: fix modem-status error handling
    - 6lowpan: iphc: reset mac_header after decompress to fix panic
    - iommu/msm: Don't call iommu_device_{,un}link from atomic context
    - s390/mm: correct allocate_pgste proc_handler callback
    - power: remove possible deadlock when unregistering power_supply
    - drm/amd/display/dc/dce: Fix multiple potential integer overflows
    - drm/amd/display: fix use of uninitialized memory
    - md-cluster: clear another node's suspend_area after the copy is finished
    - cxgb4: Fix the condition to check if the card is T5
    - RDMA/bnxt_re: Fix a couple off by one bugs
    - RDMA/i40w: Hold read semaphore while looking after VMA
    - RDMA/bnxt_re: Fix a bunch of off by one bugs in qplib_fp.c
    - IB/core: type promotion bug in rdma_rw_init_one_mr()
    - media: exynos4-is: Prevent NULL pointer dereference in __isp_video_try_fmt()
    - IB/mlx4: Test port number before querying type.
    - powerpc/kdump: Handle crashkernel memory reservation failure
    - media: fsl-viu: fix error handling in viu_of_probe()
    - vhost_net: Avoid tx vring kicks during busyloop
    - media: staging/imx: fill vb2_v4l2_buffer field entry
    - IB/mlx5: Fix GRE flow specification
    - include/rdma/opa_addr.h: Fix an endianness issue
    - x86/tsc: Add missing header to tsc_msr.c
    - ARM: hwmod: RTC: Don't assume lock/unlock will be called with irq enabled
    - x86/entry/64: Add two more instruction suffixes
    - ARM: dts: ls1021a: Add missing cooling device properties for CPUs
    - scsi: target/iscsi: Make iscsit_ta_authentication() respect the output
      buffer size
    - thermal: i.MX: Allow thermal probe to fail gracefully in case of bad
      calibration.
    - scsi: klist: Make it safe to use klists in atomic context
    - scsi: ibmvscsi: Improve strings handling
    - scsi: target: Avoid that EXTENDED COPY commands trigger lock inversion
    - usb: wusbcore: security: cast sizeof to int for comparison
    - ath10k: sdio: use same endpoint id for all packets...

Changed in linux (Ubuntu Cosmic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers