Oracle cosmic image does not find broadcom network device in Shape VMStandard2.1

Bug #1790652 reported by Scott Moser
22
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Seth Forshee
Cosmic
Fix Released
High
Seth Forshee

Bug Description

I tried to register and boot a cosmic image to verify new changes in it and in cloud-init.
The image failed to bring up networking in the initramfs, and thus failed to find iscsi root.
this could be user error.

Here is what I did to publish the image.

 - use oci build tool [1].
   following
   https://docs.cloud.oracle.com/iaas/Content/Compute/Tasks/imageimportexport.htm#ImportinganImage
 - Download a livefs build from cloudware
   https://launchpad.net/~cloudware/+livefs/ubuntu/cosmic/cpc/
   example: livecd.ubuntu-cpc.oracle_bare_metal.img
   My image had version 20180821.1

 - oci os bucket create --name=smoser-devel
 - oci os object put \
      --parallel-upload-count=4 \
      --part-size=10 \
      --bucket-name=smoser-devel \
      --file=/tmp/livecd.ubuntu-cpc.oracle_bare_metal.img \
      --name=cosmic-20180821.1.img

 - import the object
    $ oci compute image import from-object \
        --display-name=smoser-cosmic-20180821.1.img \
        --launch-mode=NATIVE \
        --namespace=intcanonical \
        --bucket-name=smoser-devel \
        --name=cosmic-20180821.1.img \
        --source-image-type=QCOW2

Then I launched from the web UI a VM.Standard2.1.

--
 https://docs.cloud.oracle.com/iaas/Content/API/Concepts/cliconcepts.htm

CVE References

Revision history for this message
Scott Moser (smoser) wrote :
Revision history for this message
Scott Moser (smoser) wrote :

I built and uploaded an image with cosmic-proposed enabled to get the 4.18.0-7-generic kernel.
Note that I'm fairly sure I also got linux-firmware at 1.175 (cosmic-proposed version). But if that didn't make it into the initramfs (I'm assuming that some linux-firmware files must make it into the initramfs) then that would be 1.174.

This is some information collected from the initramfs there.

affects: ubuntu → linux (Ubuntu)
Changed in linux (Ubuntu):
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Scott Moser (smoser) wrote :

I updated the description to clarify. This issue only occurs on VMStandard2.1 size. The image I uploaded works for VMStandard1.1 (which gets a different NIC).

summary: - Oracle cosmic image does not find broadcom network device in initramfs
+ Oracle cosmic image does not find broadcom network device in Shape
+ VMStandard2.1
Revision history for this message
Scott Moser (smoser) wrote :
tags: added: cosmic kernel-da-key
tags: added: kernel-key
removed: kernel-da-key
Changed in linux (Ubuntu Cosmic):
status: Confirmed → Triaged
Revision history for this message
Seth Forshee (sforshee) wrote :

Just an update, I sent a report of the bug upstream over a week ago but haven't received any response.

Revision history for this message
Robert C Jennings (rcj) wrote :

Copied from Seth's upstream mailing list posting
https://www.spinics.net/lists/netdev/msg521428.html

This is with a kernel based on 4.18.5, and it has also been seen with a
4.17-based kernel. I'm not currently aware of any working kernel
version. The driver seems to be getting an error response from the
firmware when trying to set the MAC address.

[ 2.437420] Broadcom NetXtreme-C/E driver bnxt_en v1.9.1
[ 2.449820] bnxt_en 0000:00:03.0 (unnamed net_device) (uninitialized): hwrm req_type 0xf seq id 0x5 error 0xffff
[ 2.455610] bnxt_en 0000:00:03.0 (unnamed net_device) (uninitialized): VF MAC address 00:00:17:02:05:d0 not approved by the PF
[ 2.461443] bnxt_en 0000:00:03.0: Unable to initialize mac address.
[ 2.483531] bnxt_en: probe of 0000:00:03.0 failed with error -99

Posting here in case @pgraydon-oracle has seen this.

Revision history for this message
Paul Graydon (pgraydon-oracle) wrote :

I haven't specifically seen that one, but I'll check in with both the Oracle Linux team and our Hypervisor team.

Revision history for this message
Paul Graydon (pgraydon-oracle) wrote :

I've been able to replicate the situation with a few different distributions. It seems to only occur with VMs. When I tried 4.18.7 on a bare metal instance, there was no problem.

We believe we've isolated the kernel commit that is introducing the problem to 707e7e96602675beb5e09bb994195663da6eb56d

Revision history for this message
Scott Moser (smoser) wrote :

@Seth,
I'm confused by:
"I'm not currently aware of any working kernel version."

Ubuntu 18.04 kernels (4.15) work correctly.

Revision history for this message
Scott Moser (smoser) wrote :
Revision history for this message
Paul Graydon (pgraydon-oracle) wrote :
Revision history for this message
Seth Forshee (sforshee) wrote :

@Scott: I've put a test build with the fix at http://people.canonical.com/~sforshee/lp1790652/. Can you test it and confirm the fix? Thanks.

Revision history for this message
Seth Forshee (sforshee) wrote :

Still waiting on testing, but the patch has been added to the upstream stable queue so I went ahead and applied it for cosmic as well. Would still appreciate some testing of the kernel I posted so we can be sure of the fix.

Changed in linux (Ubuntu Cosmic):
assignee: nobody → Seth Forshee (sforshee)
status: Triaged → Fix Committed
Revision history for this message
Robert C Jennings (rcj) wrote :

Seth,

I've installed your kernel in an image that wasn't booting on the instance type and validated that the test kernel from comment #12 does fix boot.

$ sudo ethtool -i ens3
driver: bnxt_en
version: 1.9.1
firmware-version: 20.8.172.0/pkg 20.8.29.0
expansion-rom-version:
bus-info: 0000:00:03.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

$ uname -a
Linux rcj-lp1790652-21 4.18.0-8-generic #9+lp1790652v201809170859 SMP Mon Sep 17 07:01:34 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

$ dmesg|grep bnx
[ 2.366526] Broadcom NetXtreme-C/E driver bnxt_en v1.9.1
[ 2.384175] bnxt_en 0000:00:03.0 (unnamed net_device) (uninitialized): hwrm req_type 0xf seq id 0x5 error 0xffff
[ 2.394757] bnxt_en 0000:00:03.0 eth0: Broadcom NetXtreme-E Ethernet Virtual Function found at mem 2000100000, node addr 00:00:17:02:42:58
[ 2.401353] bnxt_en 0000:00:03.0: 0.000 Gb/s available PCIe bandwidth, limited by Unknown speed x0 link at 0000:00:03.0 (capable of 63.008 Gb/s with 8 GT/s x8 link)
[ 2.474211] bnxt_en 0000:00:03.0 ens3: renamed from eth0
[ 4.379763] bnxt_en 0000:00:03.0 ens3: NIC Link is Up, 25000 Mbps full duplex, Flow control: none
[ 4.382846] bnxt_en 0000:00:03.0 ens3: FEC autoneg off encodings: None

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (60.2 KiB)

This bug was fixed in the package linux - 4.18.0-9.10

---------------
linux (4.18.0-9.10) cosmic; urgency=medium

  * linux: 4.18.0-9.10 -proposed tracker (LP: #1796346)

  * Cosmic update: v4.18.12 upstream stable release (LP: #1796139)
    - crypto: skcipher - Fix -Wstringop-truncation warnings
    - iio: adc: ina2xx: avoid kthread_stop() with stale task_struct
    - tsl2550: fix lux1_input error in low light
    - misc: ibmvmc: Use GFP_ATOMIC under spin lock
    - vmci: type promotion bug in qp_host_get_user_memory()
    - siox: don't create a thread without starting it
    - x86/numa_emulation: Fix emulated-to-physical node mapping
    - staging: rts5208: fix missing error check on call to rtsx_write_register
    - power: supply: axp288_charger: Fix initial constant_charge_current value
    - misc: sram: enable clock before registering regions
    - serial: sh-sci: Stop RX FIFO timer during port shutdown
    - uwb: hwa-rc: fix memory leak at probe
    - power: vexpress: fix corruption in notifier registration
    - iommu/amd: make sure TLB to be flushed before IOVA freed
    - Bluetooth: Add a new Realtek 8723DE ID 0bda:b009
    - USB: serial: kobil_sct: fix modem-status error handling
    - 6lowpan: iphc: reset mac_header after decompress to fix panic
    - iommu/msm: Don't call iommu_device_{,un}link from atomic context
    - s390/mm: correct allocate_pgste proc_handler callback
    - power: remove possible deadlock when unregistering power_supply
    - drm/amd/display/dc/dce: Fix multiple potential integer overflows
    - drm/amd/display: fix use of uninitialized memory
    - md-cluster: clear another node's suspend_area after the copy is finished
    - cxgb4: Fix the condition to check if the card is T5
    - RDMA/bnxt_re: Fix a couple off by one bugs
    - RDMA/i40w: Hold read semaphore while looking after VMA
    - RDMA/bnxt_re: Fix a bunch of off by one bugs in qplib_fp.c
    - IB/core: type promotion bug in rdma_rw_init_one_mr()
    - media: exynos4-is: Prevent NULL pointer dereference in __isp_video_try_fmt()
    - IB/mlx4: Test port number before querying type.
    - powerpc/kdump: Handle crashkernel memory reservation failure
    - media: fsl-viu: fix error handling in viu_of_probe()
    - vhost_net: Avoid tx vring kicks during busyloop
    - media: staging/imx: fill vb2_v4l2_buffer field entry
    - IB/mlx5: Fix GRE flow specification
    - include/rdma/opa_addr.h: Fix an endianness issue
    - x86/tsc: Add missing header to tsc_msr.c
    - ARM: hwmod: RTC: Don't assume lock/unlock will be called with irq enabled
    - x86/entry/64: Add two more instruction suffixes
    - ARM: dts: ls1021a: Add missing cooling device properties for CPUs
    - scsi: target/iscsi: Make iscsit_ta_authentication() respect the output
      buffer size
    - thermal: i.MX: Allow thermal probe to fail gracefully in case of bad
      calibration.
    - scsi: klist: Make it safe to use klists in atomic context
    - scsi: ibmvscsi: Improve strings handling
    - scsi: target: Avoid that EXTENDED COPY commands trigger lock inversion
    - usb: wusbcore: security: cast sizeof to int for comparison
    - ath10k: sdio: use same endpoint id for all packets...

Changed in linux (Ubuntu Cosmic):
status: Fix Committed → Fix Released
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Andy Whitcroft (apw)
tags: added: kernel-fixup-verification-needed-bionic
removed: verification-needed-bionic
Brad Figg (brad-figg)
tags: added: verification-needed-bionic
Revision history for this message
Andy Whitcroft (apw) wrote :

This bug was erroneously marked for verification in bionic; verification is not required and verification-needed-bionic is being removed.

tags: removed: verification-needed-bionic
tags: added: verification-done-bionic
Revision history for this message
Paul Graydon (pgraydon-oracle) wrote :

I'm confused. Do you need verification or not? Cosmic is not specifically supported on our platform, and there are no plans at the moment to support non-LTS releases that I know of. I can certainly test this if needs be, though.

Brad Figg (brad-figg)
tags: added: cscc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.