BUG() inside megaraid (megaraid_sas_fusion) driver

Bug #1755160 reported by Rafael David Tinoco
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Opinion
Medium
Rafael David Tinoco

Bug Description

The following kernel trace (together with a dump) was brought to me:

...
[ 8650.749804] SLUB: Unable to allocate memory on node -1 (gfp=0x2080020)
[ 8650.749809] cache: kmalloc-64(2958:318bd9ccdb762575c670d0f17241a655a4d31b0358a6017686497cf166ea647e), object size: 64, buffer size: 64, default order: 0, min order: 0
[ 8650.749812] node 0: slabs: 83, objs: 5312, free: 0
[ 8650.749814] node 1: slabs: 59, objs: 3776, free: 0
[ 8650.749817] SLUB: Unable to allocate memory on node -1 (gfp=0x2080020)
[ 8650.749819] cache: kmalloc-64(2958:318bd9ccdb762575c670d0f17241a655a4d31b0358a6017686497cf166ea647e), object size: 64, buffer size: 64, default order: 0, min order: 0
[ 8650.749821] node 0: slabs: 83, objs: 5312, free: 0
[ 8650.749823] node 1: slabs: 59, objs: 3776, free: 0
[ 8650.749825] DMAR: Allocating 2-page iova for 0000:02:00.0 failed
[ 8650.756414] ------------[ cut here ]------------
[ 8650.761768] kernel BUG at /build/linux-HSAA8v/linux-4.4.0/drivers/scsi/megaraid/megaraid_sas_fusion.c:1452!
[ 8650.772638] invalid opcode: 0000 [#1] SMP
[ 8650.777226] Modules linked in: vport_gre ip_gre ip6_tables xt_set xt_multiport iptable_mangle iptable_raw ip_set_hash_ip ip_set_hash_net ip_set ipip tunnel4 ip_tunnel veth xt_statistic xt_nat xt_recent ipt_REJECT nf_reject_ipv4 xt_tcpudp gre openvswitch nf_defrag_ipv6 xt_comment xt_mark ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge overlay 8021q garp mrp stp llc bonding ipmi_ssif ipmi_devintf kvm_intel kvm irqbypass joydev input_leds ipmi_si ipmi_msghandler ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov
[ 8650.857496] async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear uas hid_generic ahci crct10dif_pclmul usb_storage crc32_pclmul ghash_clmulni_intel ixgbe igb aesni_intel usbhid aes_x86_64 dca mdio lrw vxlan gf128mul ip6_udp_tunnel glue_helper udp_tunnel ablk_helper ptp cryptd i2c_algo_bit libahci megaraid_sas pps_core hid scsi_dh_emc scsi_dh_rdac scsi_dh_alua dm_multipath
[ 8650.897195] CPU: 5 PID: 9720 Comm: in_tail.rb:276 Not tainted 4.4.0-112-generic #135-Ubuntu
[ 8650.906514] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.1.7 06/16/2016
[ 8650.915059] task: ffff881d91f88000 ti: ffff881d91f90000 task.ti: ffff881d91f90000
[ 8650.923409] RIP: 0010:[<ffffffffc00c80f4>] [<ffffffffc00c80f4>] megasas_build_io_fusion+0x354/0x540 [megaraid_sas]
[ 8650.935073] RSP: 0000:ffff881d91f93a50 EFLAGS: 00010282
[ 8650.941000] RAX: 00000000fffffff4 RBX: ffff881fec2c0580 RCX: 0000000000000000
[ 8650.948960] RDX: 00000000fffffff4 RSI: 0000000000000246 RDI: 0000000000000246
[ 8650.956922] RBP: ffff881d91f93aa8 R08: 0000000000000005 R09: 0000000000000891
[ 8650.964883] R10: 0000000000000000 R11: 0000000000000891 R12: ffff883d72a79080
[ 8650.972845] R13: ffff881fec2c0500 R14: 00000000fffffff4 R15: ffff881fea100000
[ 8650.980806] FS: 00007f0d2d8f9ab0(0000) GS:ffff883ffb680000(0000) knlGS:0000000000000000
[ 8650.989835] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8650.996246] CR2: 00007f0d335a6972 CR3: 0000003fec1fa000 CR4: 0000000000360670
[ 8651.004207] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8651.012168] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 8651.020129] Stack:
[ 8651.022372] ffff881d91f93a88 ffffffff815bf054 ffff881fea91f8c0 ffff883d72a79080
[ 8651.030663] ffff881feb8607d8 ffff881ff0029000 ffff881feb8607d8 ffff881fea91f8c0
[ 8651.038954] ffff883d72a79080 0000000000001055 0000000000000005 ffff881d91f93ae0
[ 8651.047247] Call Trace:
[ 8651.049982] [<ffffffff815bf054>] ? scsi_host_alloc_command+0x44/0xc0
[ 8651.057173] [<ffffffffc00c8373>] megasas_build_and_issue_cmd_fusion+0x93/0x1b0 [megaraid_sas]
[ 8651.066787] [<ffffffffc00b8f88>] megasas_queue_command+0xf8/0x100 [megaraid_sas]
[ 8651.075139] [<ffffffff815c69e1>] scsi_dispatch_cmd+0xe1/0x240
[ 8651.081647] [<ffffffff815c9822>] scsi_request_fn+0x472/0x610
[ 8651.088064] [<ffffffff813c7833>] __blk_run_queue+0x33/0x40
[ 8651.094283] [<ffffffff813c7b0a>] queue_unplugged+0x2a/0xb0
[ 8651.100502] [<ffffffff813cdc96>] blk_flush_plug_list+0x1d6/0x240
[ 8651.107300] [<ffffffff813ce10c>] blk_finish_plug+0x2c/0x40
[ 8651.113522] [<ffffffff8119fb6a>] __do_page_cache_readahead+0x1aa/0x230
[ 8651.120903] [<ffffffff811917ed>] ? pagecache_get_page+0x2d/0x1c0
[ 8651.127704] [<ffffffff811933a5>] filemap_fault+0x375/0x3f0
[ 8651.133923] [<ffffffff811ceee1>] ? page_add_file_rmap+0x51/0x60
[ 8651.140629] [<ffffffff812a5d56>] ext4_filemap_fault+0x36/0x50
[ 8651.147128] [<ffffffff811bfe70>] __do_fault+0x50/0xe0
[ 8651.152860] [<ffffffff811c39c2>] handle_mm_fault+0xfa2/0x1820
[ 8651.159370] [<ffffffff81210ebb>] ? new_sync_write+0x9b/0xe0
[ 8651.165688] [<ffffffff8106b687>] __do_page_fault+0x197/0x400
[ 8651.172089] [<ffffffff8106b912>] do_page_fault+0x22/0x30
[ 8651.178116] [<ffffffff81849ac8>] page_fault+0x28/0x30
[ 8651.183848] Code: 34 c1 e9 31 fe ff ff 41 0f b7 87 68 09 00 00 4c 89 e7 48 c1 e0 04 c6 44 03 ff 00 e8 a7 29 50 c1 85 c0 41 89 c6 0f 89 93 fd ff ff <0f> 0b 48 8b 45 b8 48 8b 7d c8 4c 8b 30 49 8b 04 24 48 8b 8f f8
[ 8651.205479] RIP [<ffffffffc00c80f4>] megasas_build_io_fusion+0x354/0x540 [megaraid_sas]
...

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :
Download full text (10.9 KiB)

crash-study.txt

/**
 * megasas_make_sgl_fusion - Prepares 32-bit SGL
 * @instance: Adapter soft state
 * @scp: SCSI command from the mid-layer
 * @sgl_ptr: SGL to be filled in
 * @cmd: cmd we are working on
 *
 * If successful, this function returns the number of SG elements.
 */
static int
megasas_make_sgl_fusion(struct megasas_instance *instance,
   struct scsi_cmnd *scp,
   struct MPI25_IEEE_SGE_CHAIN64 *sgl_ptr,
   struct megasas_cmd_fusion *cmd)
{
...
 sge_count = scsi_dma_map(scp);

 BUG_ON(sge_count < 0); ----> FAILS HERE

 if (sge_count > instance->max_num_sge || !sge_count)
  return sge_count;
----

/**
 * scsi_dma_map - perform DMA mapping against command's sg lists
 * @cmd: scsi command
 *
 * Returns the number of sg lists actually used, zero if the sg lists
 * is NULL, or -ENOMEM if the mapping failed.
 */
int scsi_dma_map(struct scsi_cmnd *cmd)
{
 int nseg = 0;

 if (scsi_sg_count(cmd)) {
  struct device *dev = cmd->device->host->dma_dev;

  nseg = dma_map_sg(dev, scsi_sglist(cmd), scsi_sg_count(cmd),
      cmd->sc_data_direction);
  if (unlikely(!nseg))
   return -ENOMEM;
 }
 return nseg;
}

----

The only possible way for the BUG_ON in megasas_make_sg_fusion to be triggered
is if nseg is 0 and -ENOMEM (-12) is returned. This means that dma_map_sg could
NOT mapp the scatter gather buffers, from scsi_cmnd, into the firmware ?

----

#define dma_map_sg(d, s, n, r) dma_map_sg_attrs(d, s, n, r, NULL)

----

/*
 * dma_maps_sg_attrs returns 0 on error and > 0 on success.
 * It should never return a value < 0.
 */
static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
       int nents, enum dma_data_direction dir,
       struct dma_attrs *attrs)
{
...
 ents = ops->map_sg(dev, sg, nents, dir, attrs);
 BUG_ON(ents < 0);
...
 return ents;
}

----

Specially because dma_map_sg_attrs would have BUGed_ON if ents where negative.
So the only possible thing that could have happened is for ents to be zeroed, so
it BUGed_ON at megasas_make_sgl_fusion() instead.

----

ops->map_sg:

 {init __mic_dma_ops}() : dma_map_ops
 {init amd_iommu_dma_ops}() : dma_map_ops
 {init calgary_dma_ops}() : dma_map_ops
 {init gart_dma_ops}() : dma_map_ops
 {init intel_dma_ops}() : dma_map_ops
 {init nommu_dma_ops}() : dma_map_ops
 {init sta2x11_dma_ops}() : dma_map_ops
 {init swiotlb_dma_ops}() : dma_map_ops
 {init xen_swiotlb_dma_ops}() : dma_map_ops

crash> dev -d
MAJOR GENDISK NAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV
    8 ffff881ff0142800 sdc ffff881fe92a9f50 0 0 0 1
   11 ffff881ff06dd000 sr0 ffff881fe8ecb968 0 0 0 0
    8 ffff881ff06de000 sdd ffff881fe8ecb430 0 0 0 0
    8 ffff881ff0141800 sdb ffff881fe9c78a70 0 0 0 0
    8 ffff881ff0140800 sda ffff881fe9c78538 0 0 0 0
    8 ffff881ff06b2000 sde ffff881fe8dfbea0 0 0 0 0

crash> struct device.archdata ffff881fe8ecb968
  archdata = {
    dma_ops = 0xffff881ff06dc168,
    iommu = 0x0
  }
crash> struct device.archdata ffff881fe8ecb430
  archdata = {
    dma_ops = 0xffff881ff06dc968,
    iommu = 0x0
  }...

Changed in linux (Ubuntu):
assignee: nobody → Rafael David Tinoco (inaddy)
importance: Undecided → Medium
status: New → Opinion
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.