lubuntu 18.10 x86 (32bit) image fails to load "ehci-pci 0000:00:a.7: dma_direct_map_sg: overflow 0x000000016e3f3000+2048 of device mask ffffffff" repeats

Bug #1794922 reported by Chris Guiver on 2018-09-28
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Debian)
Unknown
Unknown
linux (Ubuntu)
High
Unassigned

Bug Description

-- Background --
Testing the x86 (32bit) ISO image on various machines.

After zsync download, the thumb-drive is tested ("check disk for defects") & passes; and worked fine on two systems today (dell d610 & hp dx6120), however

-- Issue --

dell 755 (desktop) & hp dc7700 (small form factor) it just sits at lubuntu plymouth screen.

switching to messages, it's just the following repeating

// on dell 755
"ehci-pci 0000:00:1a.7: dma_direct_map_sg: overflow 0x000000016e3f3000+2048 of device mask ffffffff"

// on hp dc7700
"ehci-pci 0000:00:1a.7: dma_direct_map_sg: overflow 0x0000000162317000+2048 of device mask ffffffff"

NOTE: I can't copy/paste from systems, as it never completes boot; I left it >30mins (on 755) & it just kept repeating message. I've manually typed it; so typo's could have been made.

A discussion on #lubuntu-devel provided a possible cause (Thank you Walter!) -

---
<wxl> guiverc: lyorian: http://debian.2.n7.nabble.com/Bug-908924-dma-direct-map-sg-overflow-on-USB-access-after-upgrade-to-kernel-4-18-td4387757.html
<wxl> seems like it's a usb issue of some kind with 4.18
---

This fits because whilst I do my testing mostly on d610 & t43 (no issues today), I'd tested Lubuntu 18.10 before on the dell 755 with no issues. My last test was also long enough ago to be on 4.17 kernel.

I'll add `lshw` files for the two failed systems

Chris Guiver (guiverc) on 2018-09-28
description: updated
Ubuntu QA Website (ubuntuqa) wrote :

This bug has been reported on the Ubuntu ISO testing tracker.

A list of all reports related to this bug can be found here:
http://iso.qa.ubuntu.com/qatracker/reports/bugs/1794922

tags: added: iso-testing
Chris Guiver (guiverc) on 2018-09-28
description: updated
Walter Lapchynski (wxl) on 2018-09-28
affects: ubuntu → linux (Ubuntu)
Changed in linux (Ubuntu):
importance: Undecided → High

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1794922

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: cosmic
Walter Lapchynski (wxl) wrote :

I note the linked report is 32 bit as well. Can you confirm this does NOT occur on a 64 bit machine? Be very careful if you're using a 64 bit machine that it's more likely to not have an EHCI controller. Roughly speaking, EHCI is 2.0-level USB while xHCI is 3.0 (UHCI being 1.0, which requires some additional memory hardware to support 64 bit FWIW). Since the errors relate to EHCI, this is going to be crucial. `lspci | grep HCI` should help out in that regard. It also might be good to check a 32 bit kernel on a 64 bit machine if you can get that test out of the way.

It is possible this may affect other controller architectures, but I doubt it. On the other hand, EHCI can handle 1.0 and 2.0 devices and will likely downgrade a 3.0 device to 2.0 speeds. If it's possible, checking all those different ones will likely prove helping in understanding the scope of the problem.

tags: added: lubuntu
removed: cosmic
Walter Lapchynski (wxl) wrote :

Also let's check Xubuntu and netboot, too. They both have i386 images.

Chris Guiver (guiverc) on 2018-09-28
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Chris Guiver (guiverc) wrote :

Reply to "Ubuntu Kernel Bot"

> This bug is missing log files that will aid in diagnosing the problem.
> While running an Ubuntu kernel

The ISO doesn't boot on those machines, so I can't get to a terminal.

> If, due to the nature of the issue you have encountered, you are unable
> to run this command, please add a comment stating that fact and change
> the bug status to 'Confirmed'.

Done

Walter, thanks again!

I'll have to learn a little about EHCI so I can recognize it.

As for xubuntu x86 daily ISO - I wrote one yesterday to test, but never got around to it
I booted it on dell 755 (next to me as I type this), and

"ehci-pci 000:00:1a.7: dma_direct_map_sg: overflow 0x0000000016ca56000+2048 of device mask ffffffff"

(typed, so may not be exact). It did't continue endlessly, eventually getting to BusyBox.

I only left one (lubuntu 18.10 image) system go for 30 mins (can't recall which; both use same keyd/mouse/screen), so maybe a lubuntu image will also drop to busybox if you want me to try...

I'll add more when I can..

Chris Guiver (guiverc) wrote :

update on xubuntu 18.10 (x86) iso
both systems get to busybox; the dc7700 (other box) didn't show hundreds of those ehci-pci messages...

BusyBox v1.27.2 (Ubuntu 1:1.27.2-2ubuntu4) bui...
Enter 'help'...
(initramfs) Unable to find a medium containing a live file system

It was yesterday's daily; I'll re-zsync it, then re-write & re-test both (after confirming image runs fine on d610 laptop) - but later... (UWN summaries..)

Chris Guiver (guiverc) wrote :

another update on xubuntu 18.10 (x86) iso
the same thumb-drive image runs fine on d610 (dell latitude laptop)
(no zsync yet)

Chris Guiver (guiverc) wrote :

I've re-run zsync for latest image; no 'ehci-pci' messages on either dc7700 or 755 for Xubuntu x86 (but both drop to BusyBox/initramfs)

Given different messages - I reported as a different bug (for QA tracker)
http://launchpad.net/bugs/1795092
providing link to here of course; inc. your upstream possible link..
(I am yet to try lubuntu image today)

-- hp dc7700 (c2d, 5gb ram, nvidia quadro nvs290/g86)
guiverc@dc7700ub:~$ lscpi|grep HCI
00:1a.0 USB controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #4 (rev 02)
00:1a.1 USB controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #5 (rev 02)
00:1a.7 USB controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #2 (rev 02)
00:1d.0 USB controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #1 (rev 02)
00:1d.1 USB controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #2 (rev 02)
00:1d.7 USB controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #1 (rev 02)

-- dell 755 (c2d, 5gb ram, rv516/x1300/x1550 radeon)
755-suse:/home/guiverc # lspci |grep HCI
00:1a.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 02)
00:1a.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 02)
00:1a.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 02)
00:1d.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 02)
00:1d.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 02)
00:1d.2 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 02)
00:1d.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 02)
00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode] (rev 02)

Sorry Walter, I'm lost with the following

> It also might be good to check a 32 bit kernel on a 64 bit machine if you can get
> that test out of the way
> If it's possible, checking all those different ones will likely prove helping in
> understanding the scope of the problem.

Chris Guiver (guiverc) wrote :

Lubuntu 18.10 (today's daily image) on dc7700

"ehci-pci 0000:00:1a.7: dma_direct_map_sg: overflow 0x000000016233c000+2048 of device mask ffffffff"

(alt-f1 = lubuntu plymouth dots; alt-f2 just scrolling "[ 999.999999] ehci-pci.." messages

I have no idea what the overflow number means; but it's different. I copied
"ehci-pci 0000:00:1a.7: dma_direct_map_sg: overflow 0x0000000162317000+2048 of device mask ffffffff" from earlier in lp.bug.report; and had to alter it.

I'll shutdown machine at 2400. (~40mins)

Chris Guiver (guiverc) wrote :

Downloaded http://archive.ubuntu.com/ubuntu/dists/cosmic/main/installer-i386/current/images/netboot/mini.iso

wrote to usb & booted on dc7700. It mainly has 'install' options, I'm not willing to install to test; so selected 'rescue mode' ...

it asked a number of questions (keyboard, mirror..) then started downloads; about 40% done it went purple-screen ("Rescue mode" top left in white, white bar on bottom line, black space bottom left) & stopped dead. ctrl-alt-f2 asked me to press enter to open console, which put me at BusyBox v1.27.2.... ctrl-alt-f1 is static screen... i waited

eventually i hit enter, white bar increased in size (two lines); 'rescue mode' now gone, enter a few more times & white area is increasing.. I `ls` & enter and a line just has 'ls' on it now.

because I've not used this installer; I don't know if this behavior is normal, or wrong... the test cases in iso.QA are for 'install'.

Walter Lapchynski (wxl) wrote :

Maybe installing in a virtual machine might be good idea?

Chris Guiver (guiverc) wrote :

Sorry Walter, I should have updated with info from
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1795092

the xubuntu 18.10 x86 has the same (or very similar) messages, but they stop far sooner, and xubuntu drops to busybox/initramfs with an error.

I was asked by jsalisbury to do a test; which I think proves the kernel is okay.

From #1795092 bug comment #13 follows
----------------

installed lubuntu 18.04.1 on dell 755 without updates, format of /, but used my old /home (from fedora). rebooted post install, pointed sources to my ISP's ubuntu mirror (bandwidth free updates) & dist-upgrade. I added xubuntu-desktop & some of the apps missing from my modified xfce desktop.

added canonical kernel ppa; changed [ppa] source to point to cosmic; update & install kernel. reboot & login (normally!)

guiverc@755-lubu:~$ screenfetch -n
 guiverc@755-lubu
 OS: Ubuntu 18.04 bionic
 Kernel: x86_64 Linux 4.18.0-8-generic
 Uptime: 10m
 Packages: 1678
 Shell: bash 4.4.19
 Resolution: 1680x1050
 DE: XFCE
 WM: Xfwm4
 WM Theme: Agualemon
 GTK Theme: Mist [GTK2]
 Icon Theme: gnome
 Font: Sans 10
 CPU: Intel Core2 Duo E6850 @ 2x 2.71GHz
 GPU: ATI RV515
 RAM: 477MiB / 4824MiB

// in #14 from same bug report I booted xubuntu 18.10 (x86) live on same dell 755; it still has

[ 99.999999] ehci-pci 0000:00:1a.7:dma_direct_map_sg: overflow 0x000000016cbe60000+2048 of device mask ffffffff

Chris Guiver (guiverc) wrote :

did a test on a hp 8200 elite sff (i5-2400, 8gb, nvidia quadro 600)

gui screen shows blue background, lubuntu (logo) and dots (normal plymouth)
text screen shows

[ 99.999999] ehci-pci 0000:00:1d.0:dma_direct_map_sg: overflow 0x0000000223ad7000+2048 of device mask ffffffff

scrolling by endlessly... (till I rebooted at ~1650.999999 timestamp message)

alas lubuntu 18.10 x86 image was maybe 2 days old..
I updated image to latest, same result (but I didn't wait as long)

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.