Unbind not working as expected

Bug #1190120 reported by Matt Bruzek
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Medium
Unassigned

Bug Description

I am trying to unbind a PCI device I received a kernel Oops. I was following instructions described in this KVM document:
http://www.linux-kvm.org/page/How_to_assign_devices_with_VT-d_in_KVM

When I get to the unbind step:
echo 0000:01:00.0 > /sys/bus/pci/devices/0000:01:00.0/driver/unbind

In Raring I got a kernel Oops. Please note at this point I am not running the kvm guest, just trying to unbind the PCI device. I do have VT-d enabled in the bios and a processor that supports IOMMU. This problem is repeatable and appears to be isolated to the unbind command.

However, in Saucy, I do not get a kernel oops, but executing unbind as root puts back into non-root user account. After going back as root, I am not able to perform the next step to bind the device.

root@ubuntu:/home/ubuntu# echo 0000:01:00.0 > /sys/bus/pci/drivers/pci-stub/bind
bash: echo: write error: No such device

ProblemType: Bug
DistroRelease: Ubuntu 13.04
Package: linux-image-3.8.0-23-generic 3.8.0-23.34
ProcVersionSignature: Ubuntu 3.8.0-23.34-generic 3.8.11
Uname: Linux 3.8.0-23-generic x86_64
ApportVersion: 2.9.2-0ubuntu8.1
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC2: bruzer 2169 F.... pulseaudio
 /dev/snd/controlC1: bruzer 2169 F.... pulseaudio
CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found.
Date: Tue Jun 11 21:43:14 2013
HibernationDevice: RESUME=UUID=4b6b5242-2b8c-46e6-9d91-3168bcee1249
InstallationDate: Installed on 2013-06-07 (4 days ago)
InstallationMedia: Ubuntu 13.04 "Raring Ringtail" - Release amd64 (20130424)
IwConfig:
 eth0 no wireless extensions.

 lo no wireless extensions.

 virbr0 no wireless extensions.
MachineType: To Be Filled By O.E.M. To Be Filled By O.E.M.
MarkForUpload: True
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB:
 0 inteldrmfb
 1 nouveaufb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.8.0-23-generic root=UUID=3ee858ac-5064-4c88-b187-2629fedf6f9c ro intel_iommu=on quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-3.8.0-23-generic N/A
 linux-backports-modules-3.8.0-23-generic N/A
 linux-firmware 1.106
RfKill:

SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 05/03/2013
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: P1.20
dmi.board.name: Z87M Extreme4
dmi.board.vendor: ASRock
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 3
dmi.chassis.vendor: To Be Filled By O.E.M.
dmi.chassis.version: To Be Filled By O.E.M.
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrP1.20:bd05/03/2013:svnToBeFilledByO.E.M.:pnToBeFilledByO.E.M.:pvrToBeFilledByO.E.M.:rvnASRock:rnZ87MExtreme4:rvr:cvnToBeFilledByO.E.M.:ct3:cvrToBeFilledByO.E.M.:
dmi.product.name: To Be Filled By O.E.M.
dmi.product.version: To Be Filled By O.E.M.
dmi.sys.vendor: To Be Filled By O.E.M.
---
ApportVersion: 2.12.1-0ubuntu3
Architecture: amd64
CasperVersion: 1.336
DistroRelease: Ubuntu 13.10
IwConfig:
 eth0 no wireless extensions.

 lo no wireless extensions.
LiveMediaBuild: Ubuntu 13.10 "Saucy Salamander" - Alpha amd64 (20130904)
MachineType: To Be Filled By O.E.M. To Be Filled By O.E.M.
MarkForUpload: True
Package: linux (not installed)
ProcFB: 1 inteldrmfb
ProcKernelCmdLine: file=/cdrom/preseed/hostname.seed boot=casper initrd=/casper/initrd.lz quiet splash -- maybe-ubiquity
ProcVersionSignature: Ubuntu 3.11.0-4.9-generic 3.11.0-rc7
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: Daemon not responding.
RelatedPackageVersions:
 linux-restricted-modules-3.11.0-4-generic N/A
 linux-backports-modules-3.11.0-4-generic N/A
 linux-firmware 1.113
RfKill:

Tags: saucy
Uname: Linux 3.11.0-4-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

dmi.bios.date: 07/05/2013
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: P1.60
dmi.board.name: Z87M Extreme4
dmi.board.vendor: ASRock
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 3
dmi.chassis.vendor: To Be Filled By O.E.M.
dmi.chassis.version: To Be Filled By O.E.M.
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrP1.60:bd07/05/2013:svnToBeFilledByO.E.M.:pnToBeFilledByO.E.M.:pvrToBeFilledByO.E.M.:rvnASRock:rnZ87MExtreme4:rvr:cvnToBeFilledByO.E.M.:ct3:cvrToBeFilledByO.E.M.:
dmi.product.name: To Be Filled By O.E.M.
dmi.product.version: To Be Filled By O.E.M.
dmi.sys.vendor: To Be Filled By O.E.M.

Revision history for this message
Matt Bruzek (mbruzek) wrote :
Revision history for this message
Matt Bruzek (mbruzek) wrote :

BUG: unable to handle kernel NULL pointer dereference at 00000000000038
IP: [<ffffffffa02f4514>] nouveau_fence_done+0xc4/0x100 [nouveau]
PGD 3c8eea067 PUD 3c8f2b067 PMD 0
Oops: 0000 [#1] SMP
...

Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote : Re: Kernel Oops - unable to handle kernel NULL pointer dereference; after unbind of a PCI device

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.10 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.10-rc5-saucy/

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Revision history for this message
Matt Bruzek (mbruzek) wrote :

I applied the latest v3.10 kernel that you linked to in comment #4. That appears to fix the kernel oops problem.

I was able to run the command:

echo 0000:01:00.0 > /sys/bus/pci/devices/0000:01:00.0/driver/unbind

Without a kernel Oops problem.

So I believe the problem is fixed upstream. kernel-fixed-upstream

What are my options? Do you recommend using the upstream kernel from now on or will this fix be ported back?

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
tags: added: kernel-fixed-upstream
Revision history for this message
penalvch (penalvch) wrote :

Matt Bruzek, could you please confirm this issue exists with the latest development release of Ubuntu? ISO images are available from http://cdimage.ubuntu.com/daily-live/current/ . If the issue remains, could you please run the following command in the development release from a Terminal (Applications->Accessories->Terminal), as it will automatically gather and attach updated debug information to this report:

apport-collect -p linux <replace-with-bug-number>

tags: added: bios-outdated-1.60
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: needs-saucy-test regression-potential
tags: added: kernel-fixed-upstream-v3.10-rc5
removed: kernel-fixed-upstream
Revision history for this message
Matt Bruzek (mbruzek) wrote :

I obtained the Live CD for 13.10 amd64.
uname -a = Linux ubuntu 3.11.0-4-generic #9-Ubuntu SMP Mon Aug 26 15:21:06 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

I was following instructions described in this KVM document:
http://www.linux-kvm.org/page/How_to_assign_devices_with_VT-d_in_KVM

root@ubuntu:/home/ubuntu# dmesg | grep -e DMAR -e IOMMU
[ 0.000000] ACPI: DMAR 00000000bc843878 000B8 (v01 INTEL HSW 00000001 INTL 00000001)
[ 0.020006] dmar: IOMMU 0: reg_base_addr fed90000 ver 1:0 cap c0000020660462 ecap f0101a
[ 0.020009] dmar: IOMMU 1: reg_base_addr fed91000 ver 1:0 cap d2008020660462 ecap f010da
[ 0.020080] IOAPIC id 2 under DRHD base 0xfed91000 IOMMU 1

root@ubuntu:/home/ubuntu# modprobe pci_stub
root@ubuntu:/home/ubuntu# lspci -n
...
01:00.0 0300: 10de:11c0 (rev a1)
...
root@ubuntu:/home/ubuntu# echo "10de 11c0" >/sys/bus/pci/drivers/pci-stub/new_id
root@ubuntu:/home/ubuntu# echo 0000:01:00.0 > /sys/bus/pci/devices/0000\:01\:00.0/driver/unbind
ubuntu@ubuntu:~$

I no longer get the kernel oops probelm, but the unbind operation kicks me out of super user and back to the regular user. After goign back into the super user, I am not able to perform the next step to bind the device.

root@ubuntu:/home/ubuntu# echo 0000:01:00.0 > /sys/bus/pci/drivers/pci-stub/bind
bash: echo: write error: No such device

So again no more kernel oops, but I don't think the unbind is workign correctly. The apport-collect command is not included on the Live CD/DVD so I had to install it to get it to run.

tags: added: apport-collected saucy
description: updated
Revision history for this message
Matt Bruzek (mbruzek) wrote : AlsaInfo.txt

apport information

Revision history for this message
Matt Bruzek (mbruzek) wrote : AudioDevicesInUse.txt

apport information

Revision history for this message
Matt Bruzek (mbruzek) wrote : BootDmesg.txt

apport information

Revision history for this message
Matt Bruzek (mbruzek) wrote : CRDA.txt

apport information

Revision history for this message
Matt Bruzek (mbruzek) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Matt Bruzek (mbruzek) wrote : Lspci.txt

apport information

Revision history for this message
Matt Bruzek (mbruzek) wrote : Lsusb.txt

apport information

Revision history for this message
Matt Bruzek (mbruzek) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Matt Bruzek (mbruzek) wrote : ProcEnviron.txt

apport information

Revision history for this message
Matt Bruzek (mbruzek) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Matt Bruzek (mbruzek) wrote : ProcModules.txt

apport information

Revision history for this message
Matt Bruzek (mbruzek) wrote : UdevDb.txt

apport information

Revision history for this message
Matt Bruzek (mbruzek) wrote : UdevLog.txt

apport information

Revision history for this message
Matt Bruzek (mbruzek) wrote : WifiSyslog.txt

apport information

penalvch (penalvch)
tags: added: latest-bios-1.60
removed: bios-outdated-1.60
tags: removed: needs-saucy-test
penalvch (penalvch)
description: updated
tags: removed: saucy
Revision history for this message
penalvch (penalvch) wrote : Re: Kernel Oops - unable to handle kernel NULL pointer dereference; after unbind of a PCI device

Matt Bruzek, regarding unbind not working in Saucy, looks like you may have exchanged one bug for another. :) If you test v3.10-rc5-saucy in Saucy, does unbind still not work and the kernel still not crash?

Revision history for this message
Matt Bruzek (mbruzek) wrote :

Christopher Penalver,

I finally got a chance to install 13.10 Saucy and retry this problem. The new unbind problem happens exactly described above. Reminder I am following these instructions: http://www.linux-kvm.org/page/How_to_assign_devices_with_VT-d_in_KVM

I happened to be root while running the unbind command and the result of the command kicks me out of root back to my normal user. Had I been a normal user I am not sure I would have found this bug as a normal user because the sudo command returns you to normal user after complete.

root@apocalypse:/home/bruzer# echo 0000:01:00.0 > /sys/bus/pci/devices/0000\:01\:00.0/driver/unbind
bruzer@apocalypse:~$

The kernel does not crash as before, but I don't think the unbind command is working properly in this situation. I think it is clear the kernal oops problem is gone but the unbind step in the list does not work.

Revision history for this message
penalvch (penalvch) wrote :

Matt Bruzek, could you please test the latest mainline kernel via http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.12-trusty/ and advise on the results?

summary: - Kernel Oops - unable to handle kernel NULL pointer dereference; after
- unbind of a PCI device
+ Unbind not working as expected
description: updated
tags: added: saucy
removed: kernel-fixed-upstream-v3.10-rc5
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
status: Confirmed → Incomplete
Revision history for this message
Matt Bruzek (mbruzek) wrote :

Christopher,

I downloaded and installed the mainline kerel v3.12 and tried the sequence of commands leading to the unbind command.

This time the system hung, no kernel oops but the system was unresponsive/locked up. I was unable to use any keystroke to gain control of my system Ctrl+c failed as did trying to change to the text display. This forced me to remove the power from the system. I have done this twice so I could record my steps, same repeatable result.

bruzer@apocalypse:~$ uname -a
Linux apocalypse 3.12.0-031200-generic #201311071835 SMP Thu Nov 7 23:36:07 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
bruzer@apocalypse:~$ sudo su
[sudo] password for bruzer:
root@apocalypse:/home/bruzer# dmesg | grep -e DMAR -e IOMMU
[ 0.000000] ACPI: DMAR 00000000bc843878 000B8 (v01 INTEL HSW 00000001 INTL 00000001)
[ 0.020392] dmar: IOMMU 0: reg_base_addr fed90000 ver 1:0 cap c0000020660462 ecap f0101a
[ 0.020396] dmar: IOMMU 1: reg_base_addr fed91000 ver 1:0 cap d2008020660462 ecap f010da
[ 0.020466] IOAPIC id 2 under DRHD base 0xfed91000 IOMMU 1
root@apocalypse:/home/bruzer# modprobe pci_stub
root@apocalypse:/home/bruzer# lspci -n
...
01:00.0 0300: 10de:11c0 (rev a1)
01:00.1 0403: 10de:0e0b (rev a1)
root@apocalypse:/home/bruzer# echo "10de 11c0" > /sys/bus/pci/drivers/pci-stub/new_id
root@apocalypse:/home/bruzer# echo 0000:01:00.0 > /sys/bus/pci/devices/0000\:01\:00.0/driver/unbind

I believe the v3.12 behavior of this problem is much more troublesome than a kernel oops or kicking out of superuser.

Was there anything specific in the v3.12 kernel that you thought would fix this problem? Or do I have to keep trying all versions of the kernel until the problem is fixed?

Revision history for this message
penalvch (penalvch) wrote :

Matt Bruzek, did this problem not occur in a release prior to Raring?

Revision history for this message
Matt Bruzek (mbruzek) wrote :

Christopher,

I have not tried this scenario in any release earlier than Raring 13.04.

A google search tells me that people have been using PCI passthrough on Ubuntu since 2009. This problem does not seem to be related to PCI passthrough only the unbind step.

penalvch (penalvch)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Matt Bruzek (mbruzek) wrote :

I had some more time to work on this problem. After more research and reading more information I discovered that IOMMU was not enabled on my system. To enable IOMMU I performed the following steps on my Ubuntu Linux server:

sudo vi /etc/default/grub
Add “intel_iommu=on” to the GRUB_CMDLINE_LINUX variable
:wq
sudo update-grub
Reboot the system.

The results of these commands is adding intel_iommu=on as a boot parameter to the grub Linux boot commands.

After performing those steps I was able to get much more information in the command:
bruzer@apocalypse:~$ dmesg | grep -e DMAR -e IOMMU
[ 0.000000] ACPI: DMAR 00000000bc843878 000B8 (v01 INTEL HSW 00000001 INTL 00000001)
[ 0.000000] Intel-IOMMU: enabled
[ 0.020485] dmar: IOMMU 0: reg_base_addr fed90000 ver 1:0 cap c0000020660462 ecap f0101a
[ 0.020489] dmar: IOMMU 1: reg_base_addr fed91000 ver 1:0 cap d2008020660462 ecap f010da
[ 0.020556] IOAPIC id 2 under DRHD base 0xfed91000 IOMMU 1
[ 0.394459] DMAR: No ATSR found
[ 0.394479] IOMMU 0 0xfed90000: using Queued invalidation
[ 0.394480] IOMMU 1 0xfed91000: using Queued invalidation
[ 0.394481] IOMMU: Setting RMRR:
[ 0.394489] IOMMU: Setting identity map for device 0000:00:02.0 [0xbf800000 - 0xcf9fffff]
[ 0.395687] IOMMU: Setting identity map for device 0000:00:1d.0 [0xbc6fa000 - 0xbc706fff]
[ 0.395708] IOMMU: Setting identity map for device 0000:00:1a.0 [0xbc6fa000 - 0xbc706fff]
[ 0.395726] IOMMU: Setting identity map for device 0000:00:14.0 [0xbc6fa000 - 0xbc706fff]
[ 0.395740] IOMMU: Prepare 0-16MiB unity mapping for LPC
[ 0.395745] IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 – 0xffffff]

Now I am able to successfully run the commands that unbind and bind to the pci-stub module.

# echo "10de 11c0" > /sys/bus/pci/drivers/pci-stub/new_id
# echo "0000:01:00.0" > /sys/bus/pci/devices/0000\:01\:00.0/driver/unbind
# echo "0000:01:00.0" > /sys/bus/pci/drivers/pci-stub/bind

These commands have no problems now that I am using the additional boot parameter. I believe we can close this bug report because there does not seem to be a problem with the unbind command.

The PCI passthrough of the video card is still not working, but I do not believe that is related to anything in this bug report.

Revision history for this message
penalvch (penalvch) wrote :

Matt Bruzek, this bug report is being closed due to your last comment https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1190120/comments/28 regarding this being addressed via kernel parameter. For future reference you can manage the status of your own bugs by clicking on the current status in the yellow line and then choosing a new status in the revealed drop down box. You can learn more about bug statuses at https://wiki.ubuntu.com/Bugs/Status. Thank you again for taking the time to report this bug and helping to make Ubuntu better. Please submit any future bugs you may find.

Changed in linux (Ubuntu):
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.