Nvidia driver causing SIGSEGV in nvclock and smartdimmer

Bug #1039916 reported by TJ
66
This bug affects 10 people
Affects Status Importance Assigned to Milestone
nvclock (Ubuntu)
Triaged
Medium
Unassigned
nvidia-graphics-drivers-updates (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Changes by Nvidia in their proprietary driver have resulted in several reports of SIGSEGV on Precise and others using Nvidia driver version 295 and later.

Specifically, changes to external access to memory-mapped registers that previously provided the memory address of the video BIOS result in an invalid value being returned. The Pointer to RAMIN (RAM INstance memory) is no longer valid.

This value is treated as a memory address and used by nvclock/smartdimmer to do a memory copy into its own memory.

The SIGSEGV occurs when the glibc (C System Library) memcpy() is called.

(gdb) run
Starting program: /home/all/SourceCode/nvclock/nvclock-0.8b4+cvs20100914/src/smartdimmer -g

Breakpoint 1, check_driver () at back_linux.c:43
43 {
(gdb) c
Continuing.

Breakpoint 1, check_driver () at back_linux.c:43
43 {
(gdb) c
Continuing.

Breakpoint 3, load_bios_pramin (data=0x61e270 "") at bios.c:918
918 uint32_t old_bar0_pramin = 0;
(gdb) n
921 if(!nv_card->arch)
(gdb) n
925 if (nv_card->arch & NV5X)
(gdb) n
937 bios = (char*)nv_card->PRAMIN;
(gdb) n
938 memmove(data, bios, NV_PROM_SIZE);
(gdb) print bios
$1 = 0xffffffffffffffff <Address 0xffffffffffffffff out of bounds>
(gdb) print nv_card->PRAMIN
$2 = (volatile unsigned int *) 0xffffffffffffffff
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
__memmove_ssse3 () at ../sysdeps/x86_64/multiarch/memcpy-ssse3.S:91
91 ../sysdeps/x86_64/multiarch/memcpy-ssse3.S: No such file or directory.

grep '01:00.0' /var/log/dmesg
[ 0.396479] pci 0000:01:00.0: [10de:0398] type 00 class 0x030000
[ 0.396503] pci 0000:01:00.0: reg 10: [mem 0xd1000000-0xd1ffffff]
[ 0.396529] pci 0000:01:00.0: reg 14: [mem 0xb0000000-0xbfffffff 64bit pref]
[ 0.396554] pci 0000:01:00.0: reg 1c: [mem 0xd0000000-0xd0ffffff 64bit]
[ 0.396571] pci 0000:01:00.0: reg 24: [io 0x2000-0x207f]
[ 0.396589] pci 0000:01:00.0: reg 30: [mem 0x00000000-0x0001ffff pref]
[ 0.396684] pci 0000:01:00.0: disabling ASPM on pre-1.1 PCIe device. You can enable it with 'pcie_aspm=force'
[ 0.410100] vgaarb: device added: PCI:0000:01:00.0,decodes=io+mem,owns=io+mem,locks=none
[ 0.410100] vgaarb: bridge control possible 0000:01:00.0
[ 0.466741] pci 0000:01:00.0: BAR 6: can't assign mem pref (size 0x20000)
[ 0.473603] pci 0000:01:00.0: Boot video device
[ 33.246548] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=io+mem,decodes=none:owns=io+mem

lspci -vvvnn -s 01:00.0
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation G73 [GeForce Go 7600] [10de:0398] (rev a1) (prog-if 00 [VGA controller])
 Subsystem: Sony Corporation Device [104d:81ef]
 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
 Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
 Latency: 0
 Interrupt: pin A routed to IRQ 16
 Region 0: Memory at d1000000 (32-bit, non-prefetchable) [size=16M]
 Region 1: Memory at b0000000 (64-bit, prefetchable) [size=256M]
 Region 3: Memory at d0000000 (64-bit, non-prefetchable) [size=16M]
 Region 5: I/O ports at 2000 [size=128]
 Expansion ROM at <unassigned> [disabled]
 Capabilities: <access denied>
 Kernel driver in use: nvidia
 Kernel modules: nvidia_current_updates, nouveau, nvidiafb

The Nouveau (F/OSS nvidia driver) wiki[1] says:

"Video memory is split into normal memory and RAMIN, also known as instance memory. RAMIN is used to contain the card management objects (usually accessible only to the kernel), normal memory is for objects that normal applications access. "

According to the Nouveau wiki PRAMIN should be in PCI BAR 2[2]. The wiki may be out-of-date but the lspci output above shows the BAR2 isn't being used - I would think BAR3 is the location of PRAMIN.

[2] BAR is a PCI device's Base Address Register. PCI devices can have several of these, each defined in the device's PCI config space. In the lspci output above the BARs are the lines prefixed "Region".

[1] http://nouveau.freedesktop.org/wiki/HwIntroduction

Tags: precise
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nvclock (Ubuntu):
status: New → Confirmed
TJ (tj)
Changed in nvclock (Ubuntu):
assignee: nobody → TJ (tj)
importance: Undecided → Medium
tags: added: precise
Revision history for this message
TJ (tj) wrote :

This bug causes a regression for bug #95444 "No Screen Backlight Control; Notebooks (Vaio, Macbook, HP/Compaq, Samsung, Zepto et al.) with Nvidia Geforce8/Geforce9/Quadro series graphics" which was marked 'Fix Released' 2009-03-27.

Changed in nvclock (Ubuntu):
status: Confirmed → In Progress
Revision history for this message
TJ (tj) wrote :

For those users simply wanting to control the backlight (and currently using smartdimmer) a better alternative with active development is Guillaume Zin's "nvidiabl" kernel-module project. This presents a standard /sys/class/backlight/ node that the system can control using all existing tools.

https://github.com/guillaumezin/nvidiabl

Revision history for this message
TJ (tj) wrote :

The route of the problem is that the nvidia driver returns invalid values for PCIO, PDISPLAY and PRAMIN.

Breakpoint 6, map_mem (dev_name=0x61e250 "/dev/nvidia0") at backend.c:34
34 dev_handle_t *fd = open_dev(dev_name);
(gdb) n
36 if(!fd) /* open_dev has already set the error */
(gdb) p fd
$2 = (dev_handle_t *) 0x61e0d0
(gdb) n
40 nv_card->PEXTDEV = map_dev_mem(fd, nv_card->reg_address + 0x101000, 0x1000);
(gdb) p dev_name
$3 = 0x61e250 "/dev/nvidia0"
(gdb) p /x nv_card->reg_address + 0x101000
$4 = 0xd5101000
(gdb) n
41 nv_card->PFB = map_dev_mem(fd, nv_card->reg_address + 0x100000, 0x1000);
(gdb) p /x nv_card->PEXTDEV
$5 = 0x7ffff7ff8000
(gdb) n
43 nv_card->PMC = map_dev_mem(fd, nv_card->reg_address + 0x000000, 0x2ffff);
(gdb) p /x nv_card->PFB
$6 = 0x7ffff7ff7000
(gdb) n
44 nv_card->PCIO = map_dev_mem(fd, nv_card->reg_address + 0x601000, 0x2000);
(gdb) p /x nv_card->PMC
$7 = 0xffffffffffffffff
(gdb) n
45 nv_card->PDISPLAY = map_dev_mem(fd, nv_card->reg_address + NV_PDISPLAY_OFFSET, NV_PDISPLAY_SIZE);
(gdb) p /x nv_card->PCIO
$8 = 0x7ffff7ff5000
(gdb) n
46 nv_card->PRAMDAC = map_dev_mem(fd, nv_card->reg_address + 0x680000, 0x2000);
(gdb) p /x nv_card->PDISPLAY
$9 = 0xffffffffffffffff
(gdb) n
47 nv_card->PRAMIN = map_dev_mem(fd, nv_card->reg_address + NV_PRAMIN_OFFSET, NV_PRAMIN_SIZE);
(gdb) p /x nv_card->PRAMDAC
$10 = 0x7ffff7fe3000
(gdb) n
48 nv_card->PROM = map_dev_mem(fd, nv_card->reg_address + 0x300000, 0xffff);
(gdb) p /x nv_card->PRAMIN
$11 = 0xffffffffffffffff
(gdb) n
51 if(nv_card->arch & NV5X)

Revision history for this message
TJ (tj) wrote :

A better summary:

(gdb) p /x *nv_card
$13 = {card_name = 0x41332c, number = 0x0, caps = 0x0, device_id = 0x398, subvendor_id = 0x0, arch = 0x8000,
  reg_address = 0xd5000000, dev_name = 0x61e250, devbusfn = 0x100, irq = 0x10, base_freq = 0x0, gpu = 0x3, debug = 0x0,
  bios = 0x0, have_coolbits = 0x0, state = 0x0, mem_mapped = 0x0, PFB = 0x7ffff7ff7000, PBUS = 0x0, PDISPLAY = 0xffffffffffffffff,
  PMC = 0xffffffffffffffff, PRAMDAC = 0x7ffff7fe3000, PRAMIN = 0xffffffffffffffff, PEXTDEV = 0x7ffff7ff8000,
  PROM = 0x7ffff7fe5000, PCIO = 0x7ffff7ff5000, nvclk_min = 0x0, nvclk_max = 0x0, memclk_min = 0x0, memclk_max = 0x0,
  nvclk_3d = 0x0, memclk_3d = 0x0, get_gpu_architecture = 0x0, get_gpu_revision = 0x0, set_gpu_pci_id = 0x0, mem_type = 0x0,
  get_memory_type = 0x0, get_memory_width = 0x0, get_memory_size = 0x0, get_bus_type = 0x0, get_bus_rate = 0x0,
  get_agp_status = 0x0, get_agp_fw_status = 0x0, get_agp_sba_status = 0x0, get_agp_supported_rates = 0x0,
  get_pcie_max_bus_rate = 0x0, num_busses = 0x0, busses = {0x0, 0x0, 0x0, 0x0}, sensor = 0x0, sensor_name = 0x0,
  get_board_temp = 0x0, get_gpu_temp = 0x0, get_fanspeed = 0x0, set_fanspeed = 0x0, get_i2c_fanspeed_mode = 0x0,
  set_i2c_fanspeed_mode = 0x0, get_i2c_fanspeed_rpm = 0x0, get_i2c_fanspeed_pwm = 0x0, set_i2c_fanspeed_pwm = 0x0,
  get_default_mask = 0x0, get_hw_masked_units = 0x0, get_sw_masked_units = 0x0, get_pixel_pipelines = 0x0,
  set_pixel_pipelines = 0x0, get_vertex_pipelines = 0x0, set_vertex_pipelines = 0x0, get_shader_speed = 0x0,
  set_shader_speed = 0x0, reset_shader_speed = 0x0, get_stream_units = 0x0, get_rop_units = 0x0, get_smartdimmer = 0x0,
  set_smartdimmer = 0x0, mpll = 0x0, mpll2 = 0x0, nvpll = 0x0, nvpll2 = 0x0, set_state = 0x0, get_gpu_speed = 0x0,
  set_gpu_speed = 0x0, get_memory_speed = 0x0, set_memory_speed = 0x0, reset_gpu_speed = 0x0, reset_memory_speed = 0x0,
  get_debug_info = 0x0}

Revision history for this message
TJ (tj) wrote :

The issue is being caused by calls the system's mmap() function failing to allocate the requested VMA mappings.

After some further debugging and playing with values I modified backend/back_linux.c::map_dev_mem() so that when mmap() returns MAP_FAILED the function adjusts the requested allocation to progressively smaller sizes (in steps of 256 bytes).

If the allocation differs from the request it is reported to stderr. If the allocation totally fails the program exits.

Here's the output on my amd64 system:

$ ./smartdimmer -g
OK: map_mem(): PEXTDEV requested 0x1000 and received 0x1000 after 0 attempts at 0xd5101000
OK: map_mem(): PEXTDEV requested 0x1000 and received 0x1000 after 0 attempts at 0xd5100000
OK: map_mem(): PEXTDEV requested 0x1000 and received 0x1000 after 0 attempts at 0xd5000000
OK: map_mem(): PEXTDEV requested 0x2000 and received 0x2000 after 0 attempts at 0xd5601000
ERR: map_mem(): PDISPLAY requested 0x10000 but received 0x8000 after 8 attempts at 0xd5610000
OK: map_mem(): PEXTDEV requested 0x2000 and received 0x2000 after 8 attempts at 0xd5680000
ERR: map_mem(): PRAMIN requested 0x100000 but received 0x0 after 264 attempts at 0xd5700000

So it looks like requests for large maps are causing problems.

Revision history for this message
TJ (tj) wrote :

My last comment was messed up by my failing to refactor after a mass cut and paste operation. Here's a better, clearer view of what is going on:

$ ./smartdimmer -g
OK: map_mem(): PEXTDEV register 0xd5101000 requested 0x1000 and received 0x1000 after 0 attempts at 0x7f3a97e76000
OK: map_mem(): PFB register 0xd5100000 requested 0x1000 and received 0x1000 after 0 attempts at 0x7f3a97e75000
OK: map_mem(): PMC register 0xd5000000 requested 0x1000 and received 0x1000 after 0 attempts at 0x7f3a97e74000
OK: map_mem(): PCIO register 0xd5601000 requested 0x2000 and received 0x2000 after 0 attempts at 0x7f3a97e72000
ERR: map_mem(): PDISPLAY register 0xd5610000 requested 0x10000 but received 0x8000 after 8 attempts at 0x7f3a97dff000
OK: map_mem(): PDISPLAY register 0xd5680000 requested 0x2000 and received 0x2000 after 8 attempts at 0x7f3a97e70000
ERR: map_mem(): PRAMIN register 0xd5700000 requested 0x100000 but received 0 after 264 attempts at 0xffffffffffffffff

Revision history for this message
TJ (tj) wrote :
Download full text (5.9 KiB)

More detail, including the process memory map.

$ sudo ./smartdimmer -g
OK: map_mem(): PEXTDEV register 0xd5101000 requested 0x1000 and received 0x1000 after 0 attempts at 0x7f709064e000
OK: map_mem(): PFB register 0xd5100000 requested 0x1000 and received 0x1000 after 0 attempts at 0x7f709064d000
OK: map_mem(): PMC register 0xd5000000 requested 0x1000 and received 0x1000 after 0 attempts at 0x7f709064c000
OK: map_mem(): PCIO register 0xd5601000 requested 0x2000 and received 0x2000 after 0 attempts at 0x7f709064a000
ERR: map_mem(): PDISPLAY register 0xd5610000 requested 0x10000 but received 0x8000 after 8 attempts at 0x7f70905d7000
OK: map_mem(): PRAMDAC register 0xd5680000 requested 0x2000 and received 0x2000 after 0 attempts at 0x7f7090648000
ERR: map_mem(): PRAMIN register 0xd5700000 requested 0x100000 but received 0 after 256 attempts at 0xffffffffffffffff
ERR: map_mem(): memory-mapping PRAMIN failed

$ sudo cat /proc/7505/maps
[sudo] password for tj:
00400000-0041c000 r-xp 00000000 fc:03 817412 /home/all/SourceCode/nvclock/nvclock-0.8b4+cvs20100914/src/smartdimmer
0061c000-0061d000 r--p 0001c000 fc:03 817412 /home/all/SourceCode/nvclock/nvclock-0.8b4+cvs20100914/src/smartdimmer
0061d000-0061e000 rw-p 0001d000 fc:03 817412 /home/all/SourceCode/nvclock/nvclock-0.8b4+cvs20100914/src/smartdimmer
0061e000-0061f000 rw-p 00000000 00:00 0
01778000-01799000 rw-p 00000000 00:00 0 [heap]
7f708f302000-7f708f307000 r-xp 00000000 fc:0d 408436 /usr/lib/x86_64-linux-gnu/libXdmcp.so.6.0.0
7f708f307000-7f708f506000 ---p 00005000 fc:0d 408436 /usr/lib/x86_64-linux-gnu/libXdmcp.so.6.0.0
7f708f506000-7f708f507000 r--p 00004000 fc:0d 408436 /usr/lib/x86_64-linux-gnu/libXdmcp.so.6.0.0
7f708f507000-7f708f508000 rw-p 00005000 fc:0d 408436 /usr/lib/x86_64-linux-gnu/libXdmcp.so.6.0.0
7f708f508000-7f708f50a000 r-xp 00000000 fc:0d 408434 /usr/lib/x86_64-linux-gnu/libXau.so.6.0.0
7f708f50a000-7f708f709000 ---p 00002000 fc:0d 408434 /usr/lib/x86_64-linux-gnu/libXau.so.6.0.0
7f708f709000-7f708f70a000 r--p 00001000 fc:0d 408434 /usr/lib/x86_64-linux-gnu/libXau.so.6.0.0
7f708f70a000-7f708f70b000 rw-p 00002000 fc:0d 408434 /usr/lib/x86_64-linux-gnu/libXau.so.6.0.0
7f708f70b000-7f708f70d000 r-xp 00000000 fc:0d 12464 /lib/x86_64-linux-gnu/libdl-2.15.so
7f708f70d000-7f708f90d000 ---p 00002000 fc:0d 12464 /lib/x86_64-linux-gnu/libdl-2.15.so
7f708f90d000-7f708f90e000 r--p 00002000 fc:0d 12464 /lib/x86_64-linux-gnu/libdl-2.15.so
7f708f90e000-7f708f90f000 rw-p 00003000 fc:0d 12464 /lib/x86_64-linux-gnu/libdl-2.15.so
7f708f90f000-7f708f92c000 r-xp 00000000 fc:0d 404458 /usr/lib/x86_64-linux-gnu/libxcb.so.1.1.0
7f708f92c000-7f708fb2b000 ---p 0001d000 fc:0d 404458 /usr/lib/x86_64-linux-gnu/lib...

Read more...

Revision history for this message
TJ (tj) wrote :
Download full text (6.5 KiB)

Further analysis shows the nvidia driver failing to allocate the requested memory mappings.

Enabling nvidia kernel module's debug output

$ cat /etc/modprobe.d/nvidia-current-updates.conf
options nvidia-current-updates NVreg_ResmanDebugLevel=0

and integrating it into the smartdimmer output:

$ sudo ./smartdimmer -g
[ 2212.993981] NVRM: more events: 0
[ 2213.014574] NVRM: nv_kern_open...
[ 2213.014583] NVRM: nv_kern_open on device 0
[ 2213.014624] NVRM: VM: nv_kern_mmap:246: 0x7fb74a1ad000 - 0x7fb74a1ae000, 0x00001000 bytes @ 0x00000000d5000000, 0x (null), 0xffff880050b95b00
OK: map_mem(): PMC register 0xd5000000 requested 0x1000 and received 0x1000 after 0 attempts at 0x7fb74a1ad000
[ 2213.014693] NVRM: VM: nv_kern_mmap:246: 0x7fb74a1ac000 - 0x7fb74a1ad000, 0x00001000 bytes @ 0x00000000d5100000, 0x (null), 0xffff880050b95b00
OK: map_mem(): PFB register 0xd5100000 requested 0x1000 and received 0x1000 after 0 attempts at 0x7fb74a1ac000
[ 2213.014734] NVRM: VM: nv_kern_mmap:246: 0x7fb74a1ab000 - 0x7fb74a1ac000, 0x00001000 bytes @ 0x00000000d5101000, 0x (null), 0xffff880050b95b00
OK: map_mem(): PEXTDEV register 0xd5101000 requested 0x1000 and received 0x1000 after 0 attempts at 0x7fb74a1ab000
[ 2213.014775] NVRM: VM: nv_kern_mmap:246: 0x7fb74a19b000 - 0x7fb74a1ab000, 0x00010000 bytes @ 0x00000000d5300000, 0x (null), 0xffff880050b95b00
OK: map_mem(): PROM register 0xd5300000 requested 0xffff and received 0xffff after 0 attempts at 0x7fb74a19b000
[ 2213.014842] NVRM: VM: nv_kern_mmap:246: 0x7fb74a199000 - 0x7fb74a19b000, 0x00002000 bytes @ 0x00000000d5601000, 0x (null), 0xffff880050b95b00
OK: map_mem(): PCIO register 0xd5601000 requested 0x2000 and received 0x2000 after 0 attempts at 0x7fb74a199000
[ 2213.014893] NVRM: VM: nv_kern_mmap:246: 0x7fb74a189000 - 0x7fb74a199000, 0x00010000 bytes @ 0x00000000d5610000, 0x (null), 0xffff880050b95b00
map_dev_mem(): Invalid argument
[ 2213.014954] NVRM: VM: nv_kern_mmap:246: 0x7fb74a174000 - 0x7fb74a183000, 0x0000f000 bytes @ 0x00000000d5610000, 0x (null), 0xffff880050b95b00
map_dev_mem(): Invalid argument
[ 2213.014993] NVRM: VM: nv_kern_mmap:246: 0x7fb74a166000 - 0x7fb74a174000, 0x0000e000 bytes @ 0x00000000d5610000, 0x (null), 0xffff880050b95b00
map_dev_mem(): Invalid argument
[ 2213.015039] NVRM: VM: nv_kern_mmap:246: 0x7fb74a159000 - 0x7fb74a166000, 0x0000d000 bytes @ 0x00000000d5610000, 0x (null), 0xffff880050b95b00
map_dev_mem(): Invalid argument
[ 2213.015074] NVRM: VM: nv_kern_mmap:246: 0x7fb74a14d000 - 0x7fb74a159000, 0x0000c000 bytes @ 0x00000000d5610000, 0x (null), 0xffff880050b95b00
map_dev_mem(): Invalid argument
[ 2213.015112] NVRM: VM: nv_kern_mmap:246: 0x7fb74a142000 - 0x7fb74a14d000, 0x0000b000 bytes @ 0x00000000d5610000, 0x (null), 0xffff880050b95b00
map_dev_mem(): Invalid argument
[ 2213.015157] NVRM: VM: nv_kern_mmap:246: 0x7fb74a138000 - 0x7fb74a142000, 0x0000a000 bytes @ 0x00000000d5610000, 0x (null), 0xffff880050b95b00
map_dev_mem(): Invalid argument
[ 2213.015201] NVRM: VM: nv_kern_mmap:246: 0x7fb7...

Read more...

Revision history for this message
TJ (tj) wrote :
Download full text (7.5 KiB)

After adding debug logging to the nvidia kernel module's source-code - at every point in nv-map.c::nv_kern_mmap() where it sets an error in the 'status' variable - I have been able to pin-point where the allocation is being refused.

It is this code:

    vma->vm_ops = &nv_vm_ops;

    if (IS_REG_OFFSET(nv, NV_VMA_OFFSET(vma), NV_VMA_SIZE(vma)))
    {
        if (IS_BLACKLISTED_REG_OFFSET(nv, NV_VMA_OFFSET(vma), NV_VMA_SIZE(vma)))
        {
            nv_printf(NV_DBG_ERRORS,
                "NVRM: requested mapping blacklisted\n");
            status = -EINVAL;
            goto done;
        }

The integrated log output with smartdimmer that shows this:

$ sudo ./smartdimmer -g
[ 674.194158] NVRM: nv_kern_open...
[ 674.194171] NVRM: nv_kern_open on device 0
[ 674.194233] NVRM: VM: nv_kern_mmap:246: 0x7f4e90ad0000 - 0x7f4e90ad1000, 0x00001000 bytes @ 0x00000000d5000000, 0x (null), 0xffff88005357da00
OK: map_mem(): PMC register 0xd5000000 requested 0x1000 and received 0x1000 after 0 attempts at 0x7f4e90ad0000
[ 674.194347] NVRM: VM: nv_kern_mmap:246: 0x7f4e90acf000 - 0x7f4e90ad0000, 0x00001000 bytes @ 0x00000000d5100000, 0x (null), 0xffff88005357da00
OK: map_mem(): PFB register 0xd5100000 requested 0x1000 and received 0x1000 after 0 attempts at 0x7f4e90acf000
[ 674.194418] NVRM: VM: nv_kern_mmap:246: 0x7f4e90ace000 - 0x7f4e90acf000, 0x00001000 bytes @ 0x00000000d5101000, 0x (null), 0xffff88005357da00
OK: map_mem(): PEXTDEV register 0xd5101000 requested 0x1000 and received 0x1000 after 0 attempts at 0x7f4e90ace000
[ 674.194522] NVRM: VM: nv_kern_mmap:246: 0x7f4e90abe000 - 0x7f4e90ace000, 0x00010000 bytes @ 0x00000000d5300000, 0x (null), 0xffff88005357da00
OK: map_mem(): PROM register 0xd5300000 requested 0xffff and received 0xffff after 0 attempts at 0x7f4e90abe000
[ 674.194606] NVRM: VM: nv_kern_mmap:246: 0x7f4e90abc000 - 0x7f4e90abe000, 0x00002000 bytes @ 0x00000000d5601000, 0x (null), 0xffff88005357da00
OK: map_mem(): PCIO register 0xd5601000 requested 0x2000 and received 0x2000 after 0 attempts at 0x7f4e90abc000
[ 674.194692] NVRM: VM: nv_kern_mmap:246: 0x7f4e90aac000 - 0x7f4e90abc000, 0x00010000 bytes @ 0x00000000d5610000, 0x (null), 0xffff88005357da00
[ 674.194717] NVRM: requested mapping blacklisted
map_dev_mem(): Invalid argument
[ 674.194795] NVRM: VM: nv_kern_mmap:246: 0x7f4e90a96000 - 0x7f4e90aa5000, 0x0000f000 bytes @ 0x00000000d5610000, 0x (null), 0xffff88005357da00
[ 674.194823] NVRM: requested mapping blacklisted
map_dev_mem(): Invalid argument
[ 674.194861] NVRM: VM: nv_kern_mmap:246: 0x7f4e90a88000 - 0x7f4e90a96000, 0x0000e000 bytes @ 0x00000000d5610000, 0x (null), 0xffff88005357da00
[ 674.194886] NVRM: requested mapping blacklisted
map_dev_mem(): Invalid argument
[ 674.194935] NVRM: VM: nv_kern_mmap:246: 0x7f4e90a7b000 - 0x7f4e90a88000, 0x0000d000 bytes @ 0x00000000d5610000, 0x (null), 0xffff88005357da00
[ 674.194960] NVRM: requested mapping blacklisted
map_dev_mem(): Invalid argument
[ 674.195005] NVRM: VM: nv_kern_mmap:246: 0x7f4e90a6f000 - 0x7f4e90a...

Read more...

Revision history for this message
TJ (tj) wrote :

The blacklisted entry was introduced by the latest security patch to the nvidia drivers.

nvidia-graphics-drivers-updates (295.49-0ubuntu0.2) precise-security; urgency=low

  * SECURITY UPDATE: privilege escalation via kernel memory access
    - debian/dkms/patches/blacklist-vga-pmu-registers.patch: blacklist
      more offsets in nv.{c,h}.
    - debian/dkms.conf{.in}: added new patch.
    - CVE number pending
 -- Marc Deslauriers <email address hidden> Sun, 05 Aug 2012 09:49:25 -0400

The code:

diff -ur kernel/nv.h kernel/nv.h
--- kernel/nv.h 2012-08-02 18:19:37.000000000 -0700
+++ kernel/nv.h 2012-08-02 18:19:37.000000000 -0700
@@ -448,7 +448,20 @@

 #define IS_BLACKLISTED_REG_OFFSET(nv, offset, length) \
              ((IS_REG_RANGE_WITHIN_MAPPING(nv, 0x1000, 0x1000, offset, length)) ||\
- (IS_REG_RANGE_WITHIN_MAPPING(nv, 0x700000, 0x100000, offset, length)))
+ (IS_REG_RANGE_WITHIN_MAPPING(nv, 0x84000, 0x1000, offset, length)) ||\
+ (IS_REG_RANGE_WITHIN_MAPPING(nv, 0x85000, 0x1000, offset, length)) ||\
+ (IS_REG_RANGE_WITHIN_MAPPING(nv, 0x86000, 0x1000, offset, length)) ||\
+ (IS_REG_RANGE_WITHIN_MAPPING(nv, 0x87000, 0x1000, offset, length)) ||\
+ (IS_REG_RANGE_WITHIN_MAPPING(nv, 0x89000, 0x1000, offset, length)) ||\
+ (IS_REG_RANGE_WITHIN_MAPPING(nv, 0xa0000, 0x20000, offset, length)) ||\
+ (IS_REG_RANGE_WITHIN_MAPPING(nv, 0x104000, 0x1000, offset, length)) ||\
+ (IS_REG_RANGE_WITHIN_MAPPING(nv, 0x105000, 0x1000, offset, length)) ||\
+ (IS_REG_RANGE_WITHIN_MAPPING(nv, 0x10a000, 0x1000, offset, length)) ||\
+ (IS_REG_RANGE_WITHIN_MAPPING(nv, 0x1c2000, 0x1000, offset, length)) ||\
+ (IS_REG_RANGE_WITHIN_MAPPING(nv, 0x1c3000, 0x1000, offset, length)) ||\
+ (IS_REG_RANGE_WITHIN_MAPPING(nv, 0x618000, 0x2000, offset, length)) ||\
+ (IS_REG_RANGE_WITHIN_MAPPING(nv, 0x627000, 0x1000, offset, length)) ||\
+ (IS_REG_RANGE_WITHIN_MAPPING(nv, 0x700000, 0x100000, offset, length)))

 /* duplicated from nvos.h for external builds */
 #ifndef NVOS_AGP_CONFIG_DISABLE_AGP

Revision history for this message
TJ (tj) wrote :

Two tests have confirmed that the nvidia driver DKMS package security patches are the cause of the breakage to nvclock/smartdimmer. The upshot is that maybe 'nvclock' should be removed from the archives since the security patches
appear sane and are important to protect most users.

After reversing the recent security update patch 'smartdimmer' was able to request all but one memory region - PRAMIN.

After also removing from the base nvidia package the blacklist for IS_REG_RANGE_WITHIN_MAPPING(nv, 0x700000, 0x100000, offset, length) the program was able to map all memory successfully.

Unfortunately another SIGSEGV occurs whilst calling getpagesize() in unmap_dev_mem() for PCIO (140737354022912 is 0x7ffff7fe4000). I'm not clear how or why getpagesize() should SIGSEGV.

...
(gdb) c
Continuing.
OK: map_mem(): PCIO register 0xd5601000 requested 0x2000 and received 0x2000 after 0 attempts at 0x7ffff7fe4000

Breakpoint 7, map_dev_mem (dev=0x61f0d0, map=0x7fffffffe250) at back_linux.c:266
(gdb) c

...

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff75c35fb in __GI___getpagesize () at ../sysdeps/unix/sysv/linux/getpagesize.c:31
31 ../sysdeps/unix/sysv/linux/getpagesize.c: No such file or directory.
(gdb) bt full
#0 0x00007ffff75c35fb in __GI___getpagesize () at ../sysdeps/unix/sysv/linux/getpagesize.c:31
        __PRETTY_FUNCTION__ = "__getpagesize"
#1 0x000000000040c27a in unmap_dev_mem (Base=140737354022912, Size=8192) at back_linux.c:292
        alignOff = 0
#2 0x0000000000401f43 in unmap_mem () at backend.c:150
No locals.
#3 0x00000000004022dd in unset_card () at backend.c:252
No locals.
#4 0x000000000040dd83 in create_config (file=0x61f030 "/home/tj/.nvclock/config") at config.c:611
        section = "hw0"
        bios = "/home/tj/.nvclock/bios0.rom\000\377\177\000\000\020\360a\000\000\000\000\000\a\000\000\000\000\000\000\000\200\343\377\377\377\177\000\000\020\360a\000\000\000\000\000\001\000\000\000\000\000\000\000A\360a\000\000\000\000"
        base_freq = 27000
        i = 0
#5 0x000000000040d013 in open_config () at config.c:309
        file = 0x61f030 "/home/tj/.nvclock/config"
        home = 0x7fffffffef22 "/home/tj"
        stat_buf = {st_dev = 34, st_ino = 1417714, st_nlink = 2, st_mode = 16877, st_uid = 1000, st_gid = 1000, __pad0 = 0,
          st_rdev = 0, st_size = 4096, st_blksize = 4096, st_blocks = 8, st_atim = {tv_sec = 1345644438, tv_nsec = 0}, st_mtim = {
            tv_sec = 1345749948, tv_nsec = 0}, st_ctim = {tv_sec = 1345749948, tv_nsec = 0}, __unused = {0, 0, 0}}
#6 0x000000000040ba6c in init_nvclock () at back_linux.c:108
        nv_driver = 2
#7 0x00000000004016ff in sd_init () at smartdimmer.c:74
No locals.
#8 0x00000000004018a8 in main (argc=2, argv=0x7fffffffe5d8) at smartdimmer.c:132
        optind = 0
        options = 2
        setvalue = 0
        c = -1

Changed in nvclock (Ubuntu):
status: In Progress → Triaged
Revision history for this message
TJ (tj) wrote :

Discussion with Adam Conrad on IRC #ubuntu-devel:

<TJ-> A recent change to the nvidia DKMS drivers package has broken another package such that it won't conceivably ever work again. What's the procedure in such cases?
<infinity> TJ-: That might need to be slightly less vague.
<TJ-> infinity: bug #1039916 ... look at the last comment
<ubottu> Launchpad bug 1039916 in nvclock (Ubuntu) "Nvidia driver causing SIGSEGV in nvclock and smartdimmer" [Medium,Triaged] https://launchpad.net/bugs/1039916
<infinity> TJ-: But if it's "some third-party tool relied on a feature that the upstream binary driver no longer provides", the solution would be to (A) remove the third-party tool from the archive, and (B) make the nvidia packages conflict with it.
<infinity> Oh, in precise, we wouldn't do the removal bit.
<infinity> But certainly, having the driver conflict with the tool, if there's really no way to fix the tool, would work.
<TJ-> OK... it'll affect Q too of course
<infinity> TJ-: You might also want to talk to Marc, since it was his patch that broke things (now that I've read the whole bug report).
<infinity> TJ-: But if it's a tradeoff between dimmer control and priviledge escalation, with no way to compromise on fixing both, I suspect security wins.
<TJ-> infinity: Yes, I was planning on emailing him. I think the patch is a good one - nvclock is using not-quite-approved hacks to do its thing, so I think nvclock has to give
<TJ-> infinity: There is an alternative project developing for an nvida backlight DKMS kernel module that - when/if it works - does things properly with a /sys/class/backlight/ interface... so I think that should be the preferred route
<infinity> TJ-: That does seem to be the saner way forward, yeah.
<TJ-> infinity: besides which I think nvclock has been abandoned by its creator since 2009/10
<TJ-> infinity: thanks; I'll figure out who to contact and what to do to let users of nvclock aware and remove it for Q
<infinity> TJ-: I can remove it from the archive for Q right now, if you think that's the Right Thing To Do, but it'll still need the nvidia-* packaging updated to conflict and force removal on upgrades.
<TJ-> infinity: Let me check first... if nvidia proprietary isn't used by nouveau or the old nv is, nvclock may still work
<TJ-> infinity: And all I really wanted was the backlight dimming *sigh* :p

Revision history for this message
TJ (tj) wrote :

A ray of hope for the alternative. After fixing a bug in the nvidiabl kernel module[1] I can use the standard OS brightness controls to control the backlight. More work needs to be done on that, and it needs to be brought into Ubuntu, but it is the proper solution and it does work for many devices.

[1] https://github.com/guillaumezin/nvidiabl

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nvidia-graphics-drivers-updates (Ubuntu):
status: New → Confirmed
TJ (tj)
Changed in nvclock (Ubuntu):
assignee: TJ (tj) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.