package blcr-dkms 0.8.2-9 failed to install/upgrade: blcr kernel module failed to build against linux-rt kernel

Bug #534175 reported by pablomme
24
This bug affects 5 people
Affects Status Importance Assigned to Milestone
blcr (Ubuntu)
Fix Released
Undecided
Alan

Bug Description

blcr does not build against the linux-rt kernel (which is at 2.6.31-10 as of up-to-date lucid installed from alpha 3), complaining that symbol __put_task_struct has not been defined.

If I am not mistaken, this causes dkms to refuse to automatically build other modules - please correct me if I'm wrong. I've had to invoke dkms manually to compile the nvidia binary driver support module, and I assume that the cause of this problem was the failure to compile blcr in the first place.

I suppose this may be hard to fix if blcr relies on features specific to 2.6.32, given that the only alternative to make compilation work would be to forward-port the rt patch to 2.6.32. In the past this has been problematic -- Intrepid and Jaunty had broken linux-rt packages which wouldn't work on more than one CPU core, besides other issues.

Would it be possible to blacklist blcr from being compiled against the -rt kernel? Does anything in the standard desktop install of lucid depend on having a working blcr?

ProblemType: Package
Architecture: amd64
Date: Mon Mar 8 04:25:41 2010
DistroRelease: Ubuntu 10.04
ErrorMessage: blcr kernel module failed to build
InstallationMedia: Ubuntu 10.04 "Lucid Lynx" - Alpha amd64 (20100224.1)
Package: blcr-dkms 0.8.2-9
PackageArchitecture: all
PackageVersion: 0.8.2-9
ProcVersionSignature: Ubuntu 2.6.31-10.153-rt
SourcePackage: blcr
Title: package blcr-dkms 0.8.2-9 failed to install/upgrade: blcr kernel module failed to build
Uname: Linux 2.6.31-10-rt x86_64

Related branches

Revision history for this message
pablomme (pablomme) wrote :
Revision history for this message
Alan (awoodland) wrote :

BLCR can in theory build against almost any standard 2.6.X kernel. I have no idea what -rt changes internally though. Do you have a config.log from the failed configure attempt that you could attach to this report?

As for nvidia modules not getting built because of this I'd say that was a bug with dkms itself. There's no inherent reason why failure of one module should prevent another one from getting built. The boot time script that causes these to get built is pretty simple from what I remember.

Alan

Revision history for this message
pablomme (pablomme) wrote :
Revision history for this message
pablomme (pablomme) wrote :

> BLCR can in theory build against almost any standard 2.6.X kernel. I have
> no idea what -rt changes internally though.

It changes quite a few things across the kernel, as far as I know. I've seen symbols being renamed between -generic and -rt before (e.g. there was a problem with building the nvidia module during the karmic alphas because of this, see bug #413296). It is possible that it's not a 2.6.32-specific feature that's missing, but a rename by the rt patch.

> As for nvidia modules not getting built because of this I'd say that was
> a bug with dkms itself. There's no inherent reason why failure of one
> module should prevent another one from getting built.

Ok, I'll assume this is a different problem then.

Revision history for this message
Alan (awoodland) wrote :

From a quick look just now it looks like it should be possible to add support for this branch. I don't have enough time to develop and test this right now though, but I'll forward the report upstream.

Alan

Revision history for this message
Paul H. Hargrove (phhargrove) wrote :

Hello from "upstream".

The DKMS log shows
    checking kernel symbol table for __put_task_struct... configure: error: Found symbol __put_task_struct but no declaration -- please file a bug report.

This says the symbol WAS found. So the problem is not a rename, but a missing or relocated prototype. This should be easier to resolve than if the symbol had been removed. However, since configure stops at the first error, I cannot be sure there are no additional, possibly harder to fix, errors lurking after this one.

So that I may take a look at this directly (rather posting a potentially long iteration of patches), could somebody please clue me in on what I need to do to get a copy of the relevant linux-rt kernel sources on my Karmic/x86_64 machine.

Thanks,
-Paul (BLCR

Revision history for this message
pablomme (pablomme) wrote :

On lucid I can do
  sudo apt-get install dpkg-dev
  apt-get source linux-image-2.6.31-10-rt
On karmic the version is 2.6.31-9. Should be nearly the same kernel, I suppose.

However if you prefer wgetting the lucid tarballs, I believe the correct files to fetch are:
  http://archive.ubuntu.com/ubuntu/pool/universe/l/linux-rt/linux-rt_2.6.31.orig.tar.gz
  http://archive.ubuntu.com/ubuntu/pool/universe/l/linux-rt/linux-rt_2.6.31-10.153.diff.gz

I'm sure there must be a git branch in some repository somewhere containing these sources, but I don't know what that is.

Revision history for this message
Paul H. Hargrove (phhargrove) wrote :

For the record: I can reproduce w/ the 2.6.31-9-rt kernel in karmic, and I am pursuing the problem there.
So, one may ignore my request (in the previous comment) for help getting the Lucid kernel sources.

-Paul

Revision history for this message
Alan (awoodland) wrote :

apt-get source linux-image-2.6.28-3-rt

(Oddly this package is at version 2.6.31-9.152)

linux-rt appears to be a meta-package which depends upon the latest linux-image-X-rt. I've never looked at the -rt packages until today.

Revision history for this message
Paul H. Hargrove (phhargrove) wrote :

Using the karmic -rt kernel headers I can confirm that this is the only symbol causing configure-time problems. The one-line patch below is sufficient to resolve that issue.

However, there are also non-trivial changes in linux/semaphore.h that break things at compile time and will take some time to sort out.

diff -u -r1.410.2.13 configure.ac
--- configure.ac 19 Dec 2009 00:55:46 -0000 1.410.2.13
+++ configure.ac 8 Mar 2010 18:31:08 -0000
@@ -986,7 +986,7 @@
 fi

 # put_task_struct() requires one of these:
-CR_FIND_KSYM([__put_task_struct],[CODE])
+CR_FIND_KSYM([__put_task_struct],[CODE],[extern void __put_task_struct(struct task_struct *);])
 CR_FIND_KSYM([__put_task_struct_cb],[CODE])

 CR_CHECK_KERNEL_MEMBER([mm.task_size],[#include <linux/sched.h>],

Revision history for this message
Paul H. Hargrove (phhargrove) wrote :

OK, that was not as bad as I had feared.

This attachment makes two minor changes to BLCR that are sufficient for me to manually (e.g. not using DKMS) configure and build for the linux-2.6.31-9-rt kernel for Karmic/x86_64. These changes are such that I think it highly unlikely to break any other builds, though I have tried only compiling for the 2.6.31-9-rt and 2.6.31-16-generic kernels (in Karmic/x86_64).

Note that I have only RUN the resulting kernel modules with 2.6.31-16-generic.
It would be helpful if the original reporter could apply this patch and verify that BLCR runs correctly with the -rt kernel. Since the -rt kernel is significantly different from -generic, and neither Alan nor I have looked at the -rt kernels before, there does exist the possibility that BLCR could compile fine but could fail at runtime (perhaps as badly as a kernel Oops or lock-up, so be careful).

I would encourage Alan to wait for confirmation that this patch results in non-crashing BLCR kernel modules before pushing this patch to debian-unstable. It is my opinion that failing to configure/build is preferable to building a kernel module that crashes.

-Paul (the primary BLCR developer)

Revision history for this message
Alan (awoodland) wrote :

I will definitely wait for an ack on the functionality of the resulting patch. Seems odd that it would end up missing those macros.

If it works manually it should work with DKMS, all the bits do (in a slightly odd way) is call configure for the appropriate kernel and make, dressed up like a non-autoconf'd module.

I can only upload to Debian too, so I don't know right now what the chances of this getting into the upcoming release now it's been frozen would be. I will make inquiries once there's a positive test report from a system (or two) running -rt.

Alan

tags: added: patch
Alan (awoodland)
Changed in blcr (Ubuntu):
status: New → In Progress
assignee: nobody → Alan Woodland (awoodland)
Revision history for this message
Paul H. Hargrove (phhargrove) wrote :

Alan said "Seems odd that it would end up missing those macros."

For the curious:

At about 2.6.26 the type "struct mutex" was replaced by "struct semaphore", and the various mutex-related functions and macros were implemented in terms of wrappers around the semaphore code. These wrappers still exist today in the vanilla 2.6.33 kernel.

However, it looks as if the -rt kernel is in the midst of a translation from "struct semaphore" to "struct anon_semaphore". It has implemented the semaphore functions and macros in terms of the anon_semaphore versions. It appears that the support for the mutex wrappers have been removed in that process.

Since BLCR aims to support kernels including those older than 2.6.26, we rely on the mutex wrappers rather than shifting our code to use the semaphore calls. The patch to cr_module/cr_kcompat.h simply reintroduces the 2 missing wrappers that we use.

Revision history for this message
pablomme (pablomme) wrote :

Ok, patch tested. The module compiles and loads correctly. It doesn't load automatically on startup (is it supposed to?) but it loads with modprobe afterwards, along with a "blcr_imports" module it pulls in.

I applied the patch by regenerating the diff so that dpkg-buildpackage would accept it and adding it under debian/patches, with the corresponding change to debian/patches/series, then generating the .debs and installing them. I'm attaching this version of the patch in case it's useful.

Revision history for this message
Paul H. Hargrove (phhargrove) wrote :

pablomme,

I am unsure from your comment if you actually "used" the blcr kernel modules, or just loaded them?
If you only loaded them, then I suggest testing them via "make insmod check" from the blcr build directory. You should be cautious in case the kernel crashes (unlikely, but not impossible).

-Paul

Revision history for this message
pablomme (pablomme) wrote :

I meant I only loaded them.

Trying to run "make insmod check" under /var/lib/dkms/blcr/0.8.2/build produces

  /usr/bin/make --no-print-directory -C include
  make[1]: *** No targets specified and no makefile found. Stop.
  make: *** [modules] Error 2

so whatever that was supposed to do does not work in the dkms environment for some reason. I tried to reconstruct the command for the 'insmod' target, and got to this:

  # cat /boot/System.map-2.6.31-10-rt 2>/dev/null | env NM='/usr/bin/nm -B' /usr/bin/perl -- ./contrib/cr_depmod /var/lib/dkms/blcr/0.8.2/build/blcr_imports/kbuild/blcr_imports.ko /var/lib/dkms/blcr/0.8.2/build/cr_module/kbuild/blcr.ko >.depmod.err 2>&1 && echo yes || echo no
  yes

So that succeeds. However the 'check' target links to 'check-recursive', which leads me nowhere.

Is there any other way to do this, or shall I just compile blcr by hand (without dkms) and run the check there?

Revision history for this message
Paul H. Hargrove (phhargrove) wrote :

pablomme,
  Sorry to cause some confusion. I had not considered that the debian/ubuntu packaging for BLCR splits the kernel modules off into a DKMS package. Because of that split, the build directory you are using contains no test codes. So, it would be great if you could build from the debian blcr source package and then "make check" after using modprobe to load the blcr module you build via DKMS.

Alan,
  In the future, would you consider packaging the blcr-testsuite as is done in the RPM packaging? It greatly simplifies situations like the present one.

-Paul

Revision history for this message
pablomme (pablomme) wrote :

Sorry, I should have realized there was stuff missing in the dkms build directory. I've run the tests now, and the module passes them correctly:

PASS: atomics
SKIP: bug2524
PASS: cr_run
PASS: cr_targ
PASS: cr_targ2
PASS: cr_omit
PASS: dlopen
PASS: bug2003
PASS: run_on
PASS: save_exe
PASS: save_priv
PASS: save_share
PASS: save_all
PASS: reloc_exe
PASS: reloc_file
PASS: reloc_fifo
PASS: reloc_dir
PASS: reloc_all
PASS: clobber
PASS: stage0001.st
PASS: stage0002.st
PASS: stage0003.st
PASS: stage0004.st
PASS: critical_sections.st
PASS: replace_cb.st
PASS: failed_cb.st
PASS: failed_cb2.st
PASS: pid_in_use.st
cs_enter_leave: 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
PASS: cs_enter_leave.st
cs_enter_leave2: 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
PASS: cs_enter_leave2.st
cr_tryenter_cs: 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
PASS: cr_tryenter_cs.st
PASS: stopped.st
PASS: edeadlk.st
PASS: pid_restore.st
PASS: simple.ct
PASS: simple_pthread.ct
PASS: cwd.ct
PASS: dup.ct
PASS: filedescriptors.ct
PASS: pipe.ct
PASS: named_fifo.ct
PASS: cloexec.ct
PASS: get_info.ct
PASS: orphan.ct
PASS: overlap.ct
PASS: child.ct
PASS: mmaps.ct
No hugetlbfs mount point found (test skipped)
SKIP: hugetlbfs.ct
PASS: readdir.ct
PASS: dev_null.ct
PASS: cr_signal.ct
PASS: linked_fifo.ct
PASS: sigpending.ct
PASS: dpipe.ct
PASS: forward.ct
PASS: hooks.ct
PASS: math.ct
PASS: sigaltstack.ct
PASS: prctl.ct
PASS: lam.ct
======================
All 58 tests passed
(2 tests were not run)
======================

Revision history for this message
Paul H. Hargrove (phhargrove) wrote :

pablomme,
  Thanks for the good news!

Now, somebody needs to answer Alan's concern (comment #12) of how we might get this into the Lucid release: Alan is the Debian maintainer for the blcr package, but if he commits changes there they may not make it into Lucid this late, right?

-Paul

Revision history for this message
pablomme (pablomme) wrote :

> Alan is the Debian maintainer for the blcr package, but if he commits
> changes there they may not make it into Lucid this late, right?

I think he referred to the Debian freeze for 6.0, not the Ubuntu freeze for 10.04, no?

Revision history for this message
Alan (awoodland) wrote :

It was the Ubuntu freeze I was unsure about. I'm preparing -10 now with the patch for -lp kernels. I've checked up and there shouldn't be a problem for getting the fix pulled across from Debian. Once that hits testing and lucid I'll do another revision which adds an extra package with the testsuite.

Alan

Revision history for this message
Alan (awoodland) wrote :

Of course when I said -lp I really meant -rt...

New version has just been uploaded to Debian/Unstable. Will request getting it pulled into Lucid later today.

Alan

Alan (awoodland)
Changed in blcr (Ubuntu):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package blcr - 0.8.2-10

---------------
blcr (0.8.2-10) unstable; urgency=low

  * Add patch from Paul Hargrove that fixes builds on linux-rt
    - Fixes build dkms build failure with linux-rt kernels.
    - Required autoreconf run
      LP: #534175
  * Bump to standards version 3.8.4, no changes needed
  * Add misc depends for dkms package
 -- Alan Woodland <email address hidden> Fri, 26 Mar 2010 16:45:08 +0000

Changed in blcr (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.