lockfile-create hangs inside lxc containers (potential buffer overflow?)

Bug #941968 reported by James Page on 2012-02-27
108
This bug affects 25 people
Affects Status Importance Assigned to Milestone
liblockfile (Ubuntu)
High
Unassigned
Precise
Undecided
Unassigned
Quantal
Undecided
Unassigned
Raring
Undecided
Unassigned

Bug Description

I've hit this problem while testing juju charms that use ntp (specifically hbase - bug 800708).

The first instance in the first LXC container start OK; however subsequent instances in other LXC containers fail as ntp is installed:

root 1157 416 0 14:48 ? 00:00:00 /usr/bin/dpkg --status-fd 49 --configure resolvconf:all openjdk-6-jre-headless:amd
root 1313 1 0 14:48 ? 00:00:00 /usr/sbin/libvirtd -d
root 1398 1157 0 14:48 ? 00:00:00 /bin/sh /var/lib/dpkg/info/ntp.postinst configure
root 1437 1398 0 14:48 ? 00:00:00 /bin/sh /usr/sbin/invoke-rc.d ntp start
root 1453 1437 0 14:48 ? 00:00:00 /bin/sh /etc/init.d/ntp start
root 1458 1453 0 14:48 ? 00:00:00 lockfile-create /var/lock/ntpdate

Running lockfile-create by hand after killing the hanging lockfile-create:

ubuntu@jamespage-hendrix-hbase-regioncluster-2:~$ lockfile-create /var/lock/ntpdate
*** glibc detected *** lockfile-create: malloc(): memory corruption (fast): 0x000000000105b0e0 ***

[Test Case]
Set a hostname of 64 characters (HOST_NAME_MAX is 64) and create a lock file:

$ lock=/var/lock/lockfile-create-test
$ lockfile-remove $lock
$ sudo hostname hostna01234567890123456789012345678901234567890123456789
$ lockfile-create $lock

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: lxc 0.7.5-3ubuntu30
ProcVersionSignature: Ubuntu 3.2.0-17.26-generic 3.2.6
Uname: Linux 3.2.0-17-generic x86_64
NonfreeKernelModules: fglrx
ApportVersion: 1.93-0ubuntu2
Architecture: amd64
Date: Mon Feb 27 14:49:30 2012
SourcePackage: lxc
UpgradeStatus: No upgrade log present (probably fresh install)

[Regression Potential]
Minimum. We've applied a patch to the same version of liblockfile in 13.04 and that has since been merged to debian with no reports of regressions.

Related branches

James Page (james-page) wrote :
Changed in lxc (Ubuntu):
importance: Undecided → High
Serge Hallyn (serge-hallyn) wrote :

Thanks for submitting this bug, James.

I tried to reproduce it by simply installing ntp in two containers, but failed.

However, my hunch is that it is due to bug 925024.

The fix for that bug had been queued in the precise kernel source, but is not yet (I think) in the archive.

Could you disable the apparmor profile using

    sudo apparmor_parser -R /etc/apparmor.d/usr.bin.lxc-start

and see if you can still reproduce this?

Changed in lxc (Ubuntu):
status: New → Incomplete
tags: added: rls-mgr-p-tracking
Joe Breu (breu) wrote :

I played around with this a bit and derived that it is due to the length of the hostname (at least in my case). With a hostname of length 40 I can reproduce this error. Setting the hostname to <= 37 produces the correct results.

Joe Breu (breu) wrote :

I also do not believe that this bug is a duplicate and apparmor is only the culprit in that it is finding a potential overflow somewhere in the ntp code

David Britton (davidpbritton) wrote :

If I turn apparmour off, I can no longer effectively install through juju if the package includes NTP. I'll work to reproduce with a minimal environment.

David Britton (davidpbritton) wrote :

I'm the same has @breu. the juju charm I was using was making the hostname of my system 40 characters. Reducing the charm name length solved the problem. I think this bug should be re-opened and fixed in NTP or in LXC or wherever the problem lies.

Serge Hallyn (serge-hallyn) wrote :

I've unmarked this as a duplicate and marked it confirmed as there are two people reporting this as an overflow in ntp.

I will mark this bug as affecting NTP and temporarily assign it to jjohansen, with apologies, for advice.

Changed in lxc (Ubuntu):
status: Incomplete → Confirmed
summary: - lockfile-create hangs inside lxc containers
+ lockfile-create hangs inside lxc containers (potential buffer overflow?)
Serge Hallyn (serge-hallyn) wrote :

I decided not to assign to jjohansen, only subscribe him :)

If noone else gets to the bottom of this before I clear some things off my plate, I will see if I can get to the bottom of it. I would assign myself to it, but dont' want to stop someone else from jumping in if they have time.

Serge Hallyn (serge-hallyn) wrote :

Brue and davidpbritton,

could you please attach the charm you were using?

Serge Hallyn (serge-hallyn) wrote :

Simply running lockfile-create, killing it quickly, and re-running it in a container doesn't reproduce it for me. Can you please tell me which juju charm to use to reproduce this? Can you still reproduce this today?

Changed in lxc (Ubuntu):
status: Confirmed → Incomplete
Changed in ntp (Ubuntu):
status: New → Incomplete
Launchpad Janitor (janitor) wrote :

[Expired for lxc (Ubuntu) because there has been no activity for 60 days.]

Changed in lxc (Ubuntu):
status: Incomplete → Expired
Launchpad Janitor (janitor) wrote :

[Expired for ntp (Ubuntu) because there has been no activity for 60 days.]

Changed in ntp (Ubuntu):
status: Incomplete → Expired

I accidentally reproduced this, using:
username project64198
environment democluster
service hbase-master

project64198-democluster-hbase-master-2 has 40 characters. After I destroyed the environment and recreated under the name demo, everything was running fine.

Changed in lxc (Ubuntu):
status: Expired → Confirmed
Serge Hallyn (serge-hallyn) wrote :

Sebastian,

I'm sorry, could you please give us more specifically the steps you took? What was that username for - is that the logged-in username on the host? is the environment the environment name in .juju/environments.yaml, and service the name of the charm you invoked?

Sure. First, I ssh'd to project64198@deneb, a (metal) server of mine. There prepared for local deployment:

sudo apt-get install python-software-properties
sudo add-apt-repository ppa:juju/pkgs
sudo apt-get update && sudo apt-get install juju
sudo apt-get install lxc apt-cacher-ng libzookeeper-java zookeeper

Now I ran juju bootstrap and got a .juju/environments.yaml, which I modiified:

environments:
  democluster:
    type: local
    control-bucket: juju-c81e728d9d4c2f636f067f89cc14862c
    admin-secret: eccbc87e4b5ce2fe28308fd9f2a7baf3
    default-series: precise
    juju-origin: ppa
    data-dir: /home/project64198/democluster

Also, I ran ssh-keygen -t rsa and mkdir democluster, and prepared for hbase by writing this config.yaml:

hadoop-master:
    hbase: True
hadoop-workers:
    hbase: True
thrift:
    monitoring_port: 9090

After another juju bootstrap I deployed my cloud:

juju deploy zookeeper
juju deploy --config config.yaml hadoop hadoop-master
juju deploy --config config.yaml hadoop hadoop-workers
juju deploy hbase hbase-master
juju deploy hbase hbase-regions
juju add-relation hadoop-master:namenode hadoop-workers:datanode
juju add-relation hbase-master zookeeper
juju add-relation hbase-regions zookeeper
juju add-relation hadoop-master:namenode hbase-master:namenode
juju add-relation hadoop-master:namenode hbase-regions:namenode
juju add-relation hbase-master:master hbase-regions:regionserver

It was at this point that hbase-master would stick in agent-state: pending, public-address: null
So I ran juju destroy-environment, changed these lines in .juju/environments.yaml:

  demo:
    data-dir: /home/project64198/demo

After an mkdir demo, juju bootstrap and all the deploys and add-relations, my cloud was up.

Serge Hallyn (serge-hallyn) wrote :

@Sebastian,

what you are seeing is likely unrelated to this particular bug. However if I simply do something like:

juju deploy --repository=~/myrepo local:precise/ovs-lxc lxc-5678901234567890123456789012345678901

and very quickly

juju debug-log

then I see:

LXCError: lxc-start: node name '' is too long
lxc-start: failed to read configuration file
Traceback (most recent call last):

Could you confirm whether that is what you also see? And if so, file it as a bug against lxc and juju?

Serge Hallyn (serge-hallyn) wrote :

@jamespage,

do you still see this bug?

Changed in ntp (Ubuntu):
status: Expired → Confirmed
importance: Undecided → High
Serge Hallyn (serge-hallyn) wrote :

Never mind - simply doing

sudo hostname host567890123456789012345678901234567890
sudo apt-get install lxc

in precise reproduced this, giving me

 * Starting NTP server ntpd *** glibc detected *** lockfile-create: free(): invalid next size (fast): 0x0000000000d750a0 ***

Serge Hallyn (serge-hallyn) wrote :

Note I get the same thing in a kvm-based cloud instance.

Marking this bug as not affecting lxc.

no longer affects: lxc (Ubuntu)
Serge Hallyn (serge-hallyn) wrote :

I have reproduced this on precise and raring.

Tyler Hicks (tyhicks) wrote :

The problem is with string handling in liblockfile's lockfile_create_save_tmplock(). I'll start work on getting a debdiff prepared.

Changed in liblockfile (Ubuntu):
assignee: nobody → Tyler Hicks (tyhicks)
importance: Undecided → Medium
status: New → In Progress
Changed in lockfile-progs (Ubuntu):
status: New → Invalid
no longer affects: lockfile-progs (Ubuntu)
no longer affects: ntp (Ubuntu)
Tyler Hicks (tyhicks) wrote :

Here's debdiff for Raring. It passes the [Test Case] that I added to the bug description, as well as other manual testing such as a creating a lockfile on a system with a 1 character long hostname.

The debdiff also fixes bug #1011477. See that bug for info and test case.

description: updated
Tyler Hicks (tyhicks) on 2013-01-09
Changed in liblockfile (Ubuntu):
status: In Progress → Confirmed
assignee: Tyler Hicks (tyhicks) → nobody
Changed in liblockfile (Ubuntu):
importance: Medium → High
Michael Terry (mterry) wrote :

I just uploaded your patch to raring, and sent it on to Debian bug http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=677225

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package liblockfile - 1.09-5ubuntu1

---------------
liblockfile (1.09-5ubuntu1) raring; urgency=low

  * debian/patches/fix-buffer-overflows.patch: Fix buffer overflows when
    building strings
    - Protect against overflows caused by long hostnames (LP: #941968)
    - Protect against overflows caused by large PID numbers (LP: #1011477)
 -- Tyler Hicks <email address hidden> Wed, 09 Jan 2013 12:23:07 -0800

Changed in liblockfile (Ubuntu):
status: Confirmed → Fix Released
Nick Moffitt (nick-moffitt) wrote :

this may be a dup of #971314

Nick Moffitt (nick-moffitt) wrote :

or do I have to type bug#971314 or LP #971314 or what

tags: added: canonistack
Haw Loeung (hloeung) on 2013-04-23
tags: added: canonical-webops-juju
Adam Gandelman (gandelman-a) wrote :

Hitting this on precise trying to deploy a more recent Ceph (which apparently uses liblockfile) on an openstack cloud configured to use its default hostname scheme, eg: server-e5476307-a3c3-4703-b28c-84fedce7078.novalocal

Adam Gandelman (gandelman-a) wrote :

The raring nomination can be removed/rejected. The patch that fixes the issue was released in raring circa liblockfile 1.09-5ubuntu1.

description: updated
Martin Pitt (pitti) on 2013-06-24
Changed in liblockfile (Ubuntu Raring):
status: New → Fix Released
Martin Pitt (pitti) wrote :

Sponsored precise upload. Unsubscribing sponsors.

Changed in liblockfile (Ubuntu Precise):
status: New → In Progress

Hello James, or anyone else affected,

Accepted liblockfile into precise-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/liblockfile/1.09-3ubuntu0.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in liblockfile (Ubuntu Precise):
status: In Progress → Fix Committed
tags: added: verification-needed
Colin Watson (cjwatson) wrote :

Hello James, or anyone else affected,

Accepted liblockfile into quantal-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/liblockfile/1.09-4ubuntu0.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in liblockfile (Ubuntu Quantal):
status: New → Fix Committed
Peter Schroeter (schroeter) wrote :

Came here by way of the ntp restart failing due to liblockfile errors if the hostname is too long. The 1.09-3ubuntu0.1 proposed package fixes the error for me, thanks!

tags: added: verification-done-precise
Joe Breu (breu) wrote :

I tested this and the bug still exists where lockfile-create segfaults with a long hostname. The problem here is the 23 characters allowed for the system name is still not sufficient. gethostname() can return a hostname up to 256 characters long.

on precise with liblockfile-bin 1.09-3ubuntu0.1 installed:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
root@areallylonghostnamethatshouldbreakeverything:~# lockfile-create /var/lock/ntpdate
*** glibc detected *** lockfile-create: free(): invalid next size (fast): 0x000000000128f0a0 ***
Segmentation fault (core dumped)

from the patch:
~~~~~~~~~~~~
#define TMPLOCKSTR ".lk"
#define TMPLOCKSTRSZ strlen(TMPLOCKSTR)
+#define TMPLOCKPIDSZ 5
#define TMPLOCKTIMESZ 1
#define TMPLOCKSYSNAMESZ 23
#define TMPLOCKFILENAMESZ (TMPLOCKSTRSZ + TMPLOCKPIDSZ + \
     TMPLOCKTIMESZ + TMPLOCKSYSNAMESZ)

TMPLOCKSYSNAMESZ needs to be much larger than 23. This should actually be the same as the size of sysname which in this case is 256.

Alexander List (alexlist) wrote :

Can we please get this released in a precise SRU?

Philipp Kern (pkern) wrote :

Joseph, I cannot reproduce this with the version in precise-proposed. Are you sure that you installed 1.09-3ubuntu0.1? Also the maximum the kernel stores as a hostname is 64 characters, so there should never be 256 characters returned.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package liblockfile - 1.09-3ubuntu0.1

---------------
liblockfile (1.09-3ubuntu0.1) precise-proposed; urgency=low

  * debian/patches/fix-buffer-overflows.patch: Fix buffer overflows when
    building strings
    - Protect against overflows caused by long hostnames (LP: #941968)
    - Protect against overflows caused by large PID numbers (LP: #1011477)
 -- Adam Gandelman <email address hidden> Thu, 20 Jun 2013 12:37:10 -0700

Changed in liblockfile (Ubuntu Precise):
status: Fix Committed → Fix Released
Joe Breu (breu) wrote :

Phillipp

I am, and it is still not fixed

root@breu-prec-jenkins-32675-api:~# apt-cache policy liblockfile1
liblockfile1:
  Installed: 1.09-3
  Candidate: 1.09-3ubuntu0.1
  Version table:
     1.09-3ubuntu0.1 0
        500 http://archive.ubuntu.com/ubuntu/ precise-updates/main amd64 Packages
 *** 1.09-3 0
        500 http://archive.ubuntu.com/ubuntu/ precise/main amd64 Packages
        100 /var/lib/dpkg/status
root@breu-prec-jenkins-32675-api:~# hostname
breu-prec-jenkins-32675-api
root@breu-prec-jenkins-32675-api:~# hostname areallylonghostnamethatshouldbreakeverything
root@breu-prec-jenkins-32675-api:~# hostname
areallylonghostnamethatshouldbreakeverything
root@breu-prec-jenkins-32675-api:~# lockfile-create /var/lock/123date
*** glibc detected *** lockfile-create: free(): invalid next size (fast): 0x00000000024bc0a0 ***
Segmentation fault (core dumped)

Joe Breu (breu) wrote :

this is also on precise

Joe Breu (breu) wrote :

never mind - my co-worker let me down and this totally works. my bad.

Rolf Leggewie (r0lf) wrote :

quantal has seen the end of its life and is no longer receiving any updates. Marking the quantal task for this ticket as "Won't Fix".

Changed in liblockfile (Ubuntu Quantal):
status: Fix Committed → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.