Libvirt FTBFS in Artful on x86

Bug #1718668 reported by Christian Ehrhardt 
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
Fix Released
Medium
Unassigned
Pike
Fix Released
Medium
Unassigned
libvirt (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Not yet sure what changed since the last upload.
It might even be a build environment change.

TL;DR no change to the current libvirt-3.6 FTBFS

Buildlog at:
https://launchpadlibrarian.net/337780146/buildlog_ubuntu-artful-amd64.libvirt_3.6.0-1ubuntu5~ppa01_BUILDING.txt.gz

The Fail is:
FAIL: test-getopt-posix
=======================

../../../../gnulib/tests/test-getopt.h:754: assertion 'strcmp (argv[1], "donald") == 0' failed
FAIL test-getopt-posix (exit status: 134)

tags: added: artful ftbfs
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

That will do to recreate:
autopkgtest -o test-libvirt-build --shell-fail --apt-upgrade libvirt_3.6.0-1ubuntu5~ppa01.dsc -- qemu ~/work/autopkgtest-artful-amd64.img

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The tests checks that argv isn't touched by getopt.
There are multiple calls which work before the breaking one:

The calls set different options:
setenv ("POSIXLY_CORRECT", "1", 1);
Works
then
unsetenv ("POSIXLY_CORRECT");
fails

In this env getopt_loop moves the args around:
(gdb) p argv
$14 = {0x555555559c14 "program", 0x555555559d3d "donald", 0x555555559cd9 "-p", 0x555555559d44 "billy", 0x555555559d4a "duck",
  0x555555559c1c "-a", 0x555555559c1f "bar", 0x0, 0x555555559cc9 "foo", 0x555555559cdc "-q"}
(gdb) n
753 ASSERT (strcmp (argv[0], "program") == 0);
(gdb) p argv
$15 = {0x555555559c14 "program", 0x555555559cd9 "-p", 0x555555559d44 "billy", 0x555555559c1c "-a", 0x555555559d3d "donald",
  0x555555559d4a "duck", 0x555555559c1f "bar", 0x0, 0x555555559cc9 "foo", 0x555555559cdc "-q"}

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

So it checks for something not meant to be the way it is [1]
I'd assume the action to actually check is wrong.

That depends on the var "posixly", maybe that changed.
It is defined as:
  bool posixly = !!getenv ("POSIXLY_CORRECT");
  /* See comment in getopt.c:
     glibc gets a LSB-compliant getopt.
     Standalone applications get a POSIX-compliant getopt. */
#if defined __GETOPT_PREFIX || !(__GLIBC__ >= 2 || defined __MINGW32__)
  /* Using getopt from gnulib or from a non-glibc system. */
  posixly = true;
#endif

Need to check those defines on build

[1]: http://man7.org/linux/man-pages/man3/getopt.3.html

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

gcc -DHAVE_CONFIG_H -DEXEEXT=\"\" -I. -I../../../../gnulib/tests -I../.. -DIN_LIBVIRT_GNULIB_TESTS=1 -I. -I../../../../gnulib/tests -I../.. -I../../../../gnulib/tests/../.. -I../../gnulib/lib -I../../../../gnulib/tests/../../gnulib/lib -g -O2 -c -o test-getline.o ../../../../gnulib/tests/test-getline.c
gcc -DHAVE_CONFIG_H -DEXEEXT=\"\" -I. -I../../../../gnulib/tests -I../.. -DIN_LIBVIRT_GNULIB_TESTS=1 -I. -I../../../../gnulib/tests -I../.. -I../../../../gnulib/tests/../.. -I../../gnulib/lib -I../../../../gnulib/tests/../../gnulib/lib -g -O2 -c -o test-getopt-posix.o ../../../../gnulib/tests/test-getopt-posix.c

/bin/bash ../../libtool --tag=CC --preserve-dup-deps --mode=link gcc -g -O2 -o test-getopt-posix test-getopt-posix.o libtests.a ../../gnulib/lib/libgnu.la libtests.a -ldl

With -E it generates:
# 86 "../../../../gnulib/tests/test-getopt.h"
      posixly = !!getenv ("POSIXLY_CORRECT");
  posixly =
# 92 "../../../../gnulib/tests/test-getopt.h" 3 4
           1
# 92 "../../../../gnulib/tests/test-getopt.h"

So it is at build time.
A short check shows that it is __GETOPT_PREFIX that is defined and bringing the hard assign in.
Also disabling the assignment makes the test work.

Need to find where the __GETOPT_PREFIX now comes from.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Generated debian/build/config.h has:
debian/build/config.h:2942:#define __GETOPT_PREFIX rpl_

Didn't it before?
Unfortunately not in build logs e.g. config.log has that.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Building the same libvirt source on zetsy works as it did befor eon artful.
So likely one of the later lib updates.

Will use the zesty build as a comparison where in configure and co it might be different.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

grep __GETOPT_PREFIX debian/build/config.log | uniq
| #define __GETOPT_PREFIX rpl_
#define __GETOPT_PREFIX rpl_

The actual config defines the symbol in both cases.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Build with -O0 for debugging and -E to check precompiler
gcc -DHAVE_CONFIG_H -DEXEEXT=\"\" -I. -I../../../../gnulib/tests -I../.. -DIN_LIBVIRT_GNULIB_TESTS=1 -I. -I../../../../gnulib/tests -I../.. -I../../../../gnulib/tests/../.. -I../../gnulib/lib -I../../../../gnulib/tests/../../gnulib/lib -g -O0 -c -o test-getopt-posix.o ../../../../gnulib/tests/test-getopt-posix.c

/bin/bash ../../libtool --tag=CC --preserve-dup-deps --mode=link gcc -g -O0 -o test-getopt-posix test-getopt-posix.o libtests.a ../../gnulib/lib/libgnu.la libtests.a -ldl
libtool: link: gcc -g -O0 -o test-getopt-posix test-getopt-posix.o libtests.a ../../gnulib/lib/.libs/libgnu.a -lutil libtests.a -ldl -pthread

 gcc -DHAVE_CONFIG_H -DEXEEXT=\"\" -I. -I../../../../gnulib/tests -I../.. -DIN_LIBVIRT_GNULIB_TESTS=1 -I. -I../../../../gnulib/tests -I../.. -I../../../../gnulib/tests/../.. -I../../gnulib/lib -I../../../../gnulib/tests/../../gnulib/lib -g -O0 -E -o test-getopt-posix.E ../../../../gnulib/tests/test-getopt-posix.c

It turns out posixly was always defined to be overwritten as =1 due to the config setting __GETOPT_PREFIX as shown in c#6.

That implies that symptom and casuse is vice versa than I thought.
I expected posixly behavior all the time, but now the getopt_loop changes order.
That might be the .so binding resolved differently or something like that.

Checking the details of that execution, to track the right one we need:
b ../../../../gnulib/tests/test-getopt.h:747
run
c
c
# on the third hit go deeper
b getopt
c
c
p argv[1]
# at the end of the function order will be changed

TBD: check env, check which implementation is used

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Position slightly changed getopt.c, line 1188 -> getopt.c, line 735
But way more important the "good case" does not hit the getopt at all.

The backtrace finds the true reason which is that it stopped using its gnulib implementation.
The configure still detects to use the local gnulib thing and sets posixly, but debugging shows:

Artful:
#0 getopt (argc=7, argv=0x7fffffffe270, optstring=0x55555555bf5a "abp:q:") at getopt.c:735
#1 0x0000555555554f37 in getopt_loop (argc=7, argv=0x7fffffffe270, options=0x55555555bf5a "abp:q:", a_seen=0x7fffffffe1b0, b_seen=0x7fffffffe1b4, p_value=0x7fffffffe210,
    q_value=0x7fffffffe218, non_options_count=0x7fffffffe1b8, non_options=0x7fffffffe220, unrecognized=0x7fffffffe1bc, message_issued=0x7fffffffe1ae) at ../../../../gnulib/tests/test-getopt.h:43
#2 0x00005555555591f4 in test_getopt () at ../../../../gnulib/tests/test-getopt.h:748
#3 0x000055555555bb24 in main () at ../../../../gnulib/tests/test-getopt-main.h:65
(gdb) frame 1
#1 0x0000555555554f37 in getopt_loop (argc=7, argv=0x7fffffffe270, options=0x55555555bf5a "abp:q:", a_seen=0x7fffffffe1b0, b_seen=0x7fffffffe1b4, p_value=0x7fffffffe210,
    q_value=0x7fffffffe218, non_options_count=0x7fffffffe1b8, non_options=0x7fffffffe220, unrecognized=0x7fffffffe1bc, message_issued=0x7fffffffe1ae) at ../../../../gnulib/tests/test-getopt.h:43

Zesty:
#0 rpl_getopt (argc=7, argv=0x7fffffffe220, optstring=0x55555555ce0a "abp:q:") at ../../../../gnulib/lib/getopt.c:739
#1 0x000055555555508d in getopt_loop (argc=7, argv=0x7fffffffe220, options=0x55555555ce0a "abp:q:", a_seen=0x7fffffffe160, b_seen=0x7fffffffe164, p_value=0x7fffffffe1c0,
    q_value=0x7fffffffe1c8, non_options_count=0x7fffffffe168, non_options=0x7fffffffe1d0, unrecognized=0x7fffffffe16c, message_issued=0x7fffffffe15e) at ../../../../gnulib/tests/test-getopt.h:43
#2 0x000055555555934a in test_getopt () at ../../../../gnulib/tests/test-getopt.h:748
#3 0x000055555555bc7a in main () at ../../../../gnulib/tests/test-getopt-main.h:65

rpl_getopt in zesty (which behaves correctly, but getopt in artful now.
Also the ENV has not set the posixly_correct env as it expects gnulib which is correct

So the error is that it uses the glibc's getopt instead of it's bundled gnulib.
TBD: check linking and build in that regard

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Artful:
  symbol=getopt; lookup in file=./debian/build/gnulib/tests/test-getopt-posix [0]
  symbol=getopt; lookup in file=/lib/x86_64-linux-gnu/libpthread.so.0 [0]
  symbol=getopt; lookup in file=/lib/x86_64-linux-gnu/libc.so.6 [0]
  binding file ./debian/build/gnulib/tests/test-getopt-posix [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `getopt' [GLIBC_2.2.5]
  symbol=getopt; lookup in file=./debian/build/gnulib/tests/test-getopt-posix [0]
  symbol=getopt; lookup in file=/lib/x86_64-linux-gnu/libpthread.so.0 [0]
  symbol=getopt; lookup in file=/lib/x86_64-linux-gnu/libc.so.6 [0]
  binding file ./debian/build/gnulib/tests/test-getopt-posix [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `getopt' [GLIBC_2.2.5]

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

As always when you know what to look for one can trace further.
config.h and config.log has no massive difference that would indicate the issue.
But before the linking already on the preprocessor step it is broken.
So we can ignore the linker step.

The good case has:
# 91 "../../../../gnulib/tests/../../gnulib/lib/getopt-core.h"
  rpl_getopt
[...]
getopt_loop (int argc, const char **argv,
[...]
while ((c = rpl_getopt (argc, (char **) argv, options)) != -1)

OTOH the bad case has:
# nothing on rpl_getopt
while ((c = getopt (argc, (char **) argv, options)) != -1)

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

commit e3461d1c21a99bcef1b8826f710434e0ffb5adea
Author: Paul Eggert <email address hidden>
Date: Sun Jun 11 15:53:09 2017 -0700

    getopt-posix: port to glibc 2.25.90

    Problem reported by Daniel P. Berrange in:
    http://lists.gnu.org/archive/html/bug-gnulib/2017-06/msg00003.html
    * lib/getopt-pfx-core.h (_GETOPT_CORE_H):
    * lib/getopt-pfx-ext.h (_GETOPT_EXT_H):
    #undef if __GETOPT_PREFIX is defined

Sounds very much like it right?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Well that sounded too good, but it is already applied in what I merged for artful.
It seems there is something new for glibc 2.26 we have now.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Unfortunately gnulib as-from-git builds fine even from the hash it was in libvirt.
/sigh

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

./gnulib-tool --test getopt-* all three working

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I simplified the code to be somewhat readable.
One can on zesty remove the path to reach ../../gnulib/lib/getopt.h and co and get the bad behavior to get to the systems unistd.h and run with that.

On Artful it is that behavior no matter if the path is added or not.
Since the source is the same there is a ../../gnulib/lib/unistd.h on artful as well, but nor more working the same way.

It might be interesting though that in zesty - extern int getopt is first requested from:
# 150 "/usr/include/getopt.h" 3 4
But on artful it is from
#91 "/usr/include/x86_64-linux-gnu/bits/getopt_core.h" 3 4

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

On zesty:
.. /usr/include/unistd.h
... /usr/include/features.h
.... /usr/include/x86_64-linux-gnu/sys/cdefs.h
..... /usr/include/x86_64-linux-gnu/bits/wordsize.h
.... /usr/include/x86_64-linux-gnu/gnu/stubs.h
..... /usr/include/x86_64-linux-gnu/gnu/stubs-64.h
... /usr/include/x86_64-linux-gnu/bits/posix_opt.h
... /usr/include/x86_64-linux-gnu/bits/environments.h
.... /usr/include/x86_64-linux-gnu/bits/wordsize.h
... /usr/include/x86_64-linux-gnu/bits/types.h
.... /usr/include/x86_64-linux-gnu/bits/wordsize.h
.... /usr/include/x86_64-linux-gnu/bits/typesizes.h
... /usr/lib/gcc/x86_64-linux-gnu/6/include/stddef.h
... /usr/include/x86_64-linux-gnu/bits/confname.h
... ../../gnulib/lib/getopt.h
.... /usr/include/getopt.h
.... ../../gnulib/lib/getopt-cdefs.h
.... ../../../../gnulib/lib/getopt-pfx-core.h
..... ../../../../gnulib/lib/getopt-core.h
.... ../../../../gnulib/lib/getopt-pfx-ext.h
..... ../../../../gnulib/lib/getopt-ext.h

Artful:
.. /usr/include/unistd.h
... /usr/include/features.h
.... /usr/include/x86_64-linux-gnu/sys/cdefs.h
..... /usr/include/x86_64-linux-gnu/bits/wordsize.h
..... /usr/include/x86_64-linux-gnu/bits/long-double.h
.... /usr/include/x86_64-linux-gnu/gnu/stubs.h
..... /usr/include/x86_64-linux-gnu/gnu/stubs-64.h
... /usr/include/x86_64-linux-gnu/bits/posix_opt.h
... /usr/include/x86_64-linux-gnu/bits/environments.h
.... /usr/include/x86_64-linux-gnu/bits/wordsize.h
... /usr/include/x86_64-linux-gnu/bits/types.h
.... /usr/include/x86_64-linux-gnu/bits/wordsize.h
.... /usr/include/x86_64-linux-gnu/bits/typesizes.h
... /usr/lib/gcc/x86_64-linux-gnu/7/include/stddef.h
... /usr/include/x86_64-linux-gnu/bits/confname.h
... /usr/include/x86_64-linux-gnu/bits/getopt_posix.h
.... /usr/include/x86_64-linux-gnu/bits/getopt_core.h

Due to https://sourceware.org/ml/libc-alpha/2017-04/msg00115.html in glibc 2.26.

TL;DR: the new glibc's unistd does pull in glibc's header for getopt before we include gnulib.

I need to:
1. find a workaround for libvirt in artful to work "as before" pulling in gnulib - maybe pre-including, but that is hard as it could be so many places.
2. get to a reproducer that is good to show the problem
   Why does gnulib work on old and new level in artful/zesty - write a rather simple repro to
   check. With that report to mailing lists.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

So summarizing, while I fail to see the way to a fix I can now quite well summarize the issue and provide steps to reproduce.
I'll go to the devel mailing list with that:

Hi,
there seems to be an incompatibility to the last glibc due to [1].

Eventually this breaks gnulib unittests (and maybe more).
Debugging went from an assert, to bidngin different symbols, to changed function names to different header resolution.

Because it expects it to behave "posixly" but arguments are changed differently.
FAIL: test-getopt-posix
=======================

../../../../gnulib/tests/test-getopt.h:754: assertion 'strcmp (argv[1], "donald") == 0' failed

# get latest libvirt to get their style of local gnulib
# (sorry I couldn't get git://git.sv.gnu.org/gnulib.git to work showing the issue, yet I also couldn't identify a fix there that I could leverage)
$ wget http://libvirt.org/sources/libvirt-3.7.0.tar.xz
$ tar xf libvirt-3.7.0.tar.xz
$ cd libvirt
$ apt build-dep libvirt
$ ./autogen.sh
$ make -j4
$ gcc -Wall -I ./gnulib/lib -I . test.c -o test -H

I really wanted to come up with the same against gnulib itself, but I was lost in the build system for too long and hope one can translate it if you think it is needed.
Well until then you can run the following simplified test derived from the unit test (it is much shorter, so easier to debug).

$ cat << EOF >> test1.c
#include <config.h>
#include <unistd.h>

#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>

int main (void)
{
        return 0;
}
EOF
$ gcc -I ./gnulib/lib -I . test1.c -H

You can see in -H output already the difference in the paths of the headers that it uses:

Glibc <2.26
. ./config.h
.. ./config-post.h
. /usr/include/unistd.h
[...]
.. /usr/include/getopt.h

Glibc >=2.26
. ./config.h
.. ./config-post.h
. ./gnulib/lib/unistd.h
[...]
... ./gnulib/lib/getopt.h

If you build with -E you'll also see that it now uses getopt from glibc instead of the prefixed rpl_getopt from gnulib.

Sorry, but I don't see the right fix here - I could easily silence the test but that is no fix. Especially since this might have all sort of implications due to handling the args (slightly) differently. Therefore 'm reaching out to you for your help and experience on the build system what could be done.

[1]: https://sourceware.org/ml/libc-alpha/2017-04/msg00115.html

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Upstream discussion at [1] and a preliminary fix building and testing atm.

[1]: https://www.redhat.com/archives/libvir-list/2017-September/msg01039.html

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Worked on all architectures, since the upstream discussion is going on I'm giving it a day if there is a better fix, otherwise I'll submit the interim fix to fix it for the time being.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

We are closing in on the upstream discussion and this is now reproducible with glibc/gnulib along without libvirt. Holding back an ugly fix for now until we know the right way to do it.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Discussion ended in a suggested fix in [1].
Builds are fine on all arches now [2], given the changes I'm running a shorter set of regression checks to be sure (no test would be too few, but no need for ALL on all-arch given an this is an optarg change).

Will update here and upload after tests are (hopefully) good.

[1]: https://www.redhat.com/archives/libvir-list/2017-October/msg00331.html
[2]: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/2987/+packages

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Being a pure bugfix upload (and required for any latter service of libvirt in Artful and UCA-Pike) it is ok to upload this fix into Artful despite the current Freezes in place.

Regression checks were good, so uploading 3.6.0-1ubuntu5 now.
Seeing it in unapproved [1] so upload was ok.

[1]: https://launchpadlibrarian.net/340501612/libvirt_3.6.0-1ubuntu5_source.changes

Changed in libvirt (Ubuntu):
status: New → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 3.6.0-1ubuntu5

---------------
libvirt (3.6.0-1ubuntu5) artful; urgency=medium

  * d/p/u/gnulib-getopt-posix-Fix-build-failure-when-using-ac_cv_head.patch:
    fix FTBFS with glibc 2.26 (LP: #1718668)

 -- Christian Ehrhardt <email address hidden> Thu, 28 Sep 2017 08:18:10 -0400

Changed in libvirt (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Corey Bryant (corey.bryant) wrote : Please test proposed package

Hello ChristianEhrhardt, or anyone else affected,

Accepted libvirt into pike-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:pike-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-pike-needed to verification-pike-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-pike-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-pike-needed
Revision history for this message
Corey Bryant (corey.bryant) wrote :

OpenStack regression tests have passed successfully.

pike-proposed with next charms:

======
Totals
======
Ran: 102 tests in 1904.7099 sec.
 - Passed: 93
 - Skipped: 9
 - Expected Fail: 0
 - Unexpected Success: 0
 - Failed: 0
Sum of execute time for each test: 794.0835 sec.

pike-proposed with stable charms:

======
Totals
======
Ran: 102 tests in 1976.6895 sec.
 - Passed: 93
 - Skipped: 9
 - Expected Fail: 0
 - Unexpected Success: 0
 - Failed: 0
Sum of execute time for each test: 854.3158 sec.

tags: added: verification-pike-done
removed: verification-pike-needed
Revision history for this message
Corey Bryant (corey.bryant) wrote : Update Released

The verification of the Stable Release Update for libvirt has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

This bug was fixed in the package libvirt - 3.6.0-1ubuntu5~cloud0
---------------

 libvirt (3.6.0-1ubuntu5~cloud0) xenial-pike; urgency=medium
 .
   * New update for the Ubuntu Cloud Archive.
 .
 libvirt (3.6.0-1ubuntu5) artful; urgency=medium
 .
   * d/p/u/gnulib-getopt-posix-Fix-build-failure-when-using-ac_cv_head.patch:
     fix FTBFS with glibc 2.26 (LP: #1718668)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.