SRU: vmware-guestd crashing

Bug #306835 reported by Martin
28
Affects Status Importance Assigned to Milestone
open-vm-tools (Ubuntu)
Fix Released
Medium
Unassigned
Intrepid
Invalid
Undecided
Unassigned

Bug Description

The upstream version of open-vm-tools in Intrepid FTBFS because it has format string warnings and is built with -Werror. The Ubuntu “fix”, ubuntu_toolchain_FTBFS.dpatch, introduced many regressions relative to upstream by adding incorrect "%s\n" format strings. These spurious newlines cause vmware-guestd to crash on startup.

The format string warnings have been fixed (correctly) upstream, so this patch is not needed in Jaunty and has already been removed. The attached debdiff (comment 22) fixes ubuntu_toolchain_FTBFS.dpatch for Intrepid.

This debdiff also fixes bug #289921 “network interface does not come up after installing open-vm-tools” and bug #302226 “vmware-user doesn't autostart”, and descriptions of the corresponding patches are posted there.

The patched package has been in my PPA for four weeks, and confirmed working by three other users. Risk of regression is low, especially given that these three bugs essentially prevent the current Intrepid package from being useful in any way.

===

Binary package hint: open-vm-tools

I have some amd64 machines running on VMWare ESX and the vmware-guestd terminates itself after a few seconds. The kernel modules load fine and while vmware-guestd is running the Infrastrcture Client shows that the VMWare Tools are running.

I used the latest version from -proposed:
open-vm-tools 2008.08.08-109361-1ubuntu2.1
linux-image-2.6.27-9-server 2.6.27-9.19 x86_64

I also attached the output of strace. I think the last few lines are interessting:

read(13, "Inter-| Receive "..., 1024) = 453
ioctl(12, SIOCGIFFLAGS, {ifr_name="lo", ifr_flags=IFF_UP|IFF_LOOPBACK|IFF_RUNNING}) = 0
ioctl(12, SIOCGIFMTU, {ifr_name="lo", ifr_mtu=16436}) = 0
ioctl(12, SIOCGIFADDR, {ifr_name="lo", ifr_addr={AF_INET, inet_addr("127.0.0.1")}}) = 0
ioctl(12, SIOCGIFNETMASK, {ifr_name="lo", ifr_netmask={AF_INET, inet_addr("255.0.0.0")}}) = 0
ioctl(12, SIOCGIFFLAGS, {ifr_name="eth0", ifr_flags=IFF_UP|IFF_BROADCAST|IFF_RUNNING|IFF_MULTICAST}) = 0
ioctl(12, SIOCGIFMTU, {ifr_name="eth0", ifr_mtu=1500}) = 0
ioctl(12, SIOCGIFADDR, {ifr_name="eth0", ifr_addr={AF_INET, inet_addr("131.234.137.92")}}) = 0
ioctl(12, SIOCGIFNETMASK, {ifr_name="eth0", ifr_netmask={AF_INET, inet_addr("255.255.255.128")}}) = 0
ioctl(12, SIOCGIFHWADDR, {ifr_name="eth0", ifr_hwaddr=00:50:56:aa:19:73}) = 0
write(2, "str.c:100 Buffer too small 0x7f7"..., 34) = 34
write(2, "str.c:100 Buffer too small 0x7f7"..., 34) = 34
write(2, "Backtrace:\n", 11) = 11
futex(0x7f77aa2a6190, 0x81 /* FUTEX_??? */, 2147483647) = 0
write(2, "Backtrace[0] 00007fffb4853200 ri"..., 177) = 177
write(2, "Backtrace[1] 00007fffb4853220 ri"..., 177) = 177
write(2, "Backtrace[2] 00007fffb4853640 ri"..., 177) = 177
write(2, "Backtrace[3] 00007fffb4853720 ri"..., 177) = 177
write(2, "Backtrace[4] 00007fffb4853810 ri"..., 177) = 177
write(2, "Backtrace[5] 00007fffb48538a0 ri"..., 177) = 177
write(2, "Backtrace[6] 00007fffb4857900 ri"..., 177) = 177
write(2, "Backtrace[7] 00007fffb4857920 ri"..., 177) = 177
write(2, "Backtrace[8] 00007fffb4857bc0 ri"..., 177) = 177
write(2, "Backtrace[9] 00007fffb4857be0 ri"..., 177) = 177
write(2, "Backtrace[10] 00007fffb4857fe0 r"..., 178) = 178
write(2, "Backtrace[11] 00007fffb4858050 r"..., 178) = 178
write(2, "Backtrace[12] 00007fffb4858100 r"..., 178) = 178
write(2, "str.c:100 Buffer too small 0x7f7"..., 34) = 34
exit_group(-1) = ?
Process 31194 detached

Revision history for this message
Martin (agima) wrote :
description: updated
Revision history for this message
Adar Dembo (adar-deactivatedaccount) wrote :

Looks like a crash in retrieving information from libdnet, used by the guestInfo subsystem in guestd.

Could you reproduce the bug and upload the dumped core as well as the guestd executable containing symbols? You may need to play with 'ulimit' to allow a core dump to be generated.

Revision history for this message
Martin (agima) wrote :

It seems as it doesn't produce a core file:

# ulimit -c unlimited
# ulimit -c
unlimited

# vmware-guestd
str.c:100 Buffer too small 0x7fe8
str.c:100 Buffer too small 0x7fe8
Backtrace:
Backtrace[0] 00007fff3d9d72c0 rip=000000000042c1aa rbx=0000000000000000 rbp=000000000042c380 r12=00007fff3d9dba60 r13=00007fff3d9d7970 r14=00007fff3d9d9970 r15=0000000000416600
Backtrace[1] 00007fff3d9d72e0 rip=000000000044074b rbx=00007fff3d9d7910 rbp=00007fff3d9d7970 r12=00007fff3d9dba60 r13=00007fff3d9d7970 r14=00007fff3d9d9970 r15=0000000000416600
Backtrace[2] 00007fff3d9d7700 rip=000000000044057b rbx=00007fff3d9d7910 rbp=00007fff3d9d7970 r12=00007fff3d9dba60 r13=00007fff3d9d7970 r14=00007fff3d9d9970 r15=0000000000416600
Backtrace[3] 00007fff3d9d77e0 rip=0000000000423f84 rbx=00007fff3d9d7910 rbp=00007fff3d9d7970 r12=00007fff3d9dba60 r13=00007fff3d9d7970 r14=00007fff3d9d9970 r15=0000000000416600
Backtrace[4] 00007fff3d9d78d0 rip=0000000000416697 rbx=00007fff3d9d7910 rbp=00007fff3d9d7970 r12=00007fff3d9dba60 r13=00007fff3d9d7970 r14=00007fff3d9d9970 r15=0000000000416600
Backtrace[5] 00007fff3d9d7960 rip=00007fe834d3a16a rbx=00007fff3d9d9972 rbp=0000000001e8f310 r12=0000000001e8dfb0 r13=00007fff3d9d7970 r14=00007fff3d9d9970 r15=0000000000416600
Backtrace[6] 00007fff3d9db9c0 rip=000000000041656d rbx=00007fff3d9dba60 rbp=0000000001e8f310 r12=0000000001e8cb50 r13=00007fff3d9dba60 r14=0000000000000001 r15=0000000000000001
Backtrace[7] 00007fff3d9db9e0 rip=0000000000415460 rbx=00007fff3d9dba70 rbp=00007fff3d9dbb70 r12=0000000001e8cb50 r13=00007fff3d9dba60 r14=0000000000000001 r15=0000000000000001
Backtrace[8] 00007fff3d9dbc80 rip=000000000040bb3d rbx=0000000001e882f0 rbp=00007fff3d9dc068 r12=0000000001e8cb50 r13=00007fff3d9dbcd0 r14=0000000000000001 r15=0000000000000001
Backtrace[9] 00007fff3d9dbca0 rip=00000000004095b9 rbx=00007fff3d9dc068 rbp=0000000001e88700 r12=0000000001e8ccc0 r13=
00007fff3d9dbcd0 r14=0000000000000001 r15=0000000000000001
Backtrace[10] 00007fff3d9dc0a0 rip=0000000000409a2a rbx=0000000000000000 rbp=0000000000000000 r12=0000000000405b40 r13=00007fff3d9dc1d8 r14=0000000000000000 r15=0000000000000001
Backtrace[11] 00007fff3d9dc110 rip=00007fe832ebd466 rbx=0000000000440c80 rbp=0000000000000000 r12=0000000000405b40 r13=00007fff3d9dc1d0 r14=0000000000000000 r15=0000000000000000
Backtrace[12] 00007fff3d9dc1c0 rip=0000000000405b69 rbx=0000000000000000 rbp=0000000000000000 r12=0000000000405b40 r13=00007fff3d9dc1d0 r14=0000000000000000 r15=0000000000000000
str.c:100 Buffer too small 0x7fe8

There is no core file in my cwd.
I attached the guestd executable, anyway.

Revision history for this message
Dmitry Torokhov (dtor) wrote :

Does it help if you disable IPv6 in the guest?

Revision history for this message
Martin (agima) wrote :

Unfortunately it doesn't help. I changed the line "alias net-pf-10 ipv6" to "alias net-pf-10 off" in
/etc/modprobe.d/aliases and insured myself that the ipv6 kernel module has not been loaded after reboot. vmware-guestd still crashes after a few seconds.

Revision history for this message
Travis Hegner (thegner) wrote :

I can confirm this bug.

Host: VMWare ESXi 3.5
Guest: Ubuntu Server 32bit

After getting the networking thing fixed from bug #2899921, I am now noticing that the vmware-guestd process crashes after roughly one-two minutes of running apparently normally.

This will continue to happen whether rebooting, running /etc/init.d/open-vm-tools start, or vmware-guestd directly in the foreground or background. My strace also showed the same "Buffer too small" error in str.c.

I can attach an strace if it will help, but it's pretty similar to what Martin attached already.

Thanks,
Travis

Revision history for this message
Adar Dembo (adar-deactivatedaccount) wrote :

Core dumps aren't going to work because I think our core dumping code is busted (see the definition of the Panic_Panic if you're curious).

Travis or Martin: could one of you reproduce this crash within gdb and use symbols to get a more meaningful backtrace? I'm not sure if Ubuntu's vmware-guestd binary is stripped; if it is, you may you need to rebuild vmware-guestd to get a build with embedded symbols. Assuming you have an unstripped vmware-guestd binary, it should just be a matter of 1) running 'gdb vmware-guestd', 2) setting up a breakpoint somewhere in Panic(), 3) 'run', 4) when gdb hits the breakpoint, 'bt' to get the full backtrace. It would be nice to also navigate up the stack to the caller of the Str_ function that panicked and see exactly what line that took place on, and why.

Revision history for this message
Dmitry Torokhov (dtor) wrote :

The binary that Martin posted is stripped, you'll have to recompile the package.

Revision history for this message
Dmitry Torokhov (dtor) wrote :

Oh, there is open-vm-tools-dbg package that has copies of all executables and puts them into /usr/lib/debug/usr/bin/. I suppose these are not stripped.

Revision history for this message
Travis Hegner (thegner) wrote :

I've been trying to follow Adar's intructions, but the vmware-guestd binary in the open-vm-tools-dbg package has some other error causing it not to run at all.

Perhaps I'm doing something wrong though as I'm not familiar with using gdb.

thegner@tmh-dev1:/usr/lib/debug/usr/sbin$ sudo ./vmware-guestd --background /var/run/vmware-guestd.pid
./vmware-guestd: 1: Syntax error: "(" unexpected
thegner@tmh-dev1:/usr/lib/debug/usr/sbin$

And attempting this binary with gdb (if because of symbols I can't run it directly):

thegner@tmh-dev1:/usr/lib/debug/usr/sbin$ sudo gdb ./vmware-guestd
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...
(gdb) b Panic
Breakpoint 1 at 0x8087c60: file panic.c, line 53.
(gdb) enable
(gdb) run --background /var/run/vmware-guestd.pid
Starting program: /usr/lib/debug/usr/sbin/vmware-guestd --background /var/run/vmware-guestd.pid
/bin/bash: /usr/lib/debug/usr/sbin/vmware-guestd: cannot execute binary file
/bin/bash: /usr/lib/debug/usr/sbin/vmware-guestd: Success

Program exited with code 01.
warning: Unable to find dynamic linker breakpoint function.
GDB will be unable to debug shared library initializers
and track explicitly loaded dynamic code.
You can't do that without a process to debug.
(gdb)

Is there something that I am doing wrong, or is the debug binary not compiled properly?

Thanks,

Travis

Revision history for this message
Martin (agima) wrote :

I rebuilt the open-vm-tools package with debug symbols and -O0 and got that far:

GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu"...
(gdb) list Panic
48 */
49
50 void
51 Panic(const char *fmt, // IN: message format
52 ...) // IN: message format arguments
53 {
54 va_list ap;
55
56 va_start(ap, fmt);
57 Panic_Panic(fmt, ap);
(gdb) break 54
Breakpoint 1 at 0x450465: file panic.c, line 54.
(gdb) run
Starting program: /usr/sbin/vmware-guestd
[Thread debugging using libthread_db enabled]
warning: Lowest section in /usr/lib/libicudata.so.38 is .hash at 0000000000000158
[New Thread 0x7fd70c73b6f0 (LWP 14596)]
[Switching to Thread 0x7fd70c73b6f0 (LWP 14596)]

Breakpoint 1, Panic (fmt=0x459385 "%s:%d Buffer too small 0x%x\n") at panic.c:56
56 va_start(ap, fmt);
(gdb) bt
#0 Panic (fmt=0x459385 "%s:%d Buffer too small 0x%x\n") at panic.c:56
#1 0x000000000042aedd in Str_Sprintf (buf=0x7fff147446f0 "00:50:56:aa:4a:55", maxSize=18, fmt=0x457351 "%s\n") at str.c:100
#2 0x0000000000418b0d in ReadInterfaceDetails (entry=0x7fff14744730, arg=0x7fff14748850) at guestInfoPosix.c:202
#3 0x00007fd70baa916a in intf_loop () from /usr/lib/libdumbnet.so.1
#4 0x0000000000418d4f in GuestInfoGetNicInfo (nicInfo=0x7fff14748850) at guestInfoPosix.c:281
#5 0x000000000041722e in GuestInfoGather (clientData=0x0) at guestInfoServer.c:285
#6 0x000000000040d1c2 in EventManager_ProcessNext (eventQueue=0xfc82f0, sleepUsecs=0x7fff14748ae0) at eventManager.c:283
#7 0x000000000040a5d4 in GuestdDaemon (pConfDict=0x7fff14748f08, gDaemonSignalPtr=0x6a63d8) at main.c:1100
#8 0x000000000040a6ed in GuestdDaemonWrapper (pConfDict=0x7fff14748f08) at main.c:1194
#9 0x000000000040ac18 in main (argc=1, argv=0x7fff14749008) at main.c:1504
(gdb) up
#1 0x000000000042aedd in Str_Sprintf (buf=0x7fff06663610 "00:50:56:aa:4a:55", maxSize=18, fmt=0x457351 "%s\n")
    at str.c:100
100 Panic("%s:%d Buffer too small 0x%x\n", __FILE__, __LINE__, stack[-1]);
(gdb) list
95
96 va_start(args,fmt);
97 i = Str_Vsnprintf(buf, maxSize, fmt, args);
98 va_end(args);
99 if (i < 0) {
100 Panic("%s:%d Buffer too small 0x%x\n", __FILE__, __LINE__, stack[-1]);
101 }
102 return i;
103 }
104
(gdb) print i
$1 = -1

I hope this helps.

Revision history for this message
Dmitry Torokhov (dtor) wrote :

Martin,

Thank you very much for the backtrace. The issue is caused by Ubuntu-specific patch:

+--- open-vm-tools-2008.08.08-109361~/lib/guestInfo/guestInfoPosix.c 2008-08-08 07:01:52.000000000 +0000
++++ open-vm-tools-2008.08.08-109361/lib/guestInfo/guestInfoPosix.c 2008-08-15 20:17:39.000000000 +0000
+@@ -199,7 +199,7 @@
+ char macAddress[NICINFO_MAC_LEN];
+ char ipAddress[NICINFO_MAX_IP_LEN];
+
+- Str_Sprintf(macAddress, sizeof macAddress,
++ Str_Sprintf(macAddress, sizeof macAddress, "%s\n",
+ addr_ntoa(&entry->intf_link_addr));
+ nic = GuestInfoAddNicEntry(nicInfo, macAddress);

You need to either change "%s\n" to "%s" or remove it altogether.

Revision history for this message
Alex (alexander-stehlik) wrote :

I can confirm this bug for my configuration:

Ubuntu 8.10 Server
Kernel: 2.6.27-9-server
open-vm-tools: 2008.08.08-109361-1ubuntu2.1

Revision history for this message
Ian Justman (ianj) wrote :

I picked up the source to follow up on Dimitry's suggestion.

First, I attempted to remove particular patch. Build fails because apparently, without the format string, it would generate a warning, and the compiler is being run so that warnings become errors. As such, the formatting string is necessary, so I used his other recommendation: pulling out the newline, leaving just a bare %s. Built it and it seems to work fine; the problem seems to manifest itself when guestd presents the VM's IP information to the host to give to the management interface. I never got this information with the bugged version of guestd with the newline still in (it crashes before I get this info in my browser; it probably crashes while it's in the act of doing so), while I received this information shortly after I fired up guestd.

--Ian.

Revision history for this message
Travis Hegner (thegner) wrote :

So what does it take to get this fix rolled up into the Ubuntu repos?

Revision history for this message
Anders Kaseorg (andersk) wrote :

ubuntu_toolchain_FTBFS.dpatch includes several incorrect changes besides the one pointed out by Dmitry. Here is a debdiff against Intrepid’s open-vm-tools 2008.08.08-109361-1ubuntu2.1 that fixes these problems. This also includes a fix for LP bug #289921.

The fixed package for Intrepid is available in my PPA:
<https://launchpad.net/~anders-kaseorg/+archive/ppa>
Please test.

Jaunty does not include the incorrect patch and should not be affected.

Revision history for this message
William Cattey (wdc-mit) wrote :

I have tested Anders' package. It works for me and gets guestd working.
It would be great to see this in Intrepid and Jaunty.
We're VERY close to a fully working set of open-vm-tools for modern Ubuntu.

Revision history for this message
William Cattey (wdc-mit) wrote :

Oops: Clarification: I have tested Anders' package under Intrepid.
I have confidence it will work under Jaunty, but have not tested it there.

Revision history for this message
Tom Duijf (tom-duijf) wrote :

Ubuntu 8.10 (intrepid) amd64.

Upgraded open-vm-tools with Anders' package (open-vm-tools_2008.08.08-109361-1ubuntu2.2~andersk5_amd64.deb) but not the open-vm-tools (still running on the stock ubuntu open-vm-source package).

Seems to be working nicely

Revision history for this message
Anders Kaseorg (andersk) wrote :

Great, thanks for the feedback.
Note that my most recent debdiff (~andersk5 in my PPA) also fixes bug 302226 and is posted there.

Changed in open-vm-tools:
status: New → Confirmed
Revision history for this message
Ian Justman (ianj) wrote :

Tested working per my post for bug 289921.

--Ian.

Andreas Moog (ampelbein)
Changed in open-vm-tools:
importance: Undecided → Medium
Revision history for this message
Anders Kaseorg (andersk) wrote :

Here is the current debdiff. (This is the same debdiff that has been posted in bug 289921 and bug 302226, and has been in my PPA since 2009-02-20).

Anders Kaseorg (andersk)
description: updated
Evan Broder (broder)
Changed in open-vm-tools (Ubuntu):
status: Confirmed → Fix Released
Changed in open-vm-tools (Ubuntu Intrepid):
status: New → Confirmed
Revision history for this message
Alex Valavanis (valavanisalex) wrote :

Intrepid Ibex reached end-of-life on 30 April 2010 so I am closing the
report. The bug has been fixed in newer releases of Ubuntu.

Changed in open-vm-tools (Ubuntu Intrepid):
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.