Segfault on rabbitmq-server start

Bug #1634989 reported by bugproxy on 2016-10-19
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
rabbitmq-server (Ubuntu)
Medium
Jon Grimm
Xenial
Medium
Unassigned
Yakkety
Medium
Unassigned

Bug Description

[Impact]

 * rabbitmq-server can segfault along codepath which happens to "open a port with the same fd multiple times". Doing so is undefined (and unsafe in erlang, though segfaulting is unintentinal).

 * This only happens on specific versions of erlang, but the rabbitmq-server code is agreeably incorrect per erlang and has been fixed upstream.

* This only affects xenial & yakkety.

* The codepath belongs to an internal helper for writing to stderr, this prevents useful diagnostic information from being provided to a user.

[Test Case]

 * Make sure your hostname resolves to something unreachable, I've selected 192.168.2.2, install rabbitmq-server, witness segfault.

 * # hostname blah
 * # echo "192.168.2.22 blah" >> /etc/hosts

 * # ping blah
PING blah (192.168.122.2) 56(84) bytes of data.
From x1 (192.168.122.90) icmp_seq=1 Destination Host Unreachable

 * # apt install rabbitmq-server
  ...
Mar 22 22:12:41 blah systemd[1]: Starting RabbitMQ Messaging Server...
Mar 22 22:12:42 blah rabbitmq[17995]: Waiting for rabbit@blah ...
Mar 22 22:12:42 blah rabbitmq[17995]: pid is 18025 ...
Mar 22 22:12:45 blah systemd[1]: rabbitmq-server.service: Main process exited, code=exited, status=1/FAILURE
Mar 22 22:12:46 blah rabbitmq[17995]: Segmentation fault (core dumped)

  ...

 * Expected behavior would be not to segfault, and consequently print out a diagnostic message to stderr:

 * # dpkg -i rabbitmq-server_3.5.7-1ubuntu16.04.1_all.deb
  ...

Mar 22 22:15:16 blah systemd[1]: Starting RabbitMQ Messaging Server...
Mar 22 22:15:16 blah rabbitmq[18365]: Waiting for rabbit@blah ...
Mar 22 22:15:16 blah rabbitmq[18365]: pid is 18386 ...
Mar 22 22:15:19 blah systemd[1]: rabbitmq-server.service: Main process exited, code=exited, status=1/FAILURE
Mar 22 22:15:20 blah rabbitmq[18365]: Error: process_not_running

  ...

 * Note: This just happens to be one error path that happens to hit the format_stderr() helper function.

[Regression Potential]

 * Limited to diagnostic messages path, so its really only seen when something is configured incorrectly. That being said, any execution through this path today will segfault and without any diagnostic information to figure out what, so seems infinitely better.

 * This fix from upstream has been in place over a year without any issue, and was originally code that was working around buggy/flaking erlang library that has (according to upstream reports) been fixed since erlang 17, thus uneeded.

[Other Info]

 * While the rabbitmq-server in trusty has this offending code, the version of erlang does not segfault. Additionally, the fix provided by upstream is not necessarily sufficient on erlang < 17 that is in trusty, so I have not fixed it there.

* Zesty if already fixed.

---Problem Description---
Starting rabbitmq-server triggers segfault.
The segfault happens when the host is not reachable, for instance, which breaks the installation of rabbitmq-server package.
It is comprehensible that an error must occur, but segfault should not be a default behaviour.
This has been tested on 16.04 and 16.10, archs ppc64el and x86_64

---uname output---
Linux vm1 4.8.0-22-generic #24-Ubuntu SMP Sat Oct 8 09:14:41 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux

---Steps to Reproduce---
 #Better reproducible on a machine with 1 cpu

root@yakkety:~# echo "192.168.1.1 blah" >> /etc/hosts
root@yakkety:~# hostname blah
root@yakkety:~# apt-get install rabbitmq-server
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
  rabbitmq-server
0 upgraded, 1 newly installed, 0 to remove and 2 not upgraded.
Need to get 0 B/4,251 kB of archives.
After this operation, 5,243 kB of additional disk space will be used.
Selecting previously unselected package rabbitmq-server.
(Reading database ... 63962 files and directories currently installed.)
Preparing to unpack .../rabbitmq-server_3.5.7-1_all.deb ...
Unpacking rabbitmq-server (3.5.7-1) ...
Processing triggers for ureadahead (0.100.0-19) ...
Setting up rabbitmq-server (3.5.7-1) ...
Created symlink /etc/systemd/system/multi-user.target.wants/rabbitmq-server.service ? /lib/systemd/system/rabbitmq-server.service.
Job for rabbitmq-server.service failed because the control process exited with error code.
See "systemctl status rabbitmq-server.service" and "journalctl -xe" for details.
invoke-rc.d: initscript rabbitmq-server, action "start" failed.
? rabbitmq-server.service - RabbitMQ Messaging Server
   Loaded: loaded (/lib/systemd/system/rabbitmq-server.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Wed 2016-10-19 11:13:46 EDT; 7ms ago
  Process: 2818 ExecStartPost=/usr/lib/rabbitmq/bin/rabbitmq-server-wait (code=exited, status=139)
  Process: 2817 ExecStart=/usr/sbin/rabbitmq-server (code=exited, status=1/FAILURE)
 Main PID: 2817 (code=exited, status=1/FAILURE)

Oct 19 11:13:13 blah systemd[1]: Starting RabbitMQ Messaging Server...
Oct 19 11:13:13 blah rabbitmq[2818]: Waiting for rabbit@blah ...
Oct 19 11:13:13 blah rabbitmq[2818]: pid is 2826 ...
Oct 19 11:13:43 blah systemd[1]: rabbitmq-server.service: Main process exited, code=exited, status=1/FAILURE
Oct 19 11:13:46 blah rabbitmq[2818]: Segmentation fault
Oct 19 11:13:46 blah systemd[1]: rabbitmq-server.service: Control process exited, code=exited status=139
Oct 19 11:13:46 blah systemd[1]: Failed to start RabbitMQ Messaging Server.
Oct 19 11:13:46 blah systemd[1]: rabbitmq-server.service: Unit entered failed state.
Oct 19 11:13:46 blah systemd[1]: rabbitmq-server.service: Failed with result 'exit-code'.
dpkg: error processing package rabbitmq-server (--configure):
 subprocess installed post-installation script returned error exit status 1
Processing triggers for systemd (231-9git1) ...
Processing triggers for man-db (2.7.5-1) ...
Processing triggers for ureadahead (0.100.0-19) ...
Errors were encountered while processing:
 rabbitmq-server
E: Sub-process /usr/bin/dpkg returned an error code (1)

root@yakkety:~# dmesg -T
[Wed Oct 19 11:11:55 2016] async_10[2334]: unhandled signal 11 at 0000000000000000 nip 00000000206867bc lr 0000000020635648 code 30001
[Wed Oct 19 11:13:02 2016] random: crng init done
[Wed Oct 19 11:13:02 2016] systemd[1]: apt-daily.timer: Adding 3h 37min 32.381328s random time.
[Wed Oct 19 11:13:02 2016] systemd[1]: apt-daily.timer: Adding 11h 5min 8.314218s random time.
[Wed Oct 19 11:13:02 2016] systemd[1]: apt-daily.timer: Adding 11h 7min 37.045127s random time.
[Wed Oct 19 11:13:03 2016] systemd[1]: apt-daily.timer: Adding 8h 43min 50.771575s random time.
[Wed Oct 19 11:13:03 2016] systemd[1]: apt-daily.timer: Adding 2h 31min 33.179443s random time.
[Wed Oct 19 11:13:04 2016] systemd[1]: apt-daily.timer: Adding 4h 22min 42.585438s random time.
[Wed Oct 19 11:13:04 2016] systemd[1]: apt-daily.timer: Adding 36min 58.644429s random time.
[Wed Oct 19 11:13:04 2016] systemd[1]: apt-daily.timer: Adding 9h 16min 4.769857s random time.
[Wed Oct 19 11:13:12 2016] systemd[1]: apt-daily.timer: Adding 7h 48min 614.372ms random time.
[Wed Oct 19 11:13:12 2016] systemd[1]: apt-daily.timer: Adding 3h 13min 41.779132s random time.
[Wed Oct 19 11:13:12 2016] systemd[1]: apt-daily.timer: Adding 9h 39min 46.023823s random time.
[Wed Oct 19 11:13:45 2016] async_10[2912]: unhandled signal 11 at 0000000000000000 nip 000000004f0d67bc lr 000000004f085648 code 30001
[Wed Oct 19 11:13:45 2016] systemd[1]: apt-daily.timer: Adding 9h 5min 5.067674s random time.

Userspace tool common name: rabbitmq-server

The userspace tool has the following bit modes: 64

Userspace package: rabbitmq-server

I have just tested the patch in https://github.com/rabbitmq/rabbitmq-common/pull/54, which is present on v3.6.1 and prevents the segfault. The patch works and can be easily backported.
Thanks

bugproxy (bugproxy) on 2016-10-19
tags: added: architecture-ppc64le bugnameltc-147708 severity-medium targetmilestone-inin1610
Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
affects: ubuntu → rabbitmq-server (Ubuntu)
Nish Aravamudan (nacc) wrote :

Hello and thank you for filing this bug; given the testing of the upstream fix, this seems like an easy change to backport to xenial. I will add this to the server team backlog. Note that X, Y and Z will need the fix, as Z only has 3.5.7-1 currently.

tags: added: bitesize server-next
Changed in rabbitmq-server (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
Manoj Iyer (manjo) wrote :

Since server team is already got this on their TODO I am removing Taco Screen Team and assigning it to canonical-server team.

Changed in rabbitmq-server (Ubuntu):
assignee: Taco Screen team (taco-screen-team) → Canonical Server Team (canonical-server)
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in rabbitmq-server (Ubuntu Xenial):
status: New → Confirmed
Changed in rabbitmq-server (Ubuntu Yakkety):
status: New → Confirmed
Jon Grimm (jgrimm) on 2017-03-20
Changed in rabbitmq-server (Ubuntu):
assignee: Canonical Server Team (canonical-server) → Jon Grimm (jgrimm)
Jon Grimm (jgrimm) wrote :

Zesty now at rabbitmq-server, verified that the referenced patch is contained in the source already and the testcase provided passes (no segfault observed), so marking the development task as released already.

I've addition built a xenial deb with this fix and observed that th. e segfault vanishes with this patch.

Jon Grimm (jgrimm) on 2017-03-20
Changed in rabbitmq-server (Ubuntu):
status: Triaged → Fix Released
Jon Grimm (jgrimm) wrote :

xenial debdiff. upstream cherrypick.

Jon Grimm (jgrimm) wrote :

yakkety debdiff. upstream cherrypick.

Changed in rabbitmq-server (Ubuntu Xenial):
status: Confirmed → In Progress
Changed in rabbitmq-server (Ubuntu Yakkety):
status: Confirmed → In Progress
Jon Grimm (jgrimm) wrote :

Did a bit more reading into the root cause; seems to be that it is undefined behavior in erlang when one opens a port multiple times with the same fd. This is path that rabbitmq-server currently triggers, and in _some_ versions of erlang this segfaults.

A standalone example of the faulty code in rabbitmq-server is:

erl -noshell -eval 'Port = open_port({fd, 0, 2}, [out]), Port2 = open_port({fd, 2, 2}, [out]), port_command(Port, "a"), port_close(Port), erlang:halt(10)'

Standalone test:
Trusty: OK, Xenial: segfault Yakkety: segfault, Zesty: OK

rabbitmq-server has offending code:
Trusty: yes Xenial: yes, Yakkety: yes, Zesty: no

Note: The rabbitmq-server offending code path is essentially anything that uses the format_stderr(Fmt, Args) function helper function. The testcase provided in #1 is just a single specifc instance that could trigger the segfault. IOW, the bug is somewhat more broad of a bug than that testcase and description, thus more interesting to SRU a fix into Xenial/Yakkety.

As yakkety and xenial contain both a rabbitmq-server with the offending code && an erlang that will segfault with it, we should SRU there.

While trusty contains the offending codepath 1) it cannot be triggered with the erlang version in trusty (1.16.x) and 2) the proposed upstream commit for the fix makes claim that it is safe with changes now made erlang-17 or later, so this fix is not certain to not cause other issues on trusty. IOW, best to leave trusty alone.

Jon Grimm (jgrimm) on 2017-03-22
description: updated
Jon Grimm (jgrimm) wrote :

Added SRU template, seeking sponsorship.

Changed in rabbitmq-server (Ubuntu Xenial):
importance: Undecided → Medium
Changed in rabbitmq-server (Ubuntu Yakkety):
importance: Undecided → Medium
Jon Grimm (jgrimm) wrote :

xenial debdiff version number tweaked per review by nacc.

Jon Grimm (jgrimm) wrote :

yakkety debdiff with version number tweak per nacc review

Default Comment by Bridge

Default Comment by Bridge

Jon Grimm (jgrimm) wrote :

xenial debdiff. tweaked changelog.

Jon Grimm (jgrimm) wrote :

yakkety debdiff

Default Comment by Bridge

Default Comment by Bridge

Hello bugproxy, or anyone else affected,

Accepted rabbitmq-server into yakkety-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/rabbitmq-server/3.5.7-1ubuntu0.16.10.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in rabbitmq-server (Ubuntu Yakkety):
status: In Progress → Fix Committed
tags: added: verification-needed
Changed in rabbitmq-server (Ubuntu Xenial):
status: In Progress → Fix Committed
Brian Murray (brian-murray) wrote :

Hello bugproxy, or anyone else affected,

Accepted rabbitmq-server into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/rabbitmq-server/3.5.7-1ubuntu0.16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Jon Grimm (jgrimm) wrote :

Verified Yakkety fixed with testcase from bug description with:

Setting up rabbitmq-server (3.5.7-1ubuntu0.16.10.1)

tags: added: verification-done-yakkety verification-needed-xenial
removed: verification-needed
Jon Grimm (jgrimm) wrote :

Verified Xenial fixed using testcase from bug description with:

Setting up rabbitmq-server (3.5.7-1ubuntu0.16.04.1) ...

no segfault \o/

tags: added: verification-done-xenial
removed: verification-needed-xenial
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package rabbitmq-server - 3.5.7-1ubuntu0.16.04.1

---------------
rabbitmq-server (3.5.7-1ubuntu0.16.04.1) xenial; urgency=medium

  * debian/patches/0001-Remove-custom-stderr-formatting.patch: [PATCH]
    Remove custom stderr formatting. Thanks to Alexey Lebedeff
    <email address hidden>. Closes LP: #1634989.

 -- Jon Grimm <email address hidden> Tue, 28 Mar 2017 15:59:39 -0700

Changed in rabbitmq-server (Ubuntu Xenial):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for rabbitmq-server has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package rabbitmq-server - 3.5.7-1ubuntu0.16.10.1

---------------
rabbitmq-server (3.5.7-1ubuntu0.16.10.1) yakkety; urgency=medium

  * debian/patches/0001-Remove-custom-stderr-formatting.patch: [PATCH]
    Remove custom stderr formatting. Thanks to Alexey Lebedeff
    <email address hidden>. Closes LP: #1634989.

 -- Jon Grimm <email address hidden> Tue, 28 Mar 2017 16:06:38 -0700

Changed in rabbitmq-server (Ubuntu Yakkety):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers