linuxbridge agent crash after R ->S upgrade

Bug #1844822 reported by Rick Cano
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
oslo.privsep
Fix Released
Undecided
Unassigned

Bug Description

After upgrading neutron from Rocky to Stein (openstack-ansible deployment on ubuntu 16) I ran into an issue where the linuxbridge agent would crash on startup:

root@bctlpicrouter01:/var/log/neutron# /openstack/venvs/neutron-19.0.4.dev1/bin/neutron-linuxbridge-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ml2_conf.ini --config-file /etc/neutron/plugins/ml2/linuxbridge_agent.ini

Exception in thread privsep_reader:

Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/openstack/venvs/neutron-19.0.4.dev1/lib/python2.7/site-packages/oslo_privsep/comm.py", line 130, in _reader_main
    for msg in reader:
  File "/openstack/venvs/neutron-19.0.4.dev1/lib/python2.7/site-packages/six.py", line 564, in next
    return type(self).__next__(self)
  File "/openstack/venvs/neutron-19.0.4.dev1/lib/python2.7/site-packages/oslo_privsep/comm.py", line 77, in __next__
    return next(self.unpacker)
  File "msgpack/_unpacker.pyx", line 562, in msgpack._cmsgpack.Unpacker.__next__
  File "msgpack/_unpacker.pyx", line 493, in msgpack._cmsgpack.Unpacker._unpack
ValueError: 1870054 exceeds max_bin_len(1048576)

I was able to get around this problem by downgrading msgpack from 0.6.1 to 0.5.6

Revision history for this message
Brian Haley (brian-haley) wrote :

msgpack is actually included as a requirement of oslo.privsep, with the following requirement:

msgpack>=0.5.0 # Apache-2.0

From the changelog at https://github.com/msgpack/msgpack-python/blob/master/ChangeLog.rst it looks like max_bin_len was removed in 0.6.1, so perhaps there needs to be a limit set the version.

Will re-assign to oslo.privsep for further investigation.

affects: neutron → oslo.privsep
Revision history for this message
Rick Cano (canori01) wrote :

It days there that it's a "document only deprecation. Does that mean that it's still in the code and just discouraging ots use or that it has actually been removed?

Also, I should note that I upgraded another environment prior to this one (also an R->S upgrade), same version of msgpack but did not run into this issue there.

The environment that ran into the problem had far more neutron ports, tap interfaces, nsmespaces and linux bridges to wire up. So I wonder if that's why it ran into that limit while the other environment didn't.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

IHMO this might be that this error is side effect of some other issue. Maybe You hit this error while trying to handle some other, real issue? Can You maybe attach more neutron logs from this agent?

Revision history for this message
Rick Cano (canori01) wrote :

I'm attaching the logs. They're pretty large. I was experiencing some issues with the rootwrap filter. You helped me with that on irc, but I'm not sure if that was related to this because while implementing the rootwrap filter you suggested took care of the permission errors I was seeing, the linuxbridgeagent was still crashing after fixing that.

If you like, I could try bumping the version of msgpack back up on one of my three router nodes on and generate a new fresh set of logs as I'm sure the ones I'm attaching must be very convoluted

Revision history for this message
Nathaniel Sherry (nsherry4) wrote :

We were having the same issue on Stein:

Exception in thread privsep_reader:
Traceback (most recent call last):
File "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
self.run()
File "/usr/lib64/python2.7/threading.py", line 765, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/lib/python2.7/site-packages/oslo_privsep/comm.py", line 130, in _reader_main
for msg in reader:
File "/usr/lib/python2.7/site-packages/six.py", line 564, in next
return type(self).__next__(self)
File "/usr/lib/python2.7/site-packages/oslo_privsep/comm.py", line 77, in __next__
return next(self.unpacker)
File "msgpack/_unpacker.pyx", line 562, in msgpack._cmsgpack.Unpacker.__next__
File "msgpack/_unpacker.pyx", line 493, in msgpack._cmsgpack.Unpacker._unpack
ValueError: 1068129 exceeds max_bin_len(1048576)

We were able to fix it by increasing the max_buffer_size of the msgpack.Unpacker created by oslo_privsep.comm.Deserializer from the default 1024*1024 to something larger:

class Deserializer(six.Iterator):
    def __init__(self, readsock):
        self.readsock = readsock
        self.unpacker = msgpack.Unpacker(use_list=False, encoding='utf-8',
                                         unicode_errors='surrogateescape',
                                         max_buffer_size=1024*1024*8)

Revision history for this message
Rick Cano (canori01) wrote :

Ah,thanks. We're actually still seeing this issue as of the Train release. We'll see about Ussuri soon.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.privsep (master)
Changed in oslo.privsep:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.privsep (master)

Reviewed: https://review.opendev.org/c/openstack/oslo.privsep/+/819996
Committed: https://opendev.org/openstack/oslo.privsep/commit/c223dbced7d5a8d1920fe764cbce42cf844538e1
Submitter: "Zuul (22348)"
Branch: master

commit c223dbced7d5a8d1920fe764cbce42cf844538e1
Author: Mohammed Naser <email address hidden>
Date: Wed Dec 1 11:19:26 2021 +0400

    Bump max_buffer_size for Deserializer

    Since msgpack 0.6.0, some limits were introduced for the
    deserializer which were put in to avoid any denial of service
    attacks using msgpack. These limits were raised to 100MiB
    in the release of msgpack 1.0.0.

    The default buffer sizes that were implemented were quite low
    and when running certain `privsep` commands, especially for
    Neutron when using linux bridge, where there is a large amount
    of netdevs, privsep would crash since msgpack would fail to
    decode the message since it considers it too big:

      ValueError: 1174941 exceeds max_str_len(1048576)

    In this commit, the `max_buffer_size` is bumped to the value
    that ships with msgpack==1.0.0 to allow for users who don't
    have that to continue to function. Also, since `msgpack` is
    only being used by the internal API, we're not worried about
    a third party coming in and overwhelming the system by
    deserializing calls.

    This fix also addresses some weird behaviour where privsep
    will die and certain OpenStack agents would start to behave
    in a strange way once they hit a certain number of ports (since
    any privsep calls would start to fail).

    Closes-Bug: #1844822
    Closes-Bug: #1896734
    Related-Bug: #1928764
    Closes-Bug: #1952611
    Change-Id: I135917522daff95377d07566317ef0fc0d16e7cb

Changed in oslo.privsep:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/oslo.privsep 2.8.0

This issue was fixed in the openstack/oslo.privsep 2.8.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.