null characters in oslo.message cause mistral to fail

Bug #1787900 reported by Danil Zhigalin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mistral
Confirmed
Medium
Unassigned

Bug Description

description of problem:

we are facing problems with RedHat OpenStack Platform 13 overcloud installation. Undercloud has been installed successfully and is operational. We already have experience with osp12 and we reused templates that worked for us in the 12 version with required changes. This is a fresh installation of undercloud - i.e. not an upgrade of an existing osp12 environment.

version-release number of selected component (if applicable):

osp13, mistral 6.0.2, heat 10.0.1, oslo-messaging 5.35.0, python-tripleoclient 9.2.1-13 - these are the components that we are seeing problems with

how reproducible:

steps to reproduce:

1. deployment is started using the following command:

openstack overcloud deploy \
    --templates \
    --libvirt-type kvm \
    --timeout 120 \
    -r /home/stack/templates/roles_data.yaml \
    -e /home/stack/templates/fra1-ost2/network-environment.yaml \
    -e /home/stack/templates/scale-environment.yaml \
    -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
    -e /home/stack/templates/fra1-ost2/overcloud_images.yaml \
    -e /usr/share/openstack-tripleo-heat-templates/environments/host-config-and-reboot.yaml \
    -e /usr/share/openstack-tripleo-heat-templates/environments/neutron-sriov.yaml \
    -e /home/stack/templates/sriov-env.yaml \
    -e /home/stack/templates/extraconfig-environment.yaml \
    --ntp-server 172.16.95.210

2a. in some occasions deployment is running up to this point:

started mistral workflow tripleo.validations.v1.check_pre_deployment_validations. execution id: 7784ff99-6a0e-44fe-8193-adb07eeab91d
waiting for messages on queue 'tripleo' with no timeout.
removing the current plan files
uploading new plan files
started mistral workflow tripleo.plan_management.v1.update_deployment_plan. execution id: 845ae510-efc2-46d9-916a-ca8e7e4bbf2d
('the read operation timed out',)

2b. in some occations deployment is running up to this point:

started mistral workflow tripleo.validations.v1.check_pre_deployment_validations. execution id: 779147aa-3ff8-450b-8588-e9a146a9b016
waiting for messages on queue 'tripleo' with no timeout.
removing the current plan files
uploading new plan files
started mistral workflow tripleo.plan_management.v1.update_deployment_plan. execution id: 6cc91812-cbd1-445f-a82d-03eec9aa8e8d
plan updated.
processing templates in the directory /tmp/tripleoclient-mr0htd/tripleo-heat-templates
unable to establish connection to https://172.21.50.11:13989/v2/action_executions: ('connection aborted.', badstatusline("''",))

actual results:

deployment stops after this point. no stack is created and no attempt of creating the stack was done.

expected results:

deployment should continue and at least reach stack creation stage or gieve some meaningful answer why it can't continue.

additional info:

in case of 2a.:

there is the following trace in /var/log/mistral/engine.log

2018-08-15 12:04:30.088 59391 error oslo.messaging._drivers.impl_rabbit [req-508ca646-1519-4cb5-9c82-5e021f680c16 f1e33f8f3cb947408e8c15f7bc7f9677 bdfdab82c3b04e49a2b06b52a67e955d - - -] failed to process message ... skipping it.: decodeerror: invalid control character at: line 1 column 1694931 (char 1694930)
2018-08-15 12:04:30.088 59391 error oslo.messaging._drivers.impl_rabbit traceback (most recent call last):
2018-08-15 12:04:30.088 59391 error oslo.messaging._drivers.impl_rabbit file "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 370, in _callback
2018-08-15 12:04:30.088 59391 error oslo.messaging._drivers.impl_rabbit self.callback(rabbitmessage(message))
2018-08-15 12:04:30.088 59391 error oslo.messaging._drivers.impl_rabbit file "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 249, in __init__
2018-08-15 12:04:30.088 59391 error oslo.messaging._drivers.impl_rabbit rpc_common.deserialize_msg(raw_message.payload))
2018-08-15 12:04:30.088 59391 error oslo.messaging._drivers.impl_rabbit file "/usr/lib/python2.7/site-packages/kombu/message.py", line 207, in payload
2018-08-15 12:04:30.088 59391 error oslo.messaging._drivers.impl_rabbit return self._decoded_cache if self._decoded_cache else self.decode()
2018-08-15 12:04:30.088 59391 error oslo.messaging._drivers.impl_rabbit file "/usr/lib/python2.7/site-packages/kombu/message.py", line 192, in decode
2018-08-15 12:04:30.088 59391 error oslo.messaging._drivers.impl_rabbit self._decoded_cache = self._decode()
2018-08-15 12:04:30.088 59391 error oslo.messaging._drivers.impl_rabbit file "/usr/lib/python2.7/site-packages/kombu/message.py", line 197, in _decode
2018-08-15 12:04:30.088 59391 error oslo.messaging._drivers.impl_rabbit self.content_encoding, accept=self.accept)
2018-08-15 12:04:30.088 59391 error oslo.messaging._drivers.impl_rabbit file "/usr/lib/python2.7/site-packages/kombu/serialization.py", line 263, in loads
2018-08-15 12:04:30.088 59391 error oslo.messaging._drivers.impl_rabbit return decode(data)
2018-08-15 12:04:30.088 59391 error oslo.messaging._drivers.impl_rabbit file "/usr/lib64/python2.7/contextlib.py", line 35, in __exit__
2018-08-15 12:04:30.088 59391 error oslo.messaging._drivers.impl_rabbit self.gen.throw(type, value, traceback)
2018-08-15 12:04:30.088 59391 error oslo.messaging._drivers.impl_rabbit file "/usr/lib/python2.7/site-packages/kombu/serialization.py", line 54, in _reraise_errors
2018-08-15 12:04:30.088 59391 error oslo.messaging._drivers.impl_rabbit reraise(wrapper, wrapper(exc), sys.exc_info()[2])
2018-08-15 12:04:30.088 59391 error oslo.messaging._drivers.impl_rabbit file "/usr/lib/python2.7/site-packages/kombu/serialization.py", line 50, in _reraise_errors
2018-08-15 12:04:30.088 59391 error oslo.messaging._drivers.impl_rabbit yield
2018-08-15 12:04:30.088 59391 error oslo.messaging._drivers.impl_rabbit file "/usr/lib/python2.7/site-packages/kombu/serialization.py", line 263, in loads
2018-08-15 12:04:30.088 59391 error oslo.messaging._drivers.impl_rabbit return decode(data)
2018-08-15 12:04:30.088 59391 error oslo.messaging._drivers.impl_rabbit file "/usr/lib/python2.7/site-packages/kombu/utils/json.py", line 96, in loads
2018-08-15 12:04:30.088 59391 error oslo.messaging._drivers.impl_rabbit return stdjson.loads(s)
2018-08-15 12:04:30.088 59391 error oslo.messaging._drivers.impl_rabbit file "/usr/lib64/python2.7/json/__init__.py", line 338, in loads
2018-08-15 12:04:30.088 59391 error oslo.messaging._drivers.impl_rabbit return _default_decoder.decode(s)
2018-08-15 12:04:30.088 59391 error oslo.messaging._drivers.impl_rabbit file "/usr/lib64/python2.7/json/decoder.py", line 366, in decode
2018-08-15 12:04:30.088 59391 error oslo.messaging._drivers.impl_rabbit obj, end = self.raw_decode(s, idx=_w(s, 0).end())
2018-08-15 12:04:30.088 59391 error oslo.messaging._drivers.impl_rabbit file "/usr/lib64/python2.7/json/decoder.py", line 382, in raw_decode
2018-08-15 12:04:30.088 59391 error oslo.messaging._drivers.impl_rabbit obj, end = self.scan_once(s, idx)
2018-08-15 12:04:30.088 59391 error oslo.messaging._drivers.impl_rabbit decodeerror: invalid control character at: line 1 column 1694931 (char 1694930)
2018-08-15 12:04:30.088 59391 error oslo.messaging._drivers.impl_rabbit

in case of 2b.:

there is the following trace in /var/log/heat/heat-engine.log:

2018-08-15 11:28:14.077 2880 error oslo.messaging._drivers.impl_rabbit traceback (most recent call last):
2018-08-15 11:28:14.077 2880 error oslo.messaging._drivers.impl_rabbit file "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 370, in _callback
2018-08-15 11:28:14.077 2880 error oslo.messaging._drivers.impl_rabbit self.callback(rabbitmessage(message))
2018-08-15 11:28:14.077 2880 error oslo.messaging._drivers.impl_rabbit file "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 249, in __init__
2018-08-15 11:28:14.077 2880 error oslo.messaging._drivers.impl_rabbit rpc_common.deserialize_msg(raw_message.payload))
2018-08-15 11:28:14.077 2880 error oslo.messaging._drivers.impl_rabbit file "/usr/lib/python2.7/site-packages/kombu/message.py", line 207, in payload
2018-08-15 11:28:14.077 2880 error oslo.messaging._drivers.impl_rabbit return self._decoded_cache if self._decoded_cache else self.decode()
2018-08-15 11:28:14.077 2880 error oslo.messaging._drivers.impl_rabbit file "/usr/lib/python2.7/site-packages/kombu/message.py", line 192, in decode
2018-08-15 11:28:14.077 2880 error oslo.messaging._drivers.impl_rabbit self._decoded_cache = self._decode()
2018-08-15 11:28:14.077 2880 error oslo.messaging._drivers.impl_rabbit file "/usr/lib/python2.7/site-packages/kombu/message.py", line 197, in _decode
2018-08-15 11:28:14.077 2880 error oslo.messaging._drivers.impl_rabbit self.content_encoding, accept=self.accept)
2018-08-15 11:28:14.077 2880 error oslo.messaging._drivers.impl_rabbit file "/usr/lib/python2.7/site-packages/kombu/serialization.py", line 263, in loads
2018-08-15 11:28:14.077 2880 error oslo.messaging._drivers.impl_rabbit return decode(data)
2018-08-15 11:28:14.077 2880 error oslo.messaging._drivers.impl_rabbit file "/usr/lib64/python2.7/contextlib.py", line 35, in __exit__
2018-08-15 11:28:14.077 2880 error oslo.messaging._drivers.impl_rabbit self.gen.throw(type, value, traceback)
2018-08-15 11:28:14.077 2880 error oslo.messaging._drivers.impl_rabbit file "/usr/lib/python2.7/site-packages/kombu/serialization.py", line 54, in _reraise_errors
2018-08-15 11:28:14.077 2880 error oslo.messaging._drivers.impl_rabbit reraise(wrapper, wrapper(exc), sys.exc_info()[2])
2018-08-15 11:28:14.077 2880 error oslo.messaging._drivers.impl_rabbit file "/usr/lib/python2.7/site-packages/kombu/serialization.py", line 50, in _reraise_errors
2018-08-15 11:28:14.077 2880 error oslo.messaging._drivers.impl_rabbit yield
2018-08-15 11:28:14.077 2880 error oslo.messaging._drivers.impl_rabbit file "/usr/lib/python2.7/site-packages/kombu/serialization.py", line 263, in loads
2018-08-15 11:28:14.077 2880 error oslo.messaging._drivers.impl_rabbit return decode(data)
2018-08-15 11:28:14.077 2880 error oslo.messaging._drivers.impl_rabbit file "/usr/lib/python2.7/site-packages/kombu/utils/json.py", line 96, in loads
2018-08-15 11:28:14.077 2880 error oslo.messaging._drivers.impl_rabbit return stdjson.loads(s)
2018-08-15 11:28:14.077 2880 error oslo.messaging._drivers.impl_rabbit file "/usr/lib64/python2.7/json/__init__.py", line 338, in loads
2018-08-15 11:28:14.077 2880 error oslo.messaging._drivers.impl_rabbit return _default_decoder.decode(s)
2018-08-15 11:28:14.077 2880 error oslo.messaging._drivers.impl_rabbit file "/usr/lib64/python2.7/json/decoder.py", line 366, in decode
2018-08-15 11:28:14.077 2880 error oslo.messaging._drivers.impl_rabbit obj, end = self.raw_decode(s, idx=_w(s, 0).end())
2018-08-15 11:28:14.077 2880 error oslo.messaging._drivers.impl_rabbit file "/usr/lib64/python2.7/json/decoder.py", line 382, in raw_decode
2018-08-15 11:28:14.077 2880 error oslo.messaging._drivers.impl_rabbit obj, end = self.scan_once(s, idx)
2018-08-15 11:28:14.077 2880 error oslo.messaging._drivers.impl_rabbit decodeerror: invalid control character at: line 1 column 1053443 (char 1053442)

further investigation shows that there are indeed invalid characters - null characters with code 'x00' in json that is being sent over the messaging bus. they are not present in the initial templates but get introduced by the message issuer. attached is the file with the intercepted message. this characters can be detected using "grep -op "\x00" message_body.txt | wc -l" or they also get showed up in vim. in the attached file first such character is encountered on the position 1694930, which complies with the stack trace referred in the case 2a. in the attached packet capture file amqp_mistral_messages.pcap this character is also seen in frame #90, packet bytes line 0000ddc0, there is this 00 byte.

in this situation it is expected that json parser will fail, since \0 characters are not allowed.

why does message sender introduce this characters and how can this be avoided?

Revision history for this message
Danil Zhigalin (danil.zhigalin) wrote :
Dougal Matthews (d0ugal)
Changed in mistral:
status: New → Confirmed
importance: Undecided → Medium
tags: added: tripleo
Changed in mistral:
milestone: none → stein-1
Ryan Brady (rbrady)
tags: added: d2a0f00e2d178a2b16055daa2d8dc5b9
Dougal Matthews (d0ugal)
Changed in mistral:
milestone: stein-1 → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.