persisting OpenStack metadata fails

Bug #1801364 reported by Robert Schweikert on 2018-11-02
18
This bug affects 4 people
Affects Status Importance Assigned to Milestone
cloud-init
Undecided
Unassigned

Bug Description

Persistsing OpenStack metadata may fail.

OpenStack has an entry named "random_seed" in the metadata [1]. This entry is treaded specially in the openstack.py helper when the metadat is read [2]. The attempt to decode the read data my result in a string that is not utf-8 compliant which eventually results in an error when we attempt to persist the metadata as follows:

2018-11-02 12:14:13,755 - __init__.py[WARNING]: Error persisting instance-data.json: 'utf8' codec can't decode byte 0xc8 in position 2: invalid continuation byte

[1] https://docs.openstack.org/nova/latest/user/metadata-service.html
[2] https://github.com/cloud-init/cloud-init/blob/master/cloudinit/sources/helpers/openstack.py#L294

Related branches

Robert Schweikert (rjschwei) wrote :

Example data for the case that triggered this issue:

From debug output inserted into util.load_json, which is called from read_v2() in openstack.py

"random_seed": "Cr7LysUKbg/wjTuPKS2qq7rZ9BCWJQAnckkvxCv7fmfOWmHARAWj8mPEfIT+tl8Q1v9Kz3QMKBC4SfgfjLF+doX84pjcDw87lL4UvDzzQfANmvUPeQSc8wpbMe9l74hDEvd/U7Rf1/ag7E/tlQZRwlhQFcHrMI1TUSLKe70vSDspK0NNB1p1GnwKB1Mp683iBtWcSW9zURrdXmYGvl+CvW/EJcesbC59XPX4AezVqejKwI4TtFLDQZ+A9HNLJOXi4E8/gFQO2hEhD56vDzqh1VZNGj2aUdJ7gBoBBE3qtEwdaAkwWS2hS6BV6e1kLgio2M6r/Qz44uf8im9JJdyFlDsVycr6+ZOdCIQTHdEhVQnjwvkxaZFFxp4CaVhzb75FgESkUoNzr/Hr7GILAThJvPLB+a94Q8uSjElSP9r5+0LbB6CGNvEAXDE0IlupzltNS1K9gHVm7mDziA7wU6jtNrMb22gXgVs4pi3QUJlBZYwR9dIPZIRwBCCrzWtyuW+0LMZmoADSosJ1ixFfX34thN/6CmVSjqOZNErBEcSnJ6apSQ8dpTgdn+yORZDs7O1cGGOzkMf4zS9t4TdLpmPTUyfFBi8ykjPLb0Ec8eAAAK/FwUN30LOqJ3GPRI9SN6UBu+6feg96NAu9Surwpkucdg3L94Z0hNi21E7G9ET19Gw="

After processing this turned into:

u'random_seed': '\xb8!\')\xe0\x0e%<\xac)E\xf6\xe0.kt>V0\x88\xdf\xb0\x02:\xbc\x192F+)P5\x9eL\xab\xbd\xf5\xa2g7\x02J\xfb\xd6\x14>^\xe9\x87\xf1\xa1LJ\xd1\xbc\xe3y\x0b\xd73\x93\xc3\xb8%3\xac_\x1cN\x02F\xf3\xec\xdaA.^\x7f\x1e\xf2\x958\x7f\xc8_f\x0c\xf3\x9e\xce\xb3\xe7>\xfa\x9a\xe6\xbbX/\x01>\r\xc8\xa1\xb1\x852\x05@:\xbb\xa3\x98\xa1\xf7\x11\x82\xbeCN\xe9.\xd7\xa6\x85\xf9\x0b\t\xa7\xcd\xc3U\'\xac\xaa\xc2\xea\r\x11\xe1\xdc\'\x0c\x9a\x7f\x85\xcc\xe3\xac]4S\xad\x08V\xcc\xba\xf6\xf2\x89\xc40\xad\xfc%\xab?\tP\xc9\x82\x87}z\x91\xd6\x94\x17\x8e\x83\xa3\x15\xc5\xfc\x01\x81E\xc0w|\x98Q4=\x9f\x07\xf1\xa3ot\xc6\x87X\xd0\xe1\x93\xae\xbe\x0f\\\xd3\x08VU\x1d\xf8H\xbb\xa4/\xc1\x96\x83\xb7\xed\x89\xb7\xa2\x10O\x1c*\xc4\xbc\xf6*2\x9cq\x9c\x10\xd8q\x93\xec\xd2\x0e\xd3\xb3\xed\xa2\xf8\x8b\xcb1VT\x9f 4\xdeg;\xf5\xe3\xdb\xd9\xb7\xe7\x90\xf3@\xf4qn\xcf\xf0x\xed*X`\x13YS\x12\xa6&\xfe\xb5)/\xf5|\x8bp\t\xc6\xfb3\x03/\xaf\xefa\x1b+\xa2F7\x10\x18\x91?&\xe8\x88\x1bY~%|\xb1\xe8\x1c\x96\xdf\xa1\xabH\xf7\xab\xb8\xc7\xdfgT#\xf2\xbd\x198j%\xba\x1f\xea\x96\x1e\r_\xaeRe)\xb5"-\xbf\xe7\xb9\x8c\xdb$\xf2{\x98\x04o\xd7D\x86\xce\x86\xf7F\xa4\x8e:OZv@K0\x0c\x0e\xf4\xf6\xce\x9c\xc8\x8ai\xa1\xf1Ec\x95U\x88.\xc1\xce\xb5\x92\x98\xb5\xe7\xf3\xc4\xd5\xc2\x1fR\xe5`-\xbd\xeaBC\xc0\xad\xdda\xb0oX\xf2b\xf88a\x12\x94R^\x0812\x7f\xa2v\x06Q\xc8\x13X\x8e\x8d-\x82\x1b\x82\x17\x18\x1c\xdd>\x8c\x13\x190\\"z\x8f\xa6\x18\xf3\xbf\x9e\x95\xa7\x8d\x89\x91\x83\x1a\xe5\xbe\xcfK\x08\xff[!\xe5\xba\x9f\xfah~\x85\x16\x18\xd9GX\x9f\x0c\xa4\xa9\xb7\xdcn\xd2\xaf5\xc6\xbc', u'uuid': u'4f9baeab-1f8c-48ca-8766-37c4c59927cf'

per debug output inserted into _crawl_metadata(). With this data in the metadata dictionary persisting the data via persist_instance_data() fails with the message shown in the description.

The question is whether "random_seed" should be persisted in the first place. The value for "random_seed" is different every time the metadata server is accessed. Thus the value is not useful for comparison. However I do not know for what other reasons the persisted data would be used.

Robert Schweikert (rjschwei) wrote :

The patch is one possible implementation of avoiding the utf-8 error when persisting the metadata.

Depending on whether random_seed is actually useful to persist it may also be possible to simply drop the data in openstack.py

if 'random_seed' in metadata:
    del metadata['random_seed']

Scott Moser (smoser) on 2018-11-02
summary: - persisting OpenStack metadat fails
+ persisting OpenStack metadata fails
Scott Moser (smoser) wrote :

Robert,
Can you please attach logs?

cloud-init collect-logs

The 'persist_instance_data' in DataSource tries to accomodate for this.

it uses util.json_dumps which uses json_serialize_default.

Here is an example:

$ python3 -c 'from cloudinit import util; print(util.json_dumps({"bin": open("/dev/urandom", "rb").read(8)}))'
{
 "bin": "ci-b64:doWuT19jT3g="
}

Notice that the dump adds 'ci-b64' in front of the value.

I'm guessing that there is a differnce in python versions that is causing a problem.

Please attach logs.
thanks.

Changed in cloud-init:
status: New → Incomplete
Robert Schweikert (rjschwei) wrote :

Yes, sorry, this is Python version dependent, fails with Python 2.7, works with Python 3.

Robert Schweikert (rjschwei) wrote :

Log file that shows the failure.

Note in this case we ran with cloud-init 18.2 but issue reproduces the same way in 18.4, to which the attached patch applies.

As stated in previous comment this happens with Python 2.7, Python 3.6 works with no modification.

Scott Moser (smoser) wrote :

I'm attaching a test case that shows the failure on python2.7.
The issue is in the way that we're serializing binary data into
json. The implementation we have (using default=json_serialize_default)
just does not work in python2.7.

I recall that I had seen a stackexchange question basically covering
this and the dsubmitter eventually just throwing up their hands.

This adds the test case to show the problem, but does not do anything to fix it.

Changed in cloud-init:
status: Incomplete → Confirmed
Scott Moser (smoser) wrote :

If someone fixes this correctly/generically then they should probably remove the Azure specific fix at https://code.launchpad.net/~jasonzio/cloud-init/+git/cloud-init/+merge/365065 .

There is good discussion on the general dumps/JsonEncoder issue at https://stackoverflow.com/questions/16405969/how-to-change-json-encoding-behaviour-for-serializable-python-object

Just to be clear, our python 3 implementation works as designed.

Marcus Furlong (furlongm) wrote :

The patch in https://bugs.launchpad.net/cloud-init/+bug/1801364/comments/2 fixes this issue for me using python2.7 on rhel7 and centos7.

This bug is fixed with commit 067516d7 to cloud-init on branch master.
To view that commit see the following URL:
https://git.launchpad.net/cloud-init/commit/?id=067516d7

Changed in cloud-init:
status: Confirmed → Fix Committed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments