persisting OpenStack metadata fails

Bug #1801364 reported by Robert Schweikert on 2018-11-02
22
This bug affects 5 people
Affects Status Importance Assigned to Milestone
cloud-init
Undecided
Unassigned

Bug Description

Persistsing OpenStack metadata may fail.

OpenStack has an entry named "random_seed" in the metadata [1]. This entry is treaded specially in the openstack.py helper when the metadat is read [2]. The attempt to decode the read data my result in a string that is not utf-8 compliant which eventually results in an error when we attempt to persist the metadata as follows:

2018-11-02 12:14:13,755 - __init__.py[WARNING]: Error persisting instance-data.json: 'utf8' codec can't decode byte 0xc8 in position 2: invalid continuation byte

[1] https://docs.openstack.org/nova/latest/user/metadata-service.html
[2] https://github.com/cloud-init/cloud-init/blob/master/cloudinit/sources/helpers/openstack.py#L294

Related branches

Robert Schweikert (rjschwei) wrote :

Example data for the case that triggered this issue:

From debug output inserted into util.load_json, which is called from read_v2() in openstack.py

"random_seed": "Cr7LysUKbg/wjTuPKS2qq7rZ9BCWJQAnckkvxCv7fmfOWmHARAWj8mPEfIT+tl8Q1v9Kz3QMKBC4SfgfjLF+doX84pjcDw87lL4UvDzzQfANmvUPeQSc8wpbMe9l74hDEvd/U7Rf1/ag7E/tlQZRwlhQFcHrMI1TUSLKe70vSDspK0NNB1p1GnwKB1Mp683iBtWcSW9zURrdXmYGvl+CvW/EJcesbC59XPX4AezVqejKwI4TtFLDQZ+A9HNLJOXi4E8/gFQO2hEhD56vDzqh1VZNGj2aUdJ7gBoBBE3qtEwdaAkwWS2hS6BV6e1kLgio2M6r/Qz44uf8im9JJdyFlDsVycr6+ZOdCIQTHdEhVQnjwvkxaZFFxp4CaVhzb75FgESkUoNzr/Hr7GILAThJvPLB+a94Q8uSjElSP9r5+0LbB6CGNvEAXDE0IlupzltNS1K9gHVm7mDziA7wU6jtNrMb22gXgVs4pi3QUJlBZYwR9dIPZIRwBCCrzWtyuW+0LMZmoADSosJ1ixFfX34thN/6CmVSjqOZNErBEcSnJ6apSQ8dpTgdn+yORZDs7O1cGGOzkMf4zS9t4TdLpmPTUyfFBi8ykjPLb0Ec8eAAAK/FwUN30LOqJ3GPRI9SN6UBu+6feg96NAu9Surwpkucdg3L94Z0hNi21E7G9ET19Gw="

After processing this turned into:

u'random_seed': '\xb8!\')\xe0\x0e%<\xac)E\xf6\xe0.kt>V0\x88\xdf\xb0\x02:\xbc\x192F+)P5\x9eL\xab\xbd\xf5\xa2g7\x02J\xfb\xd6\x14>^\xe9\x87\xf1\xa1LJ\xd1\xbc\xe3y\x0b\xd73\x93\xc3\xb8%3\xac_\x1cN\x02F\xf3\xec\xdaA.^\x7f\x1e\xf2\x958\x7f\xc8_f\x0c\xf3\x9e\xce\xb3\xe7>\xfa\x9a\xe6\xbbX/\x01>\r\xc8\xa1\xb1\x852\x05@:\xbb\xa3\x98\xa1\xf7\x11\x82\xbeCN\xe9.\xd7\xa6\x85\xf9\x0b\t\xa7\xcd\xc3U\'\xac\xaa\xc2\xea\r\x11\xe1\xdc\'\x0c\x9a\x7f\x85\xcc\xe3\xac]4S\xad\x08V\xcc\xba\xf6\xf2\x89\xc40\xad\xfc%\xab?\tP\xc9\x82\x87}z\x91\xd6\x94\x17\x8e\x83\xa3\x15\xc5\xfc\x01\x81E\xc0w|\x98Q4=\x9f\x07\xf1\xa3ot\xc6\x87X\xd0\xe1\x93\xae\xbe\x0f\\\xd3\x08VU\x1d\xf8H\xbb\xa4/\xc1\x96\x83\xb7\xed\x89\xb7\xa2\x10O\x1c*\xc4\xbc\xf6*2\x9cq\x9c\x10\xd8q\x93\xec\xd2\x0e\xd3\xb3\xed\xa2\xf8\x8b\xcb1VT\x9f 4\xdeg;\xf5\xe3\xdb\xd9\xb7\xe7\x90\xf3@\xf4qn\xcf\xf0x\xed*X`\x13YS\x12\xa6&\xfe\xb5)/\xf5|\x8bp\t\xc6\xfb3\x03/\xaf\xefa\x1b+\xa2F7\x10\x18\x91?&\xe8\x88\x1bY~%|\xb1\xe8\x1c\x96\xdf\xa1\xabH\xf7\xab\xb8\xc7\xdfgT#\xf2\xbd\x198j%\xba\x1f\xea\x96\x1e\r_\xaeRe)\xb5"-\xbf\xe7\xb9\x8c\xdb$\xf2{\x98\x04o\xd7D\x86\xce\x86\xf7F\xa4\x8e:OZv@K0\x0c\x0e\xf4\xf6\xce\x9c\xc8\x8ai\xa1\xf1Ec\x95U\x88.\xc1\xce\xb5\x92\x98\xb5\xe7\xf3\xc4\xd5\xc2\x1fR\xe5`-\xbd\xeaBC\xc0\xad\xdda\xb0oX\xf2b\xf88a\x12\x94R^\x0812\x7f\xa2v\x06Q\xc8\x13X\x8e\x8d-\x82\x1b\x82\x17\x18\x1c\xdd>\x8c\x13\x190\\"z\x8f\xa6\x18\xf3\xbf\x9e\x95\xa7\x8d\x89\x91\x83\x1a\xe5\xbe\xcfK\x08\xff[!\xe5\xba\x9f\xfah~\x85\x16\x18\xd9GX\x9f\x0c\xa4\xa9\xb7\xdcn\xd2\xaf5\xc6\xbc', u'uuid': u'4f9baeab-1f8c-48ca-8766-37c4c59927cf'

per debug output inserted into _crawl_metadata(). With this data in the metadata dictionary persisting the data via persist_instance_data() fails with the message shown in the description.

The question is whether "random_seed" should be persisted in the first place. The value for "random_seed" is different every time the metadata server is accessed. Thus the value is not useful for comparison. However I do not know for what other reasons the persisted data would be used.

Robert Schweikert (rjschwei) wrote :

The patch is one possible implementation of avoiding the utf-8 error when persisting the metadata.

Depending on whether random_seed is actually useful to persist it may also be possible to simply drop the data in openstack.py

if 'random_seed' in metadata:
    del metadata['random_seed']

Scott Moser (smoser) on 2018-11-02
summary: - persisting OpenStack metadat fails
+ persisting OpenStack metadata fails
Scott Moser (smoser) wrote :

Robert,
Can you please attach logs?

cloud-init collect-logs

The 'persist_instance_data' in DataSource tries to accomodate for this.

it uses util.json_dumps which uses json_serialize_default.

Here is an example:

$ python3 -c 'from cloudinit import util; print(util.json_dumps({"bin": open("/dev/urandom", "rb").read(8)}))'
{
 "bin": "ci-b64:doWuT19jT3g="
}

Notice that the dump adds 'ci-b64' in front of the value.

I'm guessing that there is a differnce in python versions that is causing a problem.

Please attach logs.
thanks.

Changed in cloud-init:
status: New → Incomplete
Robert Schweikert (rjschwei) wrote :

Yes, sorry, this is Python version dependent, fails with Python 2.7, works with Python 3.

Robert Schweikert (rjschwei) wrote :

Log file that shows the failure.

Note in this case we ran with cloud-init 18.2 but issue reproduces the same way in 18.4, to which the attached patch applies.

As stated in previous comment this happens with Python 2.7, Python 3.6 works with no modification.

Scott Moser (smoser) wrote :

I'm attaching a test case that shows the failure on python2.7.
The issue is in the way that we're serializing binary data into
json. The implementation we have (using default=json_serialize_default)
just does not work in python2.7.

I recall that I had seen a stackexchange question basically covering
this and the dsubmitter eventually just throwing up their hands.

This adds the test case to show the problem, but does not do anything to fix it.

Changed in cloud-init:
status: Incomplete → Confirmed
Scott Moser (smoser) wrote :

If someone fixes this correctly/generically then they should probably remove the Azure specific fix at https://code.launchpad.net/~jasonzio/cloud-init/+git/cloud-init/+merge/365065 .

There is good discussion on the general dumps/JsonEncoder issue at https://stackoverflow.com/questions/16405969/how-to-change-json-encoding-behaviour-for-serializable-python-object

Just to be clear, our python 3 implementation works as designed.

Marcus Furlong (furlongm) wrote :

The patch in https://bugs.launchpad.net/cloud-init/+bug/1801364/comments/2 fixes this issue for me using python2.7 on rhel7 and centos7.

This bug is fixed with commit 067516d7 to cloud-init on branch master.
To view that commit see the following URL:
https://git.launchpad.net/cloud-init/commit/?id=067516d7

Changed in cloud-init:
status: Confirmed → Fix Committed

I'm trying to understand how this bug is considered fixed by the following commit; https://git.launchpad.net/cloud-init/commit/?id=067516d7

We have resolved the issue by just not base64 decoding;
- if 'random_seed' in metadata:
- random_seed = metadata['random_seed']
- try:
- metadata['random_seed'] = base64.b64decode(random_seed)
- except (ValueError, TypeError) as e:
- raise BrokenMetadata("Badly formatted metadata"
- " random_seed entry: %s" % e)

Could somebody help me with some information about why we try to get the random_seed and interpret it?

I can't figure it out; the commit where this was added was the same as the initial Openstack. Openstack documentation doesn't say if we should be injecting this somewhere either.

On 12/10/19 8:22 AM, Eric Lafontaine wrote:
> I'm trying to understand how this bug is considered fixed by the
> following commit; https://git.launchpad.net/cloud-
> init/commit/?id=067516d7
>
> We have resolved the issue by just not base64 decoding;
> - if 'random_seed' in metadata:
> - random_seed = metadata['random_seed']
> - try:
> - metadata['random_seed'] = base64.b64decode(random_seed)
> - except (ValueError, TypeError) as e:
> - raise BrokenMetadata("Badly formatted metadata"
> - " random_seed entry: %s" % e)
>
> Could somebody help me with some information about why we try to get the
> random_seed and interpret it?

We had discussions at the last summit about not persisting the
random_seed to cache, which would avoid this problem. But I am not
certain we reached an actionable conclusion. Then again the topic
becomes somewhat academic at this point as Python 2 support will be
dropped at the end of this year with the release of 19.4

Scott Moser (smoser) wrote :

@Eric,
It should be fixed in that it will no longer stack trace as it did.
The path to failure was:
a.) openstack has 'random_seed' which (not surprisingly) had non-utf8 data in it
b.) cloud-init tries to persist data from the metadata service by storing json
c.) json can only serialize/store utf-8 data.
d.) stack trace on attempt to do so.

cloud-init needs to be able to persist non-utf8 data and then re-load that data correctly.

The change we made was to fix cloud-init to catch the unicodedecodeerror that is raised in c, and then to "pre-serialize" the content so that it *can* be fed to json.dumps.

Do you still see the problem?

Ryan Harper (raharper) wrote :

> I can't figure it out; the commit where this was added was the same as the initial Openstack. Openstack documentation doesn't say if we should be injecting this somewhere either.

When a platform provides a random_seed, cloud-init will add this to the host entry pool. See

https://github.com/canonical/cloud-init/blob/master/cloudinit/config/cc_seed_random.py

Alright, thanks folks.

I'm on Centos7 with cloudinit 19.3. I'll confirm this afternoon if I still have the issue and if I do, I'll go and suggest a possible fix for it. (might be a colleague that will do it :) ).

Thanks again, the cc_seed_random module is enlightening.

Just did a test and it does seem to persist with the latest commits on the master branch :).

All *my* tests passed for the image I generated, not that it means anything.

So, what would be needed to consider this bug closed?

Scott Moser (smoser) wrote :

> Just did a test and it *does* seem to persist with the latest commits
> on the master branch :).
... (emphasis added)
> So, what would be needed to consider this bug closed?

I'm confused. did you mean "does not" ?

> So, what would be needed to consider this bug closed?

The bug is fix-committed. It goes to "fix released", when it is present in a release.

This bug is believed to be fixed in cloud-init in version 19.2-53. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in cloud-init:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments