`cloud-init devel schema --annotate` fails for integer keys which do not roundtrip through string representation
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
cloud-init |
Expired
|
Low
|
Unassigned |
Bug Description
When using the new snap.commands schema (introduced in https:/
#cloud-config
snap:
commands:
01: ["foo", 123]
And then traceback during annotation:
$ cloud-init devel schema -c foo.yaml --annotate
Traceback (most recent call last):
File "/home/
validate_
File "/home/
raise SchemaValidatio
cloudinit.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/
load_
File "/home/
retval = util.log_time(
File "/home/
ret = func(*args, **kwargs)
File "/home/
validate_
File "/home/
print(
File "/home/
errors_
KeyError: 'snap.commands.1'
Note the `1` at the end of the key we're looking for (instead of 01). If we modify the input file to drop the leading 0:
#cloud-config
snap:
commands:
1: ["foo", 123]
then we don't see a traceback:
$ cloud-init devel schema -c foo.yaml --annotate
#cloud-config
snap:
commands:
1: ["foo", 123] # E1
# Errors: -------------
# E1: ['foo', 123] is not valid under any of the given schemas
So the problem we're seeing here is that YAML supports non-string keys and JSON doesn't. This means that:
commands:
01: "value"
parses as {"commands": {1: "value"}} and _not_ {"commands": {"01": "value"}}.
Our code builds a mapping of path->line number in _schemapath_ for_cloudconfig [0] from the original content and uses "01" in the path, for example:
{'snap': 2, commands. 01.0': 4, commands. 01.1': 4, commands. 01': 4}
'snap.commands': 3,
'snap.
'snap.
'snap.
and the paths in the errors that we get from the jsonschema library use the integer that was (correctly) parsed from the YAML, giving us the 'snap.commands.1' we see as the KeyError in the report.
This fundamentally comes down to a logical problem we have in the way we validate schemas: YAML is a super-set of JSON. So we are always going to have valid input that we can process successfully, which is not valid JSON.
So I think we should definitely stop using non-string keys in any of our examples, so that we aren't causing people to see validation tracebacks just from copy/pasting our examples and making (invalid) minor modifications.
I _think_ we can address this specific traceback by ensuring that the format used for paths in _schemapath_ for_cloudconfig will be the same as the format used by the jsonschema errors (so presumably pass them through the YAML library somehow). But there may still be unexpected errors lurking because of the JSON/YAML mismatch.
[0] https:/ /github. com/canonical/ cloud-init/ blob/master/ cloudinit/ config/ schema. py#L224