`cloud-init devel schema --annotate` fails for integer keys which do not roundtrip through string representation

Bug #1879356 reported by Dan Watkins
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init
Expired
Low
Unassigned

Bug Description

When using the new snap.commands schema (introduced in https://github.com/canonical/cloud-init/pull/364), it's possible to trigger a bug in our assertion code. Specifically, this file will fail validation (correctly, because `123` is an integer and not a string):

  #cloud-config
  snap:
      commands:
          01: ["foo", 123]

And then traceback during annotation:

  $ cloud-init devel schema -c foo.yaml --annotate
  Traceback (most recent call last):
    File "/home/daniel/dev/cloud-init/cloudinit/config/schema.py", line 217, in validate_cloudconfig_file
      validate_cloudconfig_schema(
    File "/home/daniel/dev/cloud-init/cloudinit/config/schema.py", line 121, in validate_cloudconfig_schema
      raise SchemaValidationError(errors)
  cloudinit.config.schema.SchemaValidationError: Cloud config schema errors: snap.commands.1: ['foo', 123] is not valid under any of the given schemas

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "/home/daniel/.virtualenvs/cloud-init/bin/cloud-init", line 11, in <module>
      load_entry_point('cloud-init', 'console_scripts', 'cloud-init')()
    File "/home/daniel/dev/cloud-init/cloudinit/cmd/main.py", line 891, in main
      retval = util.log_time(
    File "/home/daniel/dev/cloud-init/cloudinit/util.py", line 2618, in log_time
      ret = func(*args, **kwargs)
    File "/home/daniel/dev/cloud-init/cloudinit/config/schema.py", line 446, in handle_schema_args
      validate_cloudconfig_file(
    File "/home/daniel/dev/cloud-init/cloudinit/config/schema.py", line 221, in validate_cloudconfig_file
      print(annotated_cloudconfig_file(
    File "/home/daniel/dev/cloud-init/cloudinit/config/schema.py", line 153, in annotated_cloudconfig_file
      errors_by_line[schemapaths[path]].append(msg)
  KeyError: 'snap.commands.1'

Note the `1` at the end of the key we're looking for (instead of 01). If we modify the input file to drop the leading 0:

  #cloud-config
  snap:
      commands:
          1: ["foo", 123]

then we don't see a traceback:

  $ cloud-init devel schema -c foo.yaml --annotate
  #cloud-config
  snap:
      commands:
          1: ["foo", 123] # E1

  # Errors: -------------
  # E1: ['foo', 123] is not valid under any of the given schemas

Revision history for this message
Dan Watkins (oddbloke) wrote :

So the problem we're seeing here is that YAML supports non-string keys and JSON doesn't. This means that:

  commands:
    01: "value"

parses as {"commands": {1: "value"}} and _not_ {"commands": {"01": "value"}}.

Our code builds a mapping of path->line number in _schemapath_for_cloudconfig[0] from the original content and uses "01" in the path, for example:

  {'snap': 2,
   'snap.commands': 3,
   'snap.commands.01.0': 4,
   'snap.commands.01.1': 4,
   'snap.commands.01': 4}

and the paths in the errors that we get from the jsonschema library use the integer that was (correctly) parsed from the YAML, giving us the 'snap.commands.1' we see as the KeyError in the report.

This fundamentally comes down to a logical problem we have in the way we validate schemas: YAML is a super-set of JSON. So we are always going to have valid input that we can process successfully, which is not valid JSON.

So I think we should definitely stop using non-string keys in any of our examples, so that we aren't causing people to see validation tracebacks just from copy/pasting our examples and making (invalid) minor modifications.

I _think_ we can address this specific traceback by ensuring that the format used for paths in _schemapath_for_cloudconfig will be the same as the format used by the jsonschema errors (so presumably pass them through the YAML library somehow). But there may still be unexpected errors lurking because of the JSON/YAML mismatch.

[0] https://github.com/canonical/cloud-init/blob/master/cloudinit/config/schema.py#L224

Changed in cloud-init:
status: New → Triaged
importance: Undecided → Low
Revision history for this message
James Falcon (falcojr) wrote :
Changed in cloud-init:
status: Triaged → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.