[MIR] python-xmltodict

Bug #2002576 reported by Corey Bryant
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ceilometer (Ubuntu)
Incomplete
Undecided
Christian Ehrhardt 
python-xmltodict (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

[Availability]
Currently in universe

[Rationale]
This is a new dependency used by OpenStack ceilometer.

[Security]
No security history

[Quality Assurance]
Package works out of the box with no prompting. There are no major bugs in Ubuntu and there are no major bugs in Debian. Unit tests are run during build.

[Dependencies]
All are in main

[Standards Compliance]
FHS and Debian Policy compliant

[Maintenance]
Simple python package that the OpenStack Team will take care of

[Background]
xmltodict is a Python module that makes working with XML feel like you are working with JSON. xmltodict is very fast (Expat-based) and has a streaming mode with a small memory footprint, suitable for big XML dumps.

Changed in python-xmltodict (Ubuntu):
assignee: nobody → Christian Ehrhardt  (paelzer)
Revision history for this message
Seth Arnold (seth-arnold) wrote :

XML parsers quite often have two faults, XXE (external entity attack) and billion laughs:

https://en.wikipedia.org/wiki/XML_external_entity_attack
https://en.wikipedia.org/wiki/Billion_laughs_attack

It might be worth a quick skim of the history of this package, or documentation, etc, to see if either of these problems has already been handled, or can't happen due to design, or might still be an outstanding problem, before the security team is assigned this package for review.

Thanks

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Review for Package: python-xmltodict

[Summary]
MIR team NACK until alternatives have been pursued.

[Duplication]
There is another package in main providing the same functionality.

The reason for this request seems to be this change [0] in ceilometer.
This only uses the library for very trivial xml to dict conversion, no further
manipulation happens.

To some extend it could be said that even python3-bs4 provides that, but using
it is cumbersome. But OTOH [1] using lxml seems rather straight forward.
And it exists for a longer time, seems to have more adoption and is already
supported in main.

I've also found that it might be faster, but the reason I ask you (instead of
making my own judgement) in the first place is that I have no experience what
kind and structure of data you will usually get.
OTOH maybe you can use the speed argument to convince upstream to follow helping
to be not a one-off that exposes odd behavior.

And the std libs xml.etree also parses XML just fine and is supproted. Yet,
like bs4, it would be a bit more messy as it doesn't directly create dicts.

Could you please give it a try to check if using python3-lxml in this spot
might get you working without promoting an alternative that does the same
(actually a subset) of what is already supported?

[0]: https://github.com/openstack/ceilometer/commit/225f1cd7765ddb7b725c538944947ada8c52e73f
[1]: https://lxml.de/FAQ.html#how-can-i-map-an-xml-tree-into-a-dict-of-dicts
[2]: https://nickjanetakis.com/blog/how-i-used-the-lxml-library-to-parse-xml-20x-faster-in-python

Aborting review here until it has been given a try by the reporting team.
=> Incomplete

Changed in python-xmltodict (Ubuntu):
status: New → Incomplete
assignee: Christian Ehrhardt  (paelzer) → nobody
Revision history for this message
Corey Bryant (corey.bryant) wrote :

@Christian, thanks for evaluating this. Apologies for not taking a closer look at lxml. I'll submit a patch upstream to try switching to lxml. Otherwise I think we can live with this patched in the package if they don't accept a patch upstream.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

It looks like converting xml into a dict is also cumbersome using lxml. There are several attempts here:
https://gist.github.com/jacobian/795571

I'd like to open this MIR back up as it seems like it is going to be more error prone to code this using lxml.

I'm comfortable that the upstream project is actively being maintained. The project only has one file, xmltodict.py, that is 544 lines long. They also have a long history of releases up through 2022 [1]. And the maintainer seems to be active with responding to issues [2].

[1] https://pypi.org/project/xmltodict/#history
[2] https://github.com/martinblech/xmltodict/issues

Changed in python-xmltodict (Ubuntu):
status: Incomplete → New
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks for having a look Corey!
I'll leave the detailed judgement to you and follow your recommendation that using lxml isn't an option to be used for this.

Continuing review ...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (3.6 KiB)

Review for Package: python-xmltodict

[Summary]
Other than the initial potential duplication this package is small and easy
and should be not hard to maintain.

MIR team ACK

This does need a security review, so I'll assign ubuntu-security

List of specific binary packages to be promoted to main: python3-xmltodict
Specific binary packages built, but NOT to be promoted to main: <none>

[Duplication]
There has been a former Nack and a check, but the conclusion was that none
of the alternatives is really usable for this use-case, therefore:
=> There is no other package in main providing the same functionality.

[Dependencies]
OK:
- no other Dependencies to MIR due to this
- no -dev/-debug/-doc packages that need exclusion
- No dependencies in main that are only superficially tested requiring
  more tests now.

Problems: None

[Embedded sources and static linking]
OK:
- no embedded source present
- no static linking
- does not have unexpected Built-Using entries
- not a go package, no extra constraints to consider in that regard
- not a rust package, no extra constraints to consider in that regard

Problems: None

[Security]
OK:
- history of CVEs does not look concerning
- does not run a daemon as root
- does not use webkit1,2
- does not use lib*v8 directly
- does not open a port/socket
- does not process arbitrary web content
- does not use centralized online accounts
- does not integrate arbitrary javascript into the desktop
- does not deal with system authentication (eg, pam), etc)
- does not deal with security attestation (secure boot, tpm, signatures)
- does not deal with cryptography (en-/decryption, certificates, signing, ...)

Problems:
- does parse data formats from an untrusted source.
  Ceilometer could be attacked throwing crafted data at it if there is a
  weakness in this code. I think this lib is small and a review should be quick,
  but it is worthwhile before adding it to the monitoring of all your instances.

[Common blockers]
OK:
- does not FTBFS currently
- does have a test suite that runs at build time
  - test suite fails will fail the build upon error.
- This does not need special HW for build or test
- no new python2 dependency
- Python package, but using dh_python
- Go package, but using dh-golang

Problems:
- Does not have a non-trivial test suite that runs as autopkgtest
  Given what the lib does and how the dependencies look like the only realistic
  case of changing dependencies causing issues would be changing python
  itself. That will need a no-change rebuild anyway and this would be covered
  by the build time tests which are present.
  So while it would be nice to have autopkgtests I think it is ok here to not
  have them.

[Packaging red flags]
OK:
- Ubuntu does not carry a delta
- symbols tracking not applicable for this kind of code.
- d/watch is present and looks ok
- Upstream update history is good
- Debian/Ubuntu update history is good
- the current release is packaged
- promoting this does not seem to cause issues for MOTUs that so far
  maintained the package (usually a sync, no MOTU changes so far)
- no massive Lintian warnings
- d/rules is rather clean
- It is not on the lto-disabled list

Problems: Non...

Read more...

Changed in python-xmltodict (Ubuntu):
assignee: nobody → Ubuntu Security Team (ubuntu-security)
Revision history for this message
Bryce Harrington (bryce) wrote :

[Adding ceilometer bugtask and tag update-excuse to track migration issue.]

tags: added: update-excuse
Changed in ceilometer (Ubuntu):
status: New → Incomplete
assignee: nobody → Christian Ehrhardt  (paelzer)
tags: added: sec-1613
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

grabbing the ceilometer tasks to be off the lists that are regularly checked - even though it is just for tracking

Revision history for this message
David Fernandez Gonzalez (litios) wrote :
Download full text (3.7 KiB)

I reviewed python-xmltodict 0.13.0-1 as checked into lunar. This shouldn't be
considered a full audit but rather a quick gauge of maintainability.

> python-xmltodict is a Python module that makes working with XML feel like you are working with JSON

- CVE History:
  - No known CVEs
- Build-Depends?
  - Python3 default libraries
- pre/post inst/rm scripts?
  - It is a Python library so it will be installed along with other Python libraries.
  - Upon deletion, it will remove everything with py3clean if available or manually rm everything.
- init scripts?
  - No
- systemd units?
  - No
- dbus services?
  - No
- setuid binaries?
  - No
- binaries in PATH?
  - No
- sudo fragments?
  - No
- polkit files?
  - No
- udev rules?
  - No
- unit tests / autopkgtests?
  - Embedded in the build process
  - Tested both with python 3.10 and 3.11
- cron jobs?
  - No
- Build logs:
  - No errors / warnings
  - Lintian errors/warnings:
    - W: python3-xmltodict: changelog-distribution-does-not-match-changes-file unstable != lunar [usr/share/doc/python3-xmltodict/changelog.Debian.gz:1]
    - W: python-xmltodict changes: distribution-and-changes-mismatch lunar unstable
    - E: python-xmltodict changes: inconsistent-maintainer David Fernandez Gonzalez <email address hidden> (changes vs. source) Sebastien Badia <email address hidden>

- Processes spawned?
  - No
- Memory management?
  - No
- File IO?
  - No I/O
- Logging?
  - No
- Environment variable usage?
  - No
- Use of privileged functions?
  - No
- Use of cryptography / random number sources etc?
  - No crypto
- Use of temp files?
  - No
- Use of networking?
  - No use of network but could be use for processing input from network.
  - Input is not sanitized.
- Use of WebKit?
  - No
- Use of PolicyKit?
  - No

- Any significant cppcheck results?
  - No cpp code
- Any significant Coverity results?
  - Nothing found
- Any significant shellcheck results?
  - Nothing found
- Any significant bandit results?
  - LOW: Using XMLGenerator to parse untrusted XML data is known to be vulnerable to XML attacks. Replace XMLGenerator with the equivalent defusedxml package, or make sure defusedxml.defuse_stdlib() is called.
  - In this case, XMLGenerator is not used to parse untrusted XML data but generate it from a dict.

The library relies on xml.parsers.expat to perform the parsing. Expat was considered vulnerable to XML attacks up to 2.4.1 (https://bugs.python.org/issue44394). Default Python version in Lunar already includes the patched expat.

(A request to migrate to defusedxml has been proposed to the developers in https://github.com/martinblech/xmltodict/issues/321 for older Python versions as defusedexpat is not supported since Python 3.4)

xml.sax is imported, which is known to be vulnerable to XML attacks, but it's never used for parsing untrusted XML data, only to generate the XML in unparse. Nevertheless, the unparse function seems to be vulnerable to a dict bomb (https://gist.github.com/AlanCoding/a20e4f5cff19edefd98a39554ff682cb) which will create a DoS.
This is a very edge case due to the restrictions when creating the payload so it is highly unlikely that this attack could occur. (Also, note...

Read more...

Changed in python-xmltodict (Ubuntu):
status: New → In Progress
assignee: Ubuntu Security Team (ubuntu-security) → nobody
Revision history for this message
Didier Roche-Tolomelli (didrocks) wrote :

$ ./change-override -c main -S python-xmltodict
Override component to main
python-xmltodict 0.13.0-1 in lunar: universe/misc -> main
python3-xmltodict 0.13.0-1 in lunar amd64: universe/python/optional/100% -> main
python3-xmltodict 0.13.0-1 in lunar arm64: universe/python/optional/100% -> main
python3-xmltodict 0.13.0-1 in lunar armhf: universe/python/optional/100% -> main
python3-xmltodict 0.13.0-1 in lunar i386: universe/python/optional/100% -> main
python3-xmltodict 0.13.0-1 in lunar ppc64el: universe/python/optional/100% -> main
python3-xmltodict 0.13.0-1 in lunar riscv64: universe/python/optional/100% -> main
python3-xmltodict 0.13.0-1 in lunar s390x: universe/python/optional/100% -> main
Override [y|N]? y
8 publications overridden.

Changed in python-xmltodict (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.