[MIR] python-charset-normalizer

Bug #1977475 reported by Lena Voytek
This bug affects 1 person
Affects Status Importance Assigned to Milestone
python-charset-normalizer (Ubuntu)
Won't Fix
Lena Voytek

Bug Description

The package python-charset-normalizer is already in Ubuntu universe.
The package python-charset-normalizer builds for the architectures it is designed to work on.
It currently builds and works for architectures: all
Link to package: https://launchpad.net/ubuntu/+source/python-charset-normalizer

- The package python-charset-normalizer will be required in Ubuntu main for the upcoming requests version 2.28
- The package will generally be useful for a large part of our user base
- The package python-charset-normalizer is a new runtime dependency of package requests that
  we already support

- No CVEs/security issues in this software in the past

- no `suid` or `sgid` binaries
- no executables in `/sbin` and `/usr/sbin`
- Package does not install services
- Package does not open privileged ports (ports < 1024)
- Package does not contain extensions to security-sensitive software

[Quality assurance - function/usage]
- The package works well right after install

[Quality assurance - maintenance]
- The package is maintained well in Debian/Ubuntu and has not too many
  and long term critical bugs open
- Ubuntu https://bugs.launchpad.net/ubuntu/+source/python-charset-normalizer/+bug
- Debian https://bugs.debian.org/cgi-bin/pkgreport.cgi?src=python-charset-normalizer
- The package has no important open bugs
- The package does not deal with exotic hardware we cannot support

[Quality assurance - testing]
- The package runs a test suite on build time, if it fails
  it makes the build fail, link to build log:

- The package does not run an autopkgtest because none have been implemented

[Quality assurance - packaging]
- debian/watch is present and works

- This package does not yield massive lintian Warnings, Errors
- lintian --pedantic output:
  W: python3-charset-normalizer: no-manual-page usr/bin/normalizer
  P: python-charset-normalizer source: uses-debhelper-compat-file [debian/compat]
  P: python-charset-normalizer source: very-long-line-length-in-source-file data/sample-spanish.txt line 16 is 1065 characters long (>512)
- Lintian overrides are not present

- This package does not rely on obsolete or about to be demoted packages.
- This package has no python2 or GTK2 dependencies

- The package will not be installed by default

[UI standards]
- Application is not end-user facing (does not need translation)

- No further depends or recommends dependencies that are not yet in main

[Standards compliance]
- This package correctly follows FHS and Debian Policy

- Owning Team will be Ubuntu Server
- The team will be subscribed before promotion

- The package does not use static builds
- The package does not use vendored code

[Background information]
Upstream Name is charset_normalizer
Link to upstream project https://github.com/ousret/charset_normalizer

Lena Voytek (lvoytek)
description: updated
Lena Voytek (lvoytek)
description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The hard need for this depends on when the next release of "requests" happens as it will then need it.
To be on the safe side let us plan for this cycles FF (=August) - setting milestone.

Also @Lena - I accepted the team subscription to the package.

Changed in python-charset-normalizer (Ubuntu):
milestone: none → ubuntu-22.08
Changed in python-charset-normalizer (Ubuntu):
assignee: nobody → Lukas Märdian (slyon)
Revision history for this message
Lukas Märdian (slyon) wrote (last edit ):

Partial review for Package: src:python-charset-normalizer

python-charset-normalizer is a non-LGPL character detection library used by src:requests, that has been implemented as a drop-in replacement for src:chardet. "chardet" is in main already, the new dependency doesn't seem to be strictly required and is supposed to be dropped by the upstream developers in the future. So I'm stopping the MIR review at this point. I can continue doing a full review if you can show why this is really needed and what to do about the duplication in the archive in this case (see my [Duplication] comments below). If we'd move forward with this, we'd also need some autopkgtest that makes use of requests' character-detection feature (i.e. this library), FWIW.


We already have src:chardet in "main" that has been used by src:requests
in the past. They switched to using python-charset-normalizer for an easier
license story, while keeping chardet compatibility for what I can see:

I wonder if we really need this new dependency (and thus two charset
detectors/normalizers in main) or if we can stick to using chardet. Could you please elaborate about this? If we really need this, would it be
possible to migrate the other two reverse-depends/recommends to
python-charset-normalizer as well, so we could demote src:chardet?

$ reverse-depends src:chardet -c main
* python3-bs4 (for python3-chardet)

* python3-debian (for python3-chardet)
* python3-requests (for python3-chardet)

This is especially true, as upstream requests developers plan to drop the
character detection feature (and this dependency) mid-term:

Changed in python-charset-normalizer (Ubuntu):
status: New → Won't Fix
assignee: Lukas Märdian (slyon) → Lena Voytek (lvoytek)
Revision history for this message
Lena Voytek (lvoytek) wrote :

Hi Lukas,

Thanks for the review. We should be able to maintain the requests package with only chardet using our patch for now, so the promotion of charset-normalizer shouldn't be necessary yet.

Looking into python3-bs4 and python3-debian, both should be fine if chardet is swapped for charset-normalizer in the future. If support for chardet is eventually completely dropped, then there may be more need for the promotion.


To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.