Provide type annotations

Bug #1843791 reported by Daniel Hahler
34
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Beautiful Soup
In Progress
Undecided
Unassigned

Bug Description

It would be useful to have type annotations for bs4, to be used with mypy etc.

I've quickly generated them using "stubgen" [1] provided by mypy, and started adding some manually then - but nothing to publish really yet.

I've wondered if there are plans for this already, and thought it would be good to have an issue to discuss this / have a place for reference.

I've used "2to3" on the source before - not clear how this should be handled then when done in the repo itself.

1: https://mypy.readthedocs.io/en/latest/stubgen.html

Revision history for this message
Leonard Richardson (leonardr) wrote :

This is an interesting idea and I would like to get here eventually.

The sticking point, as you've found out, is that the canonical version of the Beautiful Soup code uses Python 2, and it's automatically converted to Python 3. I don't see a way to add these annotations without permanently switching to Python 3.

Because Beautiful Soup is frequently used in duct-tape environments I'm going to keep Python 2 support past the official end-of-life date, but eventually I will drop it, and we can pick up this issue then.

Changed in beautifulsoup:
status: New → Triaged
Revision history for this message
Daniel Hahler (blueyed) wrote :

So you plan to add them to the code directly already? (which is good!)

For Python 2 type hints could be used via comments, and hopefully get converted for the Python 3 version then also (still as comments then though).

btw: maybe it would be good to switch to Python 3 by default and auto-generate the Python 2 code then instead? But likely not worth the effort.

Revision history for this message
Alexander Regueiro (alexreg) wrote :

Yes please! It's now 2021 and type annotations are becoming popular in Python code, especially with the likes of mypy around.

Changed in beautifulsoup:
status: Triaged → In Progress
Revision history for this message
Florian Schulze (florian-schulze) wrote :

With Python 3.8 I currently get an error on the 4.13 branch:
```
../../beautifulsoup/bs4/__init__.py:141: in BeautifulSoup
    element_classes:Dict[type[PageElement], type[Any]] #: :meta private:
E TypeError: 'type' object is not subscriptable
```

Revision history for this message
Florian Schulze (florian-schulze) wrote :

``from __future__ import annotations`` fixes it (https://peps.python.org/pep-0563/), but then using ``|`` for types isn't supported before Python 3.10 https://peps.python.org/pep-0604/

```diff
diff --git a/bs4/__init__.py b/bs4/__init__.py
index 46c770f..b2c889a 100644
--- a/bs4/__init__.py
+++ b/bs4/__init__.py
@@ -13,6 +13,7 @@ and/or html5lib is installed, but they are not required.
 For more than you ever wanted to know about Beautiful Soup, see the
 documentation: http://www.crummy.com/software/BeautifulSoup/bs4/doc/
 """
+from __future__ import annotations

 __author__ = "Leonard Richardson (<email address hidden>)"
 __version__ = "4.12.2"
@@ -23,6 +24,7 @@ __license__ = "MIT"
 __all__ = ['BeautifulSoup']

 from collections import Counter
+from typing import Union
 import os
 import re
 import sys
@@ -376,7 +378,7 @@ class BeautifulSoup(Tag):

         # At this point we know markup is a string or bytestring. If
         # it was a file-type object, we've read from it.
- markup = cast(str|bytes, markup)
+ markup = cast(Union[str, bytes], markup)

         rejections = []
         success = False
```

Revision history for this message
Leonard Richardson (leonardr) wrote :

For anyone interested in trying it out, the 4.13 branch now contains pretty much all the type hints I'm going to add, given the current set of Python versions supported by Beautiful Soup.

I found some errors in the typeshed hints for html5lib and will be submitting a PR. My guess is no one noticed these errors because Beautiful Soup is the only consumer of those lower-level html5lib APIs that also uses type hints.

There are doubtless inconsistencies in the typeshed hints for beautifulsoup4 and the hints that I've added myself. I used the typeshed hints as an initial guide, but the final work is entirely my own. Hopefully the inconsistencies will be because my hints are more complete, but I haven't actually used Python type hints for anything except finding bugs through type-checking. So I don't know what effect this will have on people who *currently* rely on the typeshed hints.

Revision history for this message
Chris Papademetrious (chrispitude) wrote :

This was an impressive piece of work and I learned a lot by looking at what you did.

Revision history for this message
Leonard Richardson (leonardr) wrote :

My html5lib pull request to typeshed: https://github.com/python/typeshed/pull/11411

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.