html cleaner rewrite svg image tag to img tag

Bug #2025607 reported by Rémy Lavainne
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
lxml
New
Undecided
Unassigned

Bug Description

Hi,

I noticed that the html.cleaner rewrite image tag to img tag (see: https://github.com/lxml/lxml/blob/ec0b59be3074c9ce83ba01b71b7218d3d6069954/src/lxml/html/clean.py#L281). But image tag are part of the svg norm (see: https://developer.mozilla.org/fr/docs/Web/SVG/Element/image) and svg can be inline in html content (https://www.w3schools.com/html/html5_svg.asp). After cleaning html content with the html.cleaner, svg containing image will not contains image anymore.

Executing this python script reproduce the problem :

```
from lxml.html.clean import Cleaner

html_content = '''
<html>
<body>
  <svg>
    <image xlink:href="image_test.png"/>
  </svg>
</body>
</html>
'''
cleaner = Cleaner(safe_attrs_only=False)
html_clean = cleaner.clean_html(html_content)
print(html_clean)
```

The output of this script is :

```
<div>
<body>
  <svg>
    <img xlink:href="image_test.png">
  </svg>
</body>
</div>
```

The image tag was rewritten to img tag and the image is not displayed anymore in browsers.

There is the configuration I used to test this behavior :

Python : sys.version_info(major=3, minor=9, micro=2, releaselevel='final', serial=0)
lxml.etree : (4, 9, 2, 0)
libxml used : (2, 9, 14)
libxml compiled : (2, 9, 14)
libxslt used : (1, 1, 35)
libxslt compiled : (1, 1, 35)

Is this a bug or this behavior is wanted ? If it's a bug let me know, I'll be happy to fix it.

Thanks !

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.