Comment 0 for bug 1838877

Revision history for this message
Kamil Mahmood (kamilmahmood) wrote :

```
from bs4 import BeautifulSoup

markup = """
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://新2网址(www.ydsjyj.com)-时时彩平台,(www.xinyushishicai.com)-澳门赌场(www.amdc999.com)">
<html xmlns="http://www.w3.org/1999/xhtml">

<head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
    <title>时时彩娱乐-首页</title>
    <meta content="时时彩娱乐,时时彩娱乐网址,时时彩娱乐平台,时时彩娱乐官网" name="keywords" />
    <meta content="时时彩娱乐官网✅✅ 是全网最诚信,口碑最好的彩票平台!提款速度最快,赔率高达9.999 极力为您提供注册、登陆、下载、测速等服务.时时彩娱乐祝您玩的愉快开心。" name="description" />
    <title>时时彩娱乐-首页</title>
</head>

<body>
    <h1><a href="http://4b2s.com/">时时彩娱乐</a></h1>
</body>
</html>
"""

# Raises Exception TypeError: cannot use a bytes pattern on a string-like object
soup = BeautifulSoup(markup, features="lxml")

soup = BeautifulSoup(markup.encode("utf-8"), features="lxml", from_encoding="utf-8")
# Print empty string
print(str(soup))
```

Above HTML markup is a small portion from large HTML file

System information
Uname Result: 5.0.0-23-generic #24~18.04.1-Ubuntu SMP Mon Jul 29 16:12:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Python Version: 3.6.8

Libraries
beautifulsoup4==4.7.1
lxml==4.3.3