Comment 3 for bug 1686408

Revision history for this message
Leonard Richardson (leonardr) wrote :
  • a Edit (1.7 KiB, text/plain)

Thanks, I see what you're saying now. Your expected behavior is that strings should be combined whenever they become adjacent. Here's a simpler example that illustrates the same behavior:

from bs4 import _soup
soup = _soup("<b>foo</b>")
soup.b.append("bar")
soup.b.contents
# [u'foo', u'bar']

This is a reasonable request but I'm not going to make the change. It's easy to join strings yourself but impossible to separate them once they're joined. The way I use Beautiful Soup, it's more useful to keep track of strings separately and join them if necessary when outputting markup. So I'm not convinced that the users who notice this change would welcome it on balance.

This is a change of moderate complexity, which I'm trying to avoid in a project that's in the maintenance phase of its lifecycle. The change would go in a place that's likely to create a lot of subtle edge-case bugs. html5lib does something similar on the initial document parse and it's caused me a lot of grief over the years.

That said, here's the patch I wrote while investigating this issue. It works in the simple case, but like I said, edge-case bugs. In particular, it breaks unwrap().