Comment 1 for bug 1902431

Revision history for this message
Fake Name (lemuix-2) wrote : Re: Query strings in link href attributes are being spuriously escaped

As a horrible hack, going into element.py and special-casing for `href` attributes makes things work, through it's probably super buggy:

```
        for key, val in attributes:
            if val is None:
                decoded = key
            else:
                if isinstance(val, list) or isinstance(val, tuple):
                    val = ' '.join(val)
                elif not isinstance(val, str):
                    val = str(val)
                elif (
                        isinstance(val, AttributeValueWithCharsetSubstitution)
                        and eventual_encoding is not None
                ):
                    val = val.encode(eventual_encoding)

                text = formatter.attribute_value(val)
                decoded = (
                    str(key) + '='
                    + formatter.quoted_attribute_value(text))
            attrs.append(decoded)
        close = ''
        closeTag = ''
```

to

```
        for key, val in attributes:
            if val is None:
                decoded = key
            else:
                if isinstance(val, list) or isinstance(val, tuple):
                    val = ' '.join(val)
                elif not isinstance(val, str):
                    val = str(val)
                elif (
                        isinstance(val, AttributeValueWithCharsetSubstitution)
                        and eventual_encoding is not None
                ):
                    val = val.encode(eventual_encoding)
                if key == 'href':
                    text = str(val)
                else:
                    text = formatter.attribute_value(val)
                decoded = (
                    str(key) + '='
                    + formatter.quoted_attribute_value(text))
            attrs.append(decoded)
        close = ''
        closeTag = ''

```

The core of the issue appears to be that the "minimal" EntitySubstitution() instance still replaces "&".

I have no idea what's "correct" from a spec perspective here, but I can say that it seems BS4 is unable to generate the valid HTML output I need here, and I don't *think* having multi-parameter query strings in a anchor tag is invalid in any variant of HTML.