prettify(): order of attributes inside meta elements

Bug #1812422 reported by Greg Burek
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
Fix Released
Undecided
Unassigned

Bug Description

prettify() seems to sort the attributes alphabetically.
This causes problems in MadCap Flare as it expects <meta name="..." content="..." />, not <meta content="..." description="" />.
Also, it is probably easier for humans to read when "name" attribute is the first one.
prettify() should probably have special handling of meta elements to ensure this order.

BS4 version: 4.6.3

Revision history for this message
Leonard Richardson (leonardr) wrote :

Thanks for filing this ticket.

I'm reluctant to answer bug reports with "you shouldn't be doing that", but I think there's a chunk of that here. The job of prettify() is to make the structure of a parsed document obvious to the human eye. It's not intended to produce a document that means the same thing as the original, and I don't recommend using it as the input into another program. The discussion of bug #1697296 shows a case where there's a significant difference between what the human eye expects to see and what's actually going on in a document.

But even if you stop using prettify(), the attributes are going to be output in a particular order, and it probably won't be the order that MadCap Flare can handle as input.

I agree that (name, content) is easier to read than (content, name), but that's a subjective opinion about one pair of attributes in one tag, and there's no necessary connection to what a given product can parse. In this particular case, MadCap Flare can only process the "more readable" order, but maybe some other product can only process the "less readable" order.

So this is a general problem, and a full solution would require giving the programmer the ability to define a sort order for attributes within a given tag. The Formatter class would be the place to put this.

Revision history for this message
Greg Burek (gburek) wrote :

What about preserving the order the attributes were inserted via collections.OrderedDict instead?

Revision history for this message
Leonard Richardson (leonardr) wrote :

That sounds like a good compromise, I'll try it out.

Revision history for this message
Isaac Muse (facelessuser) wrote :

Interestingly, in Python 3.7 dictionaries are ordered by default now. So a good test would be to see if Python 3.7 gives you what you're looking for. At this point, since that is the future, I think it makes sense just to make them ordered by default in Beautiful Soup.

Revision history for this message
Greg Burek (gburek) wrote :

Isaac,

Yes, 3.7 is what I use. It's just that the attributes are force-sorted by bs4.element.Tag.decode - so I had to monkey patch with a copy that has that sorting removed.

Revision history for this message
Isaac Muse (facelessuser) wrote :

Ah, yeah, that does make sense. Forced sorting adds some predictability. I like the idea of preserving order though. If I get some time, I'll probably throw together the merge request.

Revision history for this message
Isaac Muse (facelessuser) wrote :

Unfortunately, looking into this a bit more, there isn't currently a way to make this sane across the board. It appears we could successfully force preserved ordering in html.parser, but not lxml and html5lib (at least below Python 3.7). We can't really control how those libraries return attributes, with that said, maybe in Python 3.7 and above, you could theoretically just avoid sorting, as I imagine dictionary usage in lxml and html5lib would then be ordered (unless they opt to sort).

Revision history for this message
Leonard Richardson (leonardr) wrote :

I spent some time duplicating Isaac's investigation. We can't get consistent behavior across Python versions, but we can make it possible to turn off sorting.

This kind of presentation decision is the job of the formatter. In revision 505 I moved the sort functionality into the Formatter class. I haven't figured out the best way to document this yet. I don't know how often this feature is needed so I don't know how much time to spend on it.

As a side effect, this means that attributes won't be sorted if you explicitly tell Beautiful Soup not to use a formatter, or if you pass in a function as 'formatter' rather than a Formatter object.

Changed in beautifulsoup:
status: New → Confirmed
Changed in beautifulsoup:
status: Confirmed → Fix Released
Revision history for this message
Leonard Richardson (leonardr) wrote :

I moved a significant amount of other functionality into the Formatter class, so I'm now comfortable with the amount of documentation coverage I gave this feature.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.