RPM

Bug #637227
Comment #14

Comment 14 for bug 637227

Revision history for this message

In Red Hat Bugzilla #190363, David (david-redhat-bugs) wrote on 2006-05-03:

#14

You are correct. It is beyond my reach to make that assumption for everyone else
on the planet. That's why I restricted myself to doing so in a Fedora Core RFE,
in Fedora bugzilla.

In the context of _Fedora_ it's perfectly reasonable to label those who refuse
to use UTF-8 as Luddites. You just have to look at the quality of the
alternative 'solution' which was proposed -- hacking all the RPM formats from
specfile through to the database to tag data in random formats instead of just
storing it in a consistent encoding in the first place.

Since you persist in trolling the Fedora bugzilla and talking about non-Fedora
issues, I suppose I might as well capitulate and discuss it...

There's no excuse for avoiding UTF-8 in RPM internals, even outside the context
of Fedora. That would really be pointless -- there's certainly no need to
'extend' its file formats when we can just store data in UTF-8, which can
represent the older encodings.

We can quite happily fix rpmq to convert from UTF-8 to the current locale in its
output, and fix rpmbuild to convert _to_ UTF-8 from the current locale when
reading the specfile. Although we certainly wouldn't want the latter in Fedora
-- if I check out the current libxml2/devel branch from CVS and attempt to build
it, for example, it should _fail_. It certainly shouldn't use _my_ locale (and
it'd fail anyway because of course my locale is UTF-8).

You'd need a way to handle existing RPM databases, which may contain random data
in unknown encodings. Probably an 'rpm --rebuilddb --oldcharset=FOO' on RPM
upgrade? This isn't a new problem _anyway_ since an existing RPM database
without either a consistent charset or charset tagging is just line noise.

And of course you might have to call it 'RPM-CHARSET' instead of 'UTF-8' to
appease those who have religious objections to UTF-8.