RPM

Duplicate dirnames in rpm headers and unsorted dirnames in general.

Bug #638618 reported by Jeff Johnson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
RPM
New
Undecided
Unassigned
Fedora
Won't Fix
Medium

Bug Description

tracker

Revision history for this message
In , Phil (phil-redhat-bugs) wrote :
Download full text (4.5 KiB)

Description of problem:

This bug has been around since the beginning of time. Actually not really, but since very early versions of rpm where the split of dirnames and basenames was introduced.

The way rpm generates dirnames for a build is simply put like this (in pseudocode):

Generate filelist with dirs
Sort filelist
for each file in filelist:
  Split dname and bname from file
  Search dirindex for dname in dirnames using glibc bsearch
  if found, use dirindex for that bname
  otherwise append dname to dirnames and use new dirindex for that bname
  append bname to basenames

Now on first glance this sounds correct. But if you look closely at how the filelist is generated you'll suddenly see that dirnames afterwards isn't actually a sorted list. Here a real life example of glibc-common.i386 from Fedora-9 release (glibc-common-2.8-3.i386.rpm)

Output of rpm -qpl glibc-common-2.8-3.i386.rpm:

/etc/default
/etc/default/nss
/usr/bin/catchsegv
/usr/bin/gencat
/usr/bin/getconf
/usr/bin/getent
/usr/bin/iconv
/usr/bin/ldd
/usr/bin/lddlibc4
/usr/bin/locale
/usr/bin/localedef
/usr/bin/rpcgen
/usr/bin/sprof
/usr/bin/tzselect
/usr/lib/locale
/usr/lib/locale/locale-archive
/usr/lib/locale/locale-archive.tmpl
/usr/libexec/pt_chown
/usr/sbin/build-locale-archive
/usr/sbin/tzdata-update
/usr/sbin/zdump
/usr/sbin/zic
/usr/share/doc/glibc-common-2.8
/usr/share/doc/glibc-common-2.8/ChangeLog.15.bz2
/usr/share/doc/glibc-common-2.8/ChangeLog.16.bz2
/usr/share/doc/glibc-common-2.8/ChangeLog.bz2
/usr/share/doc/glibc-common-2.8/README.timezone
/usr/share/doc/glibc-common-2.8/README.ufc-crypt
/usr/share/doc/glibc-common-2.8/gai.conf
/usr/share/i18n
/usr/share/i18n/charmaps
/usr/share/i18n/charmaps/ANSI_X3.110-1983.gz
/usr/share/i18n/charmaps/ANSI_X3.4-1968.gz
...
/usr/share/i18n/charmaps/UTF-8.gz
/usr/share/i18n/charmaps/VIDEOTEX-SUPPL.gz
/usr/share/i18n/charmaps/VISCII.gz
/usr/share/i18n/charmaps/WINDOWS-31J.gz
/usr/share/i18n/locales
/usr/share/i18n/locales/POSIX
/usr/share/i18n/locales/aa_DJ
...
/usr/share/i18n/locales/zh_SG
/usr/share/i18n/locales/zh_TW
/usr/share/i18n/locales/zu_ZA
/usr/share/locale
/usr/share/locale/be
/usr/share/locale/be/LC_MESSAGES
/usr/share/locale/be/LC_MESSAGES/libc.mo
/usr/share/locale/bg
/usr/share/locale/bg/LC_MESSAGES
/usr/share/locale/bg/LC_MESSAGES/libc.mo
/usr/share/locale/ca

Side by side comparison of dirnames and LANG=C sorted dirnames:

/etc/ /etc/
/etc/default/ /etc/default/
/usr/bin/ /usr/bin/
/usr/lib/ /usr/lib/
/usr/lib/locale/ /usr/lib/locale/
/usr/libexec/ /usr/libexec/
/usr/sbin/ /usr/sbin/
/usr/share/doc/ /usr/share/
/usr/share/doc/glibc-common-2.8/ /usr/share/
/usr/share/ /usr/share/doc/
/usr/share/i18n/ /usr/share/doc/glibc-common-2.8/
/usr/share/i18n/charmaps/ /usr/share/i18n/
/usr/share/i18n/locales/ /usr/share/i18n/charmaps/
/usr/share/ /usr/share/i18n/locales/
/usr/share/locale/ /usr/share/locale/

As you can see, /usr/share/ is listed twice in the dirnames list. This comes from the simple fact that after the splitting of dirnames and basenames for the files the resulting dirnames aren't sorted anymore:

...
/usr/share/doc/glibc-common-2.8 -> /usr/share/doc/ : glibc-common-2.8
...
/usr/share/i18n -> /usr/shar...

Read more...

Revision history for this message
In , R (r-redhat-bugs) wrote :

Phil

Why not just one pass, hashing each dir, and inserting into a b-tree, spotting [and avoiding] collisions (and thus dupes)?

-- Russ herrold

Revision history for this message
In , Phil (phil-redhat-bugs) wrote :

Hm, how would you manage the dirindexes for the basenames then? I suppose instead of using direct integers you could make it a pointer to a dirindex entry in a struct of the dirname and dirindex with which you populate the b-tree.

Sample:

struct dnames {
    int dirindex;
    char *dirname;
    struct dnames *left;
    struct dnames *right;
};

and have dirindexes:

int *dirindexes;
dirindexes = malloc(sizeof(int *) * len_of_filelist);

This way you can at least build the b-tree and the basenames list and the dirindexes pointer list in 1 pass and just need a quick 2nd pass to generate the final dirnames list together with the correct dirindexes in the b-tree and then save the values of the pointers in dirindexes. And as the dirnames should be a lot smaller than the filelist the time overhead should be minimal.

Revision history for this message
In , Jeff (jeff-redhat-bugs) wrote :

Let's try a reality check, shall we?

The savings in removing duplicates in RPMTAG_DIRNAMES is infintessimal. Do the math ...
E.g. the savings is (at least) an order of magnitude smaller than the added overhead
of attaching file capabilities to *every* file so that perhaps 100+ files can be installed
with capabilities.

The data in RPMTAG_DIRNAMES has *never* been guaranteed to be sorted. Examples
of unsorted RPMTAG_DIRNAMES are everywhere.

Pursuing a sort order for RPMTAG_DIRNAMES hardly changes anything. Unsorted
RPMTAG_DIRNAMES will remain in *.rpm packages forevermore. So rpm wil
continue to have to sort RPMTAG_DIRNAMES if bsearch is desired. Luckily, there's
no access on any critical path in rpm that would benefit from pre-sorted
RPMTAG_DIRNAMES.

There is no benefit -- to rpm, to developers, to Red Hat, to LSB, etc -- from
sorting RPMTAG_DIRNAMES. Nor are there any forseeable benefits from
a Newer! Better! Bestest! implementation that I can conceive of even after smoking
a lot of crack ...

Using a B-tree, or a data structure, to achieve a 1-pass generation of simultaneously
sorted RPMTAG_DIRNAMES and RPMTAG_FILENAMES with a RPMTAG_DIRINDEXES lookup
is just bleeping overkill. At least do the sort by embedding OCAML or using an Oracle
database optimized for parallel retrieval or something manly please ...

Revision history for this message
In , Bug (bug-redhat-bugs) wrote :

This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle.
Changing version to '11'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Revision history for this message
In , Bug (bug-redhat-bugs) wrote :

This message is a reminder that Fedora 11 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 11. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '11'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 11's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 11 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora please change the 'version' of this
bug to the applicable version. If you are unable to change the version,
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

The process we are following is described here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Revision history for this message
In , Bug (bug-redhat-bugs) wrote :

Fedora 11 changed to end-of-life (EOL) status on 2010-06-25. Fedora 11 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Jeff Johnson (n3npq)
tags: added: dirnames fedora
Changed in fedora:
importance: Unknown → Medium
status: Unknown → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.