rpm -Va --nofiles Segmentation fault

Bug #910708 reported by Jeff Johnson
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
RPM
Confirmed
High
Jeff Johnson
Mandriva
Confirmed
Critical

Bug Description

rpm -Va is segfaulting while retrieving a pubkey using hkp://

Revision history for this message
Jeff Johnson (n3npq) wrote :

Can't reproduce on i586.

Revision history for this message
Jeff Johnson (n3npq) wrote :
Jeff Johnson (n3npq)
tags: added: mandriva signatures
Denis Silakov (dsilakov)
Changed in rpm:
importance: Undecided → Medium
Revision history for this message
Denis Silakov (dsilakov) wrote :

I can reproduce this on both i586 and x86_64, in the similar way like Per Oyvind (in Mandriva bugzilla). Maybe not crucial, but distressing...

Revision history for this message
Jeff Johnson (n3npq) wrote :

As Per Oyvind has seen, the problem disappears when the
the actual header being retrieved from the database
is reinstalled. So the root cause is something other
than the segfault manifestation.

segfaults are the easiest thing in the world to repair *IF*
one has a reproducer.

I cannot reproduce the process by which the bad data
ends up in an rpmdb (but do not have x86_64 or virtual box
VM's which were part of either this or similar reports).

The code paths involved with signature verification are
quite well tested (details if necessary, see http://hw.jbj.org:8010
waterfalls), and I'm fairly confident that any implementation flaw
will be fixed as MANDATORY signature (if accepted) is deployed.

Meanwhile, its also quite simple to just rip out the code that
attempts to verify binding signatures on pub keys needed
to verify package header signatures during rpm --verify.
The functionality there is dependent on what threat model
one has wrto software distributed through *.rpm packages,
and YMMV, everyone's does.

But agreed "distressing" because of the artificially enhanced "security"
priority as well as the obvious flaw of segfaulting (though the root cause
isn't the segfault per se, but rather bad data being returned from an
rpmdb triggering a segfault, where the data was added through unknown
and likely buggy code paths).

Revision history for this message
Jeff Johnson (n3npq) wrote :

See also https://bugs.launchpad.net/rpm/+bug/910708

As long as developers at ROSA are attempting reproducers
and reporting bugs against RPM, I would question your opinion
     ... maybe not crucial ..
(and yes I realize that the snippet was completed by ", but distressing".

FYI: These issues were reported by me in early August, before mdv2011
was released, as a proposed release BLOCKER, with an offer to "fix".

The issues weren't deemed "crucial" (by Alex Burmashev) then either.

On Jan 20th, this bug importance will be marked "HIGH" and will be targeted
at the first available milestone, and coupled into the MANDATORY signature
checking already proposed as ESSENTIAL, no matter what other opinions ROSA
might have.

Revision history for this message
Denis Silakov (dsilakov) wrote :

Actually Anton Kirilenko (in recent Mandriva bug comments) met this problem not when performing 'request for information' such as 'rpm -qa', but when implementing a new feature in urpmi/urpm-tools. He claims that this problem is a stopper for him.

Revision history for this message
Jeff Johnson (n3npq) wrote :

Note that the code path of rpm -qa does not attempt
to verify signatures and will not call rpmhkpValidate()
afaik.

Anton Kirilenko is encouraged to report his issues either
at the "bankrupt" qa.mandriva.com or here at launchpad.net/rpm,
particularly if he is claiming "Me too." anecdotally and also
claiming that RPM is somehow a "stopper" for his efforts.

I cannot fix what isn't reported. Nor can I fix (until January 20th,
your date for milestone assigment -> implementation) passes.

Hint: You can escalate the priority/importance of this bug more
easily than I can at the moment. I will wait to January 20th
to assess what else is needed: I see this issue as CRITICAL and
switching to MANDATORY signature checking as ESSENTIAL
no matter what.

I'm not at all surprised at the issue appearing. Note that the
core issue has nothing to do with whether RPM segaults
(although RPM has to be made immune to segfaults no
matter what).

But the core issue is that there is bad data being added to
an rpmdb through a different code path. As soon as the
segfault here with --Va (and with the alleged -qa issue) is
"fixed", the core issue -- bad data being added to an rpmdb --
will become invisible and will not be reported ever again (and
will still exist).

Many (I would claim most) segfaults in RPM are data related. In
spite of MANDATORY signature/digest checking on all
data read from *.rpm packages that is being routinely
disabled, as is ACID behavior.

Revision history for this message
Denis Silakov (dsilakov) wrote :

Yes, I agree with concerns about bad data. As for 'stopper' - I'm sure it's a stopper not for the development process, but for usage - i.e., when the tool is invoked in real-life environment, it leads to segfault.

As for the technical side - if I download 4 rpms attached to https://qa.mandriva.com/show_bug.cgi?id=64378 and run rpm only against them (e.g., 'rpm -qp --verify *rpm' - without installing them and then accessing rpmdb), then I also get a segfault.
Perhaps it is easier to perform debugging using this limited set of rpms then trying to bring rpmdb to a state that will cause problems. Though this can be a different issue, just with the same symptoms...

Revision history for this message
Jeff Johnson (n3npq) wrote :

Several FYI comments:

1) if you show me a stack trace and/or -vv output, I can
usually diagnose all issues related to RPM segfaults.

Without that data all I can do is "rpm therapy".

2) The likely culprit for "bad data" here is URPMI. The correct
engineering imho is to get UPMI "make check" under a CI buildbot
soonestly.

Note that Mageia is having a quite similar issue (to lack of automated CI)
wrto usage of timedRead() in genhdlist2 this weekend. Look for Oliver Blin
comments in this weekend's Launcpad sewage from the Mageia tracking bugs.

Make sense?

Revision history for this message
Denis Silakov (dsilakov) wrote :

Yes, sounds reasonable.

Revision history for this message
Jeff Johnson (n3npq) wrote :

Specific to your reported reproducer:

And rpmdb is stateful and persistent: the root cause may
have occurred long before your reproducer.

And rpmdb issues come in waves and pulses when, say, someone
has a "brain fart" on some patch. With the number of patches
that Mandriva (and Per Oyvind) have added, its likely time
to start the process of integration. I.e. I haven't any clue
what issues Mandriva <-> ROSA are seeing because of the number
of patches that are not integrated, and because RPM as used by
Mandriva/ROSA is seriously different from the code I have released
with lots of changes under
     #ifdef RPM_VENDOR_MANDRIVA
patching.

There is a blueprint here to integrate and/or reject per-vendor
"Have it our own way!" patches that I (as project leader @rpm5.org)
am actually forbidden (technically) to comment on or change.

The reality of "rpm therapy" is still the same:
    Show me the backtrace and/or -vv output please.

I actually prefer valgrind because if there is one bug, there's often
others, and other root causes, that need to be considered. But
I have no time to instruct developers on the usage of valgrind while
doing "rpm therapy".

Revision history for this message
Jeff Johnson (n3npq) wrote :

Reproduced (using rather older rpm-5.3.12):

$ rpm --version
rpm (RPM) 5.3.12

$ rpm -Kvv

D: ========== DSA pubkey id 9aa8d0d0 22458a98 (h#2238[0])
drakconf-icons-12.19.2-1.2-mdv2011.0.noarch.rpm:
    Header V3 DSA signature: OK, key ID 22458a98
    Header SHA1 digest: OK (f8edf85cedbba67cb38a79c235f06a86d6105c8f)
    MD5 digest: OK (00be5d2d6637f1ee8181406d029ec2f2)
D: Expected size: 2362939 = lead(96)+sigs(344)+pad(0)+data(2362499)
D: Actual size: 2362939
D: PUB: DD684D7A 26752624 V4 DSA
D: SIG: DD684D7A 26752624 V4 DSA-SHA1 POSITIVE
D: SUB: DD684D7A 26752624 V4 ELG(Encrypt-Only)
D: SIG: DD684D7A 26752624 V4 DSA-SHA1 SUBKEY_BIND
Segmentation fault

Adding --nosignature "fixes" (and there is means to do this persistently)
$ rpm -qp --nosignature *.rpm
doxygen-1.7.4-1-mdv2011.0.x86_64
drakconf-icons-12.19.2-1.2-mdv2011.0.noarch
ktorrent-4.1.1-3-mdv2011.0.x86_64
libgpod-0.8.2-2-mdv2011.0.x86_64

In fact no --signature is needed to do queries (related to Kirilenko's report)
$ rpm -qp *.rpm
doxygen-1.7.4-1-mdv2011.0.x86_64
drakconf-icons-12.19.2-1.2-mdv2011.0.noarch
ktorrent-4.1.1-3-mdv2011.0.x86_64
libgpod-0.8.2-2-mdv2011.0.x86_64

but this is an older rpm-5.3.12 version of rpm (likely what is in mdv2011).

No valgrind on this box (which is also in need of upgrade if I
am to actually diagnose current Cooker bugs in rpm).

Revision history for this message
Jeff Johnson (n3npq) wrote :

Adding 0x26752624 at this URL
     http://sks-keyservers.net/i/
indicates that the pub key is typical (rpm expects self-signed
pubkeys aka POSITIVE_CERTIFICATION, and does not support
subkey signing through gpg).

That pub key is widely used, so there's something specific
to how these pkgs were built (or there would be zillions of
bug reports).

What version of gpg was used to sign these packages? What is
different about how the
     drakconf-icons-12.19.2-1.2-mdv2011.0.noarch.rpm
was produced and signed and distributed?

I will delve into the RFC 2440/4880 packets included in the
drakconf-icons-12.19.2-1.2-mdv2011.0.noarch.rpm tags next
on the way to identifying the root cause.

Revision history for this message
Jeff Johnson (n3npq) wrote :

Note that there is a process flaw here as well. AFAICT,
adding a
      rpm -Kvv *.rpm
for all packages that are about to be distributed would
have caught this issue before all developers started
reporting "Me too" behaviors.

The issue needs to be fixed in RPM no matter what (been there
for months even if unreported).

Revision history for this message
Jeff Johnson (n3npq) wrote :

Hmmm ... there's also something specific in this bug to
the ordering in which the package signatures are verified.

A simple loop doesn't reproduce the flaw:
     for i in *.rpm; do echo "$i --"; rpm -Kvv $i; done

The pubkeys are cached to address concerns about rpm
doing network (or rpmdb) retrievals is the reason for the different
behavior imho.

Off to find valgrind to repair the flaw: I predict that it will also be
dependent on the order in which the 4 pkg signatures are verified
(and I will undertake the testing to confirm that guess).

Revision history for this message
Jeff Johnson (n3npq) wrote :

Simpler reproducer:
    rpm -Kvv drakconf*.rpm ktorrent*.rpm

But even there the problem is transient somehow (so valgrind is
MUSTHAVE to identify).

Revision history for this message
Jeff Johnson (n3npq) wrote :

I should point out that there is a very obvious "fix" here:
      Rip out the code in lib/verify.c that attempts to verify signatures.

This is new functionality in rpm-5.4 (that was back ported to rpm-5.3).

There's no reason why the code MUST be included in Mandriva/ROSA
for any reason that I am aware of.

The code WILL be repaired as part of mandatory signature checking
and multithreading, if those "features" survive into ROSA 2012 deliverables
(the features are ESSENTIAL in @rpm5.org ROADMAP's).

Revision history for this message
Jeff Johnson (n3npq) wrote :

There is also an intermediate solution:

     Make the functionality to verify signatures using --verify "opt-in" rather than "opt-out"
     (as currently implemented) using --nosignatures.

There's no reason to "dog food" all users with code that is segfaulting where the
priorities get skewed towards MUSTFIX and BLOCKER artificially.

The final behavior WILL be "opt-out" with an explicit --nosignatures disabler. What
is making the issue critical is that most developers are unprepared to add explicit disablers.

Denis Silakov (dsilakov)
Changed in rpm:
milestone: none → 5.4.7
Jeff Johnson (n3npq)
Changed in rpm:
milestone: 5.4.7 → 5.4.8
Revision history for this message
Anton Kirilenko (anton-kirilenko) wrote :

It looks like I've found a solution. Structure *rpmhkp_s* has some fields which in fact are the private variables for the function rpmhkpValidate from rpmio/rpmhkp.c. At the moment of the first call these fields are initialized with -1. When this function is executed the second time (to check the second package), these fields are not reinitialized and have some values that can affect the validation of package.

For example (look at rpmhkp.c:1010), field *tvalid* has the last value of *thistime* of pkt. Every pkt's *thistime* have to be greater than the previous one. But the *tvalid* comes from the previous package (it's not 0)! It causes pkt rejection and uvalidx is not updated (in the next line). At the end of the function (line 1058) we check (hkp->uidx >= 0 && hkp->uidx < hkp->npkts) is True and get hkp->pkts[hkp->uvalidx], where uvalidx is given from the previous package. Here segfault is :)

Revision history for this message
Anton Kirilenko (anton-kirilenko) wrote :

And of cause, I forgot to describe the solution: all the *hkp* fields have to be reinitialized before validation. I don't even understand why are these fields needed for. It can be local function variables (because it's an only place these fields are used in)

Revision history for this message
Denis Silakov (dsilakov) wrote :

The patch is included in ROSA 2012 LTS and it doesn't seem to produce any problems.

Why not to integrate the patch to upstream?

Revision history for this message
Jeff Johnson (n3npq) wrote :

The patch is not integrated because the bias in the existing test
cases that did not detect this flaw needs to be addressed.

The patch isn't forgotten and will be integrated as part of MANDATORY
signature checking. Meanwhile there has been little interest
in forward progress since February.

In fact "LSB package format" adoption and revisiting header+payload
signatures re-re-re-adding header+payload signatures hasn't started,
and so I haven't added this patch.

Jeff Johnson (n3npq)
Changed in rpm:
milestone: 5.4.8 → 5.4.11
Changed in mandriva:
importance: Unknown → Critical
status: Unknown → Confirmed
Jeff Johnson (n3npq)
Changed in rpm:
status: New → Confirmed
assignee: nobody → Jeff Johnson (n3npq)
importance: Medium → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.