RPM

memory leak in rpmlib

Bug #651509 reported by Jeff Johnson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
RPM
New
Undecided
Unassigned
Fedora
Won't Fix
Medium

Bug Description

tracker

Revision history for this message
In , Miroslav (miroslav-redhat-bugs) wrote :

Description of problem:
in the /lib/misc.c within routine rpmHeaderGetEntry (which is commonly seen in the core files from Signal 6 core files):

        /* XXX FIXME: memory leak. */
        msgstr = headerSprintf(h, fmt, rpmTagTable, rpmHeaderFormats,
&errstr);
        if (msgstr) {
            *p = (void *) msgstr;
            if (type) *type = RPM_STRING_TYPE;
            if (c) *c = 1;
            return 1;
        } else {
            if (c) *c = 0;
            return 0;
        }

I check last rpm (rpm-4.4.2-48) and it seems that thich code is still there. According our findings (see BZ 173424) this leak cause problem in RHN Satellite in long run.

Version-Release number of selected component (if applicable):
rpm-4.4.2-48

How reproducible:
hardly - see 173424 for details

Steps to Reproduce:
1. install rhn satellite with package specspo
2. put it under high load
3. try to rhnpush some package several times
4. rpmlib start to leaks and it will result in seg faults of httpd.

Actual results:
seg faults of httpd

Expected results:
rhnlib not leaks

Revision history for this message
In , Panu (panu-redhat-bugs) wrote :

AFAICT the memleak comment in rpmHeaderGetEntry() refers to the fact that for summary, description and group it returns a malloced string of RPM_STRING_TYPE which headerFreeData() doesn't free. The python bindings "know" this funky little detail and take care of it, and at least I'm not able to reproduce leakage from that.

Looking at the dumps in bug 173424, it seems to me more like setenv() related memory corruption, not leak. The rpm tag translation fiddles LANGUAGE environment variable back and forth for each translated item, and increments _nl_msg_cat_cntr on each change. On a very busy box, I could imagine _nl_msg_cat_cntr possibly wrapping around and maybe something can't handle that - I dunno, that's just a wild guess but there's all sorts of things piled up in here, for example perl doing something in this area:

==4465== Invalid free() / delete / delete[]
==4465== at 0x1B8FF382: free (vg_replace_malloc.c:235)
==4465== by 0x1BFFA0DE: Perl_safesysfree (in
/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE/libperl.so)
==4465== by 0x1BFFDC07: Perl_my_setenv (in
/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE/libperl.so)
==4465== by 0x1BDF9C0A: mod_perl_pass_env (perl_config.c:207)

To put it another way, obviously the rpm translation code is causing problems (the way it works is pretty wicked), but whether that's the bug or is it just triggering problems elsewhere is not that clear.

In the meanwhile, there's a much less intrusive way to disable the translations than having spacewalk conflict with specspo:

rpm.delMacro("_i18ndomains")

I'm not familiar with spacewalk codebase so can't suggest where exactly to put it, but somewhere after rpm module has been loaded is will do.

Revision history for this message
In , Panu (panu-redhat-bugs) wrote :

FWIW, perl's environment handling seems to be somewhat controversial. Possibly related:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=142523, for which the upstream report is here: http://rt.perl.org/rt3/Public/Bug/Display.html?id=1170

Also mod_perl has some interesting commentary:
     /* Force the environment to be copied out of its original location
        above argv[]. This fixes a crash caused when a module called putenv()
        before any Perl modified the environment - environ would change to a
        new value, and the check in my_setenv() to duplicate the environment
        would fail, and then setting some environment value which had a previous
        value would cause perl to try to free() something from the original env.
        This crashed free(). */
     my_setenv("MODPERL_ENV_FIXUP", "0");
     my_setenv("MODPERL_ENV_FIXUP", NULL);

CC'ing perl maintainer for possible comments.

Revision history for this message
In , Marcela (marcela-redhat-bugs) wrote :

I suppose rhn is using mod_perl for httpd? CC'ing mod_perl maintainer for his thoughts.

Revision history for this message
In , Miroslav (miroslav-redhat-bugs) wrote :

Yes, we use mod_perl for httpd.
For RHEL5 we use mod_perl from RHEL.
For RHEL4 we pack our own mod_perl (src.rpm taken from Red Hat Web Application Stack) since we need 2.0 and plain RHEL4 has 1.99

Revision history for this message
In , Joe (joe-redhat-bugs) wrote :

The env var handling in mod_perl 1.x looks like serious voodoo too me. Do you have some PerlPassEnv configured here?

There's lots I don't understand here.

1) this is reported against RHEL5 but bug 173424 seems to be talking about RHEL4/3 only. Is this problem reproducible on RHEL5 at all? With the RHEL5 httpd/mod_perl stack? mod_perl 1.x is *way* different from 2.x.

2) The report says:

"rpmlib start to leaks and it will result in seg faults of httpd."

is this two separate problems? A memory leak, and an unrelated crash? A memory leak which is leading to OOM and hence httpd crashing? Or do you not mean "start to leaks" but "starts to corrupt memory", or what? Or is it just conjecture that rpmlib is involved?

Revision history for this message
In , Miroslav (miroslav-redhat-bugs) wrote :

> Is this problem reproducible on RHEL5 at all?
I will try to reproduce it for RHEL5. I will try to find time for this next week.

> A memory leak, and an unrelated crash? A
memory leak which is leading to OOM and hence httpd crashing? Or do you not
mean "start to leaks" but "starts to corrupt memory", or what?

I think it was OOM crash. But ping cperry who has been working on that issue to clarify. Cliff?

Revision history for this message
In , Jan (jan-redhat-bugs) wrote :

(In reply to comment #6)
> > Is this problem reproducible on RHEL5 at all?
> I will try to reproduce it for RHEL5. I will try to find time for this next
> week.

Mirek, what's the status about getting reproducer for this?

If we do not have the reproducer, I intend to just ask QA to start testing Satellite with specspo installed, and hopefully their automation tests will be able to get some reproducer for us. Or not, in which case the problem simply does not materialize with latest Apache / rpm / whatever.

But I do not like the fact that we are removing the specspo in install.pl, never giving QA a chance to the thing.

Revision history for this message
In , Denise (denise-redhat-bugs) wrote :

Since no agreement on the problem or the fix, this is moving out for consideration in 5.5

Revision history for this message
In , Clifford (clifford-redhat-bugs) wrote :

I will be honest in saying that I have not looked at this issue for over 2+ years, since we just removed specspo from the OS of Satellite systems to stop the Apache 1.3 segmentation faults from occurring. The Sig 11's were happening when mod_python called rpmlib to read rpm headers of rpm's being uploaded into a Satellite via apache. When specspo was installed we did some sort of in memory string translation which *somewhere* messed things up eventually leading to corruption and sig11 of Apache. At the time the current apache and rpm maintainers (over email communications) were unable to provide a solution other than the one I ultimately choose for Satellite.

I think the specspo would try to re-allocate a region of memory that had not been freed up yet, or buffer overflow (do not remember exactly). Somewhere/somehow things got confused and just crashed :)

While it would be great that new OS (RHEL 3 vs 4/5), new Apache (1.3 vs 2.x) helps, I doubt it.

If during Mirek's testing he was unable to replicate the issue any more, maybe glibc or either Apache is doing something better, rpm is better or it just disappeared. I would agree with comment #7 in allowing Satellite QE time to test with specspo installed and seeing if the issue is still reproducable.

Cliff

Revision history for this message
In , Jan (jan-redhat-bugs) wrote :

I'd like to point out that for QE to test with specspo, we should make it easier for them by not silently removing specspo in install.pl.

In fact, I wonder if any of those

   php|piranha|squirrelmail|specspo

packages listed there pose a problem.

Revision history for this message
In , RHEL (rhel-redhat-bugs) wrote :

Development Management has reviewed and declined this request. You may appeal
this decision by reopening this request.

Jeff Johnson (n3npq)
tags: added: memleak rhel
Jeff Johnson (n3npq)
tags: added: i18n specspo
Changed in fedora:
importance: Unknown → Medium
status: Unknown → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.