RPM

memory leak in rpmlib

Bug #651509 reported by Jeff Johnson on 2010-09-29

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	RPM	New	Undecided	Unassigned
	Fedora	Won't Fix	Medium	redhat-bugs #480127

Bug Description

tracker

Tags:

Revision history for this message

In Red Hat Bugzilla #480127, Miroslav (miroslav-redhat-bugs) wrote on 2009-01-15:

Description of problem:
in the /lib/misc.c within routine rpmHeaderGetEntry (which is commonly seen in the core files from Signal 6 core files):

        /* XXX FIXME: memory leak. */
        msgstr = headerSprintf(h, fmt, rpmTagTable, rpmHeaderFormats,
&errstr);
        if (msgstr) {
            *p = (void *) msgstr;
            if (type) *type = RPM_STRING_TYPE;
            if (c) *c = 1;
            return 1;
        } else {
            if (c) *c = 0;
            return 0;
        }

I check last rpm (rpm-4.4.2-48) and it seems that thich code is still there. According our findings (see BZ 173424) this leak cause problem in RHN Satellite in long run.

Version-Release number of selected component (if applicable):
rpm-4.4.2-48

How reproducible:
hardly - see 173424 for details

Steps to Reproduce:
1. install rhn satellite with package specspo
2. put it under high load
3. try to rhnpush some package several times
4. rpmlib start to leaks and it will result in seg faults of httpd.

Actual results:
seg faults of httpd

Expected results:
rhnlib not leaks

Revision history for this message

In Red Hat Bugzilla #480127, Panu (panu-redhat-bugs) wrote on 2009-01-16:

AFAICT the memleak comment in rpmHeaderGetEntry() refers to the fact that for summary, description and group it returns a malloced string of RPM_STRING_TYPE which headerFreeData() doesn't free. The python bindings "know" this funky little detail and take care of it, and at least I'm not able to reproduce leakage from that.

Looking at the dumps in bug 173424, it seems to me more like setenv() related memory corruption, not leak. The rpm tag translation fiddles LANGUAGE environment variable back and forth for each translated item, and increments _nl_msg_cat_cntr on each change. On a very busy box, I could imagine _nl_msg_cat_cntr possibly wrapping around and maybe something can't handle that - I dunno, that's just a wild guess but there's all sorts of things piled up in here, for example perl doing something in this area:

==4465== Invalid free() / delete / delete[]
==4465== at 0x1B8FF382: free (vg_replace_malloc.c:235)
==4465== by 0x1BFFA0DE: Perl_safesysfree (in
/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE/libperl.so)
==4465== by 0x1BFFDC07: Perl_my_setenv (in
/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE/libperl.so)
==4465== by 0x1BDF9C0A: mod_perl_pass_env (perl_config.c:207)

To put it another way, obviously the rpm translation code is causing problems (the way it works is pretty wicked), but whether that's the bug or is it just triggering problems elsewhere is not that clear.

In the meanwhile, there's a much less intrusive way to disable the translations than having spacewalk conflict with specspo:

rpm.delMacro("_i18ndomains")

I'm not familiar with spacewalk codebase so can't suggest where exactly to put it, but somewhere after rpm module has been loaded is will do.

Revision history for this message

In Red Hat Bugzilla #480127, Panu (panu-redhat-bugs) wrote on 2009-01-23:

FWIW, perl's environment handling seems to be somewhat controversial. Possibly related:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=142523, for which the upstream report is here: http://rt.perl.org/rt3/Public/Bug/Display.html?id=1170

Also mod_perl has some interesting commentary:
     /* Force the environment to be copied out of its original location
        above argv[]. This fixes a crash caused when a module called putenv()
        before any Perl modified the environment - environ would change to a
        new value, and the check in my_setenv() to duplicate the environment
        would fail, and then setting some environment value which had a previous
        value would cause perl to try to free() something from the original env.
        This crashed free(). */
     my_setenv("MODPERL_ENV_FIXUP", "0");
     my_setenv("MODPERL_ENV_FIXUP", NULL);

CC'ing perl maintainer for possible comments.

Revision history for this message

In Red Hat Bugzilla #480127, Marcela (marcela-redhat-bugs) wrote on 2009-01-26:

I suppose rhn is using mod_perl for httpd? CC'ing mod_perl maintainer for his thoughts.

Revision history for this message

In Red Hat Bugzilla #480127, Miroslav (miroslav-redhat-bugs) wrote on 2009-01-26:

Yes, we use mod_perl for httpd.
For RHEL5 we use mod_perl from RHEL.
For RHEL4 we pack our own mod_perl (src.rpm taken from Red Hat Web Application Stack) since we need 2.0 and plain RHEL4 has 1.99

Revision history for this message

In Red Hat Bugzilla #480127, Joe (joe-redhat-bugs) wrote on 2009-01-28:

The env var handling in mod_perl 1.x looks like serious voodoo too me. Do you have some PerlPassEnv configured here?

There's lots I don't understand here.

1) this is reported against RHEL5 but bug 173424 seems to be talking about RHEL4/3 only. Is this problem reproducible on RHEL5 at all? With the RHEL5 httpd/mod_perl stack? mod_perl 1.x is *way* different from 2.x.

2) The report says:

"rpmlib start to leaks and it will result in seg faults of httpd."

is this two separate problems? A memory leak, and an unrelated crash? A memory leak which is leading to OOM and hence httpd crashing? Or do you not mean "start to leaks" but "starts to corrupt memory", or what? Or is it just conjecture that rpmlib is involved?

Revision history for this message

In Red Hat Bugzilla #480127, Miroslav (miroslav-redhat-bugs) wrote on 2009-01-28:

> Is this problem reproducible on RHEL5 at all?
I will try to reproduce it for RHEL5. I will try to find time for this next week.

> A memory leak, and an unrelated crash? A
memory leak which is leading to OOM and hence httpd crashing? Or do you not
mean "start to leaks" but "starts to corrupt memory", or what?

I think it was OOM crash. But ping cperry who has been working on that issue to clarify. Cliff?

Revision history for this message

In Red Hat Bugzilla #480127, Jan (jan-redhat-bugs) wrote on 2009-02-09:

(In reply to comment #6)
> > Is this problem reproducible on RHEL5 at all?
> I will try to reproduce it for RHEL5. I will try to find time for this next
> week.

Mirek, what's the status about getting reproducer for this?

If we do not have the reproducer, I intend to just ask QA to start testing Satellite with specspo installed, and hopefully their automation tests will be able to get some reproducer for us. Or not, in which case the problem simply does not materialize with latest Apache / rpm / whatever.

But I do not like the fact that we are removing the specspo in install.pl, never giving QA a chance to the thing.

Revision history for this message

In Red Hat Bugzilla #480127, Denise (denise-redhat-bugs) wrote on 2009-03-13:

Since no agreement on the problem or the fix, this is moving out for consideration in 5.5

Revision history for this message

In Red Hat Bugzilla #480127, Clifford (clifford-redhat-bugs) wrote on 2009-03-16:

#10

I will be honest in saying that I have not looked at this issue for over 2+ years, since we just removed specspo from the OS of Satellite systems to stop the Apache 1.3 segmentation faults from occurring. The Sig 11's were happening when mod_python called rpmlib to read rpm headers of rpm's being uploaded into a Satellite via apache. When specspo was installed we did some sort of in memory string translation which *somewhere* messed things up eventually leading to corruption and sig11 of Apache. At the time the current apache and rpm maintainers (over email communications) were unable to provide a solution other than the one I ultimately choose for Satellite.

I think the specspo would try to re-allocate a region of memory that had not been freed up yet, or buffer overflow (do not remember exactly). Somewhere/somehow things got confused and just crashed :)

While it would be great that new OS (RHEL 3 vs 4/5), new Apache (1.3 vs 2.x) helps, I doubt it.

If during Mirek's testing he was unable to replicate the issue any more, maybe glibc or either Apache is doing something better, rpm is better or it just disappeared. I would agree with comment #7 in allowing Satellite QE time to test with specspo installed and seeing if the issue is still reproducable.

Cliff

Revision history for this message

In Red Hat Bugzilla #480127, Jan (jan-redhat-bugs) wrote on 2009-03-23:

#11

I'd like to point out that for QE to test with specspo, we should make it easier for them by not silently removing specspo in install.pl.

In fact, I wonder if any of those

php|piranha|squirrelmail|specspo

packages listed there pose a problem.

Revision history for this message

In Red Hat Bugzilla #480127, RHEL (rhel-redhat-bugs) wrote on 2009-09-23:

#12

Development Management has reviewed and declined this request. You may appeal
this decision by reopening this request.

Jeff Johnson (n3npq) on 2010-09-29

tags:

added: memleak rhel

Jeff Johnson (n3npq) on 2010-10-18

tags:

added: i18n specspo

Bug Watch Updater (bug-watch-updater) on 2017-10-27

Changed in fedora:
importance:	Unknown → Medium
status:	Unknown → Won't Fix

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

debbugs #142523
[open normal] Edit
redhat-bugs #480127
[CLOSED WONTFIX] Edit
auto-rt.perl.org #1170 Edit

Bug watches keep track of this bug in other bug trackers.