Python crashes with error "Inconsistent interned string state"

Bug #798561 reported by Rik Baeten
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
PyMQI
New
Undecided
Unassigned

Bug Description

Os is Solaris 10 pymqi 1.2 (with the string filter patch). It's perfectly working in Linux, but when ran on Solaris the python interpreter crashes with the following error and a core file:
<error>
Fatal Python error: Inconsistent interned string state.
Abort (core dumped)
</error>

This is a stacktrace of the core:
<stacktrace core>
 ff04c8d0 _lwp_kill (6, 0, ff0b4fc8, ff02c158, ffffffff, 6) + 8
 fefc1a4c abort (38, 1, ff01f8e4, eea74, ff0b3418, 0) + 110
 000c90e4 Py_FatalError (10e480, 0, 10e400, 0, 18, 7f) + 1c
 00060704 string_dealloc (c11020, 1, 138a38, 735ce8, fffffffe, 606dc) + 28
 000f5124 frame_dealloc (735e40, 6, 1, 5, 153e6c, c11020) + 24c
 000a2b84 PyEval_EvalFrameEx (63a7b0, ffbff254, 271f50, 1, 499ef0, 266598) + 71c0
 000a2b50 PyEval_EvalFrameEx (63a7b0, ffbff3bc, 1e7f50, 1, 49a6b0, 266598) + 718c
 000a2b50 PyEval_EvalFrameEx (2dc1a0, ffbff524, 1e7f98, 1, 49c770, 1dce60) + 718c
 000a2b50 PyEval_EvalFrameEx (2dc1a0, ffbff68c, 1eb068, 1, 49c7b0, 2dd0e0) + 718c
 000a2b50 PyEval_EvalFrameEx (4948a0, ffbff7f4, 1eb0f8, 1, 49c7f0, 2dad40) + 718c
 000a377c PyEval_EvalCodeEx (1eb0f8, 15d030, 1ad7d0, 0, 0, 0) + 73c
 000a3910 PyEval_EvalCode (1eb0f8, 176d20, 176d20, fffffffc, ff0b03bc, 0) + 28
 000c8c64 PyRun_FileExFlags (0, ffbffcda, 166248, 176d20, 176d20, 1) + c4
 000c9ba0 PyRun_SimpleFileExFlags (15c4e8, ffbffcda, 1, ffbffadc, 1, 15c400) + 158
 0001de50 Py_Main (2, 0, 1, ffbffcda, 15c400, 1) + 8a0
 0001d438 _start (0, 0, 0, 0, 0, 0) + 5c
</stacktrace core>

At the moment of the crash, my app wasn't doing anything special. For a list of around 2000 physical queues he checks which aliases are pointing to it (using the string filter) and some other small actions. Suddenly after say 1200 queues, suddenly python crashes.

Running the same application again, it will crash at exactly the same moment.

Now I've read the following somewhere in a forum:
<verbatim>
The only way it can happen is if some C code is doing a wild store,
corrupting memory it shouldn't be touching at all. C code may be in
core Python, or in any extension modules you use. Since reports of
this error remain so exceedingly rare, it's probably not in core
Python. Are you using any extensions? You really don't supply much
info here.

Wild stores are a big deal -- they can cause anything to happen.
Python string objects happen to have a field that should contain only
one of 3 possible values. You're getting the message because the
field doesn't have one of those 3 values. That means some C code has
gone insane.
</verbatim>

This statement is indirectly pointing to pymqe. Could the root cause be somewhere inside?

I've tried the program on Solaris without the string filter implementation (it's a LOT slower). And this is correctly working.

Revision history for this message
Dariusz Suchojad (dsuch) wrote :

Rik,

that Solaris box, is it SPARC or x86? 64-bit or 32-bit one? Also, what does the code look like exactly? I understand it's not doing anything special but what does it do anyway? :)

Revision history for this message
Rik Baeten (rik-baeten) wrote :

Dariusz,

Thanks already for your quick involvement.

The Solaris box is SPARC 64-bit.

It's difficult to post all code, since there are several abstractions in my framework and this particular app. In total I would have to post around 3000 lines of code.

Basically in this app I'm building browsable webpages that describe queue setups.

This is the part that directly calls pymqi:
<verbatim>
    def getMqObjectNamesQaWithTargq(self,ql):
        """Returns a list of the names of all alias queues pointing to the given physical queue."""
        #String filters are now implemented in pymqi: https://answers.launchpad.net/pymqi/+question/149408
        pcfQmgr=ql.getQueueManager().getConnectedPcfQueueManager()
        attrs = {MQCA_Q_NAME :'*', MQIA_Q_TYPE : MQQT_ALIAS, MQIACF_Q_ATTRS : MQCA_Q_NAME}
        # Probably at one of the following two lines the crash occurs. See below for more info.
        f1 = pymqi.Filter(MQCA_BASE_Q_NAME).equal(ql.getPropName())
        pcfRes = pcfQmgr.MQCMD_INQUIRE_Q(attrs, [f1])
        resultList=list()
        for pcfq in pcfRes:
            resultList.append(string.rstrip(pcfq[MQCA_Q_NAME]))
        return resultList
</verbatim>

This method is then (indirectly) used in this code (adapted for readability). Around the 1200th queue iteration of this loop, python crashes as described above. It's at the
<verbatim>
        ...
        for ql in qls:
            # determine which alias queues point to the given ql (standard or non-standard names)
            allStandardQasWithTargq=ql.getMqObjectsStandardQaWithTargq()
            allQasWithTargq=ql.getMqObjectsQaWithTargq()
            nonStandardQasWithTargq=[q for q in allQasWithTargq if q not in allStandardQasWithTargq]
            self._webPages.append(WebPageQl(ql,allStandardQasWithTargq,nonStandardQasWithTargq,environment))
            self._standardApps.add(ql.getConsumerApp())
        ...
</verbatim>

I hope this clarifies a bit what I'm doing.

Revision history for this message
Rik Baeten (rik-baeten) wrote :

Now I'm also having the same issue under Ubuntu 10.10 64bit (not just Solaris SPARC 64bit). Before I was using Ubuntu 32bit.

I'm still suspecting the pymqe c-code (buffer overflow error).

In order to get more info on what is happening, I have started the python program with the Valgrind memcheck tool, but this gives a lot of false postives (caused by the pymalloc mechanism on top of the normal malloc mechanism). However I have read that ideally you need to recompile the python interpreter in order to be able to deactivate pymalloc. I haven't done this yet though.

How could we get this bug solved? I can easily reproduce and than send the valgrind output or assist in another way. But I'm not sure if that would be useful for you.

What are your thoughts on this?

Revision history for this message
Dariusz Suchojad (dsuch) wrote :

Hi Rik,

unfortunately I won't be able to see to it for at least several weeks, I'm really pressed on time now. In that time though, would you please be able to come up with a self-contained code that I could run on my 64-bit Ubuntu install when I have some more time? Many, many thanks!

Revision history for this message
Dariusz Suchojad (dsuch) wrote :

Hi Rik,

does that still pose an issue? I'd hate to release 1.3 while this isn't closed but I guess I'll have to. I just have no access to a SPARC Solaris and have no means to reproduce it.

Revision history for this message
Dariusz Suchojad (dsuch) wrote :

I've just released PyMQI 1.3 and the development effort has moved to GitHub - https://github.com/dsuch/pymqi

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.