Comment 16 for bug 1643911

Revision history for this message
Ian Wienand (iwienand) wrote :

(i just sent this to the list, but putting here too)

While I agree that a coredump is not that likely to help, I would also
like to come to that conclusion after inspecting a coredump :) I've
found things in the heap before that give clues as to what real
problems are.

To this end, I've proposed [2] to keep coredumps. It's a little
hackish but I think gets the job done. [3] enables this and saves any
dumps to the logs in d-g.

As suggested, running under valgrind would be great but probably
impractical until we narrow it down a little. Another thing I've had
some success with is electric fence [4] which puts boundaries around
allocations so out-of-bounds access hits at the time of access. I've
proposed [5] to try this out, but it's not looking particularly
promising unfortunately. I'm open to suggestions, for example maybe
something like tcalloc might give us a different failure and could be
another clue. If we get something vaguely reliable here, our best bet
might be to run a parallel non-voting job on all changes to see what
we can pick up.

[1] https://bugs.launchpad.net/nova/+bug/1643911
[2] https://review.openstack.org/451128
[3] https://review.openstack.org/451219
[4] http://elinux.org/Electric_Fence
[5] https://review.openstack.org/451136