Comment 39 for bug 2026757

Revision history for this message
Julia Kreger (juliaashleykreger) wrote (last edit ):

Petr messaged me and suggested maybe we try using rr to capture the execution and failure to aid in debugging, unfortunately the cpu performance events are unavailable on the machine I'm attempting reproduction on.

I did manage to spend a little time the last two days adding some additional debug logging into a source build of 2.90 which includes the patch Brian posted to the dnsmasq mailing list in regards to dhcpv6.

I was still able to reproduce this issue leveraging one of ironic's combined scenario tests jobs which exercises the dhcp configuration a number of times. I also turned off inotify updates, and dhcp6 in my local build, and was also still able to reproduce the failure.

I also tried sending a HUP signal a substantial number of times, and tried massaging the configuration files which were being loaded for static entries and I was still unable to reproduce the crash. There *IS* a distinct possibility I just didn't do it "enough", but reproduced crashes can barely be running for a long time and end up crashing.

From what I've seen, it appears that it can happen after a dhcp offer response has been sent back to a v4 client, however at least looking through the code, it appears netids being set is rather sparing to configuration loading and do_options in src/rfc2131.c. I unfortunately don't have the context to understand what and why that is being done in do_options.

I have also been able to figure out a change to prevent the sigabrt by only proceeding to the next iteration if netid->next is not null which seems to prevent crashing, but only masks the root cause and there is no telling how long and what impact that is having long term.