snmpd crashes with segfault (libnetsnmpmibs.so.15.1.2)

Bug #720638 reported by rhhoek
130
This bug affects 7 people
Affects Status Importance Assigned to Milestone
net-snmp (Ubuntu)
Fix Released
Medium
Athos Ribeiro
Jammy
Triaged
Medium
Athos Ribeiro

Bug Description

[ Impact ]

snmpd crashes irregularly with segfaults when running an internal _check_interface_entry_for_updates function.

[ Test Plan ]

As shown in the upstream bug at https://github.com/net-snmp/net-snmp/issues/107, this has no clear/trivial reproducer. We will rely on affected users to test our fix, as we have been doing while investigating the issue.

[ Where problems could occur ]

Apart from the usual concerns on rebuilding the package against new versions of its dependencies, the segfault in question may take days to manifest. Therefore, we may end up applying upstream fixes which do not really fix the exact observed issue. If this is the case, there will be no need to revert changes as long as they will not cause regressions since they are indeed bug fixes, but we may need to re-open this bug and keep chasing the proper fix if this happens.

[ Other Info ]

We are currently relying on WBTMagnum's feedback for this SRU. They have been providing great feedback on this bug for the past 18 months.

[ Original message ]

snmpd crashes with segfault:

Feb 15 19:54:23 linux060 kernel: [622538.454874] snmpd[1958]: segfault at 0 ip 00007f62f56d6167 sp 00007fffafb63060 error 6 in libnetsnmpmibs.so.15.1.2[7f62f5625000+10f000]

Description: Ubuntu 10.04.2 LTS
Release: 10.04

snmpd:
  Installed: 5.4.2.1~dfsg0ubuntu1-0ubuntu2.1
  Candidate: 5.4.2.1~dfsg0ubuntu1-0ubuntu2.1
  Version table:
 *** 5.4.2.1~dfsg0ubuntu1-0ubuntu2.1 0
        500 http://nl.archive.ubuntu.com/ubuntu/ lucid-updates/main Packages
        500 http://security.ubuntu.com/ubuntu/ lucid-security/main Packages
        100 /var/lib/dpkg/status
     5.4.2.1~dfsg0ubuntu1-0ubuntu2 0
        500 http://nl.archive.ubuntu.com/ubuntu/ lucid/main Packages

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: snmpd 5.4.2.1~dfsg0ubuntu1-0ubuntu2.1
ProcVersionSignature: Ubuntu 2.6.32-27.49-server 2.6.32.26+drm33.12
Uname: Linux 2.6.32-27-server x86_64
Architecture: amd64
Date: Thu Feb 17 11:54:37 2011
InstallationMedia: Ubuntu-Server 10.04 LTS "Lucid Lynx" - Release amd64 (20100427)
ProcEnviron: SHELL=/bin/bash
SNMPVersion:
 NET-SNMP version: 5.4.2.1
 Web: http://www.net-snmp.org/
 Email: <email address hidden>
SourcePackage: net-snmp
SyslogSnmptrapd:

mtime.conffile..etc.default.snmpd: 2011-01-28T09:26:16
mtime.conffile..etc.snmp.snmpd.conf: 2011-02-17T11:54:26.656267

Revision history for this message
rhhoek (r-h-hoek) wrote :
Revision history for this message
rhhoek (r-h-hoek) wrote :
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

rhhoek I've marked this bug private, as it includes your snmpd community name, which you may not have wanted to be publicly available. If not, then I suggest uploading new versions and redacting the community name before marking the report public again.

I'm also marking this as Confirmed. I've just tested this on a lucid amd64 chroot, and something in your snmpd.conf causes the segfault. I was able to get a core dump by starting the service with your snmpd.conf after setting ulimit -c unlimited

Also setting Importance to high.

visibility: public → private
Changed in net-snmp (Ubuntu):
status: New → Confirmed
importance: Undecided → High
assignee: nobody → Clint Byrum (clint-fewbar)
Revision history for this message
rhhoek (r-h-hoek) wrote :
Revision history for this message
rhhoek (r-h-hoek) wrote :
Revision history for this message
rhhoek (r-h-hoek) wrote :

Hi Clint,

Thanks for your concern about the snmp community name. In the original uploaded config the community name was a fake one already (zwaargeheim = verysecret). Now I have changed the IP-address too.
It is no problem to me to change the visibility back to public.
Good to see the the bug is reproducible.

visibility: private → public
Changed in net-snmp (Ubuntu):
assignee: Clint Byrum (clint-fewbar) → nobody
Revision history for this message
Andreas (andreas-stollar) wrote :

I am seeing the same segfault on my lucid boxes, config file attached with sensitive information removed. None of the extended checks are actually being done on the machines that are crashing, as the first hosts we upgraded to lucid do not do anything that complex -- run lighttpd and that is all. They do push 200Mbps of traffic fairly constantly, but load is always below 1.0.

Revision history for this message
Chuck Short (zulcss) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. Please try to obtain a backtrace following the instructions at http://wiki.ubuntu.com/DebuggingProgramCrash and upload the backtrace (as an attachment) to the bug report. This will greatly help us in tracking down your problem.

Revision history for this message
rhhoek (r-h-hoek) wrote : Re: [Bug 720638] Re: snmpd crashes with segfault (libnetsnmpmibs.so.15.1.2)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2011-03-07 15:01, Chuck Short wrote:
> Thank you for taking the time to report this bug and helping to make
> Ubuntu better. Please try to obtain a backtrace following the
> instructions at http://wiki.ubuntu.com/DebuggingProgramCrash and upload

I have followed the instructions. Until now, snmpd has crashed only
ones. Do I have to wait for a crash again before sending in an backtrace?

> the backtrace (as an attachment) to the bug report. This will greatly
> help us in tracking down your problem.
>

- --

Met vriendelijke groeten,

Roel Hoek
ICT Service Centre
University of Twente, P.O.Box 217, 7500 AE Enschede, The Netherlands
Telephone +31 53 489 4598, Fax +31 53 489 2383
<email address hidden>; http://www.utwente.nl/icts
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk12GtUACgkQJwlRSGnYBcY1RgCfT06hvgFfGDFniywLKXDZ1oTV
a6YAniRS9k0xsvjPweo1LyaHexh4LIiL
=roZM
-----END PGP SIGNATURE-----

Revision history for this message
Christoph Roeder (brightdroid) wrote :

Crashes on ubuntu 20.04 too:
---

...
Aug 19 18:30:07 server snmpd[1308748]: Cannot statfs /run/docker/netns/cf01dc8e9bbc: Permission denied
Aug 19 18:30:16 server snmpd[1308748]: error on subcontainer 'ifTable container' remove (-1)
Aug 19 18:30:19 server kernel: snmpd[1308748]: segfault at 91 ip 00007f6d89fde775 sp 00007fff92349a00 error 4 in libnetsnmpmibs.so.35.0.0[7f6d89f44000+d3000]

Revision history for this message
Lucas Kanashiro (lucaskanashiro) wrote :

@Christoph could you please try to provide a backtrace to us? Also if you have any settings in your config file different from the default, could you share it with us? You are using a newer version than the one originally reported in this bug, so that would be great to determine if this is a similar bug or something else.

Here you can find some information about debugging:

https://wiki.ubuntu.com/DebuggingProgramCrash

Revision history for this message
Branko Grubic (bgrubicatwork) wrote :

Hi I'm seeing these on one of the servers running docker. 20.04 running snmpd-5.8+dfsg-2ubuntu2.3:

kernel: snmpd[1866629]: segfault at 6f ip 00007fa535732775 sp 00007fffaf58c470 error 4 in libnetsnmpmibs.so.35.0.0[7fa535698000+d3000]

I was able to get some details, probably not enough from the crash data:

#0 __GI___libc_free (mem=0x2) at malloc.c:3102
        ar_ptr = <optimized out>
        p = <optimized out>
        hook = 0x0
#1 0x00007f4170726f6b in netsnmp_access_interface_entry_free (entry=0x55cbcc574880) at if-mib/data_access/interface.c:320
        __func__ = "netsnmp_access_interface_entry_free"
#2 0x00007f41706fc718 in ifTable_rowreq_ctx_cleanup (rowreq_ctx=rowreq_ctx@entry=0x55cbcc587fe0) at if-mib/ifTable/ifTable.c:228
        __func__ = "ifTable_rowreq_ctx_cleanup"
#3 0x00007f417072ae22 in ifTable_release_rowreq_ctx (rowreq_ctx=0x55cbcc587fe0) at if-mib/ifTable/ifTable_interface.c:626
        __func__ = "ifTable_release_rowreq_ctx"
#4 0x00007f4170578848 in _ssll_for_each (c=<optimized out>, f=0x7f417072bbd0 <_delete_missing_interface>, context=0x55cbcc3e7c20) at container_list_ssll.c:284
        sl = <optimized out>
        curr = 0x55cbcc594a50
#5 0x00007f417072cef8 in ifTable_container_load (container=0x55cbcc3e7c20) at if-mib/ifTable/ifTable_data_access.c:643
        cdc = {current = 0x55cbcc59b720, deleted = 0x55cbcc59c910}
        __func__ = "ifTable_container_load"
#6 0x00007f41709074ad in _cache_load (cache=0x55cbcc3e7b90) at helpers/cache_handler.c:735
        ret = -1
#7 0x00007f41705524b7 in run_alarms () at snmp_alarm.c:218
        a = 0x55cbcc3e9820
        clientreg = 5
        t_now = {tv_sec = 1922816, tv_usec = 13682}
        __func__ = "run_alarms"
#8 0x000055cbcba3a864 in receive () at snmpd.c:1376
        numfds = 8
        readfds = {lfs_setsize = 1024, lfs_setptr = 0x7fff01800d50, lfs_set = {fds_bits = {0 <repeats 16 times>}}}
        writefds = {lfs_setsize = 1024, lfs_setptr = 0x7fff01800de0, lfs_set = {fds_bits = {0 <repeats 16 times>}}}
        exceptfds = {lfs_setsize = 1024, lfs_setptr = 0x7fff01800e70, lfs_set = {fds_bits = {0 <repeats 16 times>}}}
        timeout = {tv_sec = 0, tv_usec = 0}
        tvp = <optimized out>
        count = 0
        block = 0
        i = <optimized out>
        sd = <optimized out>
        __func__ = "receive"
#9 0x000055cbcba3a10d in main (argc=<optimized out>, argv=<optimized out>) at snmpd.c:1125
        options = "aAc:CdD::fhHI:l:L:m:M:n:p:P:qrsS:UvV-:Y:g:u:x:X"
        arg = <optimized out>
        i = <optimized out>
        ret = <optimized out>
        exit_code = 1
        dont_fork = <optimized out>
        do_help = <optimized out>
        log_set = <optimized out>
        agent_mode = -1
        pid_file = 0x0
        option_compatability = "-Le"
        fd = <optimized out>
        PID = <optimized out>
        __func__ = "main"

The call trace seems similar to a one of the comments in this case
https://github.com/net-snmp/net-snmp/issues/107#issuecomment-886216752

Revision history for this message
Lucas Kanashiro (lucaskanashiro) wrote :

Thanks for the investigation Branko.

Checking the upstream issue you mentioned, it seems that the bug was introduced and fixed after the release of the version we have even in the Ubuntu development release, so I am not sure if this is the reason of your failure. Could you please try the upstream master branch to check if the issue is fixed there?

Revision history for this message
Branko Grubic (bgrubicatwork) wrote :

Hi Lucas,

I'm not able to build latest code from source at the moment, I tried to look for snmpd from backports ..., but no luck. If there are any more details I can get from the coredump I could try to provide them.

Regarding the upstream issue it is really confusing, some people reported using 5.8 (even on Ubuntu), some 5.9. Also what was fixed is not clearly mentioned. So no easy candidates to try and "backport"/patch 5.8.

Thanks,
Branko

Revision history for this message
Paride Legovini (paride) wrote :

Hi Branko,

I prepared a PPA with the new upstream release (amd64 only):

https://launchpad.net/~paride/+archive/ubuntu/net-snmp/

Please let us know if you can reproduce the issue with this one.

Note: some d/patches didn't apply cleanly, so I just removed them from d/p/series. They looked safe to remove (upstreamed, typo fixes and similar), but I didn't do a real check. The packages are experimental builds to test if v5.9.1 has a fix for this, please take them as such.

Changed in net-snmp (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Branko Grubic (bgrubicatwork) wrote :

Hi Paride,

Thank you very much for making the package, unfortunately the server in question is customer production server and have no test server where I can reproduce this issue, I'm not able to test anymore, I had a limited time to solve this, so I did some ugly workaround to prevent daemon from being stopped after it's killed and requiring manual intervention.

By creating a systemd service override:

/etc/systemd/system/snmpd.service.d/override.conf
[Service]
Restart=always

It's not perfect, but still there is a very small chance that service will not be running or crashes while being polled via SNMP.

Maybe other reporter who had these issues on 20.04 can test it. ( @brightdroid )

Revision history for this message
WBTMagnum (nemecek) wrote :

One of our server's ran into the same issue, snmpd frequently failing with:

snmpd[766]: error on subcontainer 'ifTable container' remove (-1)
kernel: [367589.916607] snmpd[766]: segfault at 91 ip 00007f2af4a20845 sp 00007ffcbd99d3e0 error 4 in libnetsnmpmibs.so.35.0.0[7f2af4986000+d3000]
systemd[1]: snmpd.service: Main process exited, code=dumped, status=11/SEGV
systemd[1]: snmpd.service: Failed with result 'core-dump'.

Running Ubuntu 20.04.6 LTS serving several docker containers. Since there seems to be no permanent solution up until now, I applied the systemd service override. Thanks for the tipp @Branko Grubic.

Revision history for this message
Andreas Zweili (zweili-contria-gmbh) wrote :

We're seeing this as well on Ubuntu 22.04:

snmpd[2271493]: error on subcontainer 'ifTable container' remove (-1)
kernel: show_signal_msg: 22 callbacks suppressed
kernel: snmpd[2271493]: segfault at 60 ip 00007f4c9c168423 sp 00007fffa460cdd0 error 4 in libnetsnmpmibs.so.40.1.0[7f4c9c0cc000+d300>Sep 05 06:31:09 co-srv-runnercontainer1 kernel: Code: 55 41 54 49 89 f4 55 53 48 89 fb 48 83 ec 68 4c 8b 2e 48 89 fe 64 48 8b 04 25 28 00 00 00 48 89 44 24 58 48 8b 47 78 systemd[1]: snmpd.service: Main process exited, code=dumped, status=11/SEGV
systemd[1]: snmpd.service: Failed with result 'core-dump'.
systemd[1]: snmpd.service: Consumed 1min 13.654s CPU time.

Combined with a lot of these errors:

snmpd[2006336]: ioctl 35123 returned -1
snmpd[2006336]: ioctl 35123 returned -1
snmpd[2006336]: ioctl 35111 returned -1
snmpd[2006336]: ioctl 35091 returned -1
snmpd[2006336]: ioctl 35105 returned -1
snmpd[2006336]: ioctl 35123 returned -1
snmpd[2006336]: ioctl 35123 returned -1
snmpd[2006336]: ioctl 35111 returned -1
snmpd[2006336]: ioctl 35091 returned -1
snmpd[2006336]: ioctl 35105 returned -1

I don't mind testing any possible fixes, it just might take a few days for them to show up on our system.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Thank you for the update.

Would you be able to provide a coredump, a backtrace and/or a reproducer? I did some research and found upstream's https://github.com/net-snmp/net-snmp/issues/107 which seems to be related, but the patch that fixes their issue doesn't make sense on Jammy's net-snmp, so I'm thinking that there might be more to this bug than meets the eye.

Thank you.

Revision history for this message
Tanguy Pelado (tanguypelado) wrote :

Hey there,

Running ubuntu 22.04, also seeing this issue.

SNMPD version : snmpd/jammy-updates,now 5.9.1+dfsg-1ubuntu2.6 amd64 [installed]

I've got docker containers running on those servers, and I'm using librenms to monitor them.

I've installed systemd-coredump and will follow through when I get one.

BR

Revision history for this message
Jan Wagner (waja) wrote :

Same on Debian bullseye and running docker. But just sometimes and even not on every instance. Strange!

Revision history for this message
granjerox (granjerox) wrote (last edit ):

Same problem here in three production servers

 sudo journalctl -u snmpd.service -n 50
Feb 12 16:15:14 accadb001 systemd[1]: Starting Simple Network Management Protocol (SNMP) Daemon....
Feb 12 16:15:14 accadb001 systemd[1]: Started Simple Network Management Protocol (SNMP) Daemon..
Feb 12 16:25:50 accadb001 snmpd[1675639]: ioctl 35123 returned -1
Feb 12 16:25:50 accadb001 snmpd[1675639]: ioctl 35123 returned -1
Feb 12 16:25:50 accadb001 snmpd[1675639]: ioctl 35111 returned -1
Feb 12 16:25:50 accadb001 snmpd[1675639]: ioctl 35091 returned -1
Feb 12 16:25:50 accadb001 snmpd[1675639]: ioctl 35105 returned -1
Feb 12 16:25:50 accadb001 snmpd[1675639]: ioctl 35123 returned -1
Feb 12 16:25:50 accadb001 snmpd[1675639]: ioctl 35123 returned -1
Feb 12 16:25:50 accadb001 snmpd[1675639]: IfIndex of an interface changed. Such interfaces will appear multiple times in IF-MIB.
Feb 12 16:25:50 accadb001 snmpd[1675639]: ioctl 35111 returned -1
Feb 12 16:25:50 accadb001 snmpd[1675639]: ioctl 35091 returned -1
Feb 12 16:25:50 accadb001 snmpd[1675639]: ioctl 35105 returned -1
Feb 12 16:30:53 accadb001 snmpd[1675639]: error on subcontainer 'ifTable container' remove (-1)
Feb 12 16:30:56 accadb001 snmpd[1675639]: error on subcontainer 'ifTable container' remove (-1)
Feb 12 16:30:57 accadb001 systemd[1]: snmpd.service: Main process exited, code=dumped, status=11/SEGV
Feb 12 16:30:57 accadb001 systemd[1]: snmpd.service: Failed with result 'core-dump'.
Feb 12 16:30:57 accadb001 systemd[1]: snmpd.service: Consumed 1.355s CPU time.

Linux accadb001 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Distributor ID: Ubuntu
Description: Ubuntu 22.04.2 LTS
Release: 22.04
Codename: jammy

ii libsnmp-base 5.9.1+dfsg-1ubuntu2.6 all SNMP configuration script, MIBs and documentation
ii libsnmp40:amd64 5.9.1+dfsg-1ubuntu2.6 amd64 SNMP (Simple Network Management Protocol) library
ii snmpd 5.9.1+dfsg-1ubuntu2.6 amd64 SNMP (Simple Network Management Protocol) agents

Revision history for this message
Lucas Kanashiro (lucaskanashiro) wrote :

Hi,

Could you please share the core dump with us? That would help us investigate better this issue.

Revision history for this message
granjerox (granjerox) wrote :

Sure, attached the crash file from /var/crash.

Revision history for this message
BloodyIron (bloodyiron) wrote :

I'm having the same problem seemingly only on my k8s nodes (running on docker, not containerd), Ubuntu 22.04, and erratic for frequency

snmpd/jammy-updates,now 5.9.1+dfsg-1ubuntu2.6 amd64 [installed]

Can't find a solution anywhere on the internet, so I guess this is me subscribing to this thread

Revision history for this message
Mitchell Dzurick (mitchdz) wrote :

Thanks for the crash report! Looking into the crash report I do think this is related to https://github.com/net-snmp/net-snmp/issues/107. I tried applying the patch to our Jammy version and unfortunately it does not apply completely due to differences in the codebase.

I did apply the lines I could and put them in a PPA[0] if you would like to test this out to see if your issues get resolved, although I'm not confident these changes alone will fix it.

I spent a little trying to reproduce it in a VM and could not find a way to trigger this error, so a reproducer would be very helpful here.

[0] - https://launchpad.net/~mitchdz/+archive/ubuntu/lp720638-snmpd-segv

Revision history for this message
WBTMagnum (nemecek) wrote :

@mitchdz I gave your PPA a spin on one of our systems running Ubuntu 22.04. Unfortunately snmpd segfaulted again after a few hours.

Revision history for this message
Paride Legovini (paride) wrote :

Hello, I backported net-snmp net-snmp_5.9.4+dfsg-1 to Jammy, it will be shortly available in this PPA:

  https://launchpad.net/~paride/+archive/ubuntu/net-snmp

Can you please check if the crash still happens with this version? This will be an important data point to understand how to move the investigation forward. Thanks!

Revision history for this message
WBTMagnum (nemecek) wrote :

@paride I applied the PPA you provided on an affected system shortly after you released it. Usually snmpd would crash within a day or two on that system. The longest reported period for snmpd to work was a bit more than 5 days. With the patched version snmpd is now running for 6 days without segfaulting. I deem this a good sign.

Revision history for this message
Paride Legovini (paride) wrote :

Great, however it is probably worth waiting a bit longer before drawing a conclusion, please keep us posted.

Should we conclude that 5.9.4+dfsg-1 is fixed, I think we should try (sort of) bisecting, e.g. by backporting 5.9.3+dfsg-1 to Jammy and having you test it. Given that it takes several days to reproduce the problem, this will be a long process. Are you willing to help us with it by testing more PPA packages and providing feedback?

Thanks!

Revision history for this message
WBTMagnum (nemecek) wrote :

Update: snmpd service is running for 12 days without segfaulting.

And yes, if it helps I can test further snmpd versions. It might also make sense to revert to the faulty snmpd version at some time to verify if the segfaults still happen.

Paride Legovini (paride)
tags: added: server-todo
Changed in net-snmp (Ubuntu):
status: Incomplete → Triaged
importance: High → Medium
Changed in net-snmp (Ubuntu):
assignee: nobody → Athos Ribeiro (athos-ribeiro)
Revision history for this message
Athos Ribeiro (athos-ribeiro) wrote :

Hi, nemecek!

As suggested by Paride, I backported 5.9.3+dfsg-1 to jammy in the following PPA:
https://launchpad.net/~athos-ribeiro/+archive/ubuntu/net-snmp

Would you mind testing it so we can start working on Paride's bisecting idea? Once we find the last bad version we can start searching for the patch (set) that fixes the issue.

Revision history for this message
WBTMagnum (nemecek) wrote :

Dear Athos,

I downgraded the system to the version provided in your PPA. I'll report back any issues.

JFTR: Up until now, v5.9.4.pre2 was still running without segfaulting.

Revision history for this message
Bryce Harrington (bryce) wrote :

Hi nemecek, how has the PPA version being running so far?

Revision history for this message
WBTMagnum (nemecek) wrote :

Dear Bryce,
NET-SNMP version 5.9.3 is up and running without incident for almost a week now.

Revision history for this message
Bryce Harrington (bryce) wrote :

Thanks WBTMagnum.

So it sounds like the bisect has narrowed to 5.9.1 showing the issue, and 5.9.3 showing the fix. From the CHANGES file, it sounds like there was no 5.9.2 release:

https://github.com/net-snmp/net-snmp/blob/master/CHANGES

Presumably, then, the fix would be one of the ones in this list:

https://github.com/net-snmp/net-snmp/compare/v5.9.1...v5.9.3

There are a number of memory leak fixes, which could fit the characteristics of this bug, but those were found mostly via Coverity and fuzz testing so they may be more theoretical issues. It could be worth cherrypicking all the commits in that range mentioning "memory leak", applying them to 5.9.1+dfsg-1ubuntu2.6, and testing that.

Other fixes that look interesting to my eye:
https://github.com/net-snmp/net-snmp/commit/167f3116cd552e71c4a746f3c63ddb710ec05332
https://github.com/net-snmp/net-snmp/commit/16be05e0cad51bd5b0e905066ea2092e574377fd
https://github.com/net-snmp/net-snmp/commit/24c519bf899b92049e19d84293929d0253cfb9e8
https://github.com/net-snmp/net-snmp/commit/5ecd04c215f0cbe416434cb5ee0e36ec81a3a63f
https://github.com/net-snmp/net-snmp/commit/42fe5ee281beba9793017018aa200a5183a8671e
https://github.com/net-snmp/net-snmp/commit/f1a8f8545a9d9a370487c196db230d5bc8f1adae
https://github.com/net-snmp/net-snmp/commit/1a560274d0f440d5c604c36b2c32b36e2a3d9a28

Revision history for this message
BloodyIron (bloodyiron) wrote :

So when are we going to see SNMPD v5.9.3 pushed out to currently-active LTS editions of Ubuntu? I have kubernetes nodes on 22.04 that have SNMPD crashing but not seeing v5.9.3 available, only v5.9.1 on the main repos. Seems like the fix is already available with a newer version... is there something holding back v5.9.3 being pushed to currently active LTS versions?

Revision history for this message
Athos Ribeiro (athos-ribeiro) wrote :

Hi Bloodyiron,

We usually do not update versions in stable releases in Ubuntu. What we are doing here is trying to isolate the exact version where the fix was introduced so we can find the actual patch set which fix the issue.

Next, I am going to provide a fix with some cherry-picks, as suggested by bryce, until we can isolate the fix. If you want to help, please, test the provided PPAs as nemecek has been doing (this would be of great help here).

Revision history for this message
Athos Ribeiro (athos-ribeiro) wrote :

Although there is a 5.9.2 version, it has no meaningful changes for our search here. As Bryce already mentioned, there are too many commits in between 5.9.1 and 5.9.2 (ideally, we should bisect there).

Since Bryce already did some digging into those commits, I will start checking them. The most promising one there seems to be https://github.com/net-snmp/net-snmp/commit/16be05e0cad51bd5b0e905066ea2092e574377fd. So let's start with that one.

WBTMagnum, I have prepared another PPA at https://launchpad.net/~athos-ribeiro/+archive/ubuntu/net-snmp-segfault. Would you mind trying the package in there (net-snmp_5.9.1+dfsg-1ubuntu2.7~ppa1)?

Revision history for this message
WBTMagnum (nemecek) wrote :

Dear Athos,

I installed the version you provided (net-snmp_5.9.1+dfsg-1ubuntu2.7~ppa1) almost immediately (just had to wait for the builds to complete).

Sadly snmpd crashed a few hours ago:
> Aug 26 10:35:57 gitlab-tools snmpd[3740395]: error on subcontainer 'ifTable container' remove (-1)
> Aug 26 10:36:00 gitlab-tools snmpd[3740395]: error on subcontainer 'ifTable container' remove (-1)
> Aug 26 10:36:00 gitlab-tools systemd[1]: snmpd.service: Main process exited, code=dumped, status=11/SEGV
> Aug 26 10:36:00 gitlab-tools systemd[1]: snmpd.service: Failed with result 'core-dump'.
> Aug 26 10:36:00 gitlab-tools systemd[1]: snmpd.service: Consumed 13min 24.761s CPU time.

So it seems the culprit must be in another commit. I'll keep this version running though, to see if it crashes again (as I would expect) now.

Revision history for this message
Athos Ribeiro (athos-ribeiro) wrote :

Thanks for testing! Let's start a bisect then. I am bisecting using tag v5.9.3 as the "new" version and tag v5.9.2 as the "old" version in the upstream project git repository.

The first commit to test is 2cd0e7d72a.

For this, I needed to include https://github.com/net-snmp/net-snmp/commit/d30d63523bfd9ccc85175e484fea821815273237 to fix a FTBFS issue; and
Backport https://salsa.debian.org/debian/net-snmp/-/blob/master/debian/patches/makefile_trap_needs_agent as a fix for https://github.com/net-snmp/net-snmp/issues/434.

It is available in https://launchpad.net/~athos-ribeiro/+archive/ubuntu/net-snmp-segfault as well (latest version in the PPA).
net-snmp - 5.9.1+dfsg-1ubuntu2.7~ppa2

Please, let me know how this goes so we can keep searching for a commit set with the proper fix here.

Revision history for this message
WBTMagnum (nemecek) wrote :

Dear Athos,

Just wanted to report, that 5.9.1+dfsg-1ubuntu2.7~ppa2 is running without problems for more than a week now. I'll keep you posted if anything changes.

Best regards,
Sascha

Revision history for this message
Athos Ribeiro (athos-ribeiro) wrote :

Thanks for testing!

I will mark that one as "new" in the bisecting and proceed to the next step, which is commit "500d763".

For this, I needed to include https://github.com/net-snmp/net-snmp/commit/d30d63523bfd9ccc85175e484fea821815273237 to fix a FTBFS issue; and
Backport https://salsa.debian.org/debian/net-snmp/-/blob/master/debian/patches/makefile_trap_needs_agent as a fix for https://github.com/net-snmp/net-snmp/issues/434.

It is available in https://launchpad.net/~athos-ribeiro/+archive/ubuntu/net-snmp-segfault as well (latest version in the PPA).
net-snmp - 5.9.1+dfsg-1ubuntu2.7~ppa3

Please, let me know how this goes so we can keep searching for a commit set with the proper fix here.

Revision history for this message
WBTMagnum (nemecek) wrote :

I installed version 5.9.1+dfsg-1ubuntu2.7~ppa3 on 2024-09-04 20:59:42.

After almost 6 days, the service crashed today:
{{{#!bash
$ systemctl status snmpd.service
× snmpd.service - Simple Network Management Protocol (SNMP) Daemon.
     Loaded: loaded (/lib/systemd/system/snmpd.service; enabled; vendor preset: enabled)
     Active: failed (Result: core-dump) since Tue 2024-09-10 14:03:46 CEST; 14min ago
    Process: 3102483 ExecStart=/usr/sbin/snmpd -LOw -u Debian-snmp -g Debian-snmp -I -smux,mteTrigger,mt>
   Main PID: 3102483 (code=dumped, signal=SEGV)
        CPU: 16min 37.126s

Sep 10 13:58:39 gitlab-tools snmpd[3102483]: ioctl 35105 returned -1
Sep 10 13:58:39 gitlab-tools snmpd[3102483]: ioctl 35123 returned -1
Sep 10 13:58:39 gitlab-tools snmpd[3102483]: ioctl 35123 returned -1
Sep 10 13:58:39 gitlab-tools snmpd[3102483]: ioctl 35111 returned -1
Sep 10 13:58:39 gitlab-tools snmpd[3102483]: ioctl 35091 returned -1
Sep 10 13:58:39 gitlab-tools snmpd[3102483]: ioctl 35105 returned -1
Sep 10 14:03:42 gitlab-tools snmpd[3102483]: error on subcontainer 'ifTable container' remove (-1)
Sep 10 14:03:46 gitlab-tools systemd[1]: snmpd.service: Main process exited, code=dumped, status=11/SEGV
Sep 10 14:03:46 gitlab-tools systemd[1]: snmpd.service: Failed with result 'core-dump'.
Sep 10 14:03:46 gitlab-tools systemd[1]: snmpd.service: Consumed 16min 37.126s CPU time.
}}}

Bryce Harrington (bryce)
Changed in net-snmp (Ubuntu Jammy):
assignee: nobody → Athos Ribeiro (athos-ribeiro)
importance: Undecided → Medium
status: New → Triaged
Changed in net-snmp (Ubuntu):
status: Triaged → Fix Released
status: Fix Released → Triaged
Revision history for this message
Athos Ribeiro (athos-ribeiro) wrote :

Thanks, Sascha.

I will mark that one as "old" in the bisecting and proceed to the next step, which is commit "a137fe6bd3".

For this, I needed to backport https://salsa.debian.org/debian/net-snmp/-/blob/master/debian/patches/makefile_trap_needs_agent as a fix for https://github.com/net-snmp/net-snmp/issues/434.

It is available in https://launchpad.net/~athos-ribeiro/+archive/ubuntu/net-snmp-segfault as well (latest version in the PPA).
net-snmp - 5.9.1+dfsg-1ubuntu2.7~ppa4

Please, let me know how this goes so we can keep searching for a commit set with the proper fix here.

Changed in net-snmp (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
WBTMagnum (nemecek) wrote :

Installed the new version this morning. snmpd died a few hours later with:

Sep 13 09:12:37 gitlab-tools systemd[1]: Starting Simple Network Management Protocol (SNMP) Daemon....
Sep 13 09:12:37 gitlab-tools systemd[1]: Started Simple Network Management Protocol (SNMP) Daemon..
Sep 13 13:03:54 gitlab-tools snmpd[608137]: Name of an interface changed. Such interfaces will keep its old name in IF-MIB.
Sep 13 13:03:57 gitlab-tools snmpd[608137]: malloc(): smallbin double linked list corrupted
Sep 13 13:03:57 gitlab-tools systemd[1]: snmpd.service: Main process exited, code=dumped, status=6/ABRT
Sep 13 13:03:57 gitlab-tools systemd[1]: snmpd.service: Failed with result 'core-dump'.
Sep 13 13:03:57 gitlab-tools systemd[1]: snmpd.service: Consumed 39.201s CPU time.

Please note, that the error message is different as before.

Revision history for this message
Athos Ribeiro (athos-ribeiro) wrote (last edit ):

Well, This is interesting now... A different issue may completely hinder our bisecting approach (i.e., is that other bug fixed at this point or not?).

After putting some thought on this one, I think the bisecting approach is no longer feasible here. Re-reading the whole bug history, I realized that granjerox has left a core dump here a while ago.

Analyzing the coredump, I got

#0 0x00007f3c55540423 in _check_interface_entry_for_updates (rowreq_ctx=0x55aab05e9230, cdc=0x7ffd7ec287d0) at mibgroup/if-mib/ifTable/ifTable_data_access.c:317
317 int lastchanged = rowreq_ctx->data.ifLastChange;

and then checked the contents in data:
(gdb) p rowreq_ctx->data
$3 = {ifLinkUpDownTrapEnable = 0, ifAlias = '\000' <repeats 63 times>, ifAlias_len = 0, ifCounterDiscontinuityTime = 0, ifentry = 0x0}
(gdb) p rowreq_ctx->data.ifLastChange
There is no member named ifLastChange.

Which shows the cause for the error we are seeing. Then, I found https://github.com/net-snmp/net-snmp/issues/107, which describes the exact same symptoms we are dealing with here, due to the same member of "data" not being set.

The fix for that issue is available at https://github.com/net-snmp/net-snmp/commit/d4b58c60367a262d829eb33e7888d28cd4337481. However, the fix description says that the bug was introduced in an improvement available at https://github.com/net-snmp/net-snmp/commit/600c54135b1015d56070f702d878772dd9f0d51e. The issue here is that this last change is __not__ present in 5.9.1 (jammy), where we are experiencing the problem. However, the latter was introduced to fix a race condition when scanning network interfaces, which could be the culprit here.

Since the fix for the upstream issue cannot be applied without the second commit I mentioned, I am proposing a new PPA version of net-snmp with both of this patches on top of jammy's current version.

Moreover, since these would not apply cleanly, I am also including
https://github.com/net-snmp/net-snmp/commit/8da919e4ad66dec376f54a6d2f7dd7a7fe68b8f0 which is s small refactoring in the code (which reduces one function call when searching for network interfaces), and https://github.com/net-snmp/net-snmp/commit/8bb544fbd2d6986a9b73d3fab49235a4baa96c23, which fixes a memory leak when reading network interfaces.

Therefore, I am providing a PPA with the following patches:
https://github.com/net-snmp/net-snmp/commit/8da919e4ad66dec376f54a6d2f7dd7a7fe68b8f0
https://github.com/net-snmp/net-snmp/commit/8bb544fbd2d6986a9b73d3fab49235a4baa96c23
https://github.com/net-snmp/net-snmp/commit/600c54135b1015d56070f702d878772dd9f0d51e
https://github.com/net-snmp/net-snmp/commit/d4b58c60367a262d829eb33e7888d28cd4337481
applied in the order they appear above.

The new package is available in the same PPA at https://launchpad.net/~athos-ribeiro/+archive/ubuntu/net-snmp-segfault

net-snmp - 5.9.1+dfsg-1ubuntu2.7~ppa5

Revision history for this message
WBTMagnum (nemecek) wrote :

@Athos Thank you for your efforts. ppa5 has been deployed. I'll report back with the results.

description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.