Syslogd locks up server

Bug #333962 reported by msp3k
2
Affects Status Importance Assigned to Milestone
sysklogd (Ubuntu)
New
Undecided
Unassigned

Bug Description

Binary package hint: sysklogd

Setup: Several servers and workstations logging to a single server acting as a loghost (named origo).

# lsb_release -rd
Description: Ubuntu 8.04.2
Release: 8.04

# dpkg-query --show | grep sysklogd
sysklogd 1.5-1ubuntu1

Initial Symptom: After several days of uptime, the loghost server would become unresponsive: NFS would break, SSH would hang on login, console logins would hang. Rebooting would fix the problem for another few days. Tried moving the loghost service to other machines, the problem would follow.

Additional investigation: Left the administrator account logged in at the console and sudo'd as root, in hopes that the next time the lockup occurred I would be able to investigate further. This worked. After some investigation, I have found that when the server becomes unresponsive, issuing the command "invoke-rc.d sysklogd restart" would fix the problem immediately. This has been confirmed over several lockup events.

An strace of the syslogd process during the last lockup shows the following:
--------------------------------------------------------------------
8162 00:51:20.683278 write(15, "\x30\x72\x02\x01\x02\x63\x6d\x04\x11\x64\x63\x3d\x6e\x69"..., 116) = 116
 | 00000 30 72 02 01 02 63 6d 04 11 64 63 3d 6e 69 6d 62 0r...cm. .dc=nimb |
 | 00010 69 6f 73 2c 64 63 3d 6f 72 67 0a 01 02 0a 01 00 ios,dc=o rg...... |
 | 00020 02 01 01 02 01 00 01 01 00 a0 35 a3 15 04 0b 6f ........ ..5....o |
 | 00030 62 6a 65 63 74 43 6c 61 73 73 04 06 69 70 48 6f bjectCla ss..ipHo |
 | 00040 73 74 a3 1c 04 0c 69 70 48 6f 73 74 4e 75 6d 62 st....ip HostNumb |
 | 00050 65 72 04 0c 31 36 30 2e 33 36 2e 34 36 2e 37 31 er..160. 36.46.71 |
 | 00060 30 12 04 02 63 6e 04 0c 69 70 48 6f 73 74 4e 75 0...cn.. ipHostNu |
 | 00070 6d 62 65 72 mber |
8162 00:51:20.683328 poll([{fd=15, events=POLLIN|POLLPRI|POLLERR|POLLHUP, revents=POLLIN}], 1, -1) = 1
8162 00:51:20.683731 read(15, "\x30\x0c\x02\x01\x02\x65\x07\x0a", 8) = 8
 | 00000 30 0c 02 01 02 65 07 0a 0....e.. |
8162 00:51:20.683771 read(15, "\x01\x00\x04\x00\x04\x00", 6) = 6
 | 00000 01 00 04 00 04 00 ...... |
8162 00:51:20.683820 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
8162 00:51:20.683870 sendto(1, "<86>Feb 24 00:51:20 syslogd: nss"..., 111, MSG_NOSIGNAL, NULL, 0) = ? ERESTARTSYS (To be restarted)
8162 00:51:39.613026 --- SIGALRM (Alarm clock) @ 0 (0) ---
8162 00:51:39.613109 rt_sigprocmask(SIG_BLOCK, [HUP ALRM], NULL, 8) = 0
8162 00:51:39.613164 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
8162 00:51:39.613206 rt_sigprocmask(SIG_UNBLOCK, [HUP ALRM], NULL, 8) = 0
8162 00:51:39.613231 writev(9, [{"Feb 24 00:51:15", 15}, {" ", 1}, {"tortoise", 8}, {" ", 1}, {"last message repeated 9 times", 29}, {"\n", 1}], 6) = 55
 * 15 bytes in buffer 0
 | 00000 46 65 62 20 32 34 20 30 30 3a 35 31 3a 31 35 Feb 24 0 0:51:15 |
 * 1 bytes in buffer 1
 | 00000 20 |
 * 8 bytes in buffer 2
 | 00000 74 6f 72 74 6f 69 73 65 tortoise |
 * 1 bytes in buffer 3
 | 00000 20 |
 * 29 bytes in buffer 4
 | 00000 6c 61 73 74 20 6d 65 73 73 61 67 65 20 72 65 70 last mes sage rep |
 | 00010 65 61 74 65 64 20 39 20 74 69 6d 65 73 eated 9 times |
 * 1 bytes in buffer 5
 | 00000 0a . |
8162 00:51:39.613303 rt_sigaction(SIGALRM, {0x4048e0, [ALRM], SA_RESTORER|SA_RESTART, 0x7fc406e88100}, {0x4048e0, [ALRM], SA_RESTORER|SA_RESTART, 0x7fc406e88100}, 8) = 0
8162 00:51:39.613334 alarm(30) = 0
8162 00:51:39.613357 rt_sigreturn(0x1e) = 44
8162 00:51:39.613379 sendto(1, "<86>Feb 24 00:51:20 syslogd: nss"..., 111, MSG_NOSIGNAL, NULL, 0) = ? ERESTARTSYS (To be restarted)
8162 00:52:09.615370 --- SIGALRM (Alarm clock) @ 0 (0) ---
8162 00:52:09.615414 rt_sigaction(SIGALRM, {0x4048e0, [ALRM], SA_RESTORER|SA_RESTART, 0x7fc406e88100}, {0x4048e0, [ALRM], SA_RESTORER|SA_RESTART, 0x7fc406e88100}, 8) = 0
8162 00:52:09.615467 alarm(30) = 0
8162 00:52:09.615487 rt_sigreturn(0x1e) = 44
8162 00:52:09.615508 sendto(1, "<86>Feb 24 00:51:20 syslogd: nss"..., 111, MSG_NOSIGNAL, NULL, 0) = ? ERESTARTSYS (To be restarted)
8162 00:52:39.627751 --- SIGALRM (Alarm clock) @ 0 (0) ---
8162 00:52:39.627793 rt_sigaction(SIGALRM, {0x4048e0, [ALRM], SA_RESTORER|SA_RESTART, 0x7fc406e88100}, {0x4048e0, [ALRM], SA_RESTORER|SA_RESTART, 0x7fc406e88100}, 8) = 0
8162 00:52:39.627840 alarm(30) = 0
8162 00:52:39.627866 rt_sigreturn(0x1e) = 44
8162 00:52:39.627887 sendto(1, "<86>Feb 24 00:51:20 syslogd: nss"..., 111, MSG_NOSIGNAL, NULL, 0) = ? ERESTARTSYS (To be restarted)
8162 00:53:09.640132 --- SIGALRM (Alarm clock) @ 0 (0) ---
8162 00:53:09.640170 rt_sigaction(SIGALRM, {0x4048e0, [ALRM], SA_RESTORER|SA_RESTART, 0x7fc406e88100}, {0x4048e0, [ALRM], SA_RESTORER|SA_RESTART, 0x7fc406e88100}, 8) = 0
8162 00:53:09.640217 alarm(30) = 0
8162 00:53:09.640245 rt_sigreturn(0x1e) = 44
8162 00:53:09.640266 sendto(1, "<86>Feb 24 00:51:20 syslogd: nss"..., 111, MSG_NOSIGNAL, NULL, 0) = ? ERESTARTSYS (To be restarted)
8162 00:53:39.652530 --- SIGALRM (Alarm clock) @ 0 (0) ---
8162 00:53:39.652581 rt_sigaction(SIGALRM, {0x4048e0, [ALRM], SA_RESTORER|SA_RESTART, 0x7fc406e88100}, {0x4048e0, [ALRM], SA_RESTORER|SA_RESTART, 0x7fc406e88100}, 8) = 0
8162 00:53:39.652627 alarm(30) = 0
8162 00:53:39.652649 rt_sigreturn(0x1e) = 44
8162 00:53:39.652670 sendto(1, "<86>Feb 24 00:51:20 syslogd: nss"..., 111, MSG_NOSIGNAL, NULL, 0) = ? ERESTARTSYS (To be restarted)
8162 00:54:09.664887 --- SIGALRM (Alarm clock) @ 0 (0) ---
8162 00:54:09.664927 rt_sigaction(SIGALRM, {0x4048e0, [ALRM], SA_RESTORER|SA_RESTART, 0x7fc406e88100}, {0x4048e0, [ALRM], SA_RESTORER|SA_RESTART, 0x7fc406e88100}, 8) = 0
8162 00:54:09.664975 alarm(30) = 0
8162 00:54:09.665002 rt_sigreturn(0x1e) = 44
8162 00:54:09.665023 sendto(1, "<86>Feb 24 00:51:20 syslogd: nss"..., 111, MSG_NOSIGNAL, NULL, 0) = ? ERESTARTSYS (To be restarted)
8162 00:54:39.677284 --- SIGALRM (Alarm clock) @ 0 (0) ---
8162 00:54:39.677323 rt_sigaction(SIGALRM, {0x4048e0, [ALRM], SA_RESTORER|SA_RESTART, 0x7fc406e88100}, {0x4048e0, [ALRM], SA_RESTORER|SA_RESTART, 0x7fc406e88100}, 8) = 0
8162 00:54:39.677371 alarm(30) = 0
8162 00:54:39.677398 rt_sigreturn(0x1e) = 44
8162 00:54:39.677419 sendto(1, "<86>Feb 24 00:51:20 syslogd: nss"..., 111, MSG_NOSIGNAL, NULL, 0) = ? ERESTARTSYS (To be restarted)
8162 00:55:09.689668 --- SIGALRM (Alarm clock) @ 0 (0) ---
8162 00:55:09.689707 rt_sigaction(SIGALRM, {0x4048e0, [ALRM], SA_RESTORER|SA_RESTART, 0x7fc406e88100}, {0x4048e0, [ALRM], SA_RESTORER|SA_RESTART, 0x7fc406e88100}, 8) = 0
8162 00:55:09.689754 alarm(30) = 0
8162 00:55:09.689781 rt_sigreturn(0x1e) = 44
8162 00:55:09.689802 sendto(1, "<86>Feb 24 00:51:20 syslogd: nss"..., 111, MSG_NOSIGNAL, NULL, 0) = ? ERESTARTSYS (To be restarted)
8162 00:55:39.702046 --- SIGALRM (Alarm clock) @ 0 (0) ---
8162 00:55:39.702085 rt_sigaction(SIGALRM, {0x4048e0, [ALRM], SA_RESTORER|SA_RESTART, 0x7fc406e88100}, {0x4048e0, [ALRM], SA_RESTORER|SA_RESTART, 0x7fc406e88100}, 8) = 0
8162 00:55:39.702132 alarm(30) = 0
8162 00:55:39.702160 rt_sigreturn(0x1e) = 44
8162 00:55:39.702180 sendto(1, "<86>Feb 24 00:51:20 syslogd: nss"..., 111, MSG_NOSIGNAL, NULL, 0) = ? ERESTARTSYS (To be restarted)
8162 00:56:09.714431 --- SIGALRM (Alarm clock) @ 0 (0) ---
8162 00:56:09.714473 rt_sigaction(SIGALRM, {0x4048e0, [ALRM], SA_RESTORER|SA_RESTART, 0x7fc406e88100}, {0x4048e0, [ALRM], SA_RESTORER|SA_RESTART, 0x7fc406e88100}, 8) = 0
8162 00:56:09.714526 alarm(30) = 0
8162 00:56:09.714547 rt_sigreturn(0x1e) = 44
8162 00:56:09.714567 sendto(1, "<86>Feb 24 00:51:20 syslogd: nss"..., 111, MSG_NOSIGNAL, NULL, 0) = ? ERESTARTSYS (To be restarted)
8162 00:56:39.726806 --- SIGALRM (Alarm clock) @ 0 (0) ---
8162 00:56:39.726845 rt_sigaction(SIGALRM, {0x4048e0, [ALRM], SA_RESTORER|SA_RESTART, 0x7fc406e88100}, {0x4048e0, [ALRM], SA_RESTORER|SA_RESTART, 0x7fc406e88100}, 8) = 0
--------------------------------------------------------------------

I have as yet been unsuccessful in reproducing the problem on demand.

Update: Tried "pkill -HUP syslogd", this has no effect. Only "invoke-rc.d sysklogd restart" will fix the problem.

Update: This bug exists in 8.10 amd64 as well:
$ lsb_release -rd
Description: Ubuntu 8.10
Release: 8.10

Update: Recompiled syslogd w/ -g, attached debugger and got a stack trace of syslogd while hung:
(gdb) info stack
#0 0x00007f977995ccb5 in send () from /lib/libc.so.6
#1 0x00007f9779957ff0 in __vsyslog_chk () from /lib/libc.so.6
#2 0x00007f9779958283 in __syslog_chk () from /lib/libc.so.6
#3 0x00007f977944f73b in ?? () from /lib/libnss_ldap.so.2
#4 0x00007f977945046e in ?? () from /lib/libnss_ldap.so.2
#5 0x00007f9779450aff in ?? () from /lib/libnss_ldap.so.2
#6 0x00007f97794534d5 in _nss_ldap_gethostbyaddr_r ()
   from /lib/libnss_ldap.so.2
#7 0x00007f97799753a2 in gethostbyaddr_r () from /lib/libc.so.6
#8 0x00007f977997513f in gethostbyaddr () from /lib/libc.so.6
#9 0x0000000000404ed8 in cvthname (f=0x7fff81e060a0) at syslogd.c:2149
#10 0x0000000000402e7d in main (argc=0, argv=0x7fff81e061a8) at syslogd.c:1236

Syslog appears to be attempting to resolve a hostname via LDAP. This may mean that the bug is not in syslog itself but rather in the resolver libraries?

msp3k (peek-nimbios)
description: updated
msp3k (peek-nimbios)
description: updated
msp3k (peek-nimbios)
description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.