smtp "fatal: unknown service: smtp/tcp", probably after libc upgrade

Bug #1916541 reported by Eugene Crosser on 2021-02-22
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
glibc (Ubuntu)
Undecided
Unassigned
postfix (Ubuntu)
High
Unassigned

Bug Description

On hirsute, on Jan 20, postfix stopped sending outgoing mails with the error

Feb 22 23:35:18 pccross postfix/smtp[9582]: fatal: unknown service: smtp/tcp

There were no changes in the system other than daily upgrades.

Strace shows that `smtp` process (while chrooted), after successful open of the file in the queue, tried to use NSS, ran `fstat()` against `/etc/nsswitch.conf`, that succeeded (presumably using the copy in the /var/spool/postfix/etc/ directory), but there was no subsequent `open()`, and the next syscall is `sendto()` of the error message to syslog.

There was a big upgrade between the last successful and the first unsuccessful email delivery, and among other things libc was updated from 2.32-0ubuntu6 to 2.33-0ubuntu2. I suspect that there was some change in the NSS in libc that resulted in the failure of `getservbyname()`.

Is anybody else observing this problem?

ProblemType: Bug
DistroRelease: Ubuntu 21.04
Package: postfix 3.5.6-1
ProcVersionSignature: Ubuntu 5.10.0-14.15-generic 5.10.11
Uname: Linux 5.10.0-14-generic x86_64
ApportVersion: 2.20.11-0ubuntu59
Architecture: amd64
CasperMD5CheckResult: unknown
CurrentDesktop: X-Cinnamon
Date: Tue Feb 23 00:02:13 2021
EtcMailname: pccross.average.org
Hostname: pccross
PostconfMydomain: localdomain
PostconfMyhostname: pccross.localdomain
PostconfMyorigin: /etc/mailname
ResolvConf:
 # Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
 # DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN

 nameserver 127.0.0.1
 search average.org
SourcePackage: postfix
UpgradeStatus: No upgrade log present (probably fresh install)

Eugene Crosser (crosser) wrote :
description: updated

Thank you for taking the time to file a bug report.

I tried to reproduce your bug in a Ubuntu Hirsute container without success. Moreover, I looked for an upstream bug report about any incompatibility with glibc 2.33 but found nothing. Could you please share your postfix config file to help us understand what is going on here?

Since there is not enough information in your report to begin triage or to
differentiate between a local configuration problem and a bug in Ubuntu, I
am marking this bug as "Incomplete". We would be grateful if you would:
provide a more complete description of the problem, explain why you
believe this is a bug in Ubuntu rather than a problem specific to your
system, and then change the bug status back to "New".

For local configuration issues, you can find assistance here:
http://www.ubuntu.com/support/community

Changed in postfix (Ubuntu):
status: New → Incomplete
Eugene Crosser (crosser) wrote :
Download full text (3.3 KiB)

Hello @lucaskanashiro,

it may work differently in a container because the problem may be triggered by a combination of libc components and possibly the kernel (only a speculation). I can reproduce the problem on another hirsute machine, postfix was never used there. I `dpkg-reconfigure`d it as a "satellite system" and sent an email. Sure enough, it got stuck in the queue the same way:

crosser@journey:~$ mailq
-Queue ID- --Size-- ----Arrival Time---- -Sender/Recipient-------
6712B36270B 422 Tue Feb 23 21:36:05 <email address hidden>
                                                (unknown mail transport error)
                                         <email address hidden>

-- 0 Kbytes in 1 Request.

I am attaching all relevant files.

> not enough information in your report to begin triage or to differentiate between a local configuration problem and a bug in Ubuntu

Well, nothing in the local configuration changed between Feb 20 when delivery worked and Feb 21 when it stopped to work...

> provide a more complete description of the problem
The most complete I could provide short of recompiling postfix and adding debug printouts was that strace looks like libc's `getservbyname()` returned failure when it was called from a chrooted process, without trying to access `/etc/services` or even opening `/etc/nsswitch.conf` (but checked `stat` of the latter, which was successful).

> explain why you believe this is a bug in Ubuntu rather than a problem specific to your
system

As I explained in my first report (and another time above), there were no local changes (other than package upgrades) between the moment deliver worked and the moment after which it stopped working.

?field.comment=Hello @lucaskanashiro,

it may work differently in a container because the problem may be triggered by a combination of libc components and possibly the kernel (only a speculation). I can reproduce the problem on another hirsute machine, postfix was never used there. I `dpkg-reconfigure`d it as a "satellite system" and sent an email. Sure enough, it got stuck in the queue the same way:

crosser@journey:~$ mailq
-Queue ID- --Size-- ----Arrival Time---- -Sender/Recipient-------
6712B36270B 422 Tue Feb 23 21:36:05 <email address hidden>
                                                (unknown mail transport error)
                                         <email address hidden>

-- 0 Kbytes in 1 Request.

I am attaching all relevant files.

> not enough information in your report to begin triage or to differentiate between a local configuration problem and a bug in Ubuntu

Well, nothing in the local configuration changed between Feb 20 when delivery worked and Feb 21 when it stopped to work...

> provide a more complete description of the problem
The most complete I could provide short of recompiling postfix and adding debug printouts was that strace looks like libc's `getservbyname()` returned failure when it was called from a chrooted process, without trying to access `/etc/services` or even opening `/etc/nsswitch.conf` (but checked `stat` of the latter, which was successful).

> explain why you believe this is a bug in Ubuntu rather than a problem sp...

Read more...

Eugene Crosser (crosser) on 2021-02-23
Changed in postfix (Ubuntu):
status: Incomplete → New
Eugene Crosser (crosser) wrote :

I did some more investigation, and did not find an explanation, but I am writing it here to maybe save somebody else some effort.

Postfix uses normal(?) `getservbyname(service, protocol)` in the file src/smtp/smtp_connect.c line 363:

        if ((sp = getservbyname(service, protocol)) == 0)
            msg_fatal("unknown service: %s/%s", service, protocol);

and I see this message in the log.

On the other hand, a simple program that performs `chroot("/var/spool/postfix")` and calls `getservbyname()` _works correctly_ on hirsute (C code in attachment):

crosser@pccross:~/src$ sudo ./getsrvname
[sudo] password for crosser:
s_name="smtp", s_port=25, s_proto="tcp"

Note though that in such form the the program does not try to use NSS, it just opens `/etc/services` directly. While postfix's `smtp` goes through nsswitch, as can be seen in the strace log from my previous attachment.

Thanks for providing more information.

Unfortunately, I still cannot reproduce the problem. This time I tried using a Hirsute VM instead of a container, but everything worked as expected. I was able to send an email from inside the VM, the email went through without problems, and I received it successfully.

You mentioned that you experience the issue even if you configure postfix to act as a satellite system. Could you perhaps provide the configuration files you used to do that? I did the same here, and I think it is unlikely that the problem is in some configuration file, but hey, here goes nothing...

Also, and I apologize if you have already done this, but please make sure that your Hirsute system is up-to-date. I'm using a completely up-to-date Hirsute VM with absolutely nothing extra installed except postfix and sasl2-bin.

Thanks.

Changed in postfix (Ubuntu):
status: New → Incomplete
Eugene Crosser (crosser) wrote :
Download full text (4.3 KiB)

Hello @sergiodj,

> Could you perhaps provide the configuration files you used to do that?

My complete configuration is included in the tar attached to this ticket.

> please make sure that your Hirsute system is up-to-date

Yes, updated daily, _plus_ by hand right before the investigation.
Note the that problem (quite certainly) _was triggered by an update_ of the libc package.

Perhaps worth to mention, my systems are being upgraded between distro versions, not fresh-installed.

In the meanwhile, I did more investigation. First, I checked that the problem is, as I suspected, connected to running `getservbyname()` under chroot. When I change smtp line in `master.cf` to not run under chroot, i.e.

smtp unix - - y - - smtp

is changed to

smtp unix - - n - - smtp

MAIL DELIVERY STARTS TO WORK!

So at least I have a workaround now. But the problem still stands.

Now here is something interesting: strace of the `smtp` process when it is run under chroot and without chroot:

===== under chroot =====
[pid 224329] socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 14
[pid 224329] connect(14, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
[pid 224329] close(14) = 0
[pid 224329] socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 14
[pid 224329] connect(14, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
[pid 224329] close(14) = 0
[pid 224329] newfstatat(AT_FDCWD, "/etc/nsswitch.conf", {st_mode=S_IFREG|0644, st_size=699, ...}, 0) = 0
[pid 224329] newfstatat(AT_FDCWD, "/", {st_mode=S_IFDIR|0755, st_size=4096, ...}, 0) = 0
[pid 224329] getpid() = 224329
[pid 224329] sendto(7, "<18>Feb 24 20:48:11 postfix/smtp"..., 92, MSG_NOSIGNAL, NULL, 0) = 92
========

(the last line is the error message about "smtp"/"tcp" service. No attempt to load any nsswitch modules!)

==== no chroot ====
[pid 224736] socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 14
[pid 224736] connect(14, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
[pid 224736] close(14) = 0
[pid 224736] socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 14
[pid 224736] connect(14, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
[pid 224736] close(14) = 0
[pid 224736] newfstatat(AT_FDCWD, "/etc/nsswitch.conf", {st_mode=S_IFREG|0644, st_size=699, ...}, 0) = 0
[pid 224736] openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 14
[pid 224736] newfstatat(14, "", {st_mode=S_IFREG|0644, st_size=177082, ...}, AT_EMPTY_PATH) = 0
[pid 224736] mmap(NULL, 177082, PROT_READ, MAP_PRIVATE, 14, 0) = 0x7f35ad920000
[pid 224736] close(14) = 0
[pid 224736] access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
[pid 224736] openat(AT_FDCWD, "/lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v2/libnss_db.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 224736] newfsta...

Read more...

Paride Legovini (paride) wrote :

Hi Eugene,

I could reproduce the issue from a clean Hirsute VM. Steps:

1. lxc launch ubuntu-daily:hirsute paride-h-vm --vm
2. lxc exec paride-h-vm bash
3. apt update
4. apt full-upgrade
5. apt install postfix
   # Debconf config as: Satellite system.
   # All the rest left with default values
6. apt install bsd-mailx
7. echo x | mailx <email address hidden>
8. mailq. Check the "unknown mail transport error":

-Queue ID- --Size-- ----Arrival Time---- -Sender/Recipient-------
3D4CB3F4DB 370 Thu Feb 25 13:52:53 <email address hidden>
                                                (unknown mail transport error)
                                         <email address hidden>

9. cat /var/log/mail.log. There it is:

Feb 25 13:52:53 paride-h-vm postfix/smtp[10307]: fatal: unknown service: smtp/tcp

I'm not sure if this should be considered a postfix bug or a glibc bug at this point. If this is a regression caused by a recent glibc upload then other packages may be affected, and action should be taken on that side.

I'm adding a "glibc" task to get attention from the glibc maintainers, and raising the importance of this bug. Thanks a lot for sharing all the details of your debugging work!

Changed in postfix (Ubuntu):
status: Incomplete → Confirmed
importance: Undecided → Critical
importance: Critical → High
Paride Legovini (paride) wrote :

Note: the very same steps in a ubuntu:focal VM do not lead to the same issue.

Eugene Crosser (crosser) wrote :

Thanks Paride,
I am glad that you could reproduce it, and it's not like either I or my system is crazy :)

I noticed something that may or may not be relevant. I have another machine with hirsute, that was purchased a year+ ago and has undergone less package upgrades: mail delivery _worked_ there. I did _not_ reconfigure it as "satellite" and I don't remember how it was configured at the start, probably as all default.

The difference that I noticed is that the contents of /var/spool/postfix/usr/lib (as filled by posttix's startup procedure) is significantly different. One wild speculation would be that some shared lib there is opportunistically loaded that refers to an earlier glibc version, and that subsequently somehow results in impossibility to load nss libraries.

tags: added: server-next
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers