nmap hang due to BADF ioctl inside select call is returning good rc

Bug #1861389 reported by Christian Ehrhardt 
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
glibc (Ubuntu)
New
Undecided
Unassigned
nmap (Ubuntu)
Incomplete
Undecided
Unassigned

Bug Description

Hi,
the reproducibility of this is hard, but I've seen it twice in a week now so it is not a stray cosmic ray after all.

I've seen nmap hang that was running with a timeout already which made me wonder.

$ nmap -Pn 192.168.122.0/24 --host-timeout 10

In PS the output looks normal usually in a wchan with a timeout:

$ ps axlf
...
4 0 10155 0 20 0 25372 17320 poll_s S ? 0:07 nmap -Pn 192.168.122.0/24 --host-timeout 10

Full wchan:
cat /proc/10155/wchan
poll_schedule_timeout.constprop.0

I found that nmap isn't dead, it is looping it seems.
Obviously on GDB you always find it at the timeout

0x00007fe5b73a725a in __GI___select (nfds=nfds@entry=5, readfds=readfds@entry=0x7ffec6db1540, writefds=writefds@entry=0x0, exceptfds=exceptfds@entry=0x0,
    timeout=timeout@entry=0x7ffec6db1670) at ../sysdeps/unix/sysv/linux/select.c:41
41 ../sysdeps/unix/sysv/linux/select.c: No such file or directory.
(gdb) bt
#0 0x00007fe5b73a725a in __GI___select (nfds=nfds@entry=5, readfds=readfds@entry=0x7ffec6db1540, writefds=writefds@entry=0x0, exceptfds=exceptfds@entry=0x0,
    timeout=timeout@entry=0x7ffec6db1670) at ../sysdeps/unix/sysv/linux/select.c:41
#1 0x000055900ff4f838 in pcap_select (p=p@entry=0x55901131ec10, timeout=timeout@entry=0x7ffec6db1670) at netutil.cc:1012
#2 0x000055900ff4fa15 in pcap_select (usecs=999993, p=0x55901131ec10) at netutil.cc:1031
#3 read_reply_pcap (pd=0x55901131ec10, to_usec=999993, accept_callback=accept_callback@entry=0x55900ff4e530 <accept_arp(unsigned char const*, pcap_pkthdr const*, int, size_t)>,
    p=p@entry=0x7ffec6db16f0, head=head@entry=0x7ffec6db1700, rcvdtime=rcvdtime@entry=0x7ffec6db1770, datalink=0x7ffec6db16ec, offset=0x7ffec6db16f8) at netutil.cc:4234
#4 0x000055900ff56014 in read_arp_reply_pcap (pd=<optimized out>, sendermac=sendermac@entry=0x7ffec6db1792 "", senderIP=senderIP@entry=0x7ffec6db176c, to_usec=<optimized out>,
    rcvdtime=rcvdtime@entry=0x7ffec6db1770, trace_callback=0x55900ff0ef70 <PacketTrace::traceArp(int, unsigned char const*, unsigned int, timeval*)>) at netutil.cc:4324
#5 0x000055900fefce80 in get_arp_result (USI=0x7ffec6db1890, stime=0x7ffec6db1860) at scan_engine_raw.cc:1557
#6 0x000055900fef5e67 in ultra_scan(std::vector<Target*, std::allocator<Target*> >&, scan_lists*, stype, timeout_info*) () at scan_engine.cc:2546
#7 0x000055900ff0e1cc in arpping (hostbatch=<optimized out>, num_hosts=255) at targets.cc:187
#8 0x000055900ff0ed00 in refresh_hostbatch (hs=<optimized out>, exclude_group=0x559010b87780, ports=0x559010142980 <ports>, pingtype=1) at targets.cc:591
#9 0x000055900ff0ee5d in nexthost (hs=hs@entry=0x7ffec6db2010, exclude_group=exclude_group@entry=0x559010b87780, ports=ports@entry=0x559010142980 <ports>, pingtype=<optimized out>)
    at targets.cc:644
#10 0x000055900fec8ec3 in nmap_main (argc=<optimized out>, argv=0x7ffec6db3058) at nmap.cc:2063
#11 0x000055900fe9e215 in main (argc=5, argv=0x7ffec6db3058) at main.cc:237

But strace has shown that the syscall is bad
# strace -rT -p 10155
strace: Process 10155 attached
     0.000000 select(5, [4], NULL, NULL, {tv_sec=0, tv_usec=438201}) = 0 (Timeout) <0.438773>
     0.439032 ioctl(-1, TIOCGPGRP, 0x7ffec6db1784) = -1 EBADF (Bad file descriptor) <0.000028>
     0.000170 getpgrp() = 0 <0.000021>
     0.000092 select(5, [4], NULL, NULL, {tv_sec=0, tv_usec=999992}) = 0 (Timeout) <1.001123>
     1.001253 ioctl(-1, TIOCGPGRP, 0x7ffec6db1784) = -1 EBADF (Bad file descriptor) <0.000019>
     0.000075 getpgrp() = 0 <0.000018>

I expected nmap to give up if the select call went bad - and indeed the code would:
1008 FD_SET(fd, &rfds);
1009
1010 do {
1011 errno = 0;
1012 ret = select(fd + 1, &rfds, NULL, NULL, timeout);
1013 if (ret == -1) {
1014 if (errno == EINTR)
1015 netutil_error("%s: %s", __func__, strerror(errno));
1016 else
1017 netutil_fatal("Your system does not support select()ing on pcap devices (%s). PLEASE REPORT THIS ALONG WITH DETAILED SYSTEM INFORMATION TO THE nmap-dev MAILING LIST!", strerror(errno));

But glibc returns a good RC on this ioctl error:
1012 ret = select(fd + 1, &rfds, NULL, NULL, timeout);
(gdb) n
1013 if (ret == -1) {
(gdb) p ret
$1 = 1

There isn't much more code in glbic shown in GDB
Breakpoint 1, pcap_select (p=p@entry=0x55901131ec10, timeout=timeout@entry=0x7ffec6db1670) at netutil.cc:976
976 in netutil.cc
(gdb) c
Continuing.

Breakpoint 2, __GI___select (nfds=nfds@entry=5, readfds=readfds@entry=0x7ffec6db1540, writefds=writefds@entry=0x0, exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7ffec6db1670)
    at ../sysdeps/unix/sysv/linux/select.c:39
39 {
(gdb) n
41 return SYSCALL_CANCEL (select, nfds, readfds, writefds, exceptfds,
(gdb) s

Is it a bug in glibc, would it need to return -1 in that case?

Note: it also is a bug in nmap to loop on it but they could say "but the answer fromt he socket call is wrong", so lets sort that out first.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

nmap task is incomplete until we know a hard reason glibc is returning rc=1 on a underlying EBADF ioctl.

Changed in nmap (Ubuntu):
status: New → Incomplete
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

There are too many moving parts to have this be a hard-fact. But I only saw it on x86 with kernel 5.4 so far. While other architectures or in former times this didn't happen.

Revision history for this message
Zdzisław Krajewski (d21d3q) wrote :

Similar thing. Debian 9. Haven't tried to reproduce it, bu I managed to attach to running process and grab some info:

# uname -a
Linux debian 4.9.0-12-amd64 #1 SMP Debian 4.9.210-1 (2020-01-20) x86_64 GNU/Linux

nmap -n -Pn -oG /tmp/nmap-<ip>-1432834534 -sS --defeat-rst-ratelimit -sU -p T:<long list of ports> --scan-delay 10 <ip>

# strace -p1212 -s9999
strace: Process 1212 attached
select(6, [5], NULL, NULL, {tv_sec=9, tv_usec=170055}) = 0 (Timeout)
ioctl(-1, TIOCGPGRP, 0x7ffe4a632bdc) = -1 EBADF (Bad file descriptor)
getpgrp() = 872
sendto(4, "E\0\0,/&\0\0005\6X\321\300\250\n\332P\370\341Z\3206\10:y\221\273\257\0\0\0\0`\2\4\0\210\237\0\0\2\4\5\264", 44, 0, {sa_family=AF_INET, sin_port=htons(2106), sin_addr=inet_addr("<ip>")}, 16) = 44
select(6, [5], NULL, NULL, {tv_sec=9, tv_usec=999728}) = 1 (in [5], left {tv_sec=9, tv_usec=964264})
ioctl(-1, TIOCGPGRP, 0x7ffe4a632bdc) = -1 EBADF (Bad file descriptor)
getpgrp() = 872
select(6, [5], NULL, NULL, {tv_sec=9, tv_usec=963845}^Cstrace: Process 1212 detached
 <detached ...>

and this repeats

(gdb) attach 1212
(gdb) bt
#0 0x00002ad3702905e3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:84
#1 0x0000563913f0c0b9 in pcap_select(pcap*, timeval*) ()
#2 0x0000563913f0c47d in pcap_select(pcap*, long) ()
#3 0x0000563913edfb74 in readip_pcap(pcap*, unsigned int*, long, timeval*, link_header*, bool) ()
#4 0x0000563913ecbbd1 in get_pcap_result(UltraScanInfo*, timeval*) ()
#5 0x0000563913ec43b1 in ultra_scan(std::vector<Target*, std::allocator<Target*> >&, scan_lists*, stype, timeout_info*) ()
#6 0x0000563913e969b1 in nmap_main(int, char**) ()
#7 0x0000563913e68c41 in main ()

# lsof -p 1212
(...)
nmap 1212 root mem REG 0,8 28012 socket:[28012] (stat: No such file or directory)
(...)

# ls -lh /proc/1212/fd
total 0
lr-x------ 1 root root 64 May 12 10:14 0 -> /dev/null
lrwx------ 1 root root 64 May 12 10:14 1 -> socket:[28003]
lrwx------ 1 root root 64 May 12 10:14 2 -> socket:[28003]
l-wx------ 1 root root 64 May 12 10:14 3 -> /tmp/nmap-<ip>-1432834534
lrwx------ 1 root root 64 May 12 10:14 4 -> socket:[28010]
lrwx------ 1 root root 64 May 12 10:14 5 -> socket:[28012]

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.