Close all inherited unnecessary file descriptor

Bug #1096975 reported by Attila Fazekas
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Expired
Undecided
Unassigned

Bug Description

In Unix and Unix like systems the child process after a fork(2) (or clone(2)) call inherits the parent process's file descriptor .

The execve(2) preserves the file descriptors as well, unless FD_CLOEXEC is set on the given fd.

Now the devstack leaking file descriptors, so you can see the fd 3 and 6 are inherited by all swift processes.

Example /proc/$pid/fd listing:
lrwx------. 1 stack stack 64 Jan 7 18:36 0 -> /dev/null
lrwx------. 1 stack stack 64 Jan 7 18:36 1 -> /dev/null
lrwx------. 1 stack stack 64 Jan 7 18:36 2 -> /dev/null
lrwx------. 1 stack stack 64 Jan 7 18:36 3 -> /dev/pts/0
lrwx------. 1 stack stack 64 Jan 7 18:36 5 -> socket:[197825]
lrwx------. 1 stack stack 64 Jan 7 18:36 6 -> /dev/pts/0

However the parent process is the primary responsible to avoid this. The service processes should make sure they do not waste fd resources as well.

The swift-init should not let their children to inherit his inherited file descriptors!

Note:
In bash you can create new inheritable fd by similar command to this:
exec 7>&1

Revision history for this message
Kun Huang (academicgareth) wrote :

hi Attila
I'm not clear about your understanding of "waste" :

Below is my thinking: (case about proxy process)
$/proc/39684/fd: ls -alh
dr-x------ 2 swift swift 0 Mar 13 16:06 .
dr-xr-xr-x 9 swift swift 0 Mar 12 13:14 ..
lrwx------ 1 swift swift 64 Mar 13 16:06 0 -> /dev/null
lrwx------ 1 swift swift 64 Mar 13 16:06 1 -> /dev/null
lrwx------ 1 swift swift 64 Mar 13 16:06 2 -> /dev/pts/5
lrwx------ 1 swift swift 64 Mar 13 16:06 3 -> socket:[394661]
lrwx------ 1 swift swift 64 Mar 13 16:06 4 -> socket:[394662]
lrwx------ 1 swift swift 64 Mar 13 16:06 5 -> socket:[394663]
lrwx------ 1 swift swift 64 Mar 13 16:06 6 -> socket:[394664]
lrwx------ 1 swift swift 64 Mar 13 16:06 7 -> socket:[394665]

and check all sockets available or not like that:
$cat /proc/net/tcp | grep 39466
4: 00000000:22B8 00000000:0000 0A 00000000:00000000 00:00000000 00000000 1000 0 394665 1 0000000000000000 100 0 0 10 -1

That socket is (0.0.0.0:8888, 0.0.0.0:0) which belongs to proxy process. But in /proc/39684/fd, file descriptor 3~6 is wasted?

That's your understanding?

Revision history for this message
Attila Fazekas (afazekas) wrote :

Yes the socket 3,6 in my example does not need to be there. It is completely unnecessary.

/dev/null usage is also interesting... , but let's say it is normal, because some funny library might have the idea to miss use the 0..2 .
[root@new32 ~]# ls -l /proc/5298/fd
total 0
lrwx------. 1 root root 64 Mar 13 11:58 0 -> /dev/null
lrwx------. 1 root root 64 Mar 13 11:58 1 -> /dev/null
lrwx------. 1 root root 64 Mar 13 11:58 2 -> /dev/null
lrwx------. 1 root root 64 Mar 13 11:58 3 -> socket:[1012889]
lrwx------. 1 root root 64 Mar 13 11:58 4 -> socket:[1012891]
lrwx------. 1 root root 64 Mar 13 11:58 5 -> socket:[1012897]
lrwx------. 1 root root 64 Mar 13 11:58 6 -> socket:[1012898]

[root@new32 ~]# exec 7>&1
[root@new32 ~]# exec 8>&1
[root@new32 ~]# exec 9>&1
[root@new32 ~]# exec 6>&1
[root@new32 ~]# exec 5>&1
[root@new32 ~]# swift-init restart all

ls -l /proc/6129/fd
total 0
lrwx------. 1 root root 64 Mar 13 12:08 0 -> /dev/null
lrwx------. 1 root root 64 Mar 13 12:08 1 -> /dev/null
lr-x------. 1 root root 64 Mar 13 12:08 10 -> pipe:[1017372]
lrwx------. 1 root root 64 Mar 13 12:08 11 -> socket:[1017447]
lrwx------. 1 root root 64 Mar 13 12:08 12 -> socket:[1017452]
lrwx------. 1 root root 64 Mar 13 12:08 13 -> socket:[1017494]
lrwx------. 1 root root 64 Mar 13 12:08 14 -> socket:[1017501]
lrwx------. 1 root root 64 Mar 13 12:08 15 -> socket:[1017504]
lrwx------. 1 root root 64 Mar 13 12:08 16 -> socket:[1017529]
lrwx------. 1 root root 64 Mar 13 12:08 2 -> /dev/null
lr-x------. 1 root root 64 Mar 13 12:08 3 -> pipe:[1017366]
lr-x------. 1 root root 64 Mar 13 12:08 4 -> pipe:[1017369]
lrwx------. 1 root root 64 Mar 13 12:08 5 -> /dev/pts/0
lrwx------. 1 root root 64 Mar 13 12:08 6 -> /dev/pts/0
lrwx------. 1 root root 64 Mar 13 12:08 7 -> /dev/pts/0
lrwx------. 1 root root 64 Mar 13 12:08 8 -> /dev/pts/0
lrwx------. 1 root root 64 Mar 13 12:08 9 -> /dev/pts/0

[root@new32 ~]# tty
/dev/pts/0

Looks like those /dev/pts/0 will be referenced in all child process and never will be used for anything useful.

Revision history for this message
Kun Huang (academicgareth) wrote :
Download full text (3.5 KiB)

I think I understand the waste of sockets. If bad codes open a socket for reading or writing without close, here leave some useless socket fd here.
But in your case, I want to understand what the kind of bad codes is. I restart swift-init, but find something seems clean.
I restart swif with swift-init main start -n. With a pstree here, (worker is default 1):
─tmux(55319)─┬─bash(55411)───keystone-all(64641)
        │ └─bash(62276)───swift-init(47727)─┬─swift-proxy-ser(47728)───swift-proxy-ser(47800)
        │ ├─swift-container(47729)───swift-container(47798)
        │ ├─swift-container(47730)───swift-container(47797)
        │ ├─swift-container(47731)───swift-container(47799)
        │ ├─swift-container(47732)───swift-container(47790)
        │ ├─swift-account-s(47733)───swift-account-s(47782)
        │ ├─swift-account-s(47734)───swift-account-s(47768)
        │ ├─swift-account-s(47735)───swift-account-s(47805)
        │ ├─swift-account-s(47736)───swift-account-s(47793)
        │ ├─swift-object-se(47737)───swift-object-se(47788)
        │ ├─swift-object-se(47738)───swift-object-se(47770)
        │ ├─swift-object-se(47739)───swift-object-se(47796)
        │ └─swift-object-se(47740)───swift-object-se(47787)
Use process of proxy-server as example:
swift-init's fd:
lrwx------ 1 swift swift 64 Mar 15 13:35 0 -> /dev/pts/5
lrwx------ 1 swift swift 64 Mar 15 13:35 1 -> /dev/pts/5
lrwx------ 1 swift swift 64 Mar 15 13:35 2 -> /dev/pts/5
proxy-server's fd:
lrwx------ 1 swift swift 64 Mar 15 13:37 0 -> /dev/null
lrwx------ 1 swift swift 64 Mar 15 13:37 1 -> /dev/null
lrwx------ 1 swift swift 64 Mar 15 13:37 2 -> /dev/pts/5
lrwx------ 1 swift swift 64 Mar 15 13:37 3 -> socket:[456183]
lrwx------ 1 swift swift 64 Mar 15 13:37 4 -> socket:[456199]
lrwx------ 1 swift swift 64 Mar 15 13:37 5 -> socket:[456290]
lrwx------ 1 swift swift 64 Mar 15 13:37 6 -> socket:[456291]
lrwx------ 1 swift swift 64 Mar 15 13:37 7 -> socket:[456292]
fd of proxy's worker;
lrwx------ 1 swift swift 64 Mar 15 13:37 0 -> /dev/null
lrwx------ 1 swift swift 64 Mar 15 13:37 1 -> /dev/null
lrwx------ 1 swift swift 64 Mar 15 13:37 2 -> /dev/pts/5
lrwx------ 1 swift swift 64 Mar 15...

Read more...

Revision history for this message
Attila Fazekas (afazekas) wrote :

For example in a devstack installation you can see the FD leaking issue. https://review.openstack.org/#/c/19145/

In your above example if the proxy-server wants to write anything to the stderr(fd=2) it will try to write to a pseudo terminal (/dev/pts/5).
Another possible issue is your ssh session which created the pty does not exists anymore, so nobody will see the messages and the used pty number and the associated resources will not be reclaimed by the system.

The usual advice for handling the inherited file-descriptors in the early init phase :
File descriptor 0-2: should be replaced by another safe fd like /dev/null . The related syscalls are: close(2), dup(2) , dup2(2) , dup3(2)
(Another possibility is redirecting them to a log file or just closing them)
File Descriptor 3-<max_fd> : should be closed:
The traditional solution is blindly call close(2) until getdtablesize(). Typically you have just 1024 File descriptor, but it could be increased to a higher number. AFAIK on Linux you do not have a well-known library call for listing the current process's open file descriptors, but you can do something similar to 'ls -l /proc/self/fd'.

You do not need to repeat these action after every fork, you juts need to do it the "main"/"mother"/"parent" swift-{proxy,container,account,object} process as early as possible.

Revision history for this message
John Dickinson (notmyname) wrote :

This report is 2 years old and needs confirmation/verification.

Changed in swift:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Object Storage (swift) because there has been no activity for 60 days.]

Changed in swift:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.