Thomas Hood wrote:

>On Wed, 2004-07-07 at 12:09, Helge Hafting wrote:
>  
>
>>My misunderstanding then.  I tried this, and put
>>"echo" statement between the others.  I found that
>>the initial "ifdown lo"  hangs, so the rest does not happen.
>>    
>>
>
>
>Please run ifdown with "-v" and send the output.
>--
>Thomas
>  
>
 Timing for a bad strace ifdown lo:
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 67.44    0.017983        8992         2           waitpid
  5.00    0.001333           9       147        21 open
  4.29    0.001143         229         5           execve
  3.66    0.000976           8       118           read
  3.53    0.000942           5       174           old_mmap
  2.56    0.000683           3       208           brk
  1.80    0.000481           4       135           close
  1.64    0.000437           6        75        72 access
  1.53    0.000409           3       120           fstat64
  1.49    0.000397           7        54           munmap
  1.28    0.000341           7        51        33 stat64
  1.08    0.000288          26        11        11 connect
  0.90    0.000240          17        14           socket
  0.65    0.000173          58         3           clone
  0.62    0.000166           5        36           mmap2
  0.42    0.000111           3        33           rt_sigaction
  0.38    0.000100           4        28           fcntl64
  0.32    0.000084           4        20           rt_sigprocmask
  0.22    0.000058           4        13           uname
  0.17    0.000046           6         8           getdents64
  0.14    0.000036           4         9           getpid
  0.10    0.000026           5         5           time
  0.10    0.000026           5         5           _llseek
  0.09    0.000025          13         2           ioctl
  0.08    0.000021           4         6           set_thread_area
  0.08    0.000021          11         2           shutdown
  0.06    0.000017           3         6           geteuid32
  0.06    0.000017           6         3           setsockopt
  0.06    0.000015           5         3           gettimeofday
  0.06    0.000015           5         3           getcwd
  0.04    0.000010           3         3           getrlimit
  0.03    0.000009           3         3           getuid32
  0.03    0.000008           3         3           getppid
  0.03    0.000008           3         3           getgid32
  0.03    0.000008           3         3           getegid32
  0.02    0.000006           3         2           getpgrp
  0.02    0.000004           2         2           select
  0.01    0.000003           3         1           umask
------ ----------- ----------- --------- --------- ----------------
100.00    0.026666                  1319       137 total

lots of time spent in waitpid?

Also attached a strace -T -f (not the same run)
with a slow-running "ifdown lo"
It eventually completed, but running "ifdown lo" in an xterm
(not runlevel 1) is much faster, it completes in less than a second.

Taking the machine down to runlevel 1 means ifdown
will be slow, using half a minute to complete or even
take hours without completing.

ifdown ran process 2299 which spent 29.99s in a select()
that timed out.  Wich is why the entire process 2299 took 30s.
Process 2299 is run-parts /etc/network/if-post-d.

Here is the part with the timeout:
[pid  2299] socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 5 <0.000028>
[pid  2299] setsockopt(5, SOL_TCP, TCP_NODELAY, [1], 4) = 0 <0.000013>
[pid  2299] fcntl64(5, F_GETFL)         = 0x2 (flags O_RDWR) <0.000011>
[pid  2299] fcntl64(5, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000010>
[pid  2299] connect(5, {sa_family=AF_INET, sin_port=htons(389), 
sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in 
progress) <0.000085>
[pid  2299] select(1024, NULL, [5], NULL, {30, 0}) = 0 (Timeout) <29.994926>
[pid  2299] shutdown(5, 2 /* send and receive */) = 0 <0.000016>
[pid  2299] close(5)                    = 0 <0.000020>


I see.  It tries to connect to port 389, address 127.0.0.1.  Of course 
it times
out, because "lo" is down at this time.  Port 389 is the ldap server, which
I use for experimental user authentication. 
LDAP shuts down before the network goes down though.
Now I wonder - do "run-parts" use PAM in any way - even when the
directory turns out to be empty?  What for?

Should I file a bug against run-parts instead?  Or PAM? 
If these are "correct", then run-parts cannot be used after "lo"
goes down.  Or when ldap isn't up.

Helge Hafting


Helge Hafting