I just added a new artful version to the ppa, version 4.13.0-38.43+hf1711407v20180413b1; it's building right now. It should avoid the hang just like the last version from yesterday; but the last one blocked freeing any future net namespaces after one hangs; this one fixes that so even after a net namespace hangs, future ones will correctly clean up and get freed.
I also added debug to this new build that will show each net namespace alloc and free, like:
[ 207.625857] net_namespace: alloced net ffff8d2eb3cae000
...
[ 208.152882] net_namespace: freeing net ffff8d2eb3cae000
and it has the same dbg during reproduction of this bug, where it hangs, like:
[ 1363.377355] unregister_netdevice: possible namespace leak (netns ffff96ddf1c10000), have been waiting for 131032 seconds
[ 1363.377393] (netns ffff96ddf1c10000): open sk ffff96ddf1e77380 family 2 state 4 dst_cache (null) rx_dst ffff96ddf532d800 creator generic_ip_connect+0x257/0x4f0 [cifs]
[ 1363.377406] (netns ffff96ddf1c10000): dst ffff96ddf532d800 expires 0 error 0 obsolete 2 flags 0 refcnt 1 ops ipv4_dst_ops+0x0/0xc0 creator ip_route_input_rcu+0x6b3/0xd80
If you're able to test with today's build, let me know if you are able to reproduce the problem (see the 'namespace leak' message) and if it has worked around the container hang correctly. Thanks!
Jiri,
I just added a new artful version to the ppa, version 4.13.0- 38.43+hf1711407 v20180413b1; it's building right now. It should avoid the hang just like the last version from yesterday; but the last one blocked freeing any future net namespaces after one hangs; this one fixes that so even after a net namespace hangs, future ones will correctly clean up and get freed.
I also added debug to this new build that will show each net namespace alloc and free, like:
[ 207.625857] net_namespace: alloced net ffff8d2eb3cae000
...
[ 208.152882] net_namespace: freeing net ffff8d2eb3cae000
and it has the same dbg during reproduction of this bug, where it hangs, like:
[ 1363.377355] unregister_ netdevice: possible namespace leak (netns ffff96ddf1c10000), have been waiting for 131032 seconds ip_connect+ 0x257/0x4f0 [cifs] ops+0x0/ 0xc0 creator ip_route_ input_rcu+ 0x6b3/0xd80
[ 1363.377393] (netns ffff96ddf1c10000): open sk ffff96ddf1e77380 family 2 state 4 dst_cache (null) rx_dst ffff96ddf532d800 creator generic_
[ 1363.377406] (netns ffff96ddf1c10000): dst ffff96ddf532d800 expires 0 error 0 obsolete 2 flags 0 refcnt 1 ops ipv4_dst_
If you're able to test with today's build, let me know if you are able to reproduce the problem (see the 'namespace leak' message) and if it has worked around the container hang correctly. Thanks!