docker network connection only working after daemon restart

Bug #1509867 reported by Viktor Pal
38
This bug affects 7 people
Affects Status Importance Assigned to Milestone
docker.io (Ubuntu)
Confirmed
Medium
Kick In

Bug Description

I have just upgraded to Ubuntu 15.10 and realized when I try to ssh into my container I get the following error message:
ssh_exchange_identification: read: Connection reset by peer
Sometimes I get this message immediately sometimes it takes longer.

It looks like this using telenet:
#####################
$ telnet 127.0.0.1 55555
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.

Connection closed by foreign host.
#####################
No ssh banner.

This is resolved by restarting the docker daemon and starting the container again.
This is completely reproducible I just have to restart my desktop and after the restart start the container and ssh connection always fails and after restarting the daemon it always succeeds.

Just restarting the container doesn't help, I have to restart the daemon to solve the problem.

ProblemType: Bug
DistroRelease: Ubuntu 15.10
Package: docker.io 1.6.2~dfsg1-1ubuntu4
ProcVersionSignature: Ubuntu 4.2.0-16.19-generic 4.2.3
Uname: Linux 4.2.0-16-generic x86_64
ApportVersion: 2.19.1-0ubuntu3
Architecture: amd64
CurrentDesktop: GNOME-Flashback:Unity
Date: Sun Oct 25 21:52:25 2015
SourcePackage: docker.io
UpgradeStatus: Upgraded to wily on 2015-10-22 (2 days ago)

Revision history for this message
Viktor Pal (deere) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in docker.io (Ubuntu):
status: New → Confirmed
Viktor Pal (deere)
description: updated
Revision history for this message
Dennis Straffin (dbstraffin) wrote :

It seems to be a networking issue as I am unable to connect to a forwarded port until I reload the daemon.

For example if you run `sudo docker run -it -p 1080:1080 -P jamesdbloom/mockserver` and then try to connect to http://localhost:1080 in your browser, you should get a 404 (MockServer's default response) and see a log entry appear in the MockServer interactive session. But instead it fails to connect and you don't see and log entries.

Kick In (kick-d)
Changed in docker.io (Ubuntu):
assignee: nobody → Kick In (kick-d)
Revision history for this message
Kick In (kick-d) wrote :

Hi,

Thanks for your report.

I tried to reproduce on a minimal ubuntu install, steps I've done yet:

1- install docker on vivid host (1.5.0, for now), start an openssh-server container
2-so-release-upgrade of the vivid host to wily host (upgraded to docker.io-1.6.2)
3-after reboot start container, I can login into the container (after a proper docker port to get the right port to connect to).
4-reboot again, and start and try to connect to the container: working.

Will try with redirects, and if I can't reproduce with full-blown ubuntu.
You did the upgrade from vivid to wily?

Revision history for this message
Viktor Pal (deere) wrote :

Yep, upgrade was done from vivid to wily.
Can you point me to some tools to discover the state of namespace networks so I can do some investigation myself?
From what I have seen iptables rules are in place and the docker0 interface exists when network connection fails.
I was thinking that maybe docker is started somehow too early and some of the resources needed are not available that would be needed for it to start properly.
I have EXPOSE 2222 in my Dockerfile and the run the container with -p 2222:55555.

Revision history for this message
Kick In (kick-d) wrote :

Hi,

I could reproduce with a full blown ubuntu, I'm trying to find the issue.

With docker inspect, you can get the Pid of the container you want to debug on.
Then as root you create if it doesn't exist /var/run/netns, then you link /proc/$pid/ns/net /var/run/netns/container_name.

Then you can issue commands like ip netns exec container_name ip addr, ip netns exec container_name ping etc...

I'm still looking at what is creating this behavior.

Changed in docker.io (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Viktor Pal (deere) wrote :

It seems that the issue is that the route to the docker network namespace does not exist after the OS starts up, so this rule is missing on my desktop:
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0

Because this is missing packages that would target the container go instead of the NS network to the default gateway.

The problem might be that docker first tries to add the bridge and then the route.
When docker first starts it creates the docker0 interface and this does not get deleted when you stop and start or restart docker.
But if you stop docker and explicitly remove the interface and the bridge you will see that the route is not added on startup.

To reproduce.
1.) Restart machine.
2.) After the OS restart docker network is not working (route is missing).
3.) Restart docker service.
4.) Docker network is working again.
5.) Stop docker service
6.) Run: ip link set docker0 down; brctl delbr docker0; brctl show; ip link show
7.) Start docker.
8.) After docker start docker network is not working (route is missing).
9.) Restart docker.
10.) Route gets added and everything is fine.

This is what I have been able to figure out so far.

Revision history for this message
Viktor Pal (deere) wrote :

I have no idea what happened but I can't reproduce this any more.
I checked my apt logs and there is no sign that the docker.io package was updated.
Maybe it was caused by some other bug in another package that was fixed meanwhile.

Can anyone else who could reproduce this confirm that this isn't reproducible any more?

Revision history for this message
Nagy Ferenc László (nfl) wrote :

I still have the problem.

Revision history for this message
Nagy Ferenc László (nfl) wrote :

Seems to be fixed in 16.04 beta.

Revision history for this message
Глеб Майоров (gmaiorov) wrote :

Hello.
Same behavior as Victor Pal described in #7 appears on Debian Jessie laptop with docker-engine from https://apt.dockerproject.org/repo.
Found that in my case the problem appears only if exists a /etc/network/interfaces entry like 'iface docker0 inet manual' that I used to prevent Network-Manager from interfering with docker0 interface.
Removing entry from /etc/network/interfaces solved problem for me, and route for docker0 is now added as it should.

Is there anyone that used the same entry?

Revision history for this message
ai_ja_nai (albertomassidda) wrote :

It is affecting package docker-engine now, since yesterday on my pc. Same symptoms (docker networking works only for a single shot after each restart), no /etc/network/interfaces entry.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.