Not able to ssh to amphora or curl the vip after rebooting

Bug #1517290 reported by Banashankar on 2015-11-18
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
octavia
Expired
High
Unassigned

Bug Description

Steps to reproduce
1. Create a Load balancer.
2. SSH to it ssh -i /etc/octavia/.ssh/octavia_ssh_key -l ubuntu Amphora_IP (Should be able to ssh)
3. Do a sudo reboot.
4. Once amphora comes back try to do ssh.

I was not able to ssh and did not see any errors. Security groups have ssh enabled. Some times I was able to ssh to it.

[update1]
What I observed is, amphora receives the syn but it's not sending back the syn ack

[update2]
Test environment: Vagrant, Ubuntu 14.04, local.conf : https://gist.github.com/banveerad/2c30233a07bf17fd9ea1

Works fine from the router namespace.

Banashankar (bkalebe) on 2015-11-18
description: updated
Banashankar (bkalebe) on 2015-11-19
description: updated
Banashankar (bkalebe) on 2015-11-20
summary: - Not able to ssh to amphora after rebooting
+ Not able to ssh to amphora or curl the vip after rebooting
description: updated
Changed in octavia:
importance: Undecided → High
Michael Johnson (johnsom) wrote :

My guess is the route table is not persisting across the reboot.

Changed in octavia:
assignee: nobody → Michael Johnson (johnsom)
Stephen Balukoff (sbalukoff) wrote :

Do we need to support rebooting amphorae? It seems to me that if they're not running, they should just be destroyed. Is there any reason we'd actually reboot an amphora in production instead of just destroying and recreating it?

Changed in octavia:
assignee: Michael Johnson (johnsom) → nobody
Eran Raichstein (eranra) wrote :

This happened to me today:

1. I was able to continue and ping after reboot
2. SSH fails after reboot with "Connection refused"

TCP DUMP (after reboot) ... the Amphora is at 192.168.0.4

[Interface:o-hm0] 13:30:22.419964 IP 192.168.0.3.61021 > 192.168.0.4.22: Flags [S], seq 3886653544, win 28200, options [mss 1410,nop,wscale 4], length 0
[Interface:o-hm0] 13:30:22.420464 IP 192.168.0.4.22 > 192.168.0.3.61021: Flags [R.], seq 0, ack 3886653545, win 0, length 0
[Interface:qbr5cb295ee-3b] 13:30:22.420151 IP 192.168.0.3.61021 > 192.168.0.4.22: Flags [S], seq 3886653544, win 28200, options [mss 1410,nop,wscale 4], length 0
[Interface:qbr5cb295ee-3b] 13:30:22.420388 IP 192.168.0.4.22 > 192.168.0.3.61021: Flags [R.], seq 0, ack 3886653545, win 0, length 0
[Interface:qvo5cb295ee-3b] 13:30:22.420145 IP 192.168.0.3.61021 > 192.168.0.4.22: Flags [S], seq 3886653544, win 28200, options [mss 1410,nop,wscale 4], length 0
[Interface:qvo5cb295ee-3b] 13:30:22.420412 IP 192.168.0.4.22 > 192.168.0.3.61021: Flags [R.], seq 0, ack 3886653545, win 0, length 0
[Interface:qvb5cb295ee-3b] 13:30:22.420151 IP 192.168.0.3.61021 > 192.168.0.4.22: Flags [S], seq 3886653544, win 28200, options [mss 1410,nop,wscale 4], length 0
[Interface:tap5cb295ee-3b] 13:30:22.420204 IP 192.168.0.3.61021 > 192.168.0.4.22: Flags [S], seq 3886653544, win 28200, options [mss 1410,nop,wscale 4], length 0
[Interface:tap5cb295ee-3b] 13:30:22.420388 IP 192.168.0.4.22 > 192.168.0.3.61021: Flags [R.], seq 0, ack 3886653545, win 0, length 0
[Interface:qvb5cb295ee-3b] 13:30:22.420411 IP 192.168.0.4.22 > 192.168.0.3.61021: Flags [R.], seq 0, ack 3886653545, win 0, length 0
[Interface:qvo5cb295ee-3b] 13:30:24.823043 IP 192.168.0.4.22299 > 192.168.0.3.5555: UDP, length 110
[Interface:qbr5cb295ee-3b] 13:30:24.822995 IP 192.168.0.4.22299 > 192.168.0.3.5555: UDP, length 110
[Interface:o-hm0] 13:30:24.823205 IP 192.168.0.4.22299 > 192.168.0.3.5555: UDP, length 110

Eran Raichstein (eranra) wrote :

[update] I waited about 5 more minutes .... and after waiting I was able to SSH into the machine
It looks like transient behavior.

Bharath (bharathm) wrote :

If its working from the router namespace, it's likely the routing table persistency issue across reboots. I observed sometimes the routing entries to private networks get wiped out after a devstack reboot.

Also as Stephen pointed out, if amphora has some issues/delay firing up after a reboot, it would trigger failover anyway .

Adam Harwell (adam-harwell) wrote :

I want to mark this as Invalid or Wontfix, but in the spirit of open discussion I've opted for Opinion instead. Basically, as has been mentioned, Amphorae are not intended to be rebooted. Anything that would require a reboot would be better served by just failing over to a newly created Amphora (at least per our original design). If that's something we want to revisit, should we discuss this during next week's meeting?

Changed in octavia:
status: New → Opinion
Adam Harwell (adam-harwell) wrote :

I added it to the agenda so we won't lose track of this question.

Adam Harwell (adam-harwell) wrote :

The result of additional discussion was an action item to re-evaluate this bug and determine whether it is still valid, prior to making a firm decision about supporting reboots or not.

Michael Johnson (johnsom) wrote :

I have tested this under devstack. By shutting down health manager (so a failover does not start), logging into the amphora, issuing a 'sudo reboot', I see the amphora come back up and continue to service load balancing requests. I am also able to ssh back into the instance.

Can you please re-test and provide more details? Are you running on a host with virtualization/nested virtualization available for nova?

Changed in octavia:
status: Opinion → Incomplete
Launchpad Janitor (janitor) wrote :

[Expired for octavia because there has been no activity for 60 days.]

Changed in octavia:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers