Not able to ssh to amphora or curl the vip after rebooting
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| octavia |
Expired
|
High
|
Unassigned |
Bug Description
Steps to reproduce
1. Create a Load balancer.
2. SSH to it ssh -i /etc/octavia/
3. Do a sudo reboot.
4. Once amphora comes back try to do ssh.
I was not able to ssh and did not see any errors. Security groups have ssh enabled. Some times I was able to ssh to it.
[update1]
What I observed is, amphora receives the syn but it's not sending back the syn ack
[update2]
Test environment: Vagrant, Ubuntu 14.04, local.conf : https:/
Works fine from the router namespace.
description: | updated |
description: | updated |
summary: |
- Not able to ssh to amphora after rebooting + Not able to ssh to amphora or curl the vip after rebooting |
description: | updated |
Changed in octavia: | |
importance: | Undecided → High |
Michael Johnson (johnsom) wrote : | #1 |
Changed in octavia: | |
assignee: | nobody → Michael Johnson (johnsom) |
Stephen Balukoff (sbalukoff) wrote : | #2 |
Do we need to support rebooting amphorae? It seems to me that if they're not running, they should just be destroyed. Is there any reason we'd actually reboot an amphora in production instead of just destroying and recreating it?
Changed in octavia: | |
assignee: | Michael Johnson (johnsom) → nobody |
Eran Raichstein (eranra) wrote : | #3 |
This happened to me today:
1. I was able to continue and ping after reboot
2. SSH fails after reboot with "Connection refused"
TCP DUMP (after reboot) ... the Amphora is at 192.168.0.4
[Interface:o-hm0] 13:30:22.419964 IP 192.168.0.3.61021 > 192.168.0.4.22: Flags [S], seq 3886653544, win 28200, options [mss 1410,nop,wscale 4], length 0
[Interface:o-hm0] 13:30:22.420464 IP 192.168.0.4.22 > 192.168.0.3.61021: Flags [R.], seq 0, ack 3886653545, win 0, length 0
[Interface:
[Interface:
[Interface:
[Interface:
[Interface:
[Interface:
[Interface:
[Interface:
[Interface:
[Interface:
[Interface:o-hm0] 13:30:24.823205 IP 192.168.0.4.22299 > 192.168.0.3.5555: UDP, length 110
Eran Raichstein (eranra) wrote : | #4 |
[update] I waited about 5 more minutes .... and after waiting I was able to SSH into the machine
It looks like transient behavior.
Bharath (bharathm) wrote : | #5 |
If its working from the router namespace, it's likely the routing table persistency issue across reboots. I observed sometimes the routing entries to private networks get wiped out after a devstack reboot.
Also as Stephen pointed out, if amphora has some issues/delay firing up after a reboot, it would trigger failover anyway .
Adam Harwell (adam-harwell) wrote : | #6 |
I want to mark this as Invalid or Wontfix, but in the spirit of open discussion I've opted for Opinion instead. Basically, as has been mentioned, Amphorae are not intended to be rebooted. Anything that would require a reboot would be better served by just failing over to a newly created Amphora (at least per our original design). If that's something we want to revisit, should we discuss this during next week's meeting?
Changed in octavia: | |
status: | New → Opinion |
Adam Harwell (adam-harwell) wrote : | #7 |
I added it to the agenda so we won't lose track of this question.
Adam Harwell (adam-harwell) wrote : | #8 |
The result of additional discussion was an action item to re-evaluate this bug and determine whether it is still valid, prior to making a firm decision about supporting reboots or not.
Michael Johnson (johnsom) wrote : | #9 |
I have tested this under devstack. By shutting down health manager (so a failover does not start), logging into the amphora, issuing a 'sudo reboot', I see the amphora come back up and continue to service load balancing requests. I am also able to ssh back into the instance.
Can you please re-test and provide more details? Are you running on a host with virtualization/
Changed in octavia: | |
status: | Opinion → Incomplete |
Launchpad Janitor (janitor) wrote : | #10 |
[Expired for octavia because there has been no activity for 60 days.]
Changed in octavia: | |
status: | Incomplete → Expired |
My guess is the route table is not persisting across the reboot.