Juju HA upgrade 2.1.x -> 2.2.X never finish.
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Fix Released
|
Undecided
|
Unassigned | ||
2.1 |
Won't Fix
|
Undecided
|
Unassigned | ||
2.2 |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
Hi,
On a juju ha deployment, I tried to upgrade the model controller using juju upgrade
juju upgrade-juju --agent-version 2.2.4 -m controller
meanwhile controlling the traffic using iptables rules such as the following for the primary
node:
-A INPUT -s <juju_client_ip>/32 -p tcp -m tcp --dport 17070 -j ACCEPT
-A INPUT -s <controller_2>/32 -p tcp -m tcp --dport 17070 -j ACCEPT
-A INPUT -s <controller_3>/32 -p tcp -m tcp --dport 17070 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 17070 -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p tcp -m tcp --dport 17070 -j DROP
And similar for the rest of controllers:
-A INPUT -s <controller_1>/32 -p tcp -m tcp --dport 17070 -j ACCEPT
-A INPUT -s <controller_3>/32 -p tcp -m tcp --dport 17070 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 17070 -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p tcp -m tcp --dport 17070 -j DROP
-A INPUT -s <controller_2>/32 -p tcp -m tcp --dport 17070 -j ACCEPT
-A INPUT -s <controller_3>/32 -p tcp -m tcp --dport 17070 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 17070 -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p tcp -m tcp --dport 17070 -j DROP
Juju status will answer the client reporting all nodes are down, and no further
change will happen for nearly 12h. From the logs, it's possible to capture all controllers
are trying to reach each other regularly without success, as all answer telling they are in
upgrade state:
2017-09-16 04:03:44 DEBUG juju.apiserver request_
2017-09-16 04:03:44 DEBUG juju.apiserver request_
2017-09-16 04:03:44 DEBUG juju.apiserver request_
2017-09-16 04:03:44 DEBUG juju.apiserver request_
2017-09-16 04:03:44 DEBUG juju.apiserver request_
2017-09-16 04:03:44 DEBUG juju.apiserver request_
....
2017-09-16 04:03:44 DEBUG juju.apiserver request_
2017-09-16 04:03:44 DEBUG juju.apiserver request_
2017-09-16 04:03:44 DEBUG juju.apiserver request_
2017-09-16 04:03:44 DEBUG juju.apiserver request_
2017-09-16 04:03:44 DEBUG juju.apiserver request_
2017-09-16 04:03:44 DEBUG juju.apiserver request_
2017-09-16 04:03:44 DEBUG juju.apiserver request_
2017-09-16 04:03:44 DEBUG juju.apiserver request_
This last kind of output is observable from all controllers, and is not tied to the controllers
itself, units would be able to show up with same answer in the log. Please let me know any
needed output to push this forward.
Best regards.
José.
Changed in juju: | |
status: | New → Fix Released |
tags: | added: cpe-onsite |
This sounds like a repeat of /bugs.launchpad .net/bugs/ 1697956
https:/
which we believe is fixed in 2.2.1
Presumably a couple of your controllers have come up and are waiting for
the 3rd but it is failing because of the "upgrade in progress bug.
There is a manual fix for this if you just want things working, but we've
never managed to reproduce the failure directly, so it would be good to
know if we really did fix the underlying issue.
If you just want to restore the system, you can look in /var/lib/juju/tools
(I believe). There should be a symlink for the machine agent pointing
(currently) to the old agent version. Controllers wait for an indication
that all controllers are ready to move to the next upgrade state.
I'm hesitant to give off the top of my head advice in case I get something
wrong. Are you able to talk more later today?
John
=:->
On Sep 18, 2017 16:11, "José Pekkarinen" <email address hidden>
wrote:
> Public bug reported: notifier. go:93 [B3E5] API notifier. go:93 [B3E6] API notifier. go:145 <- id":1," type":" Admin", "version" :3,"request" :"Login" ,"params" :"'params notifier. go:106 [B3E6] notifier. go:171 -> id":1," error": "login for \"machine-9\" blocked ,"response" :"'body redacted'"} ...
>
> Hi,
>
> On a juju ha deployment, I tried to upgrade the model controller using
> juju upgrade
>
> juju upgrade-juju --agent-version 2.2.4 -m controller
>
> meanwhile controlling the traffic using iptables rules such as the
> following for the primary
> node:
>
> -A INPUT -s <juju_client_ip>/32 -p tcp -m tcp --dport 17070 -j ACCEPT
> -A INPUT -s <controller_2>/32 -p tcp -m tcp --dport 17070 -j ACCEPT
> -A INPUT -s <controller_3>/32 -p tcp -m tcp --dport 17070 -j ACCEPT
> -A INPUT -p tcp -m tcp --dport 17070 -m state --state RELATED,ESTABLISHED
> -j ACCEPT
> -A INPUT -p tcp -m tcp --dport 17070 -j DROP
>
> And similar for the rest of controllers:
>
> -A INPUT -s <controller_1>/32 -p tcp -m tcp --dport 17070 -j ACCEPT
> -A INPUT -s <controller_3>/32 -p tcp -m tcp --dport 17070 -j ACCEPT
> -A INPUT -p tcp -m tcp --dport 17070 -m state --state RELATED,ESTABLISHED
> -j ACCEPT
> -A INPUT -p tcp -m tcp --dport 17070 -j DROP
>
> -A INPUT -s <controller_2>/32 -p tcp -m tcp --dport 17070 -j ACCEPT
> -A INPUT -s <controller_3>/32 -p tcp -m tcp --dport 17070 -j ACCEPT
> -A INPUT -p tcp -m tcp --dport 17070 -m state --state RELATED,ESTABLISHED
> -j ACCEPT
> -A INPUT -p tcp -m tcp --dport 17070 -j DROP
>
> Juju status will answer the client reporting all nodes are down, and no
> further
> change will happen for nearly 12h. From the logs, it's possible to capture
> all controllers
> are trying to reach each other regularly without success, as all answer
> telling they are in
> upgrade state:
>
> 2017-09-16 04:03:44 DEBUG juju.apiserver request_
> connection from $controller3:34268
> 2017-09-16 04:03:44 DEBUG juju.apiserver request_
> connection from $controller3:52890
> 2017-09-16 04:03:44 DEBUG juju.apiserver request_
> [B3E5] {"request-
> redacted'"}
> 2017-09-16 04:03:44 DEBUG juju.apiserver request_
> API connection terminated after 785.69µs
> 2017-09-16 04:03:44 DEBUG juju.apiserver request_
> [B3E5] 263.965µs {"request-
> because upgrade in progress"