remove-instance force=true only works if mysql service is not running
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MySQL InnoDB Cluster Charm |
Triaged
|
High
|
Unassigned |
Bug Description
On a fresh jammy 3-unit deployment using charm revision 39 of 8.0/stable channel, where I had attempted to remove an instance (hit bug LP#1954306) and was trying to add it back, the removed instance eventually got stuck in the following state:
{"address": "10.5.3.85:3306",
"
Fatal error: Invalid (empty) username when attempting to connect to the master
server. Connection attempt terminated. (13117) at 2023-02-09 15:42:58.656640"],
"mode": "R/O", "readReplicas": {}, "recovery": {"receiverError": "Fatal error:
Invalid (empty) username when attempting to connect to the master server. Connection
attempt terminated.", "receiverErrorN
now, attempting to remove this instance from the cluster results in error:
output: "Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory\
\
the cluster, it must be brought back ONLINE. If not possible, use the 'force'
option to remove it anyway.\nTraceback (most recent call last):\n File \"<string>\",
line 3, in <module>
Instance is not ONLINE and cannot be safely removed\n"
I retried this on a fresh new deployment where the instance is not in an error state, after trying to remove the instance and hitting bug LP#1954306 again, where now the instance is offline, the error message on force=true is:
output: "Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory\
\
removed from the InnoDB cluster. Depending on the instance\nbeing the Seed or
not, the Metadata session might become invalid. If so, please\nstart a new session
to the Metadata Storage R/W instance.
std:
>, std::vector<
>, std::allocator<
> > >, bool> mysqlsh:
const: Assertion `!recovery_
The only alternative to finally remove the instance is to ssh to the instance, stop the mysql service, and then retry with force=true.
Also, the only situation where force=true works, is if the instance is fine (has ONLINE status), however, in this case force=true should not be needed, as it hits bug LP#1983158
I suppose that force=true should force the removal regardless of the instance's state and would not fail, and should not need the mysql service to be offline. If this is expected, then this should be documented in the action description for the force parameter.
Triaging to high; the charm action ought to do what it says it should do, even if that requires the action to perform additional checks beforehand. The problem is that the action is being run on the unit that isn't being removed, which means that it's sort of at the mercy of what mysql-shell is capable of w.r.t. other instances of mysql running on the other units. i.e. if the other unit's mysql isn't running and the mysql-shell command fails, then there's not a lot we can do (I suspect?)
However, perhaps we could be better at documenting what to do under which scenarios and ensure that we provide actions to resolve those situations.