Comment 0 for bug 2006759

Revision history for this message
Rodrigo Barbieri (rodrigo-barbieri2010) wrote :

On a fresh jammy 3-unit deployment using charm revision 39 of 8.0/stable channel, where I had attempted to remove an instance (hit bug LP#1954306) and was trying to add it back, the removed instance eventually got stuck in the following state:

{"address": "10.5.3.85:3306",
      "instanceErrors": ["ERROR: GR Recovery channel receiver stopped with an error:
      Fatal error: Invalid (empty) username when attempting to connect to the master
      server. Connection attempt terminated. (13117) at 2023-02-09 15:42:58.656640"],
      "mode": "R/O", "readReplicas": {}, "recovery": {"receiverError": "Fatal error:
      Invalid (empty) username when attempting to connect to the master server. Connection
      attempt terminated.", "receiverErrorNumber": 13117, "state": "CONNECTION_ERROR"}

now, attempting to remove this instance from the cluster results in error:

output: "Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory\n\e[31mERROR:
      \e[0m10.5.3.85:3306 is reachable but has state ERROR\nTo safely remove it from
      the cluster, it must be brought back ONLINE. If not possible, use the 'force'
      option to remove it anyway.\nTraceback (most recent call last):\n File \"<string>\",
      line 3, in <module>\nmysqlsh.Error: Shell Error (51004): Cluster.remove_instance:
      Instance is not ONLINE and cannot be safely removed\n"

I retried this on a fresh new deployment where the instance is not in an error state, after trying to remove the instance and hitting bug LP#1954306 again, where now the instance is offline, the error message on force=true is:

    output: "Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory\n\e[36mNOTE:
      \e[0m10.5.2.0:3306 is reachable but has state OFFLINE\nThe instance will be
      removed from the InnoDB cluster. Depending on the instance\nbeing the Seed or
      not, the Metadata session might become invalid. If so, please\nstart a new session
      to the Metadata Storage R/W instance.\n\nmysqlsh: /build/mysql-shell/parts/mysql-shell/src/modules/adminapi/cluster/cluster_impl.cc:1831:
      std::tuple<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>
      >, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>
      >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>
      > > >, bool> mysqlsh::dba::Cluster_impl::get_replication_user(const mysqlshdk::mysql::IInstance&)
      const: Assertion `!recovery_user.empty()' failed.\n"

The only alternative to finally remove the instance is to ssh to the instance, stop the mysql service, and then retry with force=true.

Also, the only situation where force=true works, is if the instance is fine (has ONLINE status), however, in this case force=true should not be needed, and it hits bug LP#1983158

I suppose that force=true should force the removal regardless of the instance's state and would not fail, and should not need the mysql service to be offline. If this is expected, then this should be documented in the action description for the force parameter.