Can't login to node on Sahara cluster from queens

Bug #1763241 reported by Zhuang Changkun
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Sahara
Invalid
Undecided
Unassigned

Bug Description

I use json file to create cluster from queens.But the cluster alway stays in "waiting" state. When I check log, see:
2018-04-11 19:15:45.572 17182 DEBUG sahara.utils.ssh_remote [req-e4c4ba37-a1f2-447e-843e-430fa1b0ebdb dc765092cbf54cf1a387fcc1daf2460d 89caddb625934909b096895bfe3eff4d - - -] [instance: 02326501-8160-47b7-a750-b08efefc2984, cluster: 5e508219-aadf-409f-8727-6bd9417ebe40] "Executing "ls .ssh/authorized_keys"" took 129.8 seconds to complete _log_command /usr/lib/python2.7/site-packages/sahara/utils/ssh_remote.py:932
2018-04-11 19:15:45.573 17182 DEBUG sahara.service.engine [req-e4c4ba37-a1f2-447e-843e-430fa1b0ebdb dc765092cbf54cf1a387fcc1daf2460d 89caddb625934909b096895bfe3eff4d - - -] [instance: none, cluster: 5e508219-aadf-409f-8727-6bd9417ebe40] Can't login to node, IP: 172.24.4.12, reason error: [Errno 110] Connection timed out
Error ID: 18b133a4-0495-4888-85fc-219351fca325 _is_accessible /usr/lib/python2.7/site-packages/sahara/service/engine.py:130
2018-04-11 19:15:45.581 17182 DEBUG sahara.utils.ssh_remote [req-e4c4ba37-a1f2-447e-843e-430fa1b0ebdb dc765092cbf54cf1a387fcc1daf2460d 89caddb625934909b096895bfe3eff4d - - -] [instance: 02326501-8160-47b7-a750-b08efefc2984, cluster: 5e508219-aadf-409f-8727-6bd9417ebe40] "Executing "ls .ssh/authorized_keys"" took 129.8 seconds to complete _log_command /usr/lib/python2.7/site-packages/sahara/utils/ssh_remote.py:932
2018-04-11 19:15:45.581 17182 DEBUG sahara.service.engine [req-e4c4ba37-a1f2-447e-843e-430fa1b0ebdb dc765092cbf54cf1a387fcc1daf2460d 89caddb625934909b096895bfe3eff4d - - -] [instance: 02326501-8160-47b7-a750-b08efefc2984, cluster: 5e508219-aadf-409f-8727-6bd9417ebe40] Can't login to node, IP: 172.24.4.9, reason error: [Errno 110] Connection timed out

and the end of log:

Error ID: 4cd00335-ed6b-4e9c-8195-4dd9af0794b5): ThreadException: An error occurred in thread 'wait-for-ssh-hello-worker-0': 'Operation with name 'Wait for instance accessibility'' timed out after 10800 second(s) and following timeout was violated: wait_until_accessible
Error ID: 07f83f07-62ba-4100-9163-5788b2edaf34
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/sahara/context.py", line 167, in _wrapper
    func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/sahara/utils/cluster_progress_ops.py", line 139, in handler
    add_fail_event(instance, e)
  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
    self.force_reraise()
  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
    six.reraise(self.type_, self.value, self.tb)
  File "/usr/lib/python2.7/site-packages/sahara/utils/cluster_progress_ops.py", line 136, in handler
    value = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/sahara/service/engine.py", line 137, in _wait_until_accessible
    self._is_accessible(instance)
  File "/usr/lib/python2.7/site-packages/sahara/utils/poll_utils.py", line 161, in handler
    poll(**poll_description)
  File "/usr/lib/python2.7/site-packages/sahara/utils/poll_utils.py", line 125, in poll
    raise ex.TimeoutException(timeout, operation_name, timeout_name)
TimeoutException: 'Operation with name 'Wait for instance accessibility'' timed out after 10800 second(s) and following timeout was violated: wait_until_accessible
Error ID: 07f83f07-62ba-4100-9163-5788b2edaf34

------------------------------------------------------------------------------------------------

the json file:
{
    "name": "hello",
    "plugin_name": "vanilla",
    "hadoop_version": "2.7.1",
    "default_image_id": "ee772c70-1523-411c-bff1-5e5afc28808d",
    "node_groups": [
        {
            "name": "master",
            "node_processes":
            [
                "namenode",
                "resourcemanager"
            ],
            "flavor_id": "2",
            "floating_ip_pool": "1883e48a-2b46-42ad-b5ba-32fe2a89ef43",
            "use_autoconfig": true,
            "count": 1
        },
        {
            "name": "worker",
            "node_processes":
            [
                "nodemanager",
                "datanode"
            ],
            "flavor_id": "2",
            "floating_ip_pool": "1883e48a-2b46-42ad-b5ba-32fe2a89ef43",
            "use_autoconfig": true,
            "count": 2
        }
    ],
    "neutron_management_network": "aedc90d6-cd9f-4241-91ad-18a3aacbca93",
    "user_keypair_id": "data-keypair"
}

Zhuang Changkun (zchkun)
affects: fuel-plugin-contrail → sahara
Revision history for this message
Luigi Toscano (ltoscano) wrote :

We don't track bugs on launchpad anymore: please report it on storyboard.openstack.org

That said, 99% is a configuration issue: sahara tries to reach the nodes through "1883e48a-2b46-42ad-b5ba-32fe2a89ef43", and for some reason it does not work (is 172.24.4.12 associated to the floating ip pool?)

Changed in sahara:
status: New → Invalid
Revision history for this message
Zhuang Changkun (zchkun) wrote :

The problem is done, I forget to open the security group, that I will abandon the bug.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.