Some containers never complete kolla_start

Bug #1672207 reported by Dave Walker
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
kolla-ansible
Fix Released
Critical
Dave Walker
Ocata
Fix Released
Critical
Dai Dang Van
Pike
Fix Released
Critical
Dave Walker

Bug Description

This issues seems to be 2 parts...

1) ssh is being attempted to port 22, rather than 8022
2) sshd in nova_ssh isn't running :/

2017-03-12 15:54:35.203 7 ERROR oslo_messaging.rpc.server ResizeError: Resize error: not able to execute ssh command: Unexpected error while running command.
2017-03-12 15:54:35.203 7 ERROR oslo_messaging.rpc.server Command: ssh -o BatchMode=yes 10.131.161.201 mkdir -p /var/lib/nova/instances/09b8e896-d87f-4ca0-a9dc-782ce9b79d49
2017-03-12 15:54:35.203 7 ERROR oslo_messaging.rpc.server Exit code: 255
2017-03-12 15:54:35.203 7 ERROR oslo_messaging.rpc.server Stdout: u''
2017-03-12 15:54:35.203 7 ERROR oslo_messaging.rpc.server Stderr: u'Host key verification failed.\r\n'

--- daidv ---
Swift also affect too when starting swift-object-server,...

Revision history for this message
Satya Sanjibani Routray (satroutr) wrote : Re: [Bug 1672207] [NEW] non-shared storage live migation failing

Make sure you have put the sshkey in the password.yml

Invalid bug

On 13-Mar-2017 1:40 AM, "Dave Walker" <email address hidden> wrote:

> Public bug reported:
>
> 2017-03-12 15:54:35.203 7 ERROR oslo_messaging.rpc.server ResizeError:
> Resize error: not able to execute ssh command: Unexpected error while
> running command.
> 2017-03-12 15:54:35.203 7 ERROR oslo_messaging.rpc.server Command: ssh -o
> BatchMode=yes 10.131.161.201 mkdir -p /var/lib/nova/instances/
> 09b8e896-d87f-4ca0-a9dc-782ce9b79d49
> 2017-03-12 15:54:35.203 7 ERROR oslo_messaging.rpc.server Exit code: 255
> 2017-03-12 15:54:35.203 7 ERROR oslo_messaging.rpc.server Stdout: u''
> 2017-03-12 15:54:35.203 7 ERROR oslo_messaging.rpc.server Stderr: u'Host
> key verification failed.\r\n'
>
> ** Affects: kolla
> Importance: Undecided
> Status: New
>
> --
> You received this bug notification because you are subscribed to kolla.
> https://bugs.launchpad.net/bugs/1672207
>
> Title:
> non-shared storage live migation failing
>
> Status in kolla:
> New
>
> Bug description:
> 2017-03-12 15:54:35.203 7 ERROR oslo_messaging.rpc.server ResizeError:
> Resize error: not able to execute ssh command: Unexpected error while
> running command.
> 2017-03-12 15:54:35.203 7 ERROR oslo_messaging.rpc.server Command: ssh
> -o BatchMode=yes 10.131.161.201 mkdir -p /var/lib/nova/instances/
> 09b8e896-d87f-4ca0-a9dc-782ce9b79d49
> 2017-03-12 15:54:35.203 7 ERROR oslo_messaging.rpc.server Exit code: 255
> 2017-03-12 15:54:35.203 7 ERROR oslo_messaging.rpc.server Stdout: u''
> 2017-03-12 15:54:35.203 7 ERROR oslo_messaging.rpc.server Stderr: u'Host
> key verification failed.\r\n'
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/kolla/+bug/1672207/+subscriptions
>

Changed in kolla:
status: New → Invalid
Dave Walker (davewalker)
description: updated
Revision history for this message
Dave Walker (davewalker) wrote : Re: non-shared storage live migation failing

@Satya, it certainly isn't that...

$ grep -i 'ssh-rsa' /etc/kolla/passwords.yml | wc -l
4

$ grep -i 'BEGIN RSA' /etc/kolla/passwords.yml | wc -l
4

Changed in kolla:
status: Invalid → New
Revision history for this message
Dave Walker (davewalker) wrote :

I'd expect to see sshd running here:

$ sudo docker exec -ti nova_ssh bash
(nova-ssh)[root@compute1 /]# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 17:51 ? 00:00:00 /usr/local/bin/dumb-init /bin/bash /usr/local/bin/kolla_start
root 7 1 0 17:51 ? 00:00:00 /bin/bash /usr/local/bin/kolla_start
root 4226 0 1 19:00 ? 00:00:00 bash
root 4251 7 0 19:00 ? 00:00:00 sleep 1

I'd also expect to see key files here:
(nova-ssh)[root@compute1 ~]# ls /etc/ssh/
moduli ssh_config sshd_config

This implies the extended start hasn't run... as running it by hand:
(nova-ssh)[root@compute1 ~]# /usr/local/bin/kolla_extend_start

Creates the ssh keys:
(nova-ssh)[root@compute1 ~]# ls /etc/ssh/
moduli ssh_config sshd_config ssh_host_dsa_key ssh_host_dsa_key.pub ssh_host_ecdsa_key ssh_host_ecdsa_key.pub ssh_host_ed25519_key ssh_host_ed25519_key.pub ssh_host_rsa_key ssh_host_rsa_key.pub

The main kolla_start script should have created /run_command which is also absent. Running kolla_set_configs by hand does create /run_command and also populates ssh configs.

Something is certainly up here..

Revision history for this message
Dave Walker (davewalker) wrote :

Ah, this is never satisfied... Stuck in a sleep loop. Seems the bug is really related to not using central logging.

/usr/local/bin/kolla_start:

...

# Wait for the log socket
if [[ ! "${!SKIP_LOG_SETUP[@]}" && -e /var/lib/kolla/heka ]]; then
    while [[ ! -S /var/lib/kolla/heka/log ]]; do
        sleep 1
    done
fi

...

Revision history for this message
Satya Sanjibani Routray (satroutr) wrote : Re: [Bug 1672207] Re: non-shared storage live migation failing

It should be the sshkey gen generated proper content of idrsa.pub and idrsa
not 4 and 4

On 13-Mar-2017 3:10 AM, "Dave Walker" <email address hidden> wrote:

> @Satya, it certainly isn't that...
>
> $ grep -i 'ssh-rsa' /etc/kolla/passwords.yml | wc -l
> 4
>
> $ grep -i 'BEGIN RSA' /etc/kolla/passwords.yml | wc -l
> 4
>
> ** Changed in: kolla
> Status: Invalid => New
>
> --
> You received this bug notification because you are subscribed to kolla.
> https://bugs.launchpad.net/bugs/1672207
>
> Title:
> non-shared storage live migation failing
>
> Status in kolla:
> New
>
> Bug description:
> This issues seems to be 2 parts...
>
> 1) ssh is being attempted to port 22, rather than 8022
> 2) sshd in nova_ssh isn't running :/
>
> 2017-03-12 15:54:35.203 7 ERROR oslo_messaging.rpc.server ResizeError:
> Resize error: not able to execute ssh command: Unexpected error while
> running command.
> 2017-03-12 15:54:35.203 7 ERROR oslo_messaging.rpc.server Command: ssh
> -o BatchMode=yes 10.131.161.201 mkdir -p /var/lib/nova/instances/
> 09b8e896-d87f-4ca0-a9dc-782ce9b79d49
> 2017-03-12 15:54:35.203 7 ERROR oslo_messaging.rpc.server Exit code: 255
> 2017-03-12 15:54:35.203 7 ERROR oslo_messaging.rpc.server Stdout: u''
> 2017-03-12 15:54:35.203 7 ERROR oslo_messaging.rpc.server Stderr: u'Host
> key verification failed.\r\n'
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/kolla/+bug/1672207/+subscriptions
>

Revision history for this message
Satya Sanjibani Routray (satroutr) wrote :

What's the issue with nova ssh container

What's the log is reflecting

On 13-Mar-2017 12:20 PM, "Satyasanjibani Rautaray" <email address hidden>
wrote:

It should be the sshkey gen generated proper content of idrsa.pub and idrsa
not 4 and 4

On 13-Mar-2017 3:10 AM, "Dave Walker" <email address hidden> wrote:

> @Satya, it certainly isn't that...
>
> $ grep -i 'ssh-rsa' /etc/kolla/passwords.yml | wc -l
> 4
>
> $ grep -i 'BEGIN RSA' /etc/kolla/passwords.yml | wc -l
> 4
>
> ** Changed in: kolla
> Status: Invalid => New
>
> --
> You received this bug notification because you are subscribed to kolla.
> https://bugs.launchpad.net/bugs/1672207
>
> Title:
> non-shared storage live migation failing
>
> Status in kolla:
> New
>
> Bug description:
> This issues seems to be 2 parts...
>
> 1) ssh is being attempted to port 22, rather than 8022
> 2) sshd in nova_ssh isn't running :/
>
> 2017-03-12 15:54:35.203 7 ERROR oslo_messaging.rpc.server ResizeError:
> Resize error: not able to execute ssh command: Unexpected error while
> running command.
> 2017-03-12 15:54:35.203 7 ERROR oslo_messaging.rpc.server Command: ssh
> -o BatchMode=yes 10.131.161.201 mkdir -p /var/lib/nova/instances/09b8e8
> 96-d87f-4ca0-a9dc-782ce9b79d49
> 2017-03-12 15:54:35.203 7 ERROR oslo_messaging.rpc.server Exit code: 255
> 2017-03-12 15:54:35.203 7 ERROR oslo_messaging.rpc.server Stdout: u''
> 2017-03-12 15:54:35.203 7 ERROR oslo_messaging.rpc.server Stderr: u'Host
> key verification failed.\r\n'
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/kolla/+bug/1672207/+subscriptions
>

Revision history for this message
Dave Walker (davewalker) wrote : Re: non-shared storage live migation failing

@Satya, the 4 and 4 is because I tried to demonstrate I had valid values by "wc -l"ing a grep that showed those values were filled... So it caught "ssh-rsa ......" and "RSA PRIVATE KEY".

The issue isn't because of ssh keys not being there, it is a byproduct of the kolla_start not progressing due to the sleep condition I outlined above.

I've pushed up a changeset that should resolve this issue:
https://review.openstack.org/444771

Changed in kolla:
assignee: nobody → Dave Walker (davewalker)
status: New → In Progress
summary: - non-shared storage live migation failing
+ nova-ssh container never completes kolla_start
Dai Dang Van (daikk115)
summary: - nova-ssh container never completes kolla_start
+ Some containers never complete kolla_start
description: updated
affects: kolla → kolla-ansible
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/448871

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/ocata)

Reviewed: https://review.openstack.org/448871
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=237663df732be95d66ed2d26457872b5a0fb946a
Submitter: Jenkins
Branch: stable/ocata

commit 237663df732be95d66ed2d26457872b5a0fb946a
Author: Dave Walker (Daviey) <email address hidden>
Date: Mon Mar 13 09:07:06 2017 +0000

    Remove heka_socket vol and unwedge some containers

    The presence of heka_socket:/var/lib/kolla/heka with
    containers that log to /dev/log, such as nova-ssh cause a
    wedge on starting as /var/lib/kolla/heka/log is never
    created due to the removal of heka.

    This means that ssh data, such as config and keys are never
    sync'd and sshd is never started.

    Change-Id: Ia561526e6caf82eebd18c6e31cbeb1738b9ff602
    Closes-Bug: #1672207
    Co-Authored-By: Dai Dang Van <email address hidden>
    Signed-off-by: Dave Walker (Daviey) <email address hidden>
    (cherry picked from commit 936722f01ce0673a2b78ab5a7f27244eb98f92a7)

Revision history for this message
Eduardo Gonzalez (egonzalez90) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 4.0.1

This issue was fixed in the openstack/kolla-ansible 4.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 5.0.0.0b2

This issue was fixed in the openstack/kolla-ansible 5.0.0.0b2 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.