neutron_server container suspended in health:starting state

Bug #2042598 reported by Dariusz Bursztynowski
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Invalid
Undecided
Unassigned

Bug Description

I installed OpenStack (zed) on a Raspberry Pi cluster with kolla-ansible (version tagged for zed), all containers are healthy except the neutron_server which is suspended in 'health: starting' state. Network related part of OpenStack does not work. Some other commands commands work as expected (e.g., can create an image which is reported by openstack image list as 'active').

There are four Raspberry Pi 4B in the cluster (2 x 4GB RAM and 2 x 8GB RAM). They run Debian 11 (bullseaye) and kolla-ansible has been used for the installation.
Notably, I'm using a specific configuration of networking on my Pis to mimic two network interfaces on each host as kolla-ansible expects. These are provided as interfaces of veth pairs (more details on that below, too).

Below, one can find:

1. configuration commands I used to configure my Pi hosts (this panel)
2. environment details related to the Pis (the one serving as controller in OpenStack) and kolla-ansible install information (this panel)
3. ml2_conf.ini and nova-compute.conf configuration used in kolla-ansible
4. kolla-ansible files: globals.yml (4.1) and inventory multinode (4.2)
   - changed parts - this panel
   - complete versions - attachments
5. HttpException: 503 message from running init-runonce (kolla-ansible test script for new installation) (this panel)
6. status of containers on the control node as reported by 'docker ps -a' (this panel)
7. output form docker neutron_server inspect command (attachment)
8. log form neutron_server container (attachment)

*************************************************************
1. Debian configuration on the Pis
*************************************************************

Selected details fo the configuration are given in the following. Basically, most of them are needed to configure Pis' host networking using netplan. Another one relates to qemu-kvm.

(Note: initial configs to enable ssh access should be done locally (keyboard, monitor) on each Pi, in particular:
PermitRootLogin yes
PasswordAuthentication yes
I skip the details of enabling ssh access, though. Below, I assume ssh acces as a regular (non-root) user.
)

=========== Preparation for host networking setup ===========

$ sudo apt-get remove unattended-upgrades -y
$ sudo apt-get update -y && sudo apt-get upgrade -y

- updating $PATH for a user
$ sudo tee -a ~/.bashrc << EOT
export PATH=$PATH:/usr/local/sbin:/usr/sbin:/sbin
EOT
$ source ~/.bashrc

- enable systemd-networkd and configure eth0 for ssh access (neede to use ssh; not neede if one does stuff locally, attaching keyboard and monitor to each Pi)
  - enabling systemd-networkd
$ sudo mv /etc/network/interfaces /etc/network/interfaces.save
$ sudo mv /etc/network/interfaces.d /etc/network/interfaces.d.save
$ sudo systemctl enable systemd-networkd && sudo systemctl start systemd-networkd
$ sudo systemctl status systemd-networkd

- configure eth0 (in may case, I've configured static DHCP for each Pi on my DHCP server)
$ sudo tee /etc/systemd/network/20-wired.network << EOT
[Match]
Name=eth0

[Network]
DHCP=yes
EOT

- install netplan
$ sudo apt update && sudo apt -y install netplan.io
$ sudo reboot

- enable ip forwarding
$ sudo nano /etc/sysctl.conf
 ===> uncomment the line: net.ipv4.ip_forward=1
$sudo sysctl -p

========= Host networking setup ==========
- network setup on each Pi host - drawing:

192.168.1.xy/24 bez adresu IP
  +---------+ +---------+
  | veth0 | | veth1 | <==== network-interface and network-external-interface for kolla-ansible
  +---------+ +---------+
       | veth pairs |
  +---------+ +---------+
  | veth0br | | veth1br |
  +---------+ +---------+
     +-┴-----------------┴-+
     | brmux |
     +----------┬----------+
           +---------+
           | eth0 |
           +---------+

========== commands for achieving the above network setup (on each Pi)

-------- first veth pair

$ sudo tee /etc/systemd/network/veth-openstack-net-itf-veth0.netdev << EOT
#network-interface in kolla-ansible globals.yml
[NetDev]
Name=veth0
Kind=veth
[Peer]
Name=veth0br
EOT

--------- second veth pair

$sudo tee /etc/systemd/network/veth-openstack-net-itf-veth1.netdev << EOT
#network-external-interface in kolla-ansible globals.yml
[NetDev]
Name=veth1
Kind=veth
[Peer]
Name=veth1br
EOT

--------- set the networking (wlan0 is not necessary, just for any case)

sudo tee /etc/netplan/50-cloud-init.yaml << EOT

network:
  version: 2
  renderer: networkd

#-----------------------------------------#
# enable wlan for any case #
#-----------------------------------------#

# Interfejs wlan0 dostanie IPaddr z Linksysa przez DHCP.
  wifis:
    wlan0:
      access-points:
        FreshTomato06:
          password: klasterek
      dhcp4: true
      optional: true

#-----------------------------------------#
# Kolla-Ansible required #
#-----------------------------------------#

# Interfaces

  ethernets:
    eth0:
      dhcp4: false
      dhcp6: false

    # kolla-ansible network-interface
    veth0:
      addresses:
        - 192.168.1.xy/24 # set appropriate address
      nameservers:
        addresses:
          - 192.168.1.1 # dns on the local/home network router
          - 8.8.8.8
      routes:
        - to: 0.0.0.0/0
          via: 192.168.1.1
    veth0br:
      dhcp4: false
      dhcp6: false

    # kolla-ansible network-external-interface
    veth1:
      dhcp4: false
      dhcp6: false
    veth1br:
      dhcp4: false
      dhcp6: false

# Bridge - logically, a switch on the side of data center provider network

  bridges:
    brmux:
      interfaces:
        - eth0
        - veth0br
        - veth1br
EOT

$ sudo netplan generate
$ sudo netplan apply

(now ssh using the new IP address of veth0)

============ Remaining Pi configurations ============

$ sudo visudo ==> add NOPASSWD: ALL as shown below
# Allow members of group sudo to execute any command
%sudo ALL=(ALL:ALL) NOPASSWD: ALL

$ sudo apt-get update
$ sudo apt-get install -y qemu-kvm
$ sudo apt-get upgrade -y
- on notification:
A new version (/tmp/tmp.EofC83AObD) of configuration file /etc/ssh/sshd_config is available
 I choose ==> keep the local version currently installed

$ sudo apt-get install sshpass
$ sudo apt-get install ufw
$ sudo apt-get install net-tools <=== not necessary, I install it on the Pis just for any case

$ sudo nano /etc/sysctl.conf
  ===> uncomment the line: net.ipv4.ip_forward=1
$ sudo sysctl -p

************ End of Debian configuration *********

**************************************************
2. Environment details
**************************************************
( when related to a Pi this is the one serving as the control node)

* OS (e.g. from /etc/os-release):
ubuntu@ost64:~$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

* Kernel (e.g. `uname -a`):
ubuntu@ost64:~$ uname -a
Linux ost64 5.10.0-20-arm64 #1 SMP Debian 5.10.158-2 (2022-12-13) aarch64 GNU/Linux

* Docker version if applicable (e.g. `docker version`):
ubuntu@ost64:~$ sudo docker version
Client: Docker Engine - Community
 Version: 24.0.7
 API version: 1.43
 Go version: go1.20.10
 Git commit: afdd53b
 Built: Thu Oct 26 09:08:29 2023
 OS/Arch: linux/arm64
 Context: default

Server: Docker Engine - Community
 Engine:
  Version: 24.0.7
  API version: 1.43 (minimum version 1.12)
  Go version: go1.20.10
  Git commit: 311b9ff
  Built: Thu Oct 26 09:08:29 2023
  OS/Arch: linux/arm64
  Experimental: false
 containerd:
  Version: 1.6.24
  GitCommit: 61f9fd88f79f081d64d6fa3bb1a0dc71ec870523
 runc:
  Version: 1.1.9
  GitCommit: v1.1.9-0-gccaecfc
 docker-init:
  Version: 0.19.0
  GitCommit: de40ad0

* Kolla-Ansible version (e.g. `git head or tag or stable branch` or pip package version if using release):
  - kolla-ansible installed according to: https://docs.openstack.org/kolla-ansible/zed/user/quickstart.html), in particular:
  $ pip install -U 'ansible>=4,<6'
  $ pip install git+https://opendev.org/openstack/kolla-ansible@stable/zed
  - kolla-ansible was run using virtual environment, strictly following the instructions form quickstart.html

* Docker image Install type of Openstack containers (source/binary): binary

* Docker image distribution: quay.io, zed-debian-bullseye-aarch64, e.g. (from Ansible log during installation)
  changed: [ost64] => (item={'key': 'fluentd', 'value': {'container_name': 'fluentd', 'group': 'fluentd', 'enabled': True, 'image': 'quay.io/openstack.kolla/fluentd:zed-debian-bullseye-aarch64', 'environment>

* Are you using official images from Docker Hub or self built?
  I assume they are from quay.io (see above).

************************************************
3. ml2_conf.ini and nova-compute.conf is configured as follows
(Note: vlans enabled just for any case)
************************************************

=========== ml2_conf.ini
$ sudo tee /etc/kolla/config/neutron/ml2_conf.ini << EOT
[ml2]
type_drivers = flat,vlan
tenant_network_types = vxlan

[ml2_type_vlan]
network_vlan_ranges = physnet1:100:200

[ml2_type_flat]
flat_networks = physnet1

EOT

=========== nova-compute.conf
$ sudo tee /etc/kolla/config/nova/nova-compute.conf << EOT
[DEFAULT]
resume_guests_state_on_host_boot = true

*************************************************
4. globals.yml and inventory multinode files
*************************************************

4.1 globals.yml - basic settings
================================
Notes:
- remaining setting are default
- complete globals.yml - in attachment

------------
kolla_base_distro: "debian"
#openstack_release: "zed" <==== kolla-ansible has been generated for zed
openstack_tag_suffix: "-aarch64"
kolla_internal_vip_address: "192.168.1.60" <==== comment: ping 192.168.1.60 works well
network_interface: "veth0"
neutron_external_interface: "veth1"
enable_neutron_provider_networks: "yes"
#nova_compute_virt_type: "kvm" <===== comment: default (kvm) is left

4.2 inventory multinode - basic settings
========================================
Notes:
- remaining setting are default
- complete multinode - in attachment

------------
# These initial groups are the only groups required to be modified. The
# additional groups are for more control of the environment.
[control]
# These hostname must be resolvable from your deployment host
#control01
#control02
#control03
ost64 ansible_user=ubuntu ansible_password=ubuntu ansible_become=true

# The above can also be specified as follows:
#control[01:03] ansible_user=kolla

# The network nodes are where your l3-agent and loadbalancers will run
# This can be the same as a host in the control group
[network]
#network01
#network02
ost64

[compute]
#compute01
ost[61:63] ansible_user=ubuntu ansible_password=ubuntu ansible_become=true
ost64

[monitoring]
ost64

# When compute nodes and control nodes use different interfaces,
# you need to comment out "api_interface" and other interfaces from the globals.yml
# and specify like below:
#compute01 neutron_external_interface=eth0 api_interface=em1 tunnel_interface=em1

[storage]
#storage01
#ost61

[deployment]
localhost ansible_connection=local

=================================================
5. HttpException - message from init-runonce on the kolla-nsible node
(Note: cirros image has been stored successfully in glance and is reported as 'active' by openstack image list.)
=================================================
...
Configuring neutron.
HttpException: 503: Server Error for url: http://192.168.1.60:9696/v2.0/routers, No server is available to handle this request.: 503 Service Unavailable

=================================================================================
6. Status of containers reported by sudo docker ps -a on the control node
=================================================================================
ubuntu@ost64:~$ sudo docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
4f6598955a8d quay.io/openstack.kolla/horizon:zed-debian-bullseye-aarch64 "dumb-init --single-…" 4 hours ago Up 4 hours (healthy) horizon
77802d295317 quay.io/openstack.kolla/heat-engine:zed-debian-bullseye-aarch64 "dumb-init --single-…" 4 hours ago Up 4 hours (healthy) heat_engine
23dcd7083007 quay.io/openstack.kolla/heat-api-cfn:zed-debian-bullseye-aarch64 "dumb-init --single-…" 4 hours ago Up 4 hours (healthy) heat_api_cfn
3c7d7fb7dd85 quay.io/openstack.kolla/heat-api:zed-debian-bullseye-aarch64 "dumb-init --single-…" 4 hours ago Up 4 hours (healthy) heat_api
86420274224a quay.io/openstack.kolla/neutron-metadata-agent:zed-debian-bullseye-aarch64 "dumb-init --single-…" 4 hours ago Up 4 hours (healthy) neutron_metadata_agent
98a0f60fe048 quay.io/openstack.kolla/neutron-l3-agent:zed-debian-bullseye-aarch64 "dumb-init --single-…" 4 hours ago Up 4 hours (healthy) neutron_l3_agent
359fce2450aa quay.io/openstack.kolla/neutron-dhcp-agent:zed-debian-bullseye-aarch64 "dumb-init --single-…" 4 hours ago Up 4 hours (healthy) neutron_dhcp_agent
4c5e18147b87 quay.io/openstack.kolla/neutron-openvswitch-agent:zed-debian-bullseye-aarch64 "dumb-init --single-…" 4 hours ago Up 4 hours (healthy) neutron_openvswitch_agent
b5218aa4fb46 quay.io/openstack.kolla/neutron-server:zed-debian-bullseye-aarch64 "dumb-init --single-…" 4 hours ago Up 8 seconds (health: starting) neutron_server
aed0c1ec47a4 quay.io/openstack.kolla/openvswitch-vswitchd:zed-debian-bullseye-aarch64 "dumb-init --single-…" 4 hours ago Up 4 hours (healthy) openvswitch_vswitchd
463c6e360b8d quay.io/openstack.kolla/openvswitch-db-server:zed-debian-bullseye-aarch64 "dumb-init --single-…" 4 hours ago Up 4 hours (healthy) openvswitch_db
a53f653df1d1 quay.io/openstack.kolla/nova-compute:zed-debian-bullseye-aarch64 "dumb-init --single-…" 4 hours ago Up 4 hours (healthy) nova_compute
ce794c8d037c quay.io/openstack.kolla/nova-libvirt:zed-debian-bullseye-aarch64 "dumb-init --single-…" 4 hours ago Up 4 hours (healthy) nova_libvirt
16285faf8013 quay.io/openstack.kolla/nova-ssh:zed-debian-bullseye-aarch64 "dumb-init --single-…" 4 hours ago Up 4 hours (healthy) nova_ssh
c1e51c6bff9d quay.io/openstack.kolla/nova-novncproxy:zed-debian-bullseye-aarch64 "dumb-init --single-…" 4 hours ago Up 4 hours (healthy) nova_novncproxy
ca27ffc8d401 quay.io/openstack.kolla/nova-conductor:zed-debian-bullseye-aarch64 "dumb-init --single-…" 4 hours ago Up 4 hours (healthy) nova_conductor
2ca934ee1aa6 quay.io/openstack.kolla/nova-api:zed-debian-bullseye-aarch64 "dumb-init --single-…" 4 hours ago Up 4 hours (healthy) nova_api
f168dc3cc91f quay.io/openstack.kolla/nova-scheduler:zed-debian-bullseye-aarch64 "dumb-init --single-…" 4 hours ago Up 4 hours (healthy) nova_scheduler
690628e6cdb9 quay.io/openstack.kolla/placement-api:zed-debian-bullseye-aarch64 "dumb-init --single-…" 4 hours ago Up 4 hours (healthy) placement_api
b2676ebcc73a quay.io/openstack.kolla/glance-api:zed-debian-bullseye-aarch64 "dumb-init --single-…" 4 hours ago Up 4 hours (healthy) glance_api
2d5e935b8d4c quay.io/openstack.kolla/keystone:zed-debian-bullseye-aarch64 "dumb-init --single-…" 4 hours ago Up 4 hours (healthy) keystone
7a76f857056e quay.io/openstack.kolla/keystone-fernet:zed-debian-bullseye-aarch64 "dumb-init --single-…" 4 hours ago Up 4 hours (healthy) keystone_fernet
75c27f2df24f quay.io/openstack.kolla/keystone-ssh:zed-debian-bullseye-aarch64 "dumb-init --single-…" 4 hours ago Up 4 hours (healthy) keystone_ssh
ecbe00610982 quay.io/openstack.kolla/rabbitmq:zed-debian-bullseye-aarch64 "dumb-init --single-…" 4 hours ago Up 4 hours (healthy) rabbitmq
bfe9f55e263a quay.io/openstack.kolla/memcached:zed-debian-bullseye-aarch64 "dumb-init --single-…" 4 hours ago Up 4 hours (healthy) memcached
86a6a9a81e8d quay.io/openstack.kolla/mariadb-clustercheck:zed-debian-bullseye-aarch64 "dumb-init --single-…" 4 hours ago Up 4 hours mariadb_clustercheck
b377bd08e204 quay.io/openstack.kolla/mariadb-server:zed-debian-bullseye-aarch64 "dumb-init -- kolla_…" 4 hours ago Up 4 hours (healthy) mariadb
13c6eef85c13 quay.io/openstack.kolla/keepalived:zed-debian-bullseye-aarch64 "dumb-init --single-…" 4 hours ago Up 4 hours keepalived
5f0939c125a8 quay.io/openstack.kolla/haproxy:zed-debian-bullseye-aarch64 "dumb-init --single-…" 4 hours ago Up 4 hours (healthy) haproxy
8539f6ebdc7a quay.io/openstack.kolla/cron:zed-debian-bullseye-aarch64 "dumb-init --single-…" 4 hours ago Up 4 hours cron
1fbe92f1c594 quay.io/openstack.kolla/kolla-toolbox:zed-debian-bullseye-aarch64 "dumb-init --single-…" 4 hours ago Up 4 hours kolla_toolbox
444b3c2dd8b7 quay.io/openstack.kolla/fluentd:zed-debian-bullseye-aarch64 "dumb-init --single-…" 4 hours ago Up 4 hours fluentd

========================================================
7. Output form docker neutron_server inspect
========================================================
see attachment

=================================================================================
8. Log form neutron_server container (opening part of the log and ... the last part of the log)
=================================================================================
see attachment

############# end of problem description ###############

Revision history for this message
Dariusz Bursztynowski (dburszty) wrote :

globals.yml

Revision history for this message
Dariusz Bursztynowski (dburszty) wrote :
Revision history for this message
Dariusz Bursztynowski (dburszty) wrote :
Revision history for this message
Dariusz Bursztynowski (dburszty) wrote :
description: updated
description: updated
summary: - neutron_server container keeps pending in health:starting state
+ neutron_server container suspended in health:starting state
Revision history for this message
Bence Romsics (bence-romsics) wrote :

Hi,

Thanks for the report!

At first glance this looks like a deployment problem, not a neutron bug. From neutron perspective there's no clear error symptom described (other than "networking does not work"). And no neutron log (the attached "log from neutron_server" stops right when neutron-server is started). Even if there is a neutron bug, this is not enough to identify and/or debug it.

I'm no kolla expert (not even a kolla user), but I would recommend that you turn with your questions to kolla folks, for example on their irc channel (#kolla on irc.oftc.net, archives: https://meetings.opendev.org/) or on the mailing list (https://lists.openstack.org/mailman3/lists/openstack-discuss.lists.openstack.org/). It would also help in debugging if you collected actual neutron-server logs to see why it did not start properly.

Hope this helps,
Bence

Changed in neutron:
status: New → Invalid
Revision history for this message
Dariusz Bursztynowski (dburszty) wrote :

Thanks Bence for a quick response. I've turned to kolla folks already on https://launchpad.net/kolla-ansible, but will try the options you have suggested.

Best,
Darek

Revision history for this message
Dariusz Bursztynowski (dburszty) wrote :
Revision history for this message
Bence Romsics (bence-romsics) wrote :

Hi Dariusz,

Thanks for the neutron-server log! From this it's quite clear, that the generated neutron configuration is wrong. I don't know whether it's wrong because of a kolla bug or because of wrong input given to kolla. The latter is more likely though.

The point is: There is a mismatch in neutron-server's config (usually in /etc/neutron/plugins/ml2/ml2_conf.ini) between:

[ml2]
tenant_network_types = ...
type_drivers = ...

Whichever types are used in 'tenant_network_types', must be loaded in 'type_drivers'. IIRC it is safe to leave 'type_drivers' unset, because then we just load all available type drivers.

Hope this helps,
Bence

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.