Bug #2047593 “Fresh install - neutron processes fail on controll...” : Bugs : OpenStack-Ansible

Revision history for this message

birbilakos (birbilis) wrote on 2023-12-27:

#1

openstack_user_config.yml Edit (6.6 KiB, text/plain)

Revision history for this message

Dmitriy Rabotyagov (noonedeadpunk) wrote on 2023-12-28:

#2

Can you kindly provide output of "ls -l /etc/neutron/rootwrap.d"?
Also can you check that file /etc/sudoers.d/neutron_sudoers exists?

Revision history for this message

birbilakos (birbilis) wrote on 2023-12-28:

#3

root@sjc-lnxserver-121:~# ls -l /etc/neutron/rootwrap.d
total 12
-rw-r----- 1 root root 503 Dec 27 18:32 ovn-plugin.filters
-rw-r----- 1 root root 2223 Dec 27 18:32 rootwrap.filters
-rw-r----- 1 root root 839 Dec 27 18:32 vpnaas.filters

root@sjc-lnxserver-121:~# ls -l /etc/sudoers.d/neutron_sudoers
-r--r----- 1 root root 434 Dec 27 18:33 /etc/sudoers.d/neutron_sudoers
root@sjc-lnxserver-121:~# cat /etc/sudoers.d/neutron_sudoers
# Ansible managed

Defaults:neutron !requiretty
Defaults:neutron secure_path="/openstack/venvs/neutron-28.0.0/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"

neutron ALL = (root) NOPASSWD: /openstack/venvs/neutron-28.0.0/bin/neutron-rootwrap
neutron ALL = (root) NOPASSWD: /openstack/venvs/neutron-28.0.0/bin/neutron-rootwrap-daemon
neutron ALL = (root) NOPASSWD: /openstack/venvs/neutron-28.0.0/bin/privsep-helper

Revision history for this message

birbilakos (birbilis) wrote on 2023-12-28:

#4

Things tried so far (to no avail)
- chown neutron:neutron eveything under: /openstack/venvs/neutron-28.0.0/bin/
- changed /etc/systemd/system/neutron-dhcp-agent.service and enabled debug

[Service]
Type = simple
User = neutron
Group = neutron

to User = root

which just produced a different error:
Dec 28 11:38:21 sjc-lnxserver-122 neutron-dhcp-agent[273064]: 2023-12-28 11:38:21.929 273086 DEBUG oslo.privsep.daemon [-] privsep: reply[c31cb48f-3757-4894-b4e1-1b0614e1cc40]: (5, 'builtins.ModuleNotFoundError', ("No module named 'neutron.privileged.agent'",)) _call_back /openstack/venvs/neutron-28.0.0/lib/python3.10/site-packages/oslo_privsep/daemon.py:499
Dec 28 11:38:21 sjc-lnxserver-122 neutron-dhcp-agent[273064]: 2023-12-28 11:38:21.930 273064 ERROR neutron.agent.dhcp.agent [-] Unable to enable dhcp for 57f3ebe1-6310-40ba-a908-c99db6b4650f.: ModuleNotFoundError: No module named 'neutron.privileged.agent'
2023-12-28 11:38:21.930 273064 ERROR neutron.agent.dhcp.agent Traceback (most recent call last):
2023-12-28 11:38:21.930 273064 ERROR neutron.agent.dhcp.agent File "/openstack/venvs/neutron-28.0.0/lib/python3.10/site-packages/neutron/agent/dhcp/agent.py", line 270, in _call_driver

Revision history for this message

birbilakos (birbilis) wrote on 2023-12-28:

#5

Forgot to mention that OS is Ubuntu 22.04 LTS, no selinux is enabled, neither ufw.

Revision history for this message

birbilakos (birbilis) wrote on 2023-12-28:

#6

# cat /etc/neutron/rootwrap.d/rootwrap.filters
# Command filters to allow privsep daemon to be started via rootwrap.
#
# This file should be owned by (and only-writeable by) the root user

[Filters]

# By installing the following, the local admin is asserting that:
#
# 1. The python module load path used by privsep-helper
# command as root (as started by sudo/rootwrap) is trusted.
# 2. Any oslo.config files matching the --config-file
# arguments below are trusted.
# 3. Users allowed to run sudo/rootwrap with this configuration(*) are
# also allowed to invoke python "entrypoint" functions from
# --privsep_context with the additional (possibly root) privileges
# configured for that context.
#
# (*) ie: the user is allowed by /etc/sudoers to run rootwrap as root
#
# In particular, the oslo.config and python module path must not
# be writeable by the unprivileged user.

# PRIVSEP
# oslo.privsep default neutron context
privsep: PathFilter, privsep-helper, root,
--config-file, /etc/(?!\.\.).*,
--privsep_context, neutron.privileged.default,
--privsep_sock_path, /

# NOTE: A second `--config-file` arg can also be added above. Since
# many neutron components are installed like that (eg: by devstack).
# Adjust to suit local requirements.

# DEBUG
sleep: RegExpFilter, sleep, root, sleep, \d+

# EXECUTE COMMANDS IN A NAMESPACE
ip: IpFilter, ip, root
ip_exec: IpNetnsExecFilter, ip, root

# METADATA PROXY
haproxy: RegExpFilter, haproxy, root, haproxy, -f, .*
haproxy_env: EnvFilter, env, root, PROCESS_TAG=, haproxy, -f, .*

# DHCP
dnsmasq: CommandFilter, dnsmasq, root
dnsmasq_env: EnvFilter, env, root, PROCESS_TAG=, dnsmasq

# DIBBLER
dibbler-client: CommandFilter, dibbler-client, root
dibbler-client_env: EnvFilter, env, root, PROCESS_TAG=, dibbler-client

# L3
radvd: CommandFilter, radvd, root
radvd_env: EnvFilter, env, root, PROCESS_TAG=, radvd
keepalived: CommandFilter, keepalived, root
keepalived_env: EnvFilter, env, root, PROCESS_TAG=, keepalived
keepalived_state_change: CommandFilter, neutron-keepalived-state-change, root
keepalived_state_change_env: EnvFilter, env, root, PROCESS_TAG=, neutron-keepalived-state-change

# OPEN VSWITCH
ovs-ofctl: CommandFilter, ovs-ofctl, root
ovsdb-client: CommandFilter, ovsdb-client, root

# cat /etc/neutron/rootwrap.d/rootwrap.filters 
# Command filters to allow privsep daemon to be started via rootwrap.
#
# This file should be owned by (and only-writeable by) the root user

[Filters]

# By installing the following, the local admin is asserting that:
#
# 1. The python module load path used by privsep-helper
#    command as root (as started by sudo/rootwrap) is trusted.
# 2. Any oslo.config files matching the --config-file
#    arguments below are trusted.
# 3. Users allowed to run sudo/rootwrap with this configuration(*) are
#    also allowed to invoke python "entrypoint" functions from
#    --privsep_context with the additional (possibly root) privileges
#    configured for that context.
#
# (*) ie: the user is allowed by /etc/sudoers to run rootwrap as root
#
# In particular, the oslo.config and python module path must not
# be writeable by the unprivileged user.

# PRIVSEP
# oslo.privsep default neutron context
privsep: PathFilter, privsep-helper, root,
 --config-file, /etc/(?!\.\.).*,
 --privsep_context, neutron.privileged.default,
 --privsep_sock_path, /

# NOTE: A second `--config-file` arg can also be added above. Since
# many neutron components are installed like that (eg: by devstack).
# Adjust to suit local requirements.

# DEBUG
sleep: RegExpFilter, sleep, root, sleep, \d+

# EXECUTE COMMANDS IN A NAMESPACE
ip: IpFilter, ip, root
ip_exec: IpNetnsExecFilter, ip, root

# METADATA PROXY
haproxy: RegExpFilter, haproxy, root, haproxy, -f, .*
haproxy_env: EnvFilter, env, root, PROCESS_TAG=, haproxy, -f, .*

# DHCP
dnsmasq: CommandFilter, dnsmasq, root
dnsmasq_env: EnvFilter, env, root, PROCESS_TAG=, dnsmasq

# DIBBLER
dibbler-client: CommandFilter, dibbler-client, root
dibbler-client_env: EnvFilter, env, root, PROCESS_TAG=, dibbler-client

# L3
radvd: CommandFilter, radvd, root
radvd_env: EnvFilter, env, root, PROCESS_TAG=, radvd
keepalived: CommandFilter, keepalived, root
keepalived_env: EnvFilter, env, root, PROCESS_TAG=, keepalived
keepalived_state_change: CommandFilter, neutron-keepalived-state-change, root
keepalived_state_change_env: EnvFilter, env, root, PROCESS_TAG=, neutron-keepalived-state-change

# OPEN VSWITCH
ovs-ofctl: CommandFilter, ovs-ofctl, root
ovsdb-client: CommandFilter, ovsdb-client, root

Revision history for this message

Dmitriy Rabotyagov (noonedeadpunk) wrote on 2023-12-28:

#7

So things looks pretty much correctly to me from your pastes, as missing/wrong sudoers or rootwrap filters would be close to the only culprit I would expect.

Using root user I think is not expected by privsep, so it indeed can crash if you're trying to use it for root, without further configuration (by just replacing username for the service).

I think the last thing worth checking, would be apparmor profiles for Debian/Ubuntu or SELinux for CentOS/Rocky Linux. We currently also do not support installation with enabled SELinux, for instance.

With that apparmor profiles for haproxy, dnsmasq and ping should be disabled to run inside namespaces:
https://opendev.org/openstack/openstack-ansible-os_neutron/src/branch/master/tasks/neutron_apparmor.yml#L35-L61

Revision history for this message

birbilakos (birbilis) wrote on 2023-12-28:

#8

I have not installed/enabled selinux so it can't be the culprit.

I'd assume if it's apparmor causing this, disabling would do the trick?

# systemctl disable apparmor

Revision history for this message

birbilakos (birbilis) wrote on 2023-12-28:

#9

well, this didn't seem to make any difference:
# systemctl status apparmor.service
○ apparmor.service - Load AppArmor profiles
Loaded: loaded (/lib/systemd/system/apparmor.service; disabled; vendor preset: enabled)

Dec 28 15:36:29 sjc-lnxserver-121 systemd[4565]: neutron-l3-agent.service: Failed to execute /openstack/venvs/neutron-28.0.0/bin/neutron-l3-agent: Permission denied
Dec 28 15:36:29 sjc-lnxserver-121 systemd[4566]: neutron-metadata-agent.service: Failed to execute /openstack/venvs/neutron-28.0.0/bin/neutron-metadata-agent: Permission denied
Dec 28 15:36:29 sjc-lnxserver-121 systemd[4565]: neutron-l3-agent.service: Failed at step EXEC spawning /openstack/venvs/neutron-28.0.0/bin/neutron-l3-agent: Permission denied
Dec 28 15:36:29 sjc-lnxserver-121 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=neutron-l3-agent comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Dec 28 15:36:29 sjc-lnxserver-121 systemd[4566]: neutron-metadata-agent.service: Failed at step EXEC spawning /openstack/venvs/neutron-28.0.0/bin/neutron-metadata-agent: Permission denied
Dec 28 15:36:29 sjc-lnxserver-121 systemd[4567]: neutron-metering-agent.service: Failed to execute /openstack/venvs/neutron-28.0.0/bin/neutron-metering-agent: Permission denied
Dec 28 15:36:29 sjc-lnxserver-121 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=neutron-metadata-agent comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Dec 28 15:36:29 sjc-lnxserver-121 systemd[4567]: neutron-metering-agent.service: Failed at step EXEC spawning /openstack/venvs/neutron-28.0.0/bin/neutron-metering-agent: Permission denied
Dec 28 15:36:29 sjc-lnxserver-121 systemd[4568]: neutron-openvswitch-agent.service: Failed to execute /openstack/venvs/neutron-28.0.0/bin/neutron-openvswitch-agent: Permission denied
Dec 28 15:36:29 sjc-lnxserver-121 systemd[1]: neutron-l3-agent.service: Main process exited, code=exited, status=203/EXEC
Dec 28 15:36:29 sjc-lnxserver-121 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=neutron-metering-agent comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Dec 28 15:36:29 sjc-lnxserver-121 systemd[1]: neutron-l3-agent.service: Failed with result 'exit-code'.
Dec 28 15:36:29 sjc-lnxserver-121 systemd[4568]: neutron-openvswitch-agent.service: Failed at step EXEC spawning /openstack/venvs/neutron-28.0.0/bin/neutron-openvswitch-agent: Permission denied

well, this didn't seem to make any difference:
# systemctl status apparmor.service 
○ apparmor.service - Load AppArmor profiles
     Loaded: loaded (/lib/systemd/system/apparmor.service; disabled; vendor preset: enabled)

Dec 28 15:36:29 sjc-lnxserver-121 systemd[4565]: neutron-l3-agent.service: Failed to execute /openstack/venvs/neutron-28.0.0/bin/neutron-l3-agent: Permission denied
Dec 28 15:36:29 sjc-lnxserver-121 systemd[4566]: neutron-metadata-agent.service: Failed to execute /openstack/venvs/neutron-28.0.0/bin/neutron-metadata-agent: Permission denied
Dec 28 15:36:29 sjc-lnxserver-121 systemd[4565]: neutron-l3-agent.service: Failed at step EXEC spawning /openstack/venvs/neutron-28.0.0/bin/neutron-l3-agent: Permission denied
Dec 28 15:36:29 sjc-lnxserver-121 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=neutron-l3-agent comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Dec 28 15:36:29 sjc-lnxserver-121 systemd[4566]: neutron-metadata-agent.service: Failed at step EXEC spawning /openstack/venvs/neutron-28.0.0/bin/neutron-metadata-agent: Permission denied
Dec 28 15:36:29 sjc-lnxserver-121 systemd[4567]: neutron-metering-agent.service: Failed to execute /openstack/venvs/neutron-28.0.0/bin/neutron-metering-agent: Permission denied
Dec 28 15:36:29 sjc-lnxserver-121 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=neutron-metadata-agent comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Dec 28 15:36:29 sjc-lnxserver-121 systemd[4567]: neutron-metering-agent.service: Failed at step EXEC spawning /openstack/venvs/neutron-28.0.0/bin/neutron-metering-agent: Permission denied
Dec 28 15:36:29 sjc-lnxserver-121 systemd[4568]: neutron-openvswitch-agent.service: Failed to execute /openstack/venvs/neutron-28.0.0/bin/neutron-openvswitch-agent: Permission denied
Dec 28 15:36:29 sjc-lnxserver-121 systemd[1]: neutron-l3-agent.service: Main process exited, code=exited, status=203/EXEC
Dec 28 15:36:29 sjc-lnxserver-121 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=neutron-metering-agent comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Dec 28 15:36:29 sjc-lnxserver-121 systemd[1]: neutron-l3-agent.service: Failed with result 'exit-code'.
Dec 28 15:36:29 sjc-lnxserver-121 systemd[4568]: neutron-openvswitch-agent.service: Failed at step EXEC spawning /openstack/venvs/neutron-28.0.0/bin/neutron-openvswitch-agent: Permission denied

Revision history for this message

birbilakos (birbilis) wrote on 2023-12-28:

#10

I just did a complete reinstall of the environment, with the same config (haven't yet logged in horizon) and still hitting this issue so it appears to happen consistently at least on Ubuntu 22.04 server.

Revision history for this message

Dmitriy Rabotyagov (noonedeadpunk) wrote on 2023-12-28:

#11

Can you kindly paste output of following resulted config files:
* /etc/neutron/plugins/ml2/ml2_conf.ini
* /etc/neutron/plugins/ml2/openvswitch_agent.ini

And also some more insight into your network configuration, like `ip a` and/or `brctl show`, since commenting out `network_interface: "br-ext"` with the comment of having connectivity issue afterwards smells somewhat fishy.

We don't see any issues in CI, so I'd assume that smth is actually off with the specific configuration, if playbooks are not failing during the deployment.

Revision history for this message

birbilakos (birbilis) wrote on 2023-12-28 (last edit on 2023-12-30):

#12

root@sjc-lnxserver-121:~# cat /etc/neutron/plugins/ml2/ml2_conf.ini
[ml2]
type_drivers = flat,vlan,vxlan
tenant_network_types = vlan,flat,vxlan
mechanism_drivers = openvswitch
extension_drivers = port_security
# ML2 flat networks

[ml2_type_flat]
flat_networks = flat
# ML2 VLAN networks

[ml2_type_vlan]
network_vlan_ranges = physnet1:40:400
# ML2 VXLAN networks

[ml2_type_vxlan]
vxlan_group = 239.1.1.1
vni_ranges = 1:1000

[ml2_type_geneve]
vni_ranges =
max_header_size = 38
# Security groups

[securitygroup]
enable_security_group = True
enable_ipset = True

root@sjc-lnxserver-121:~# cat /etc/neutron/plugins/ml2/openvswitch_agent.ini
[ovs]
local_ip = 172.29.240.121
bridge_mappings = flat:br-public,physnet1:br-provider

[agent]
l2_population = False
tunnel_types = vxlan
enable_distributed_routing = False
extensions =
# Security groups

[securitygroup]
firewall_driver = iptables_hybrid
enable_security_group = True
enable_ipset = True

Here's my netplan config:
network:
  version: 2
  ethernets:
    eno1:
      mtu: 1500
    eno2:
      mtu: 1500
  bonds:
    bond0:
      interfaces:
        - eno1
      mtu: 1500
      parameters:
        lacp-rate: fast
        mii-monitor-interval: 100
        mode: active-backup
        transmit-hash-policy: layer3+4
  vlans:
    bond0.10:
      id: 10
      link: bond0
    bond0.20:
      id: 20
      link: bond0
    bond0.30:
      id: 30
      link: bond0
    bond0.40:
      id: 40
      link: bond0
  bridges:
    br-mgmt:
      addresses:
        - "172.29.236.121/22"
      interfaces:
        - bond0.10
      mtu: 1500
      nameservers:
        addresses:
          - DNS1
          - DNS2
        search:
            - test.net
    br-storage:
      addresses:
        - "172.29.244.121/22"
      interfaces:
        - bond0.20
      mtu: 1500
      openvswitch: {}
    br-vxlan:
      addresses:
        - "172.29.240.121/22"
      interfaces:
        - bond0.30
      mtu: 1500
    br-ext:
      addresses:
        - "10.xxx.yyy/24"
      interfaces:
        - bond0
      routes:
        - to: default
          via: 10.xxx.yyy
      nameservers:
        addresses:
          - DNS1
          - DNS2
    br-vlan:
      interfaces: [bond0.40]
      mtu: 1500

root@sjc-lnxserver-121:~# cat /etc/neutron/plugins/ml2/ml2_conf.ini
[ml2]
type_drivers = flat,vlan,vxlan
tenant_network_types = vlan,flat,vxlan
mechanism_drivers = openvswitch
extension_drivers = port_security
# ML2 flat networks

[ml2_type_flat]
flat_networks = flat
# ML2 VLAN networks

[ml2_type_vlan]
network_vlan_ranges = physnet1:40:400
# ML2 VXLAN networks

[ml2_type_vxlan]
vxlan_group = 239.1.1.1
vni_ranges = 1:1000

[ml2_type_geneve]
vni_ranges = 
max_header_size = 38
# Security groups

[securitygroup]
enable_security_group = True
enable_ipset = True

root@sjc-lnxserver-121:~# cat /etc/neutron/plugins/ml2/openvswitch_agent.ini
[ovs]
local_ip = 172.29.240.121
bridge_mappings = flat:br-public,physnet1:br-provider

[agent]
l2_population = False
tunnel_types = vxlan
enable_distributed_routing = False
extensions = 
# Security groups

[securitygroup]
firewall_driver = iptables_hybrid
enable_security_group = True
enable_ipset = True

Here's my netplan config:
network:
  version: 2
  ethernets:
    eno1:
      mtu: 1500
    eno2:
      mtu: 1500
  bonds:
    bond0:
      interfaces:
        - eno1
      mtu: 1500
      parameters:
        lacp-rate: fast
        mii-monitor-interval: 100
        mode: active-backup
        transmit-hash-policy: layer3+4
  vlans:
    bond0.10:
      id: 10
      link: bond0
    bond0.20:
      id: 20
      link: bond0
    bond0.30:
      id: 30
      link: bond0
    bond0.40:
      id: 40
      link: bond0
  bridges:
    br-mgmt:
      addresses:
        - "172.29.236.121/22"
      interfaces:
        - bond0.10
      mtu: 1500
      nameservers:
        addresses:
          - DNS1
          - DNS2
        search:
            - test.net
    br-storage:
      addresses:
        - "172.29.244.121/22"
      interfaces:
        - bond0.20
      mtu: 1500
      openvswitch: {}
    br-vxlan:
      addresses:
        - "172.29.240.121/22"
      interfaces:
        - bond0.30
      mtu: 1500
    br-ext:
      addresses:
        - "10.xxx.yyy/24"
      interfaces:
        - bond0
      routes:
        - to: default
          via: 10.xxx.yyy
      nameservers:
        addresses:
          - DNS1
          - DNS2
    br-vlan:
      interfaces: [bond0.40]
      mtu: 1500

Revision history for this message

Dmitriy Rabotyagov (noonedeadpunk) wrote on 2023-12-28:

#13

Ok, that looks not right to me. I'm not sure though if it's related to the errors you see or not, but worth fixing network config first anyway.

Can you also provide output "ovs-vsctl show" just in case? I will try to come up with some proposal tomorrow morning.

Revision history for this message

birbilakos (birbilis) wrote on 2023-12-28:

#14

I appreciate all the help Dmitrivy :)

root@sjc-lnxserver-121:~# ovs-vsctl show
197d0057-3e0f-45d3-ae24-7b3fdd64d37c
    Bridge br-storage
        fail_mode: standalone
        Port "662d3225_eth2"
            Interface "662d3225_eth2"
        Port "6134f004_eth2"
            Interface "6134f004_eth2"
        Port br-storage
            Interface br-storage
                type: internal
        Port bb08d3b9_eth2
            Interface bb08d3b9_eth2
        Port bond0.20
            Interface bond0.20
    Bridge br-provider
        fail_mode: secure
        Port br-vlan
            Interface br-vlan
        Port br-provider
            Interface br-provider
                type: internal
    Bridge br-public
        fail_mode: secure
        Port br-public
            Interface br-public
                type: internal
    ovs_version: "2.17.8"

Revision history for this message

Dmitriy Rabotyagov (noonedeadpunk) wrote on 2023-12-29:

#15

1. Weird thing that you don't have `container_bridge_type: openvswitch` definition for br-storage bridge, though this bridge appears to be OVS one, rather then linux bridge (default). Not sure if it's intentional or not, but that is not related to the neutron error you see (still smth worth checking).
But at the same time, netplan contains defenition that br-storage is an OVS bridge. So I'd suggest either to edit netplan and make storage a regular bridge, or add `container_bridge_type: openvswitch` to
`provider_networks` for br-storage

2. br-ext can not be a bond0 device, if you want to have br-vlan. Idea of br-vlan, is that Neutron will try to spawn a VLANs on the interface if topic, which is impossible to do from bond0.40.
So basically, br-vlan should have bond0, while br-ext some bond0.40

3. You really should not use br-ext as a default gateway. Mainly, because on default interface you should held different networks, while br-ext is designed to handle customer public networks (passed to VM). If you're limited on amount of VLANs - better to combine smth with br-mgmt (like - drop br-storage)

4. Then in openstack_user_config you define br-public and br-provider, while in fact you have only br-ext. So I'm not sure what intention is behind that, but it results in configuring neutron mappings with non-existent interfaces. That might result easily in errors you supply, since neutron attempts to utilize interfaces that does not exist.

Revision history for this message

birbilakos (birbilis) wrote on 2023-12-30:

#16

Download full text (3.5 KiB)

Thank you Dmitriy. I'm trying to adapt based on your advice but still need some help to understand how things would work. I only have one bond available and removed br-ext from the equation. Instead routed internet traffic through the vlan mgmt bridge. I'm unsure though what you use as haproxy_keepalived_external_interface in order for my setup to be reachable from outside of the openstack perimeter.

Here's the relevant updated netplan config. Please let me know what you think.

network:
  version: 2
  ethernets:
    eno1:
      mtu: 1500
    eno2:
      mtu: 1500
  bonds:
    bond0:
      interfaces:
        - eno1
      mtu: 1500
      parameters:
        lacp-rate: fast
        mii-monitor-interval: 100
        mode: active-backup
        transmit-hash-policy: layer3+4
  vlans:
    bond0.10:
      id: 10
      link: bond0
    bond0.20:
      id: 20
      link: bond0
    bond0.30:
      id: 30
      link: bond0
  bridges:
    br-mgmt:
      addresses:
        - "172.29.236.{{ target_host_last_octet }}/22"
      interfaces:
        - bond0.10
      mtu: 1500
      routes:
        - to: default
          via: 172.29.236.120
      nameservers:
        addresses:
          - DNS1
          - DNS
    br-storage:
      addresses:
        - "172.29.244.{{ target_host_last_octet }}/22"
      interfaces:
        - bond0.20
      mtu: 1500
    br-vxlan:
      addresses:
        - "172.29.240.{{ target_host_last_octet }}/22"
      interfaces:
        - bond0.30
      mtu: 1500
    br-vlan:
      interfaces:
        - bond0
      mtu: 1500

Relevant openstack_user_config sections:
cidr_networks: &cidr_networks
  management: 172.29.236.0/22
  tunnel: 172.29.240.0/22
  storage: 172.29.244.0/22

used_ips:
  - "172.29.236.99,172.29.236.200"
  - "172.29.240.100,172.29.240.200"
  - "172.29.244.100,172.29.244.200"

global_overrides:
  cidr_networks: *cidr_networks
  internal_lb_vip_address: 172.29.236.99
  #
  # The below domain name must resolve to an IP address
  # in the CIDR specified in haproxy_keepalived_external_vip_cidr.
  # If using different protocols (https/http) for the public/internal
  # endpoints the two addresses must be different.
  #
  external_lb_vip_address: 10.222.112.40
  management_bridge: "br-mgmt"
  provider_networks:
    - network:
        container_bridge: "br-mgmt"
        container_type: "veth"
        container_interface: "eth1"
        ip_from_q: "management"
        type: "raw"
        group_binds:
          - all_containers
          - hosts
        is_management_address: true
    - network:
        container_bridge: "br-storage"
        container_type: "veth"
        container_interface: "eth2"
        ip_from_q: "storage"
        type: "raw"
        group_binds:
          - glance_api
          - cinder_api
          - cinder_volume
          - nova_compute
    - network:
        container_bridge: "br-vlan"
        container_type: "veth"
        type: "vlan"
        range: "40:400"
        net_name: "vlan"
        network_interface: "br-vlan"
        group_binds:
          - neutron_openvswitch_agent
    - network:
        container_bridge: "br-public"
        container_type: "veth"
        type: "flat"
        net_name: "f...

Ok, let me provide you with our typical configuration then - maybe it will explain a bit what I mean.

Eventually, we do not have a separate "flat" network - we provide public network through a VLAN, thus we don't have any FLAT network at all.
However, you can add a FLAT one to your configuration if you want to, it still have to be on VLAN though:
    - network:
        container_bridge: "br-public"
        container_type: "veth"
        type: "flat"
        net_name: "flat"
        network_interface: "bond0.50"
        group_binds:
          - neutron_openvswitch_agent

So our typical configuration would look like that.

openstack_user_config:

management_bridge: br-mgmt
  tunnel_bridge: br-vxlan
  provider_networks:
    - network:
        group_binds:
          - all_containers
          - hosts
        type: raw
        container_bridge: br-mgmt
        container_interface: eth1
        container_type: veth
        ip_from_q: container
        is_container_address: true
        is_ssh_address: true
    - network:
        group_binds:
          - glance_api
          - cinder_volume
          - nova_compute
        type: raw
        container_bridge: br-storage
        container_type: veth
        container_interface: eth2
        container_mtu: 9000
        ip_from_q: storage
    - network:
        group_binds:
          - neutron_openvswitch_agent
        container_bridge: br-vxlan
        container_type: veth
        container_interface: eth10
        container_mtu: 9000
        ip_from_q: tunnel
        type: vxlan
        range: 65537:69999
        net_name: vxlan
    - network:
        group_binds:
          - neutron_openvswitch_agent
        container_bridge: br-vlan
        container_type: veth
        container_interface: eth11
        type: vlan
        range: 40:400
        net_name: vlan

And here is how netplan config looks like:
network:
    bonds:
        bond0:
            interfaces:
            - eno1
            - eno2
            macaddress: MAC
            mtu: 9000
            parameters:
                down-delay: 0
                lacp-rate: slow
                mii-monitor-interval: 100
                mode: 802.3ad
                transmit-hash-policy: layer3+4
                up-delay: 0
    bridges:
        br-mgmt:
            addresses:
            - 172.29.236.X/22
            interfaces:
            - bond0.10
            macaddress: MAC
            mtu: 1500
            parameters:
                forward-delay: 15
                stp: false
        br-storage:
            addresses:
            - 172.29.244.X/22
            interfaces:
            - bond0.20
            macaddress: MAC
            mtu: 9000
            parameters:
                forward-delay: 15
                stp: false
        br-vxlan:
            addresses:
            - 172.29.240.X/22
            interfaces:
            - bond0.30
            macaddress: MAC
            mtu: 9000
            parameters:
                forward-delay: 15
                stp: false
    ethernets:
        eno1:
            match:
                macaddress: MAC
            mtu: 9000
            set-name: eno2
        eno2:
            match:
                macaddress: MAC
            mtu: 9000
            set-name: eno2
    version: 2
    vlans:
        bond0.10:
            id: 10
            link: bond0
            mtu: 1500
        bond0.40:
            addresses:
            - <PUBLIC IP>
            gateway4: <PUBLIC IP>
            id: 40
            link: bond0
            mtu: 1500
            nameservers:
                addresses:
                - 8.8.8.8
                - 1.1.1.1
        bond0.30:
            id: 30
            link: bond0
            mtu: 9000
        bond0.20:
            id: 20
            link: bond0
            mtu: 9000

Revision history for this message

birbilakos (birbilis) wrote on 2023-12-30:

#18

I don't see br-vlan defined in the netplan config, while there is one defined in openstack_user_config. Is that intentional?

Also, what's the difference between these two options in the mgmt network:
is_ssh_address: true
is_management_address: true

Revision history for this message

birbilakos (birbilis) wrote on 2023-12-30:

#19

I should also clarify that switch limitations only allow me to have public access via untagged traffic. I.e. I cannot use vlan for accessing the outside of my openstack perimeter. As such it would either need to go through untagged interface/bond or via a router for which I'm using my deployment node.

Hence comes the question about what to use for haproxy_keepalived_external_interface ...

Revision history for this message

Dmitriy Rabotyagov (noonedeadpunk) wrote on 2023-12-30:

#20

IIRC, br-vlan is an OVS bridge that's created and handled by neutron.

But if you can't use vlan tag to get outside of perimeter and have only one interface - I think you can't have vlan networks in neutron.
Then I'm really not sure how to handle outgoing traffic from VMs in a good way either.

I mean - you should probably have only vxlan even then. Then VMs will have only internal networks and will be able to reach out only through net nodes (through neutron routers and floating IPs).

And then likely you in fact need br-ext as an OVS bridge with IP/default route on it, and use it as a flat network, which is not shared (so usable only for floating IPs).
Potential other way around would be to create a veth pair that will be added to the bridge with public network and then other side of pair to a br-ext under neutron.

So there are potential way through, but you kinda need to understand what you are doing.

Revision history for this message

birbilakos (birbilis) wrote on 2023-12-30:

#21

Yes, the environment is quite limited for now. Given the constrains maybe I can use vlans only for internal traffic and not for getting outside the perimeter. E.g. mgmt and storage networks basically. I understand that vxlan would still allow me to give floating IPs to VMs (?) which is good enough for me.

I'm unsure still though of the config for this: "And then likely you in fact need br-ext as an OVS bridge with IP/default route on it, and use it as a flat network, which is not shared (so usable only for floating IPs)."

Revision history for this message

Dmitriy Rabotyagov (noonedeadpunk) wrote on 2023-12-30:

#22

For floating IPs to work you still need some "public" network in neutron. The difference is here that you don't need to have such public network on computes - only on net nodes (where neutron_l3_agent runs).

Revision history for this message

birbilakos (birbilis) wrote on 2023-12-30 (last edit on 2023-12-30):

#23

Hi Dmitriy,

Could you advice if the below configuration makes sense:
- removed any vlan type networks, kept vxlan and flat in hopes that routing would be achievable via the network nodes (?)
- br-ext is part of the br-public network and used for the haproxy_keepalived_external_interface setting

openstack_user_config:
management_bridge: br-mgmt
tunnel_bridge: br-vxlan

  provider_networks:
    - network:
        container_bridge: "br-mgmt"
        container_type: "veth"
        container_interface: "eth1"
        ip_from_q: "management"
        type: "raw"
        group_binds:
          - all_containers
          - hosts
        is_management_address: true
        is_ssh_address: true
    - network:
        container_bridge: "br-storage"
        container_type: "veth"
        container_interface: "eth2"
        ip_from_q: "storage"
        type: "raw"
        group_binds:
          - glance_api
          - cinder_api
          - cinder_volume
          - nova_compute
    - network:
        container_bridge: "br-public"
        container_type: "veth"
        type: "flat"
        net_name: "flat"
        network_interface: "br-ext"
        group_binds:
          - neutron_openvswitch_agent
    - network:
        container_bridge: "br-vxlan"
        container_type: "veth"
        container_interface: "eth10"
        ip_from_q: "tunnel"
        type: "vxlan"
        range: "1:1000"
        net_name: "vxlan"
        group_binds:
          - neutron_openvswitch_agent

netplan config (same for both controller/network as well as compute nodes):
network:
  version: 2
  ethernets:
    eno1:
      mtu: 1500
    eno2:
      mtu: 1500
  bonds:
    bond0:
      interfaces:
        - eno1
      mtu: 1500
      parameters:
        lacp-rate: fast
        mii-monitor-interval: 100
        mode: active-backup
        transmit-hash-policy: layer3+4
  vlans:
    bond0.10:
      id: 10
      link: bond0
    bond0.20:
      id: 20
      link: bond0
    bond0.30:
      id: 30
      link: bond0
  bridges:
    br-mgmt:
      addresses:
        - "172.29.236.{{ target_host_last_octet }}/22"
      interfaces:
        - bond0.10
      mtu: 1500
    br-storage:
      addresses:
        - "172.29.244.{{ target_host_last_octet }}/22"
      interfaces:
        - bond0.20
      mtu: 1500
    br-vxlan:
      addresses:
        - "172.29.240.{{ target_host_last_octet }}/22"
      interfaces:
        - bond0.30
      mtu: 1500
    br-ext:
      addresses:
        - "10.xxx.yyy.{{ target_host_last_octet }}/24"
      interfaces:
        - bond0
      mtu: 1500
      routes:
        - to: default
          via: 10.xxx.yyy
      nameservers:
        addresses:
          - DNS1
          - DNS2

Hi Dmitriy,

Could you advice if the below configuration makes sense:
- removed any vlan type networks, kept vxlan and flat in hopes that routing would be achievable via the network nodes (?)
- br-ext is part of the br-public network and used for the haproxy_keepalived_external_interface setting

openstack_user_config:
  management_bridge: br-mgmt
  tunnel_bridge: br-vxlan
  
  provider_networks:
    - network:
        container_bridge: "br-mgmt"
        container_type: "veth"
        container_interface: "eth1"
        ip_from_q: "management"
        type: "raw"
        group_binds:
          - all_containers
          - hosts
        is_management_address: true
        is_ssh_address: true
    - network:
        container_bridge: "br-storage"
        container_type: "veth"
        container_interface: "eth2"
        ip_from_q: "storage"
        type: "raw"
        group_binds:
          - glance_api
          - cinder_api
          - cinder_volume
          - nova_compute
    - network:
        container_bridge: "br-public"
        container_type: "veth"
        type: "flat"
        net_name: "flat"
        network_interface: "br-ext"
        group_binds:
          - neutron_openvswitch_agent
    - network:
        container_bridge: "br-vxlan"
        container_type: "veth"
        container_interface: "eth10"
        ip_from_q: "tunnel"
        type: "vxlan"
        range: "1:1000"
        net_name: "vxlan"
        group_binds:
          - neutron_openvswitch_agent

netplan config (same for both controller/network as well as compute nodes):
network:
  version: 2
  ethernets:
    eno1:
      mtu: 1500
    eno2:
      mtu: 1500
  bonds:
    bond0:
      interfaces:
        - eno1
      mtu: 1500
      parameters:
        lacp-rate: fast
        mii-monitor-interval: 100
        mode: active-backup
        transmit-hash-policy: layer3+4
  vlans:
    bond0.10:
      id: 10
      link: bond0
    bond0.20:
      id: 20
      link: bond0
    bond0.30:
      id: 30
      link: bond0
  bridges:
    br-mgmt:
      addresses:
        - "172.29.236.{{ target_host_last_octet }}/22"
      interfaces:
        - bond0.10
      mtu: 1500
    br-storage:
      addresses:
        - "172.29.244.{{ target_host_last_octet }}/22"
      interfaces:
        - bond0.20
      mtu: 1500
    br-vxlan:
      addresses:
        - "172.29.240.{{ target_host_last_octet }}/22"
      interfaces:
        - bond0.30
      mtu: 1500
    br-ext:
      addresses:
        - "10.xxx.yyy.{{ target_host_last_octet }}/24"
      interfaces:
        - bond0
      mtu: 1500
      routes:
        - to: default
          via: 10.xxx.yyy
      nameservers:
        addresses:
          - DNS1
          - DNS2

Revision history for this message

birbilakos (birbilis) wrote on 2023-12-31:

#24

Download full text (3.2 KiB)

I went ahead and reconfigured things using veth pairs vethb1, vethb2, using systemd configuration:
[NetDev]
Name=vethb1
Kind=veth
[Peer]
Name=vethb2

netplan:

network:
  version: 2
  ethernets:
    eno1:
      mtu: 1500
    eno2:
      mtu: 1500
    vethb1: {}
    vethb2: {}
  bonds:
    bond0:
      interfaces:
        - eno1
      mtu: 1500
      parameters:
        lacp-rate: fast
        mii-monitor-interval: 100
        mode: active-backup
        transmit-hash-policy: layer3+4
  vlans:
    bond0.10:
      id: 10
      link: bond0
    bond0.20:
      id: 20
      link: bond0
    bond0.30:
      id: 30
      link: bond0
  bridges:
    br-mgmt:
      addresses:
        - "172.29.236.121/22"
      interfaces:
        - bond0.10
      mtu: 1500
    br-storage:
      addresses:
        - "172.29.244.121/22"
      interfaces:
        - bond0.20
      mtu: 1500
    br-vxlan:
      addresses:
        - "172.29.240.121/22"
      interfaces:
        - bond0.30
      mtu: 1500
    br-ext:
      addresses:
        - "10.xxx.yyy.zzz/24"
      interfaces:
        - bond0
      mtu: 1500
      routes:
        - to: default
          via: 10.xxx.yyy.1
      nameservers:
        addresses:
          - DNS1
          - DNS2
        search:
          - test.net

openstack_user_config:
management_bridge: br-mgmt
tunnel_bridge: br-vxlan

  provider_networks:
    - network:
        container_bridge: "br-mgmt"
        container_type: "veth"
        container_interface: "eth1"
        ip_from_q: "management"
        type: "raw"
        group_binds:
          - all_containers
          - hosts
        is_management_address: true
        is_ssh_address: true
    - network:
        container_bridge: "br-storage"
        container_type: "veth"
        container_interface: "eth2"
        ip_from_q: "storage"
        type: "raw"
        group_binds:
          - glance_api
          - cinder_api
          - cinder_volume
          - nova_compute
    - network:
        container_bridge: "br-public"
        container_type: "veth"
        type: "flat"
        net_name: "flat"
        network_interface: "vethb2"
        group_binds:
          - neutron_openvswitch_agent
    - network:
        container_bridge: "br-vxlan"
        container_type: "veth"
        container_interface: "eth10"
        ip_from_q: "tunnel"
        type: "vxlan"
        range: "1:1000"
        net_name: "vxlan"
        group_binds:
          - neutron_openvswitch_agent

user_variables:
---
debug: false
# global_overrides:
# enable_logging: "yes"

install_method: source
haproxy_keepalived_external_vip_cidr: "10.xxx.yyy.zzz/32"
haproxy_keepalived_internal_vip_cidr: "172.29.236.99/32"
haproxy_keepalived_external_interface: vethb2
haproxy_keepalived_internal_interface: br-mgmt

neutron_plugin_type: ml2.ovs
neutron_ml2_drivers_type: "flat,vlan,vxlan"
neutron_plugin_base:
- router
- metering

And yet, I still get the same 'permission denied' errors in the controller/network nodes. As such, I don't see how the network configuration might play a role here...

For completeness, here's the openstack-ansible code base that I use:
commit 8bcd9198ff00363363fa3335e25c9fb6ece41847 (HEAD, origin/stabl...

OpenStack-Ansible

Fresh install - neutron processes fail on controller nodes

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Changed in openstack-ansible:
importance:	Undecided → Critical

Changed in openstack-ansible:
status:	In Progress → Fix Released