Fresh install - neutron processes fail on controller nodes

Bug #2047593 reported by birbilakos
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Fix Released
Critical
Unassigned

Bug Description

This is a fresh install, using OVS as neutron backend. Controller nodes (where network nodes are collocated also) continuous spit out:
Dec 27 22:19:24 sjc-lnxserver-122 systemd[12112]: neutron-l3-agent.service: Failed to execute /openstack/venvs/neutron-28.0.0/bin/neutron-l3-agent: Permission denied
Dec 27 22:19:24 sjc-lnxserver-122 systemd[12112]: neutron-l3-agent.service: Failed at step EXEC spawning /openstack/venvs/neutron-28.0.0/bin/neutron-l3-agent: Permission denied
Dec 27 22:19:24 sjc-lnxserver-122 systemd[12113]: neutron-metadata-agent.service: Failed to execute /openstack/venvs/neutron-28.0.0/bin/neutron-metadata-agent: Permission denied
Dec 27 22:19:24 sjc-lnxserver-122 systemd[12113]: neutron-metadata-agent.service: Failed at step EXEC spawning /openstack/venvs/neutron-28.0.0/bin/neutron-metadata-agent: Permission denied
Dec 27 22:19:24 sjc-lnxserver-122 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=neutron-l3-agent comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Dec 27 22:19:24 sjc-lnxserver-122 systemd[12114]: neutron-metering-agent.service: Failed to execute /openstack/venvs/neutron-28.0.0/bin/neutron-metering-agent: Permission denied
Dec 27 22:19:24 sjc-lnxserver-122 systemd[12114]: neutron-metering-agent.service: Failed at step EXEC spawning /openstack/venvs/neutron-28.0.0/bin/neutron-metering-agent: Permission denied
Dec 27 22:19:24 sjc-lnxserver-122 systemd[12115]: neutron-openvswitch-agent.service: Failed to execute /openstack/venvs/neutron-28.0.0/bin/neutron-openvswitch-agent: Permission denied
Dec 27 22:19:24 sjc-lnxserver-122 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=neutron-metadata-agent comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Dec 27 22:19:24 sjc-lnxserver-122 systemd[12115]: neutron-openvswitch-agent.service: Failed at step EXEC spawning /openstack/venvs/neutron-28.0.0/bin/neutron-openvswitch-agent: Permission denied

An immediate effect of this is that there is no dhcp functionality for created networks.

user_variables:
neutron_plugin_type: ml2.ovs
neutron_ml2_drivers_type: "flat,vlan,vxlan"
neutron_plugin_base:
  - router
  - metering

openstack_user_config.yml attached.

Revision history for this message
birbilakos (birbilis) wrote :
Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Can you kindly provide output of "ls -l /etc/neutron/rootwrap.d"?
Also can you check that file /etc/sudoers.d/neutron_sudoers exists?

Revision history for this message
birbilakos (birbilis) wrote :

root@sjc-lnxserver-121:~# ls -l /etc/neutron/rootwrap.d
total 12
-rw-r----- 1 root root 503 Dec 27 18:32 ovn-plugin.filters
-rw-r----- 1 root root 2223 Dec 27 18:32 rootwrap.filters
-rw-r----- 1 root root 839 Dec 27 18:32 vpnaas.filters

root@sjc-lnxserver-121:~# ls -l /etc/sudoers.d/neutron_sudoers
-r--r----- 1 root root 434 Dec 27 18:33 /etc/sudoers.d/neutron_sudoers
root@sjc-lnxserver-121:~# cat /etc/sudoers.d/neutron_sudoers
# Ansible managed

Defaults:neutron !requiretty
Defaults:neutron secure_path="/openstack/venvs/neutron-28.0.0/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"

neutron ALL = (root) NOPASSWD: /openstack/venvs/neutron-28.0.0/bin/neutron-rootwrap
neutron ALL = (root) NOPASSWD: /openstack/venvs/neutron-28.0.0/bin/neutron-rootwrap-daemon
neutron ALL = (root) NOPASSWD: /openstack/venvs/neutron-28.0.0/bin/privsep-helper

Revision history for this message
birbilakos (birbilis) wrote :

Things tried so far (to no avail)
- chown neutron:neutron eveything under: /openstack/venvs/neutron-28.0.0/bin/
- changed /etc/systemd/system/neutron-dhcp-agent.service and enabled debug

[Service]
Type = simple
User = neutron
Group = neutron

to User = root

which just produced a different error:
Dec 28 11:38:21 sjc-lnxserver-122 neutron-dhcp-agent[273064]: 2023-12-28 11:38:21.929 273086 DEBUG oslo.privsep.daemon [-] privsep: reply[c31cb48f-3757-4894-b4e1-1b0614e1cc40]: (5, 'builtins.ModuleNotFoundError', ("No module named 'neutron.privileged.agent'",)) _call_back /openstack/venvs/neutron-28.0.0/lib/python3.10/site-packages/oslo_privsep/daemon.py:499
Dec 28 11:38:21 sjc-lnxserver-122 neutron-dhcp-agent[273064]: 2023-12-28 11:38:21.930 273064 ERROR neutron.agent.dhcp.agent [-] Unable to enable dhcp for 57f3ebe1-6310-40ba-a908-c99db6b4650f.: ModuleNotFoundError: No module named 'neutron.privileged.agent'
                                                              2023-12-28 11:38:21.930 273064 ERROR neutron.agent.dhcp.agent Traceback (most recent call last):
                                                              2023-12-28 11:38:21.930 273064 ERROR neutron.agent.dhcp.agent File "/openstack/venvs/neutron-28.0.0/lib/python3.10/site-packages/neutron/agent/dhcp/agent.py", line 270, in _call_driver

Revision history for this message
birbilakos (birbilis) wrote :

Forgot to mention that OS is Ubuntu 22.04 LTS, no selinux is enabled, neither ufw.

Revision history for this message
birbilakos (birbilis) wrote :

# cat /etc/neutron/rootwrap.d/rootwrap.filters
# Command filters to allow privsep daemon to be started via rootwrap.
#
# This file should be owned by (and only-writeable by) the root user

[Filters]

# By installing the following, the local admin is asserting that:
#
# 1. The python module load path used by privsep-helper
# command as root (as started by sudo/rootwrap) is trusted.
# 2. Any oslo.config files matching the --config-file
# arguments below are trusted.
# 3. Users allowed to run sudo/rootwrap with this configuration(*) are
# also allowed to invoke python "entrypoint" functions from
# --privsep_context with the additional (possibly root) privileges
# configured for that context.
#
# (*) ie: the user is allowed by /etc/sudoers to run rootwrap as root
#
# In particular, the oslo.config and python module path must not
# be writeable by the unprivileged user.

# PRIVSEP
# oslo.privsep default neutron context
privsep: PathFilter, privsep-helper, root,
 --config-file, /etc/(?!\.\.).*,
 --privsep_context, neutron.privileged.default,
 --privsep_sock_path, /

# NOTE: A second `--config-file` arg can also be added above. Since
# many neutron components are installed like that (eg: by devstack).
# Adjust to suit local requirements.

# DEBUG
sleep: RegExpFilter, sleep, root, sleep, \d+

# EXECUTE COMMANDS IN A NAMESPACE
ip: IpFilter, ip, root
ip_exec: IpNetnsExecFilter, ip, root

# METADATA PROXY
haproxy: RegExpFilter, haproxy, root, haproxy, -f, .*
haproxy_env: EnvFilter, env, root, PROCESS_TAG=, haproxy, -f, .*

# DHCP
dnsmasq: CommandFilter, dnsmasq, root
dnsmasq_env: EnvFilter, env, root, PROCESS_TAG=, dnsmasq

# DIBBLER
dibbler-client: CommandFilter, dibbler-client, root
dibbler-client_env: EnvFilter, env, root, PROCESS_TAG=, dibbler-client

# L3
radvd: CommandFilter, radvd, root
radvd_env: EnvFilter, env, root, PROCESS_TAG=, radvd
keepalived: CommandFilter, keepalived, root
keepalived_env: EnvFilter, env, root, PROCESS_TAG=, keepalived
keepalived_state_change: CommandFilter, neutron-keepalived-state-change, root
keepalived_state_change_env: EnvFilter, env, root, PROCESS_TAG=, neutron-keepalived-state-change

# OPEN VSWITCH
ovs-ofctl: CommandFilter, ovs-ofctl, root
ovsdb-client: CommandFilter, ovsdb-client, root

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

So things looks pretty much correctly to me from your pastes, as missing/wrong sudoers or rootwrap filters would be close to the only culprit I would expect.

Using root user I think is not expected by privsep, so it indeed can crash if you're trying to use it for root, without further configuration (by just replacing username for the service).

I think the last thing worth checking, would be apparmor profiles for Debian/Ubuntu or SELinux for CentOS/Rocky Linux. We currently also do not support installation with enabled SELinux, for instance.

With that apparmor profiles for haproxy, dnsmasq and ping should be disabled to run inside namespaces:
https://opendev.org/openstack/openstack-ansible-os_neutron/src/branch/master/tasks/neutron_apparmor.yml#L35-L61

Revision history for this message
birbilakos (birbilis) wrote :

I have not installed/enabled selinux so it can't be the culprit.

I'd assume if it's apparmor causing this, disabling would do the trick?

# systemctl disable apparmor

Revision history for this message
birbilakos (birbilis) wrote :

well, this didn't seem to make any difference:
# systemctl status apparmor.service
○ apparmor.service - Load AppArmor profiles
     Loaded: loaded (/lib/systemd/system/apparmor.service; disabled; vendor preset: enabled)

Dec 28 15:36:29 sjc-lnxserver-121 systemd[4565]: neutron-l3-agent.service: Failed to execute /openstack/venvs/neutron-28.0.0/bin/neutron-l3-agent: Permission denied
Dec 28 15:36:29 sjc-lnxserver-121 systemd[4566]: neutron-metadata-agent.service: Failed to execute /openstack/venvs/neutron-28.0.0/bin/neutron-metadata-agent: Permission denied
Dec 28 15:36:29 sjc-lnxserver-121 systemd[4565]: neutron-l3-agent.service: Failed at step EXEC spawning /openstack/venvs/neutron-28.0.0/bin/neutron-l3-agent: Permission denied
Dec 28 15:36:29 sjc-lnxserver-121 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=neutron-l3-agent comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Dec 28 15:36:29 sjc-lnxserver-121 systemd[4566]: neutron-metadata-agent.service: Failed at step EXEC spawning /openstack/venvs/neutron-28.0.0/bin/neutron-metadata-agent: Permission denied
Dec 28 15:36:29 sjc-lnxserver-121 systemd[4567]: neutron-metering-agent.service: Failed to execute /openstack/venvs/neutron-28.0.0/bin/neutron-metering-agent: Permission denied
Dec 28 15:36:29 sjc-lnxserver-121 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=neutron-metadata-agent comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Dec 28 15:36:29 sjc-lnxserver-121 systemd[4567]: neutron-metering-agent.service: Failed at step EXEC spawning /openstack/venvs/neutron-28.0.0/bin/neutron-metering-agent: Permission denied
Dec 28 15:36:29 sjc-lnxserver-121 systemd[4568]: neutron-openvswitch-agent.service: Failed to execute /openstack/venvs/neutron-28.0.0/bin/neutron-openvswitch-agent: Permission denied
Dec 28 15:36:29 sjc-lnxserver-121 systemd[1]: neutron-l3-agent.service: Main process exited, code=exited, status=203/EXEC
Dec 28 15:36:29 sjc-lnxserver-121 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=neutron-metering-agent comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Dec 28 15:36:29 sjc-lnxserver-121 systemd[1]: neutron-l3-agent.service: Failed with result 'exit-code'.
Dec 28 15:36:29 sjc-lnxserver-121 systemd[4568]: neutron-openvswitch-agent.service: Failed at step EXEC spawning /openstack/venvs/neutron-28.0.0/bin/neutron-openvswitch-agent: Permission denied

Revision history for this message
birbilakos (birbilis) wrote :

I just did a complete reinstall of the environment, with the same config (haven't yet logged in horizon) and still hitting this issue so it appears to happen consistently at least on Ubuntu 22.04 server.

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Can you kindly paste output of following resulted config files:
* /etc/neutron/plugins/ml2/ml2_conf.ini
* /etc/neutron/plugins/ml2/openvswitch_agent.ini

And also some more insight into your network configuration, like `ip a` and/or `brctl show`, since commenting out `network_interface: "br-ext"` with the comment of having connectivity issue afterwards smells somewhat fishy.

We don't see any issues in CI, so I'd assume that smth is actually off with the specific configuration, if playbooks are not failing during the deployment.

Revision history for this message
birbilakos (birbilis) wrote (last edit ):

root@sjc-lnxserver-121:~# cat /etc/neutron/plugins/ml2/ml2_conf.ini
[ml2]
type_drivers = flat,vlan,vxlan
tenant_network_types = vlan,flat,vxlan
mechanism_drivers = openvswitch
extension_drivers = port_security
# ML2 flat networks

[ml2_type_flat]
flat_networks = flat
# ML2 VLAN networks

[ml2_type_vlan]
network_vlan_ranges = physnet1:40:400
# ML2 VXLAN networks

[ml2_type_vxlan]
vxlan_group = 239.1.1.1
vni_ranges = 1:1000

[ml2_type_geneve]
vni_ranges =
max_header_size = 38
# Security groups

[securitygroup]
enable_security_group = True
enable_ipset = True

root@sjc-lnxserver-121:~# cat /etc/neutron/plugins/ml2/openvswitch_agent.ini
[ovs]
local_ip = 172.29.240.121
bridge_mappings = flat:br-public,physnet1:br-provider

[agent]
l2_population = False
tunnel_types = vxlan
enable_distributed_routing = False
extensions =
# Security groups

[securitygroup]
firewall_driver = iptables_hybrid
enable_security_group = True
enable_ipset = True

Here's my netplan config:
network:
  version: 2
  ethernets:
    eno1:
      mtu: 1500
    eno2:
      mtu: 1500
  bonds:
    bond0:
      interfaces:
        - eno1
      mtu: 1500
      parameters:
        lacp-rate: fast
        mii-monitor-interval: 100
        mode: active-backup
        transmit-hash-policy: layer3+4
  vlans:
    bond0.10:
      id: 10
      link: bond0
    bond0.20:
      id: 20
      link: bond0
    bond0.30:
      id: 30
      link: bond0
    bond0.40:
      id: 40
      link: bond0
  bridges:
    br-mgmt:
      addresses:
        - "172.29.236.121/22"
      interfaces:
        - bond0.10
      mtu: 1500
      nameservers:
        addresses:
          - DNS1
          - DNS2
        search:
            - test.net
    br-storage:
      addresses:
        - "172.29.244.121/22"
      interfaces:
        - bond0.20
      mtu: 1500
      openvswitch: {}
    br-vxlan:
      addresses:
        - "172.29.240.121/22"
      interfaces:
        - bond0.30
      mtu: 1500
    br-ext:
      addresses:
        - "10.xxx.yyy/24"
      interfaces:
        - bond0
      routes:
        - to: default
          via: 10.xxx.yyy
      nameservers:
        addresses:
          - DNS1
          - DNS2
    br-vlan:
      interfaces: [bond0.40]
      mtu: 1500

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Ok, that looks not right to me. I'm not sure though if it's related to the errors you see or not, but worth fixing network config first anyway.

Can you also provide output "ovs-vsctl show" just in case? I will try to come up with some proposal tomorrow morning.

Revision history for this message
birbilakos (birbilis) wrote :

I appreciate all the help Dmitrivy :)

root@sjc-lnxserver-121:~# ovs-vsctl show
197d0057-3e0f-45d3-ae24-7b3fdd64d37c
    Bridge br-storage
        fail_mode: standalone
        Port "662d3225_eth2"
            Interface "662d3225_eth2"
        Port "6134f004_eth2"
            Interface "6134f004_eth2"
        Port br-storage
            Interface br-storage
                type: internal
        Port bb08d3b9_eth2
            Interface bb08d3b9_eth2
        Port bond0.20
            Interface bond0.20
    Bridge br-provider
        fail_mode: secure
        Port br-vlan
            Interface br-vlan
        Port br-provider
            Interface br-provider
                type: internal
    Bridge br-public
        fail_mode: secure
        Port br-public
            Interface br-public
                type: internal
    ovs_version: "2.17.8"

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

1. Weird thing that you don't have `container_bridge_type: openvswitch` definition for br-storage bridge, though this bridge appears to be OVS one, rather then linux bridge (default). Not sure if it's intentional or not, but that is not related to the neutron error you see (still smth worth checking).
But at the same time, netplan contains defenition that br-storage is an OVS bridge. So I'd suggest either to edit netplan and make storage a regular bridge, or add `container_bridge_type: openvswitch` to
`provider_networks` for br-storage

2. br-ext can not be a bond0 device, if you want to have br-vlan. Idea of br-vlan, is that Neutron will try to spawn a VLANs on the interface if topic, which is impossible to do from bond0.40.
So basically, br-vlan should have bond0, while br-ext some bond0.40

3. You really should not use br-ext as a default gateway. Mainly, because on default interface you should held different networks, while br-ext is designed to handle customer public networks (passed to VM). If you're limited on amount of VLANs - better to combine smth with br-mgmt (like - drop br-storage)

4. Then in openstack_user_config you define br-public and br-provider, while in fact you have only br-ext. So I'm not sure what intention is behind that, but it results in configuring neutron mappings with non-existent interfaces. That might result easily in errors you supply, since neutron attempts to utilize interfaces that does not exist.

Revision history for this message
birbilakos (birbilis) wrote :
Download full text (3.5 KiB)

Thank you Dmitriy. I'm trying to adapt based on your advice but still need some help to understand how things would work. I only have one bond available and removed br-ext from the equation. Instead routed internet traffic through the vlan mgmt bridge. I'm unsure though what you use as haproxy_keepalived_external_interface in order for my setup to be reachable from outside of the openstack perimeter.

Here's the relevant updated netplan config. Please let me know what you think.

network:
  version: 2
  ethernets:
    eno1:
      mtu: 1500
    eno2:
      mtu: 1500
  bonds:
    bond0:
      interfaces:
        - eno1
      mtu: 1500
      parameters:
        lacp-rate: fast
        mii-monitor-interval: 100
        mode: active-backup
        transmit-hash-policy: layer3+4
  vlans:
    bond0.10:
      id: 10
      link: bond0
    bond0.20:
      id: 20
      link: bond0
    bond0.30:
      id: 30
      link: bond0
  bridges:
    br-mgmt:
      addresses:
        - "172.29.236.{{ target_host_last_octet }}/22"
      interfaces:
        - bond0.10
      mtu: 1500
      routes:
        - to: default
          via: 172.29.236.120
      nameservers:
        addresses:
          - DNS1
          - DNS
    br-storage:
      addresses:
        - "172.29.244.{{ target_host_last_octet }}/22"
      interfaces:
        - bond0.20
      mtu: 1500
    br-vxlan:
      addresses:
        - "172.29.240.{{ target_host_last_octet }}/22"
      interfaces:
        - bond0.30
      mtu: 1500
    br-vlan:
      interfaces:
        - bond0
      mtu: 1500

Relevant openstack_user_config sections:
cidr_networks: &cidr_networks
  management: 172.29.236.0/22
  tunnel: 172.29.240.0/22
  storage: 172.29.244.0/22

used_ips:
  - "172.29.236.99,172.29.236.200"
  - "172.29.240.100,172.29.240.200"
  - "172.29.244.100,172.29.244.200"

global_overrides:
  cidr_networks: *cidr_networks
  internal_lb_vip_address: 172.29.236.99
  #
  # The below domain name must resolve to an IP address
  # in the CIDR specified in haproxy_keepalived_external_vip_cidr.
  # If using different protocols (https/http) for the public/internal
  # endpoints the two addresses must be different.
  #
  external_lb_vip_address: 10.222.112.40
  management_bridge: "br-mgmt"
  provider_networks:
    - network:
        container_bridge: "br-mgmt"
        container_type: "veth"
        container_interface: "eth1"
        ip_from_q: "management"
        type: "raw"
        group_binds:
          - all_containers
          - hosts
        is_management_address: true
    - network:
        container_bridge: "br-storage"
        container_type: "veth"
        container_interface: "eth2"
        ip_from_q: "storage"
        type: "raw"
        group_binds:
          - glance_api
          - cinder_api
          - cinder_volume
          - nova_compute
    - network:
        container_bridge: "br-vlan"
        container_type: "veth"
        type: "vlan"
        range: "40:400"
        net_name: "vlan"
        network_interface: "br-vlan"
        group_binds:
          - neutron_openvswitch_agent
    - network:
        container_bridge: "br-public"
        container_type: "veth"
        type: "flat"
        net_name: "f...

Read more...

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :
Download full text (3.7 KiB)

Ok, let me provide you with our typical configuration then - maybe it will explain a bit what I mean.

Eventually, we do not have a separate "flat" network - we provide public network through a VLAN, thus we don't have any FLAT network at all.
However, you can add a FLAT one to your configuration if you want to, it still have to be on VLAN though:
    - network:
        container_bridge: "br-public"
        container_type: "veth"
        type: "flat"
        net_name: "flat"
        network_interface: "bond0.50"
        group_binds:
          - neutron_openvswitch_agent

So our typical configuration would look like that.

openstack_user_config:

  management_bridge: br-mgmt
  tunnel_bridge: br-vxlan
  provider_networks:
    - network:
        group_binds:
          - all_containers
          - hosts
        type: raw
        container_bridge: br-mgmt
        container_interface: eth1
        container_type: veth
        ip_from_q: container
        is_container_address: true
        is_ssh_address: true
    - network:
        group_binds:
          - glance_api
          - cinder_volume
          - nova_compute
        type: raw
        container_bridge: br-storage
        container_type: veth
        container_interface: eth2
        container_mtu: 9000
        ip_from_q: storage
    - network:
        group_binds:
          - neutron_openvswitch_agent
        container_bridge: br-vxlan
        container_type: veth
        container_interface: eth10
        container_mtu: 9000
        ip_from_q: tunnel
        type: vxlan
        range: 65537:69999
        net_name: vxlan
    - network:
        group_binds:
          - neutron_openvswitch_agent
        container_bridge: br-vlan
        container_type: veth
        container_interface: eth11
        type: vlan
        range: 40:400
        net_name: vlan

And here is how netplan config looks like:
network:
    bonds:
        bond0:
            interfaces:
            - eno1
            - eno2
            macaddress: MAC
            mtu: 9000
            parameters:
                down-delay: 0
                lacp-rate: slow
                mii-monitor-interval: 100
                mode: 802.3ad
                transmit-hash-policy: layer3+4
                up-delay: 0
    bridges:
        br-mgmt:
            addresses:
            - 172.29.236.X/22
            interfaces:
            - bond0.10
            macaddress: MAC
            mtu: 1500
            parameters:
                forward-delay: 15
                stp: false
        br-storage:
            addresses:
            - 172.29.244.X/22
            interfaces:
            - bond0.20
            macaddress: MAC
            mtu: 9000
            parameters:
                forward-delay: 15
                stp: false
        br-vxlan:
            addresses:
            - 172.29.240.X/22
            interfaces:
            - bond0.30
            macaddress: MAC
            mtu: 9000
            parameters:
                forward-delay: 15
                stp: false
    ethernets:
        eno1:
            match:
                macaddress: MAC
            mtu: 9000
            set-name: eno2
        eno2:
            match:
     ...

Read more...

Revision history for this message
birbilakos (birbilis) wrote :

I don't see br-vlan defined in the netplan config, while there is one defined in openstack_user_config. Is that intentional?

Also, what's the difference between these two options in the mgmt network:
        is_ssh_address: true
        is_management_address: true

Revision history for this message
birbilakos (birbilis) wrote :

I should also clarify that switch limitations only allow me to have public access via untagged traffic. I.e. I cannot use vlan for accessing the outside of my openstack perimeter. As such it would either need to go through untagged interface/bond or via a router for which I'm using my deployment node.

Hence comes the question about what to use for haproxy_keepalived_external_interface ...

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

IIRC, br-vlan is an OVS bridge that's created and handled by neutron.

But if you can't use vlan tag to get outside of perimeter and have only one interface - I think you can't have vlan networks in neutron.
Then I'm really not sure how to handle outgoing traffic from VMs in a good way either.

I mean - you should probably have only vxlan even then. Then VMs will have only internal networks and will be able to reach out only through net nodes (through neutron routers and floating IPs).

And then likely you in fact need br-ext as an OVS bridge with IP/default route on it, and use it as a flat network, which is not shared (so usable only for floating IPs).
Potential other way around would be to create a veth pair that will be added to the bridge with public network and then other side of pair to a br-ext under neutron.

So there are potential way through, but you kinda need to understand what you are doing.

Revision history for this message
birbilakos (birbilis) wrote :

Yes, the environment is quite limited for now. Given the constrains maybe I can use vlans only for internal traffic and not for getting outside the perimeter. E.g. mgmt and storage networks basically. I understand that vxlan would still allow me to give floating IPs to VMs (?) which is good enough for me.

I'm unsure still though of the config for this: "And then likely you in fact need br-ext as an OVS bridge with IP/default route on it, and use it as a flat network, which is not shared (so usable only for floating IPs)."

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

For floating IPs to work you still need some "public" network in neutron. The difference is here that you don't need to have such public network on computes - only on net nodes (where neutron_l3_agent runs).

Revision history for this message
birbilakos (birbilis) wrote (last edit ):

Hi Dmitriy,

Could you advice if the below configuration makes sense:
- removed any vlan type networks, kept vxlan and flat in hopes that routing would be achievable via the network nodes (?)
- br-ext is part of the br-public network and used for the haproxy_keepalived_external_interface setting

openstack_user_config:
  management_bridge: br-mgmt
  tunnel_bridge: br-vxlan

  provider_networks:
    - network:
        container_bridge: "br-mgmt"
        container_type: "veth"
        container_interface: "eth1"
        ip_from_q: "management"
        type: "raw"
        group_binds:
          - all_containers
          - hosts
        is_management_address: true
        is_ssh_address: true
    - network:
        container_bridge: "br-storage"
        container_type: "veth"
        container_interface: "eth2"
        ip_from_q: "storage"
        type: "raw"
        group_binds:
          - glance_api
          - cinder_api
          - cinder_volume
          - nova_compute
    - network:
        container_bridge: "br-public"
        container_type: "veth"
        type: "flat"
        net_name: "flat"
        network_interface: "br-ext"
        group_binds:
          - neutron_openvswitch_agent
    - network:
        container_bridge: "br-vxlan"
        container_type: "veth"
        container_interface: "eth10"
        ip_from_q: "tunnel"
        type: "vxlan"
        range: "1:1000"
        net_name: "vxlan"
        group_binds:
          - neutron_openvswitch_agent

netplan config (same for both controller/network as well as compute nodes):
network:
  version: 2
  ethernets:
    eno1:
      mtu: 1500
    eno2:
      mtu: 1500
  bonds:
    bond0:
      interfaces:
        - eno1
      mtu: 1500
      parameters:
        lacp-rate: fast
        mii-monitor-interval: 100
        mode: active-backup
        transmit-hash-policy: layer3+4
  vlans:
    bond0.10:
      id: 10
      link: bond0
    bond0.20:
      id: 20
      link: bond0
    bond0.30:
      id: 30
      link: bond0
  bridges:
    br-mgmt:
      addresses:
        - "172.29.236.{{ target_host_last_octet }}/22"
      interfaces:
        - bond0.10
      mtu: 1500
    br-storage:
      addresses:
        - "172.29.244.{{ target_host_last_octet }}/22"
      interfaces:
        - bond0.20
      mtu: 1500
    br-vxlan:
      addresses:
        - "172.29.240.{{ target_host_last_octet }}/22"
      interfaces:
        - bond0.30
      mtu: 1500
    br-ext:
      addresses:
        - "10.xxx.yyy.{{ target_host_last_octet }}/24"
      interfaces:
        - bond0
      mtu: 1500
      routes:
        - to: default
          via: 10.xxx.yyy
      nameservers:
        addresses:
          - DNS1
          - DNS2

Revision history for this message
birbilakos (birbilis) wrote :
Download full text (3.2 KiB)

I went ahead and reconfigured things using veth pairs vethb1, vethb2, using systemd configuration:
[NetDev]
Name=vethb1
Kind=veth
[Peer]
Name=vethb2

netplan:

network:
  version: 2
  ethernets:
    eno1:
      mtu: 1500
    eno2:
      mtu: 1500
    vethb1: {}
    vethb2: {}
  bonds:
    bond0:
      interfaces:
        - eno1
      mtu: 1500
      parameters:
        lacp-rate: fast
        mii-monitor-interval: 100
        mode: active-backup
        transmit-hash-policy: layer3+4
  vlans:
    bond0.10:
      id: 10
      link: bond0
    bond0.20:
      id: 20
      link: bond0
    bond0.30:
      id: 30
      link: bond0
  bridges:
    br-mgmt:
      addresses:
        - "172.29.236.121/22"
      interfaces:
        - bond0.10
      mtu: 1500
    br-storage:
      addresses:
        - "172.29.244.121/22"
      interfaces:
        - bond0.20
      mtu: 1500
    br-vxlan:
      addresses:
        - "172.29.240.121/22"
      interfaces:
        - bond0.30
      mtu: 1500
    br-ext:
      addresses:
        - "10.xxx.yyy.zzz/24"
      interfaces:
        - bond0
      mtu: 1500
      routes:
        - to: default
          via: 10.xxx.yyy.1
      nameservers:
        addresses:
          - DNS1
          - DNS2
        search:
          - test.net

openstack_user_config:
management_bridge: br-mgmt
  tunnel_bridge: br-vxlan

  provider_networks:
    - network:
        container_bridge: "br-mgmt"
        container_type: "veth"
        container_interface: "eth1"
        ip_from_q: "management"
        type: "raw"
        group_binds:
          - all_containers
          - hosts
        is_management_address: true
        is_ssh_address: true
    - network:
        container_bridge: "br-storage"
        container_type: "veth"
        container_interface: "eth2"
        ip_from_q: "storage"
        type: "raw"
        group_binds:
          - glance_api
          - cinder_api
          - cinder_volume
          - nova_compute
    - network:
        container_bridge: "br-public"
        container_type: "veth"
        type: "flat"
        net_name: "flat"
        network_interface: "vethb2"
        group_binds:
          - neutron_openvswitch_agent
    - network:
        container_bridge: "br-vxlan"
        container_type: "veth"
        container_interface: "eth10"
        ip_from_q: "tunnel"
        type: "vxlan"
        range: "1:1000"
        net_name: "vxlan"
        group_binds:
          - neutron_openvswitch_agent

user_variables:
---
debug: false
# global_overrides:
# enable_logging: "yes"

install_method: source
haproxy_keepalived_external_vip_cidr: "10.xxx.yyy.zzz/32"
haproxy_keepalived_internal_vip_cidr: "172.29.236.99/32"
haproxy_keepalived_external_interface: vethb2
haproxy_keepalived_internal_interface: br-mgmt

neutron_plugin_type: ml2.ovs
neutron_ml2_drivers_type: "flat,vlan,vxlan"
neutron_plugin_base:
  - router
  - metering

And yet, I still get the same 'permission denied' errors in the controller/network nodes. As such, I don't see how the network configuration might play a role here...

For completeness, here's the openstack-ansible code base that I use:
commit 8bcd9198ff00363363fa3335e25c9fb6ece41847 (HEAD, origin/stabl...

Read more...

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

So what's the resulted configuration of
* /etc/neutron/plugins/ml2/ml2_conf.ini
* /etc/neutron/plugins/ml2/openvswitch_agent.ini

Revision history for this message
birbilakos (birbilis) wrote :

root@sjc-lnxserver-121:~# cat /etc/neutron/plugins/ml2/ml2_conf.ini
[ml2]
type_drivers = flat,vlan,vxlan
tenant_network_types = flat,vxlan
mechanism_drivers = openvswitch
extension_drivers = port_security
# ML2 flat networks

[ml2_type_flat]
flat_networks = flat
# ML2 VLAN networks

[ml2_type_vlan]
network_vlan_ranges =
# ML2 VXLAN networks

[ml2_type_vxlan]
vxlan_group = 239.1.1.1
vni_ranges = 1:1000

[ml2_type_geneve]
vni_ranges =
max_header_size = 38
# Security groups

[securitygroup]
enable_security_group = True
enable_ipset = True

root@sjc-lnxserver-121:~# cat /etc/neutron/plugins/ml2/openvswitch_agent.ini
[ovs]
local_ip = 172.29.240.121
bridge_mappings = flat:br-public

[agent]
l2_population = False
tunnel_types = vxlan
enable_distributed_routing = False
extensions =
# Security groups

[securitygroup]
firewall_driver = iptables_hybrid
enable_security_group = True
enable_ipset = True

Revision history for this message
birbilakos (birbilis) wrote :

Btw, I do not see why would these services attempt to run on the host, as opposed to a container (?)
neutron-dhcp-agent
neutron-l3-agent
neutron-metadata-agent
neutron-metering-agent

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Yes, these services run on bare metal hosts, not in containers by default

Revision history for this message
birbilakos (birbilis) wrote :

I have not found anything related to them in apparmor profiles, which seem to be installed during openstack deployment. These processes just refuse to start up with permission denied for user neutron so I doubt an interface missconfig can be at play here.

Revision history for this message
birbilakos (birbilis) wrote :

Well... it turns out that for some reason /openstack folder doesn't have the right perms to allow anyone besides root access ('x' bit is missing). Once I did chmod +x /openstack processes started as expected!

As such I believe that this is a legitimate bug.

I'm still unable though to access the haproxy_keepalived_external_vip_cidr IP address, which does not seem to have been assigned during installation to the vethb2 pair...

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

I'm not able to reproduce that behavior then...

In all my test deployments from 28.0.0 I see proper permissions on /openstack:

# stat /openstack/
  File: /openstack/
  Size: 4096 Blocks: 8 IO Block: 4096 directory
Device: fc01h/64513d Inode: 3141121 Links: 4
Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2023-11-17 13:55:06.942741664 +0000
Modify: 2023-11-13 12:39:44.415935380 +0000
Change: 2023-11-13 12:39:44.415935380 +0000
 Birth: 2023-11-13 12:22:25.355064045 +0000
#

And I am not able to reproduce the behavior you're talking about.

Don't you accidnetally have the /openstack as a separate mount point or being pre-created in a way?

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote (last edit ):

Can you kindly provide output of `ls -l /openstack`?

I'm trying to narrow down component that might mess up permissions but I fail to find one so far.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible-lxc_hosts (master)
Changed in openstack-ansible:
status: New → In Progress
Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Ok, I finally boiled down what messes up the permissions, and that is regression caused by this patch: https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/888180

Revision history for this message
birbilakos (birbilis) wrote :

Hi Dmitriy,

I'm happy you have RCAed the issue :)
I now have a working openstack env. Thank you so much for all your help and commitment in this great project!

Changed in openstack-ansible:
importance: Undecided → Critical
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible-lxc_hosts (master)

Reviewed: https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/904756
Committed: https://opendev.org/openstack/openstack-ansible-lxc_hosts/commit/bd011b0eeef76c450cf32cafc542948a769adcd1
Submitter: "Zuul (22348)"
Branch: master

commit bd011b0eeef76c450cf32cafc542948a769adcd1
Author: Dmitriy Rabotyagov <email address hidden>
Date: Thu Jan 4 15:31:46 2024 +0100

    Fix permissions for base directories

    With fixing linters [1] I have accidentally set incorrect mode for base directories
    to 0644 while it should be 0755.

    [1] https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/888180

    Closes-Bug: #2047593
    Change-Id: Ied402f4f22ac333573c7144877da669251eccf8c

Changed in openstack-ansible:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible-lxc_hosts (stable/2023.2)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible-lxc_hosts (stable/2023.2)

Reviewed: https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/904738
Committed: https://opendev.org/openstack/openstack-ansible-lxc_hosts/commit/b0a0a7ce82b6ff9cd0560436258cdf9e0d35cf66
Submitter: "Zuul (22348)"
Branch: stable/2023.2

commit b0a0a7ce82b6ff9cd0560436258cdf9e0d35cf66
Author: Dmitriy Rabotyagov <email address hidden>
Date: Thu Jan 4 15:31:46 2024 +0100

    Fix permissions for base directories

    With fixing linters [1] I have accidentally set incorrect mode for base directories
    to 0644 while it should be 0755.

    [1] https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/888180

    Closes-Bug: #2047593
    Change-Id: Ied402f4f22ac333573c7144877da669251eccf8c
    (cherry picked from commit bd011b0eeef76c450cf32cafc542948a769adcd1)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.