`ovn-metadata-agent` not starting due to missing module `neutron.privileged.agent`

Bug #1966858 reported by Cristian Le
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
devstack
Fix Released
Undecided
Unassigned
neutron
Incomplete
Undecided
Unassigned

Bug Description

I haven't found any other information out there about this and I am unable to figure out how to debug it, so I am sending the relevant journal log file of the service.

The error message is `ModuleNotFoundError: No module named 'neutron.privileged.agent'`, however this module is not an explicit dependence of the relevant callers, and instead it is constructed in `oslo_privsep/daemon.py`. I have tried with older versions of `oslo.privsep` with no luck.

My distro is Fedora 35, and I would try the stable/Xena branch, but that gives an error when installing Yappi that bricks latter installations back on master. This is also the only remaining start-up error in running `devstack` from scratch according to the guides.

Revision history for this message
Cristian Le (lecris) wrote :
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello Cristian:

The error during the installation could be affecting the Neutron code installation. "neutron.privileged.agent" is just part of the Neutron code thus I would think you need to go back and install Neutron correctly again.

I see you are using python3.10. This is not yet supported in OpenStack. If you check [1], you'll see the supported versions are 3.6 to 3.9. I would recommend you to make 3.8 or 3.9 the default Python binary in your system and reinstall it again.

Regards.

[1]https://github.com/openstack/neutron/blob/28cabb8ccb09f28e149caa39b4bed01a7e1c0c58/setup.cfg#L16-L21

Changed in neutron:
status: New → Incomplete
Revision history for this message
Cristian Le (lecris) wrote (last edit ):

Thank you, your comment was very helpful. I will report back if the Python 3.9 does not have this issue.

Hopefully this would be helpful to bring python 3.10 support there.

Unrelated but currently opendev.org gitea environment is bottlenecked so any git clone opperations fail. ~~Is there a quick fix to point the devstack to github.com upstream?~~ Nvm, I found `GIT_BASE`

Revision history for this message
Cristian Le (lecris) wrote :

Small update. Because Fedora 35 ships with Python 3.10 and we suspect the python version is the cause of this issue, should the supported version be bumped down to Fedora 34 until this issue is updated?

Revision history for this message
Cristian Le (lecris) wrote (last edit ):

Ok, I have tested master on Fedora 34 which has python 3.9, and the same issue appears. It seems to be a different issue. I will try Fedora 34 and stable/xena next.

Update: Confirmed it is still present in stable/xena and stable/yoga

Revision history for this message
Cristian Le (lecris) wrote :
no longer affects: fedora
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello Cristian:

Do you have the log of ovn-metadat-agent when running on Fedora34 and python3.9? If the same problem as before?

Please, check your python binary and set one version as the default one. In the stack logs, I see you are installing the OpenStack projects for python3.8. E.g.:
2022-04-04 05:34:09.682 | Requirement already satisfied: Jinja2===3.0.3 in /usr/local/lib/python3.8/dist-packages (from -c /opt/stack/requirements/upper-constraints.txt (line 11)) (3.0.3)

If you are using python3.9 to execute the code, this is the reason of the "ModuleNotFoundError" exception. Check where "/usr/bin/python" link is pointing.

Regards.

Revision history for this message
yatin (yatinkarel) wrote :

I see a non voting fedora-35 python3-10 job with ovn is running and i see couple of success[1] like[2]. So atleast the issue is not because of specific python version.
And yes it could be related to what Rodolfo said like having multiple python versions installed and default pointing to version other than 3.10. Also find a similar old bug[3] but that was for ubuntu and neutron-dhcp-agent.

[1] https://zuul.opendev.org/t/openstack/builds?job_name=devstack-platform-fedora-latest&project=openstack/devstack
[2] https://0040485a3ab53b03d16a-74c6967b9c388ca77dd0a83c1c662326.ssl.cf2.rackcdn.com/837207/1/check/devstack-platform-fedora-latest/3b8d081/controller/logs/devstacklog.txt
[3] https://bugs.launchpad.net/ubuntu/+source/neutron/+bug/1825872

Revision history for this message
Cristian Le (lecris) wrote :

Sorry for the long delay, I have re-run tests on F34 and F35 for (xena and yoga respectively). My testing approach was to spin up instances (2 core 7 ram) from OVH (with private networks attached to get it started), and perform the exact steps indicated in the devstack manual [1], i.e. with `local.conf` looking like:
```
[[local|localrc]]
ADMIN_PASSWORD=secret
DATABASE_PASSWORD=$ADMIN_PASSWORD
RABBIT_PASSWORD=$ADMIN_PASSWORD
SERVICE_PASSWORD=$ADMIN_PASSWORD
DEST=/opt/stack
LOGFILE=$DEST/logs/stack.sh.log
```
(Also setting `chmod 760 /opt/stack`)

For F35, I also had to fix the `pyscss` version in horizon. Otherwise everything should be reproducible as I've described. I have attached the log directory of this, along with `journalctl` of the service, `dnf list --installed 'python*'`, and a copy of `neutron-ovn-metadata-agent` that the service is executing. I can't find why/where python3.8 is used as described.

One thing to note is that the ansible roles that zuul uses and `stack.sh` are not equivalent. Probably the ansible script forces the python version much stricter than pip or the `functions.sh` scripts do.

[1] https://docs.openstack.org/devstack/latest/index.html

Revision history for this message
Cristian Le (lecris) wrote :

Here are the logs for F35. More of them because I had to stack -> unstack -> stack -> clean -> stack, when I forgot to add some fixes.

Don't know if I can come back to give more debug data to this issue before the 24th, so I hope these logs and reproducing steps are helpful for now.

`PIP_ENV` should be fixing this issue, but I couldn't get it to work for other reasons. That should also be reproducible in this setup.

PS: I did `dnf upgrade`, and also tried without it, with the same results.

Revision history for this message
yatin (yatinkarel) wrote :

@Cristian so i was able to reproduce it, the issue happened due to wrong permission of /opt/stack i.e 760 that you have set.
When stack user is created permissions are set to 700 for /opt/stack with RHEL distros, with which devstack fails at beginning itself. so need to fix permission for /opt/stack, 755 works fine, same permissions are set for all other nested directories in /opt/stack.

I think devstack doc can be updated to fix it.

The following steps worked for me in F34 and F35:-

sudo dnf update -y
sudo dnf install -y git-core

sudo useradd -s /bin/bash -d /opt/stack -m stack
echo "stack ALL=(ALL) NOPASSWD: ALL" | sudo tee /etc/sudoers.d/stack
sudo -u stack -i
chmod 755 /opt/stack

git clone https://git.openstack.org/openstack-dev/devstack /opt/stack/devstack

cat > /opt/stack/devstack/local.conf << END
[[local|localrc]]
ADMIN_PASSWORD=secret
DATABASE_PASSWORD=\$ADMIN_PASSWORD
RABBIT_PASSWORD=\$ADMIN_PASSWORD
SERVICE_PASSWORD=\$ADMIN_PASSWORD
DEST=/opt/stack
LOGFILE=\$DEST/logs/stack.sh.log
disable_service horizon
FORCE=yes
END

cd /opt/stack/devstack
bash stack.sh

If want to deploy without creating stack user, like default fedora user in image then u can also do below instead of useradd and sudo settings:-
sudo mkdir -p /opt/stack
sudo chown $USER /opt/stack

In CI also it's done same for stack user:- https://opendev.org/openstack/devstack/src/branch/master/roles/setup-stack-user/tasks/main.yaml#L24-L30

Revision history for this message
Cristian Le (lecris) wrote :

Oh, lol, I don't know why I've set it to 6 instead of 5, but maybe relevant part is the others permission being 0. Devstack had a check for the permission directory, but from this issue, it should be refined to check specifically `755`. I'll give it another try as soon as I can. Related to Fedora, when can we get support for it in production deployments? Even centos based installations are hit or miss, so why not enable us to work on fedora and bring support closer to mainstream, like via copr repositories.

Revision history for this message
Cristian Le (lecris) wrote :

@yatin, thank you for the guidance. Indeed I have confirmed that this works. If an appropriate check is added to `stack.sh` this issue is resolved.

yatin (yatinkarel)
Changed in devstack:
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to devstack (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/devstack/+/838645

Changed in devstack:
status: Confirmed → In Progress
Revision history for this message
yatin (yatinkarel) wrote :

Also Ubuntu switching to same permissions set by default:- https://discourse.ubuntu.com/t/private-home-directories-for-ubuntu-21-04-onwards/19533. I faced same issue with Ubuntu jammy(22.0.4).

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to devstack (master)

Reviewed: https://review.opendev.org/c/openstack/devstack/+/838645
Committed: https://opendev.org/openstack/devstack/commit/c64ea4f213afebd1602d05cdd4d5bc14eaf5356b
Submitter: "Zuul (22348)"
Branch: master

commit c64ea4f213afebd1602d05cdd4d5bc14eaf5356b
Author: yatinkarel <email address hidden>
Date: Wed Apr 20 12:30:09 2022 +0530

    Fix doc and user create script to set homedir permissions

    RHEL based distros set homedir permissions to 700,
    and Ubuntu 21.04+ to 750[1], i.e missing executable
    permission for group or others, this results into failures
    as defined in the below bug.

    Since in doc we add useradd command, it's good to
    add instructions to fix the permissions there itself
    instead of getting failures during installation and then
    fixing it.

    Also update user create script to fix permissions
    by adding executable bit to DEST directory if missing.

    [1] https://discourse.ubuntu.com/t/private-home-directories-for-ubuntu-21-04-onwards/19533

    Closes-Bug: #1966858
    Change-Id: Id2787886433281238eb95ee11a75eddeef514293

Changed in devstack:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/867008

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/867008
Committed: https://opendev.org/openstack/neutron/commit/aaae0798832e8892d3b592c2b30a48b1d896bfa8
Submitter: "Zuul (22348)"
Branch: master

commit aaae0798832e8892d3b592c2b30a48b1d896bfa8
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Sat Dec 3 06:40:53 2022 +0100

    Fix homedir permissions

    RHEL based distros set homedir permissions to 700,
    and Ubuntu 21.04+ to 750[1], i.e missing executable
    permission for group or others, this results into failures
    as defined in the below bug.

    This patch fixes the homedir permissions for local (non-gate)
    installations, using the devstack patch as reference.

    Check patch [1] for more information.

    [1]https://review.opendev.org/c/openstack/devstack/+/838645

    Related-Bug: #1966858
    Change-Id: I9f701e1015fc7cfa954eba12e55fd3544a4d95d2

Revision history for this message
Yu Jae IL (vanhome) wrote :

This happens in ubuntu 22.04 LTS with devstack zed. same.

Revision history for this message
Harsh Ailani (haailani) wrote :

I am still facing this issue on ubuntu 20.04 LTS with devstack stable/yoga branch.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.