unable to send audit message

Bug #1989247 reported by Cristian Le
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
In Progress
Undecided
Unassigned

Bug Description

Seems to be similar to #1942076 but now it affects keystone_db_sync

Revision history for this message
Cristian Le (lecris) wrote (last edit ):

To the best of my knowledge this is due to dropping of `CAP_AUDIT_WRITE`:
```
$ sudo podman inspect keystone_db_sync | jq -C .[0].HostConfig.CapDrop
[
  "CAP_AUDIT_WRITE",
  "CAP_MKNOD"
]
```
I have added the ansible.log and container log.

Other notes:
- Happens on both `master` and `yoga` releases
- Occured only on tls-e deployment

Revision history for this message
Cristian Le (lecris) wrote :
Revision history for this message
Cristian Le (lecris) wrote (last edit ):

Full scripts (scripts folder) and configuration (data folder) used to run.
Releases are for master and yoga.

The ansible.log is excessive. The relevant deployment should be after `2022-09-12 03:08:26,009`

Jiri Podivin (jpodivin)
Changed in tripleo:
importance: Undecided → High
importance: High → Medium
importance: Medium → Undecided
Revision history for this message
Cristian Le (lecris) wrote :

Indeed as expected, adding the cap fixes this issue. I don't know how/where to add for the proper execution, but my runtime edit was:
- Copy the container CreateCommand:
```
$ sudo podman inspect keystone_db_sync | jq '.[0].Config.CreateCommand|join(" ")'
```
- Delete and re-run the command including `--cap-add CAP_AUDIT_WRITE`

Revision history for this message
Jakob Meng (jm1337) wrote :

This error has occurred on our periodic jobs before:

* periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-clients-master [1]
* periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset035-master [2]

But this error is **intermittent**, i.e. another rerun of those jobs "fixed" the issue. Maybe this issue is a red herring in that it might not be the real cause that lets overcloud deployment fail?!?

[1] https://logserver.rdoproject.org/57/44657/9/check/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-clients-master/3fb6e42/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz
[2] https://logserver.rdoproject.org/56/44656/5/check/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset035-master/007c1da/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz

Changed in tripleo:
status: New → Triaged
milestone: none → zed-1
tags: added: ci promotion-blocker
Revision history for this message
Jakob Meng (jm1337) wrote :

Example output from last comment's log files:

  2022-09-01 11:13:33 | 2022-09-01 11:13:33.804022 | | WARNING | ERROR: Can't run container keystone_db_sync
  2022-09-01 11:13:33 | stderr: time="2022-09-01T11:12:50Z" level=info msg="podman filtering at log level info"
  2022-09-01 11:13:33 | time="2022-09-01T11:12:50Z" level=info msg="Not using native diff for overlay, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled"
  2022-09-01 11:13:33 | time="2022-09-01T11:12:50Z" level=info msg="Setting parallel job count to 13"
  2022-09-01 11:13:33 | time="2022-09-01T11:12:50Z" level=info msg="Sysctl net.ipv4.ping_group_range=0 0 ignored in containers.conf, since Network Namespace set to host"
  2022-09-01 11:13:33 | time="2022-09-01T11:12:50Z" level=info msg="User mount overriding libpod mount at \"/etc/hosts\""
  2022-09-01 11:13:33 | time="2022-09-01T11:12:50Z" level=info msg="Running conmon under slice machine.slice and unitName libpod-conmon-3f6fd0110150d4f2e8717bee1fb40cb8744ec1aa24b44a5bdddbad0ee39c80a7.scope"
  2022-09-01 11:13:33 | time="2022-09-01T11:12:51Z" level=info msg="Got Conmon PID as 43271"
  2022-09-01 11:13:33 | time="2022-09-01T11:12:51Z" level=info msg="Received shutdown.Stop(), terminating!" PID=43259
  2022-09-01 11:13:33 | + sudo -E kolla_set_configs
  2022-09-01 11:13:33 | sudo: unable to send audit message: Operation not permitted
  2022-09-01 11:13:33 | INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
  2022-09-01 11:13:33 | INFO:__main__:Validating config file
  2022-09-01 11:13:33 | INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
  2022-09-01 11:13:33 | INFO:__main__:Copying service configuration files
  ...
  2022-09-01 11:13:33 | ++ EXTRA_KEYSTONE_MANAGE_ARGS=
  2022-09-01 11:13:33 | ++ [[ -n 0 ]]
  2022-09-01 11:13:33 | ++ sudo -H -u keystone keystone-manage db_sync
  2022-09-01 11:13:33 | sudo: unable to send audit message: Operation not permitted
  2022-09-01 11:13:33 | 2022-09-01 11:13:33.807922 | fa163e33-4fb2-7254-0cc9-000000008849 | FATAL | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/keystone-db-sync | overcloud-controller-0 | error={"changed": false, "msg": "Failed containers: keystone_db_sync"}

https://logserver.rdoproject.org/57/44657/9/check/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-clients-master/3fb6e42/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz

Revision history for this message
Rabi Mishra (rabi) wrote :
Revision history for this message
Jakob Meng (jm1337) wrote :

Example of a job which passed and still shows the same "error" message [1]:

 stderr F INFO:__main__:Writing out command to execute
 stderr F sudo: unable to send audit message: Operation not permitted
 stderr P chown:
 stderr P cannot access '/auth_pam_tool_dir/auth_pam_tool'
 stderr P : No such file or directory

and simply continues execution after that message.

[1] https://logserver.rdoproject.org/56/44656/16/check/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset035-master/a6e423d/logs/overcloud-controller-2/var/log/containers/stdouts/mysql_bootstrap.log.txt.gz

Revision history for this message
Jakob Meng (jm1337) wrote :
Jakob Meng (jm1337)
tags: removed: ci promotion-blocker
Revision history for this message
Cristian Le (lecris) wrote :

Thank you for pointing out that this is just a red-herring. Indeed I re-deployed it again and found in `/var/log/container/keystone/keystone-manage.log` that the issue is actually a TLS issue with a certificate file not being present. I will open a separate issue about that.

I hope that this issue will help others debug similar issues and to look for the real cause.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-ansible (master)
Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
Cédric Jeanneret (cjeanner) wrote :

Hello there,

imho, the audit write denial should be avoidable by a proper sudo configuration.

After some digging, it seems sudo.conf (and sudo) load a "sudoers_audit" plugin:
Plugin sudoers_audit sudoers.so

IMHO, we should be able to not load it, or maybe address its logging in a better way.

Apparently, if no plugin is mentioned in the /etc/sudo.conf, this is the default block:
           Plugin sudoers_policy sudoers.so
           Plugin sudoers_io sudoers.so
           Plugin sudoers_audit sudoers.so

Don't really know who may be in charge of the global "sudo" configuration, but imho a good candidate is Security - ppl like "xek", or "ade_lee" on the #tripleo channel.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-ansible (master)

Change abandoned by "Ghanshyam <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/tripleo-ansible/+/857338
Reason: TrieplO project is retiring now, for details, please see https://review.opendev.org/c/openstack/governance/+/905145 or reach out to OpenStack TC.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.