td-agent not found in containers path

Bug #2056104 reported by Antony Messerli
22
This bug affects 5 people
Affects Status Importance Assigned to Milestone
kolla
Confirmed
Undecided
Antony Messerli

Bug Description

When testing out 2023.2 we ran into an issue during deployment that resulted in the following error:

The error message in your Docker logs, exec: td-agent: not found, indicates that the Fluentd executable td-agent is not found in the container's PATH. This problem typically arises when the Fluentd package (td-agent) is either not installed within the Docker image or if it's installed in a location that is not included in the container's PATH environment variable.

Upon investigation, it showed that the td-agent was being renamed to fluentd in newer containers. In order to handle this migrations, some labels were added in the newer containers to help Ansible determine which username. Because these values were added to the label block in the Dockerfile, our customized containers which we build did not set the fluentd_binary and fluentd_user labels.

In looking through the various other labels in all of the other Dockerfiles, this seems to be the one location where additional values are set in the labels block.

As they are specific to the fluentd container operation, the LABEL for those two values should probably be moved outside of the block in order to not break operators that are customizing their containers. In our case I was able to fix the issue in our containers by moving those values outside of the block, but I think we should probably get this corrected upstream as well.

Changed in kolla:
assignee: nobody → Antony Messerli (antonym)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/kolla/+/911014

Revision history for this message
Maksim Malchuk (mmalchuk) wrote :

If you build your own customised containers you should set all needed labels yourself.

Changed in kolla:
status: In Progress → Invalid
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on kolla (master)

Change abandoned by "Michal Nasiadka <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/kolla/+/911014
Reason: not needed

Revision history for this message
Noel Ashford (nashford77) wrote :

Why is this not needed out of interest. a stock 2023.2 deployment does not work with fluentd container - it will never start due to this:

+++ stat -c %U:%G /var/lib/fluentd
++ [[ fluentd:kolla != \f\l\u\e\n\t\d\:\k\o\l\l\a ]]
+ echo 'Running command: '\''td-agent -c /etc/td-agent/td-agent.conf -o /var/log/kolla/fluentd/fluentd.log'\'''
Running command: 'td-agent -c /etc/td-agent/td-agent.conf -o /var/log/kolla/fluentd/fluentd.log'
+ exec td-agent -c /etc/td-agent/td-agent.conf -o /var/log/kolla/fluentd/fluentd.log
/usr/local/bin/kolla_start: line 24: exec: td-agent: not found

how can this be fixed, kolla images 17.3.0 still have this broken.

Revision history for this message
Travis Best (tb2097) wrote :

Confirming this is broken for the official 2023.2 default fluentd image:

https://quay.io/repository/openstack.kolla/fluentd

I have been testing with the rocky-9 build. I had built a test cluster using the previous image ~month ago, which did not have this issue. If the official container is not to be updated, can you please provide documentation on how the fix can be applied? My preference would be to run a stable release as opposed to master, which is what I expected 2023.2 to be.

Revision history for this message
Antony Messerli (antonym) wrote :

We were operating our own build of Rocky 9 containers of 2023.2 (https://docs.openstack.org/kolla/latest/admin/image-building.html) and the issue we ran is we were adding our own label to the label block for our customizations.

The original binary was named td-agent and has been changed in newer versions to fluentd, this is where it's set in kolla-ansible:

https://github.com/openstack/kolla-ansible/blob/master/ansible/roles/common/tasks/config.yml#L65

and here is where the variables are now set at the container build:

https://github.com/openstack/kolla/blob/master/docker/fluentd/Dockerfile.j2#L6

The code checks for these labels and if it found uses them, else it defaults to the original td-agent.

In our case, we build our own containers and have a LABEL override for our build so we can set our own settings for our build. Most of the other LABEL sections upstream have been pretty similar for the most part but in this case, they were overloaded with some critical variables that are needed. Because we were setting things in the LABEL section, it ended up overriding what was upstream.

Our fix in the case of our container build was to create a separate line to cover those settings on a separate label line so they wouldn't collide and override the needed variables.

{% block fluentd_header %}
LABEL fluentd_binary="fluentd" fluentd_user="{{ fluentd_user }}"
{% endblock %}

I had proposed decoupling the LABELs from each other so that operators building their own containers wouldn't run into that issue.

Changed in kolla:
status: Invalid → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.