kolla rabbitmq container setup should not try to delete /etc/hosts

Bug #1709689 reported by John Fulton
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Medium
Jiří Stránský

Bug Description

OS::Heat::StructuredDeployment failed on Controller during Step 1

Controller's /var/log/messages [1] contained the following about rabbitmq container:

  INFO:__main__:Deleting /etc/hosts
  ...
  OSError: [Errno 16] Device or resource busy: '/etc/hosts'

Running docker-puppet.py directly [2] produced the same error message again in the logs [3]. jistr suggested this might be a side effect of how docker/services/rabbitmq.yaml [4] sets source and dest of config_files.

Workaround: re-running the same deployment command again got the deployment past the issue and docker ps showed the rabbitmq container to be running.

[1] http://paste.openstack.org/show/617946
[2] http://paste.openstack.org/show/617953
[3] http://paste.openstack.org/show/617951
[4] https://github.com/openstack/tripleo-heat-templates/blob/master/docker/services/rabbitmq.yaml#L84-L85

Revision history for this message
John Fulton (jfulton-org) wrote :

Ran into this again on a new quickstart install. The docker run command below shows the mapping: "/etc/hosts:/etc/hosts:ro". I am unable to install an overcloud without hitting this.

Aug 10 13:06:15 localhost os-collect-config: "Error running ['docker', 'run', '--name', 'rabbitmq_bootstrap', '--label', 'config_id=tripleo_step1', '--label', 'container_name=rabbitmq_bootstrap', '--label', 'managed_by=paunch', '--label', 'config_data={\"environment\": [\"KOLLA_CONFIG_STRATEGY=COPY_ALWAYS\", \"KOLLA_BOOTSTRAP=True\", \"RABBITMQ_CLUSTER_COOKIE=5GcPkrumAykgrLxIodDt\"], \"start_order\": 1, \"volumes\": [\"/etc/hosts:/etc/hosts:ro\", \"/etc/localtime:/etc/localtime:ro\", \"/etc/puppet:/etc/puppet:ro\", \"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\", \"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\", \"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\", \"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\", \"/dev/log:/dev/log\", \"/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro\", \"/var/lib/kolla/config_files/rabbitmq.json:/var/lib/kolla/config_files/config.json:ro\", \"/var/lib/config-data/puppet-generated/rabbitmq/:/var/lib/kolla/config_files/src:ro\", \"/var/lib/rabbitmq:/var/lib/rabbitmq\", \"/var/log/containers/rabbitmq:/var/log/rabbitmq\"], \"image\": \"tripleoupstream/centos-binary-rabbitmq:latest\", \"detach\": false, \"net\": \"host\", \"privileged\": false}', '--env=KOLLA_CONFIG_STRATEGY=COPY_ALWAYS', '--env=KOLLA_BOOTSTRAP=True', '--env=RABBITMQ_CLUSTER_COOKIE=5GcPkrumAykgrLxIodDt', '--net=host', '--privileged=false', '--volume=/etc/hosts:/etc/hosts:ro', '--volume=/etc/localtime:/etc/localtime:ro', '--volume=/etc/puppet:/etc/puppet:ro', '--volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro', '--volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro', '--volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro', '--volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro', '--volume=/dev/log:/dev/log', '--volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro', '--volume=/var/lib/kolla/config_files/rabbitmq.json:/var/lib/kolla/config_files/config.json:ro', '--volume=/var/lib/config-data
...
Aug 10 13:06:15 localhost os-collect-config: "INFO:__main__:Deleting /etc/hosts",
Aug 10 13:06:15 localhost os-collect-config: "OSError: [Errno 16] Device or resource busy: '/etc/hosts'",

Revision history for this message
John Fulton (jfulton-org) wrote :

Deployment was done using:

time openstack overcloud deploy --templates ~/templates \
-e ~/templates/environments/docker.yaml \
-e ~/templates/environments/ceph-ansible/ceph-ansible.yaml \
-e ~/templates/environments/low-memory-usage.yaml \
-e ~/templates/environments/disable-telemetry.yaml \
-e ~/templates/environments/docker-centos-tripleoupstream.yaml \
-e ~/tripleo-ceph-ansible/tht/overcloud-ceph-ansible.yaml

(undercloud) [stack@undercloud tripleo-ceph-ansible]$ cat ~/tripleo-ceph-ansible/tht/overcloud-ceph-ansible.yaml
resource_registry:
  OS::TripleO::NodeUserData: first-boot-template.yaml

parameter_defaults:
  NtpServer: 10.5.26.10

  OvercloudControlFlavor: control
  ControllerCount: 1
  OvercloudComputeFlavor: compute
  ComputeCount: 1
  OvercloudCephStorageFlavor: ceph-storage
  CephStorageCount: 3
  CephMdsCount: 0
  OvercloudCephMdsFlavor: ceph-mds
  CephRgwCount: 0
  OvercloudCephMdsFlavor: ceph-rgw

  CephPoolDefaultSize: 1
  CephAnsibleDisksConfig:
    devices:
      - /dev/vdb
      - /dev/vdc
    raw_journal_devices:
      - /dev/vdd
      - /dev/vdd
    journal_size: 256 # vdd is 1024M
    journal_collocation: false
    raw_multi_journal: true
  CephPoolDefaultPgNum: 32

  #DockerNamespace: 192.168.24.1:8787/tripleoupstream
  #DockerNamespaceIsRegistry: true
(undercloud) [stack@undercloud tripleo-ceph-ansible]$

(undercloud) [stack@undercloud tripleo-ceph-ansible]$ ls -l ~ | grep templates
lrwxrwxrwx. 1 stack stack 34 Aug 10 13:39 templates -> /home/stack/tripleo-heat-templates
drwxrwxr-x. 20 stack stack 4096 Aug 10 13:41 tripleo-heat-templates
(undercloud) [stack@undercloud tripleo-ceph-ansible]$

tripleo-heat-templates was from git checkout w/ "git review -d 492082". So it basically had these templates as I was testing this feature:

https://review.openstack.org/#/c/492082

Revision history for this message
Jiří Stránský (jistr) wrote :

We found out that the cause for trying to delete /etc/hosts in the container is that Kolla indeed tries to replace it. This happens because `/var/lib/config-data/puppet-generated/rabbitmq/etc/hosts` exists. We don't know yet *why* it exists though. The rsync in docker-puppet.py should leave alone all files that haven't been touched by Puppet. For all we know in CI and other environments /etc/hosts is not touched by puppet and the rsync doesn't pick it up into .../puppet-generated/rabbitmq.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Looks like a dup / same root cause as for https://bugs.launchpad.net/tripleo/+bug/1709339

Revision history for this message
John Fulton (jfulton-org) wrote :

I am having this issue in my local environment stood up by quickstart where changes in openstack-infra/tripleo-ci [1] should not come into play.

[1] https://review.openstack.org/#/c/481233

Revision history for this message
John Fulton (jfulton-org) wrote :

I am not seeing this in the lastest builds so I'd say I can't reproduce it. I am going to close this issue for now but will re-open if I see it again. Thanks.

Changed in tripleo:
status: Triaged → Incomplete
Changed in tripleo:
milestone: pike-rc1 → pike-rc2
Revision history for this message
John Fulton (jfulton-org) wrote :

This bug doesn't happen any more under the same conditions that I reported it so I am marking it fixed.

Changed in tripleo:
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.