we shouldn't ignore facter cache generation failures

Bug #1861917 reported by Alex Schultz on 2020-02-04
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Medium
Alex Schultz

Bug Description

If the host's facter pre-cache step fails, the container-puppet executions will fail with some other error around facts. If we want to skip the cache, then we need to not mount the folder as read-only as that has other effects.

An example would be:

+ mkdir -p /etc/puppet
+ cp -a /tmp/puppet-etc/auth.conf /tmp/puppet-etc/hiera.yaml /tmp/puppet-etc/hieradata /tmp/puppet-etc/modules /tmp/puppet-etc/puppet.conf /tmp/puppet-etc/puppet.conf.rpmnew /etc/puppet
+ rm -Rf /etc/puppet/ssl
+ echo '{"step": 6}'
+ TAGS=
+ '[' -n file,file_line,concat,augeas,cron,neutron_plugin_ml2,neutron_config,neutron_l3_agent_config,neutron_config,neutron_metadata_agent_config,neutron_config,neutron_agent_ovs,neutron_plugin_ml2 ']'
+ TAGS='--tags file,file_line,concat,augeas,cron,neutron_plugin_ml2,neutron_config,neutron_l3_agent_config,neutron_config,neutron_metadata_agent_config,neutron_config,neutron_agent_ovs,neutron_plugin_ml2'
+ origin_of_time=/var/lib/config-data/neutron.origin_of_time
+ touch /var/lib/config-data/neutron.origin_of_time
+ sync
+ set +e
+ export FACTER_deployment_type=containers
+ FACTER_deployment_type=containers
++ cat /sys/class/dmi/id/product_uuid
++ tr '[:upper:]' '[:lower:]'
+ export FACTER_uuid=4c4c4544-0052-4310-8038-c2c04f305732
+ FACTER_uuid=4c4c4544-0052-4310-8038-c2c04f305732
+ FACTER_hostname=compute-d-048
+ /usr/bin/puppet apply --summarize --detailed-exitcodes --color=false --logdest syslog --logdest console --modulepath=/etc/puppet/modules:/usr/share/openstack-puppet/modules --tags file,file_line,concat,augeas,cron,neutron_plugin_ml2,neutron_config,neutron_l3_agent_config,neutron_config,neutron_metadata_agent_config,neutron_config,neutron_agent_ovs,neutron_plugin_ml2 /etc/config.pp
Error: Facter: Facter.value uncaught exception: boost::filesystem::create_directories: Read-only file system: "/opt/puppetlabs/facter/cache/cached_facts"
Error: Facter: Facter.value uncaught exception: boost::filesystem::create_directories: Read-only file system: "/opt/puppetlabs/facter/cache/cached_facts"
Error: Could not autoload puppet/provider/service/init: undefined method `downcase' for nil:NilClass
Error: Could not autoload puppet/provider/service/bsd: Could not autoload puppet/provider/service/init: undefined method `downcase' for nil:NilClass
Error: Facter: error while resolving custom facts in /usr/share/openstack-puppet/modules/stdlib/lib/facter/service_provider.rb: Could not autoload puppet/provider/service/bsd: Could not autoload puppet/provider/service/init: undefined method `downcase' for nil:NilClass
Error: Facter: Facter.add uncaught exception: boost::filesystem::create_directories: Read-only file system: "/opt/puppetlabs/facter/cache/cached_facts"
Error: Facter: Facter.value uncaught exception: boost::filesystem::create_directories: Read-only file system: "/opt/puppetlabs/facter/cache/cached_facts"
Error: Could not autoload puppet/provider/service/init: undefined method `downcase' for nil:NilClass
Error: Could not autoload puppet/provider/service/debian: Could not autoload puppet/provider/service/init: undefined method `downcase' for nil:NilClass
Error: Facter: error while resolving custom facts in /usr/share/openstack-puppet/modules/stdlib/lib/facter/service_provider.rb: Could not autoload puppet/provider/service/debian: Could not autoload puppet/provider/service/init: undefined method `downcase' for nil:NilClass
Error: Facter: Facter.fact uncaught exception: boost::filesystem::create_directories: Read-only file system: "/opt/puppetlabs/facter/cache/cached_facts"
Error: Facter: Facter.value uncaught exception: boost::filesystem::create_directories: Read-only file system: "/opt/puppetlabs/facter/cache/cached_facts"
Error: Facter: Facter.value uncaught exception: boost::filesystem::create_directories: Read-only file system: "/opt/puppetlabs/facter/cache/cached_facts"
Error: Facter: error while resolving custom fact "java_version": undefined method `downcase' for nil:NilClass
Warning: Found multiple default providers for package: norpm, yum, pip3; using norpm
Warning: Could not retrieve fact fqdn
Warning: Could not retrieve fact ipaddress
Notice: hiera(): Cannot load backend module_data: cannot load such file -- hiera/backend/module_data_backend
Warning: Undefined variable 'deploy_config_name';
   (file & line not available)
Warning: Undefined variable 'osfamily';
   (file & line not available)
Notice: hiera(): Cannot load backend module_data: cannot load such file -- hiera/backend/module_data_backend
Warning: Unknown variable: '::hostname'. at /etc/puppet/modules/tripleo/manifests/profile/base/neutron/plugins/ml2.pp:39:6
Warning: ModuleLoader: module 'neutron' has unresolved dependencies - it will only see those that are resolved. Use 'puppet module list --tree' to see information about modules
   (file & line not available)
Warning: Unknown variable: '::puppetversion'. at /etc/puppet/modules/openstacklib/manifests/defaults.pp:9:17
Warning: ModuleLoader: module 'openstacklib' has unresolved dependencies - it will only see those that are resolved. Use 'puppet module list --tree' to see information about modules
   (file & line not available)
Error: Evaluation Error: Error while evaluating a Function Call, 'versioncmp' parameter 'a' expects a String value, got Undef at /etc/puppet/modules/openstacklib/manifests/defaults.pp:9:6 on node
+ rc=1
+ set -e
+ '[' 1 -ne 2 -a 1 -ne 0 ']'
+ exit 1

In this case, this was caused by the host having facter2 which didn't support providing a facter.conf, but the containers did so the pre-cache fails and the attempt to generate cache in the container fails due to not being able to create the folders.

Changed in tripleo:
assignee: nobody → Bogdan Dobrelya (bogdando)
status: Triaged → In Progress

Change abandoned by Bogdan Dobrelya (bogdando) (<email address hidden>) on branch: master
Review: https://review.opendev.org/705954

Changed in tripleo:
status: In Progress → Triaged
assignee: Bogdan Dobrelya (bogdando) → nobody
wes hayutin (weshayutin) on 2020-02-10
Changed in tripleo:
milestone: ussuri-2 → ussuri-3
wes hayutin (weshayutin) on 2020-04-13
Changed in tripleo:
milestone: ussuri-3 → ussuri-rc3
Alex Schultz (alex-schultz) wrote :

This happened when packages were out of sync (facter2 vs facter3). We did improve some logic around it but this happens because of an invalid configuration.

Changed in tripleo:
status: Triaged → Invalid
Alex Schultz (alex-schultz) wrote :

Resurecting this bug because i've seen other failures for example if EC2 is available in the environment, the fact generation can fail with:

2020-05-13 22:38:30.574339 ERROR puppetlabs.facter - EC2 user data
request failed: Timeout was reached

Then the puppet execution fails because the facts don't exist. I've added logic to retry, exclude ec2 and fail hard if caching fails because it'll be a better UX than letting the container-puppet bits fail.

Changed in tripleo:
status: Invalid → Fix Released
assignee: nobody → Alex Schultz (alex-schultz)

Reviewed: https://review.opendev.org/733147
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=c0267e0c35ffd05ea166886608db66e430a2f725
Submitter: Zuul
Branch: stable/queens

commit c0267e0c35ffd05ea166886608db66e430a2f725
Author: Alex Schultz <email address hidden>
Date: Wed Jun 3 07:44:17 2020 -0600

    Improve facter cache reliability

    We don't need to be caching EC2 metadata in our facter runs because we
    don't use this. By leaving this on, it can cause problems when run on
    VMs in a cloud that might support EC2. By having this enabled, we can
    see deployments failing to configure nodes because it timesout:

    2020-05-13 22:38:30.574339 ERROR puppetlabs.facter - EC2 user data
    request failed: Timeout was reached

    Additionally if facter fails to generate the cache, the subsequent
    puppet runs will fail. This change also added retries to the facter
    cache call to ensure that we should be able to generate the cache if an
    external fact call fails for some reason.

    NOTE: The master patch for this is against tripleo-ansible

    Closes-Bug: #1861917
    Change-Id: Iaaed0dcf747ca4a08f8e200b43d0f2259ad0ed39
    (cherry-picked from c3b57d6a6c26acf35168fea7303fc485e1dcd13f)

tags: added: in-stable-queens
tags: added: in-stable-rocky

Reviewed: https://review.opendev.org/733146
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=44636dde6164d149d77c33ca9a460964bd1dead9
Submitter: Zuul
Branch: stable/rocky

commit 44636dde6164d149d77c33ca9a460964bd1dead9
Author: Alex Schultz <email address hidden>
Date: Wed Jun 3 07:44:17 2020 -0600

    Improve facter cache reliability

    We don't need to be caching EC2 metadata in our facter runs because we
    don't use this. By leaving this on, it can cause problems when run on
    VMs in a cloud that might support EC2. By having this enabled, we can
    see deployments failing to configure nodes because it timesout:

    2020-05-13 22:38:30.574339 ERROR puppetlabs.facter - EC2 user data
    request failed: Timeout was reached

    Additionally if facter fails to generate the cache, the subsequent
    puppet runs will fail. This change also added retries to the facter
    cache call to ensure that we should be able to generate the cache if an
    external fact call fails for some reason.

    NOTE: The master patch for this is against tripleo-ansible

    Closes-Bug: #1861917
    Change-Id: Iaaed0dcf747ca4a08f8e200b43d0f2259ad0ed39
    (cherry-picked from c3b57d6a6c26acf35168fea7303fc485e1dcd13f)

Reviewed: https://review.opendev.org/733145
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=a67db9df202fa9eac173cff097ffd52fb4740bbe
Submitter: Zuul
Branch: stable/stein

commit a67db9df202fa9eac173cff097ffd52fb4740bbe
Author: Alex Schultz <email address hidden>
Date: Wed Jun 3 07:44:17 2020 -0600

    Improve facter cache reliability

    We don't need to be caching EC2 metadata in our facter runs because we
    don't use this. By leaving this on, it can cause problems when run on
    VMs in a cloud that might support EC2. By having this enabled, we can
    see deployments failing to configure nodes because it timesout:

    2020-05-13 22:38:30.574339 ERROR puppetlabs.facter - EC2 user data
    request failed: Timeout was reached

    Additionally if facter fails to generate the cache, the subsequent
    puppet runs will fail. This change also added retries to the facter
    cache call to ensure that we should be able to generate the cache if an
    external fact call fails for some reason.

    NOTE: The master patch for this is against tripleo-ansible

    Closes-Bug: #1861917
    Change-Id: Iaaed0dcf747ca4a08f8e200b43d0f2259ad0ed39
    (cherry-picked from c3b57d6a6c26acf35168fea7303fc485e1dcd13f)

tags: added: in-stable-stein

Reviewed: https://review.opendev.org/729326
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=024afc22a07d94b186fff2d99f2600c9bc170034
Submitter: Zuul
Branch: stable/train

commit 024afc22a07d94b186fff2d99f2600c9bc170034
Author: Alex Schultz <email address hidden>
Date: Tue May 19 10:38:10 2020 -0600

    Improve facter cache reliability

    We don't need to be caching EC2 metadata in our facter runs because we
    don't use this. By leaving this on, it can cause problems when run on
    VMs in a cloud that might support EC2. By having this enabled, we can
    see deployments failing to configure nodes because it timesout:

    2020-05-13 22:38:30.574339 ERROR puppetlabs.facter - EC2 user data
    request failed: Timeout was reached

    Additionally if facter fails to generate the cache, the subsequent
    puppet runs will fail. This change also added retries to the facter
    cache call to ensure that we should be able to generate the cache if an
    external fact call fails for some reason.

    NOTE: The master patch for this is against tripleo-ansible

    Closes-Bug: #1861917
    Change-Id: Iaaed0dcf747ca4a08f8e200b43d0f2259ad0ed39
    (cherry-picked from c3b57d6a6c26acf35168fea7303fc485e1dcd13f)

tags: added: in-stable-train

This issue was fixed in the openstack/tripleo-heat-templates 11.4.0 release.

This issue was fixed in the openstack/tripleo-heat-templates rocky-eol release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers