undercloud fails with " [ERROR] Could not lock /var/run/os-refresh-config.lock."

Bug #1669110 reported by Alfredo Moralejo
28
This bug affects 6 people
Affects Status Importance Assigned to Milestone
tripleo
Won't Fix
High
Unassigned

Bug Description

We are hitting an intermittent issue in RDO-CI jobs using oooq where undercloud fails with following trace:

2017-03-01 14:52:54,672 INFO: dib-run-parts Wed Mar 1 14:52:54 UTC 2017 --------------------- END PROFILING ---------------------
2017-03-01 14:52:54,682 INFO: INFO: 2017-03-01 14:52:54,673 -- ############### End stdout/stderr logging ###############
2017-03-01 14:52:54,682 INFO: INFO: 2017-03-01 14:52:54,673 -- Running hook post-install
2017-03-01 14:52:54,682 INFO: INFO: 2017-03-01 14:52:54,673 -- Skipping hook post-install, the hook directory doesn't exist at /tmp/tmpbC7P9Y/post-install.d
2017-03-01 14:52:54,683 INFO: INFO: 2017-03-01 14:52:54,679 -- Ending run of instack.
2017-03-01 14:52:54,692 INFO: Instack completed successfully
2017-03-01 14:52:54,693 INFO: Running os-refresh-config
2017-03-01 14:52:54,814 INFO: [2017-03-01 14:52:54,814] (os-refresh-config) [ERROR] Could not lock /var/run/os-refresh-config.lock. [Errno 11] Resource temporarily unavailable
2017-03-01 14:52:54,823 ERROR:

We are seeing in ocata and master deployments. Logs for some of the failed jobs:

https://ci.centos.org/artifacts/rdo/jenkins-tripleo-quickstart-promote-ocata-rdo_trunk-minimal_pacemaker-10/undercloud/home/stack/undercloud_install.log.gz

https://ci.centos.org/artifacts/rdo/jenkins-tripleo-quickstart-promote-ocata-rdo_trunk-minimal_pacemaker-22/undercloud/home/stack/undercloud_install.log.gz

https://ci.centos.org/artifacts/rdo/jenkins-tripleo-quickstart-promote-master-delorean-minimal_pacemaker-486/undercloud/home/stack/undercloud_install.log.gz

https://ci.centos.org/artifacts/rdo/jenkins-tripleo-quickstart-promote-master-current-tripleo-delorean-minimal-132/undercloud/home/stack/undercloud_install.log.gz

Tags: ci quickstart
tags: added: ci quickstart
Changed in tripleo:
status: New → Triaged
importance: Undecided → Medium
milestone: none → pike-1
Changed in tripleo:
milestone: pike-1 → pike-2
Changed in tripleo:
milestone: pike-2 → pike-3
Revision history for this message
wes hayutin (weshayutin) wrote :
Revision history for this message
wes hayutin (weshayutin) wrote :
Revision history for this message
wes hayutin (weshayutin) wrote :
Changed in tripleo:
importance: Medium → High
Changed in tripleo:
milestone: pike-3 → pike-rc1
Revision history for this message
Justin Kilpatrick (jkilpatr) wrote :
Changed in tripleo:
milestone: pike-rc1 → pike-rc2
Revision history for this message
yatin (yatinkarel) wrote :
Revision history for this message
Matt Young (halcyondude) wrote :
Changed in tripleo:
milestone: pike-rc2 → queens-1
Changed in tripleo:
milestone: queens-1 → queens-2
Changed in tripleo:
milestone: queens-2 → queens-3
Revision history for this message
Matt Young (halcyondude) wrote :
Revision history for this message
Matt Young (halcyondude) wrote :

Added a patch for a recheck query here: https://review.openstack.org/#/c/527559/

Revision history for this message
Ameed Ashour (ameeda) wrote :

I faced the same issue , I could solve it by remove the file
$sudo rm /var/run/os-refresh-config.lock

what if I add script to check if the file exists, if exists delete it ?

Changed in tripleo:
milestone: queens-3 → queens-rc1
Changed in tripleo:
milestone: queens-rc1 → rocky-1
Changed in tripleo:
milestone: rocky-1 → rocky-2
Revision history for this message
Moisés Guimarães de Medeiros (moguimar) wrote :
Download full text (4.6 KiB)

Hitting this on a fresh tripleo-quickstart installation using the master-tripleo-ci release.

I see this in the journal logs:

Apr 26 10:05:44 undercloud os-collect-config[680]: HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /latest/meta-data/ (Caused by ConnectTimeoutError(<requests.packages
Apr 26 10:05:44 undercloud os-collect-config[680]: Source [ec2] Unavailable.
Apr 26 10:05:44 undercloud os-collect-config[680]: Source [request] Unavailable.
Apr 26 10:05:44 undercloud os-collect-config[680]: /var/lib/os-collect-config/local-data not found. Skipping
Apr 26 10:05:44 undercloud os-collect-config[680]: No auth_url configured.
Apr 26 10:05:44 undercloud os-collect-config[680]: [2018-04-26 10:05:44,210] (os-refresh-config) [ERROR] Could not lock /var/run/os-refresh-config.lock. [Errno 11] Resource temporarily unavailable
Apr 26 10:05:44 undercloud os-collect-config[680]: Command failed, will not cache new data. Command 'os-refresh-config' returned non-zero exit status 11
Apr 26 10:05:57 undercloud os-collect-config[680]: HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /latest/meta-data/ (Caused by ConnectTimeoutError(<requests.packages
Apr 26 10:05:57 undercloud os-collect-config[680]: Source [ec2] Unavailable.
Apr 26 10:05:57 undercloud os-collect-config[680]: Source [request] Unavailable.
Apr 26 10:05:57 undercloud os-collect-config[680]: /var/lib/os-collect-config/local-data not found. Skipping

This is the end of the undercloud install logs:

2018-04-26 10:12:13 | 2018-04-26 10:12:13,496 INFO: + exit 1
2018-04-26 10:12:13 | 2018-04-26 10:12:13,502 INFO: [2018-04-26 10:12:13,496] (os-refresh-config) [ERROR] during configure phase. [Command '['dib-run-parts', '/usr/libexec/os-refresh-config/configure.d']' returned non-zero exit status 1]
2018-04-26 10:12:13 | 2018-04-26 10:12:13,502 INFO:
2018-04-26 10:12:13 | 2018-04-26 10:12:13,502 INFO: [2018-04-26 10:12:13,496] (os-refresh-config) [ERROR] Aborting...
2018-04-26 10:12:13 | 2018-04-26 10:12:13,506 DEBUG: An exception occurred
2018-04-26 10:12:13 | Traceback (most recent call last):
2018-04-26 10:12:13 | File "/usr/lib/python2.7/site-packages/instack_undercloud/undercloud.py", line 2285, in install
2018-04-26 10:12:13 | _run_orc(instack_env)
2018-04-26 10:12:13 | File "/usr/lib/python2.7/site-packages/instack_undercloud/undercloud.py", line 1592, in _run_orc
2018-04-26 10:12:13 | _run_live_command(args, instack_env, 'os-refresh-config')
2018-04-26 10:12:13 | File "/usr/lib/python2.7/site-packages/instack_undercloud/undercloud.py", line 671, in _run_live_command
2018-04-26 10:12:13 | raise RuntimeError('%s failed. See log for details.' % name)
2018-04-26 10:12:13 | RuntimeError: os-refresh-config failed. See log for details.
2018-04-26 10:12:13 | 2018-04-26 10:12:13,508 ERROR:
2018-04-26 10:12:13 | #############################################################################
2018-04-26 10:12:13 | Undercloud install failed.
2018-04-26 10:12:13 |
2018-04-26 10:12:13 | Reason: os-refresh-config failed. See log for details.
2018-04-26 10:12:13 |
2018-04-26 10:12:13 | See the previous output for detai...

Read more...

Revision history for this message
Ronelle Landy (rlandy) wrote :

This problem still shows up sporadically.

Changed in tripleo:
milestone: rocky-2 → rocky-3
Changed in tripleo:
milestone: rocky-3 → rocky-rc1
Changed in tripleo:
milestone: rocky-rc1 → stein-1
Revision history for this message
Martin Kopec (mkopec) wrote :
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

We do not support instack undercloud for Stein, don't we?

Changed in tripleo:
status: Triaged → Won't Fix
Revision history for this message
Marios Andreou (marios-b) wrote :

@Bogdan the most recent trace from martin in comment #12 is pointing to ocata gate check and we do use it there https://review.openstack.org/#/c/564291/.

Having said that it seems we aren't seeing it any more that scenario job is green there. From the discussion and intermittent reports above it might be some race condition as it is not consistent/happens sporadically it seems. Eventually instack-undercloud _will_ go away but we will continue to use it until it goes end of life.

I'm not pushing back on wont fix for now, but i won't be surprised if we re-open it.

Revision history for this message
Aakarsh (agopi) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.