cleanup_file_locks does not remove stale sentinel files

Bug #1018586 reported by Branan Purvine-Riley on 2012-06-27
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Undecided
Eugene Kirpichov
Essex
High
Unassigned
nova (Ubuntu)
High
Unassigned
Precise
Undecided
Unassigned

Bug Description

Related to https://bugs.launchpad.net/nova/+bug/785955

The patch for that issue has an incorrect regex for sentinel files.

The correct regex is "hostname + r'-.*\.(\d+$)'"

Related branches

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ubuntu:
status: New → Confirmed
Pádraig Brady (p-draigbrady) wrote :

shouldn't you just reopen 785955 and add the comment there?

Fix proposed to branch: master
Review: https://review.openstack.org/10095

Changed in nova:
assignee: nobody → Eugene Kirpichov (ekirpichov)
status: New → In Progress
Michael Still (mikal) on 2012-07-21
tags: added: canonistack ops
Eugene Kirpichov (ekirpichov) wrote :

(oops, didn't notice it was already linked to automatically)

Reviewed: https://review.openstack.org/10095
Committed: http://github.com/openstack/nova/commit/974417b75f5f839ce4daaf080147ad154d727f10
Submitter: Jenkins
Branch: master

commit 974417b75f5f839ce4daaf080147ad154d727f10
Author: Eugene Kirpichov <email address hidden>
Date: Sat Jul 21 23:17:55 2012 +0000

    Fix wrong regex in cleanup_file_locks.

    The sentinel filename actually has form hostname-threadid.pid,
    not hostname.threadid-pid.
    Launchpad bug 1018586.
    Change-Id: I09c01e0e63ee704b1485c196dc0b396ee03b2e5c

Changed in nova:
status: In Progress → Fix Committed
Changed in ubuntu:
importance: Undecided → High
tags: added: essex-backport

Reviewed: https://review.openstack.org/10321
Committed: http://github.com/openstack/nova/commit/f2bc403879234aaaeeb61e1dca1affe18192cfa1
Submitter: Jenkins
Branch: stable/essex

commit f2bc403879234aaaeeb61e1dca1affe18192cfa1
Author: Eugene Kirpichov <email address hidden>
Date: Sat Jul 21 23:17:55 2012 +0000

    Fix wrong regex in cleanup_file_locks.

    The sentinel filename actually has form hostname-threadid.pid,
    not hostname.threadid-pid.

    Launchpad bug 1018586.

    Update: Add Eugene to Authors for stable/essex.

    Change-Id: I09c01e0e63ee704b1485c196dc0b396ee03b2e5c
    (cherry picked from commit 974417b75f5f839ce4daaf080147ad154d727f10)

tags: added: in-stable-essex
Eugene Kirpichov (ekirpichov) wrote :

Hm, I'm confused. I just noticed that in ubuntu precise, the package python-lockfile uses a version of lockfile (0.8) for which this regex IS CORRECT. Where did I and the other guy whom this bug affects get the more up-to-date version of lockfile??

Adam Gandelman (gandelman-a) wrote :

Hey Eugene-

I'm not sure where you and the other guy got a more up-to-date version of lockfile. python-lockfile has remained at 0.8 in Ubuntu since the package was introduced in Lucid. That said, AFACIS I'm not sure any of this is lockfile related as nova.utils.GreenLockFile overrides lockfile's naming scheme for sentinel files, anyway, and the sentinel regexp is dependent on that, not lockfile.

Did a quick test locally, and found that system named 'warhead.home.base' leaves a sentinel file as 'warhead.home.base-2ae619-2a025a0.24791', for which your newer regexp works, and the original does not:

#!/usr/bin/python
import re

hostname = 'warhead.home.base'
file="warhead.home.base-2ae619-2a025a0.24791"
orig_sentinel_re = hostname + r'\..*-(\d+$)'
new_sentinel_re = hostname + r'-.*\.(\d+$)'
print re.match(orig_sentinel_re, file)
print re.match(new_sentinel_re, file)

output:
None
<_sre.SRE_Match object at 0x7f69e74ad558>

Pádraig Brady (p-draigbrady) wrote :

tl;dr The current code should be correct.

old naming = blah-pid
new naming = blah.pid
That was changed upstream in:
http://code.google.com/p/pylockfile/source/detail?r=102
That was released upstream in 0.9.1

But nova overrides lockfile naming since essex-1-2022-geb42e7f
The new regexp is correct for that.
I.E. diablo lock files and named depending on lockfile version,
but diablo doesn't have the cleaning code, so that is moot.

p.s. This cleanup code doesn't work on windows I think,
as it's assuming file rather than directory locks.
Maybe os.link is available on windows but I don't think
it's available in python yet.

Thierry Carrez (ttx) on 2012-08-16
Changed in nova:
milestone: none → folsom-3
status: Fix Committed → Fix Released
Dave Walker (davewalker) on 2012-08-24
affects: ubuntu → nova (Ubuntu)
Changed in nova (Ubuntu):
status: Confirmed → Fix Released
Dave Walker (davewalker) on 2012-08-24
Changed in nova (Ubuntu Precise):
status: New → Confirmed

Please find the attached test log from the Ubuntu Server Team's CI infrastructure. As part of the verification process for this bug, Nova has been deployed and configured across multiple nodes using precise-proposed as an installation source. After successful bring-up and configuration of the cluster, a number of exercises and smoke tests have be invoked to ensure the updated package did not introduce any regressions. A number of test iterations were carried out to catch any possible transient errors.

Please Note the list of installed packages at the top and bottom of the report.

For records of upstream test coverage of this update, please see the Jenkins links in the comments of the relevant upstream code-review(s):

Trunk review: https://review.openstack.org/10095
Stable review: https://review.openstack.org/10321

As per the provisional Micro Release Exception granted to this package by the Technical Board, we hope this contributes toward verification of this update.

Adam Gandelman (gandelman-a) wrote :

Test coverage log.

tags: added: verification-done
Launchpad Janitor (janitor) wrote :
Download full text (5.4 KiB)

This bug was fixed in the package nova - 2012.1.3+stable-20120827-4d2a4afe-0ubuntu1

---------------
nova (2012.1.3+stable-20120827-4d2a4afe-0ubuntu1) precise-proposed; urgency=low

  * New upstream snapshot, fixes FTBFS in -proposed. (LP: #1041120)
  * Resynchronize with stable/essex (4d2a4afe):
    - [5d63601] Inappropriate exception handling on kvm live/block migration
      (LP: #917615)
    - [ae280ca] Deleted floating ips can cause instance delete to fail
      (LP: #1038266)

nova (2012.1.3+stable-20120824-86fb7362-0ubuntu1) precise-proposed; urgency=low

  * New upstream snapshot. (LP: #1041120)
  * Dropped, superseded by new snapshot:
    - debian/patches/CVE-2012-3447.patch: [d9577ce]
    - debian/patches/CVE-2012-3371.patch: [25f5bd3]
    - debian/patches/CVE-2012-3360+3361.patch: [b0feaff]
  * Resynchronize with stable/essex (86fb7362):
    - [86fb736] Libvirt driver reports incorrect error when volume-detach fails
      (LP: #1029463)
    - [272b98d] nova delete lxc-instance umounts the wrong rootfs (LP: #971621)
    - [09217ab] Block storage connections are NOT restored on system reboot
      (LP: #1036902)
    - [d9577ce] CVE-2012-3361 not fully addressed (LP: #1031311)
    - [e8ef050] pycrypto is unused and the existing code is potentially insecure
      to use (LP: #1033178)
    - [3b4ac31] cannot umount guestfs (LP: #1013689)
    - [f8255f3] qpid_heartbeat setting in ineffective (LP: #1030430)
    - [413c641] Deallocation of fixed IP occurs before security group refresh
      leading to potential security issue in error / race conditions
      (LP: #1021352)
    - [219c5ca] Race condition in network/deallocate_for_instance() leads to
      security issue (LP: #1021340)
    - [f2bc403] cleanup_file_locks does not remove stale sentinel files
      (LP: #1018586)
    - [4c7d671] Deleting Flavor currently in use by instance creates error
      (LP: #994935)
    - [7e88e39] nova testsuite errors on newer versions of python-boto (e.g.
      2.5.2) (LP: #1027984)
    - [80d3026] NoMoreFloatingIps: Zero floating ips available after repeatedly
      creating and destroying instances over time (LP: #1017418)
    - [4d74631] Launching with source groups under load produces lazy load error
      (LP: #1018721)
    - [08e5128] API 'v1.1/{tenant_id}/os-hosts' does not return a list of hosts
      (LP: #1014925)
    - [801b94a] Restarting nova-compute removes ip packet filters (LP: #1027105)
    - [f6d1f55] instance live migration should create virtual_size disk image
      (LP: #977007)
    - [4b89b4f] [nova][volumes] Exceeding volumes, gigabytes and floating_ips
      quotas returns general uninformative HTTP 500 error (LP: #1021373)
    - [6e873bc] [nova][volumes] Exceeding volumes, gigabytes and floating_ips
      quotas returns general uninformative HTTP 500 error (LP: #1021373)
    - [7b215ed] Use default qemu-img cluster size in libvirt connection driver
    - [d3a87a2] Listing flavors with marker set returns 400 (LP: #956096)
    - [cf6a85a] nova-rootwrap hardcodes paths instead of using
      /sbin:/usr/sbin:/usr/bin:/bin (LP: #1013147)
    - [2efc87c] affinity filters don't work if scheduler_hints is None
      (LP: #1007573)
  ...

Read more...

Changed in nova (Ubuntu Precise):
status: Confirmed → Fix Released

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

Thierry Carrez (ttx) on 2012-09-27
Changed in nova:
milestone: folsom-3 → 2012.2
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers