object-updater should be more tolerant of already-removed async pendings

Bug #1877924 reported by Tim Burke
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
Undecided
Unassigned

Bug Description

So I'm not entirely sure *why* this happened (since each worker should have been looking only at its own disk), but I've seen some tracebacks in a prod cluster:

May 10 00:21:40 ss0192 object-updater: STDERR: Traceback (most recent call last):
May 10 00:21:40 ss0192 object-updater: STDERR: File "/opt/ss/lib/python2.7/site-packages/eventlet/hubs/poll.py", line 111, in wait
May 10 00:21:40 ss0192 object-updater: STDERR: listener.cb(fileno)
May 10 00:21:40 ss0192 object-updater: STDERR: File "/opt/ss/lib/python2.7/site-packages/eventlet/greenthread.py", line 221, in main
May 10 00:21:40 ss0192 object-updater: STDERR: result = function(*args, **kwargs)
May 10 00:21:40 ss0192 object-updater: STDERR: File "/opt/ss/lib/python2.7/site-packages/swift/obj/updater.py", line 351, in process_object_update
May 10 00:21:40 ss0192 object-updater: STDERR: renamer(update_path, target_path, fsync=False)
May 10 00:21:40 ss0192 object-updater: STDERR: File "/opt/ss/lib/python2.7/site-packages/swift/common/utils.py", line 1595, in renamer
May 10 00:21:40 ss0192 object-updater: STDERR: os.rename(old, new)
May 10 00:21:40 ss0192 object-updater: STDERR: OSError: [Errno 2] No such file or directory
May 10 00:21:40 ss0192 object-updater: STDERR: Removing descriptor: 7
May 10 00:22:03 ss0192 object-updater: UNCAUGHT EXCEPTION
Traceback (most recent call last):
  File "/opt/ss/bin/swift-object-updater", line 23, in <module>
    run_daemon(ObjectUpdater, conf_file, **options)
  File "/opt/ss/lib/python2.7/site-packages/swift/common/daemon.py", line 316, in run_daemon
    DaemonStrategy(klass(conf), logger).run(once=once, **kwargs)
  File "/opt/ss/lib/python2.7/site-packages/swift/common/daemon.py", line 156, in run
    self._run(once=once, **kwargs)
  File "/opt/ss/lib/python2.7/site-packages/swift/common/daemon.py", line 226, in _run
    return self._run_inline(once, **kwargs)
  File "/opt/ss/lib/python2.7/site-packages/swift/common/daemon.py", line 150, in _run_inline
    self.daemon.run(once=once, **kwargs)
  File "/opt/ss/lib/python2.7/site-packages/swift/common/daemon.py", line 65, in run
    self.run_forever(**kwargs)
  File "/opt/ss/lib/python2.7/site-packages/swift/obj/updater.py", line 166, in run_forever
    self.object_sweep(dev_path)
  File "/opt/ss/lib/python2.7/site-packages/swift/obj/updater.py", line 298, in object_sweep
    for update in ap_iter:
  File "/opt/ss/lib/python2.7/site-packages/swift/common/utils.py", line 1731, in next
    next_value = next(self.iterator)
  File "/opt/ss/lib/python2.7/site-packages/swift/obj/updater.py", line 274, in _iter_async_pendings
    os.unlink(update_path)
OSError: [Errno 2] No such file or directory: '/srv/node/d776/async_pending/73e/452b233828da81484e718d8fd8e9173e-1588834613.87109'
May 10 00:29:36 ss0192 object-updater: UNCAUGHT EXCEPTION
Traceback (most recent call last):
  File "/opt/ss/bin/swift-object-updater", line 23, in <module>
    run_daemon(ObjectUpdater, conf_file, **options)
  File "/opt/ss/lib/python2.7/site-packages/swift/common/daemon.py", line 316, in run_daemon
    DaemonStrategy(klass(conf), logger).run(once=once, **kwargs)
  File "/opt/ss/lib/python2.7/site-packages/swift/common/daemon.py", line 156, in run
    self._run(once=once, **kwargs)
  File "/opt/ss/lib/python2.7/site-packages/swift/common/daemon.py", line 226, in _run
    return self._run_inline(once, **kwargs)
  File "/opt/ss/lib/python2.7/site-packages/swift/common/daemon.py", line 150, in _run_inline
    self.daemon.run(once=once, **kwargs)
  File "/opt/ss/lib/python2.7/site-packages/swift/common/daemon.py", line 65, in run
    self.run_forever(**kwargs)
  File "/opt/ss/lib/python2.7/site-packages/swift/obj/updater.py", line 166, in run_forever
    self.object_sweep(dev_path)
  File "/opt/ss/lib/python2.7/site-packages/swift/obj/updater.py", line 298, in object_sweep
    for update in ap_iter:
  File "/opt/ss/lib/python2.7/site-packages/swift/common/utils.py", line 1731, in next
    next_value = next(self.iterator)
  File "/opt/ss/lib/python2.7/site-packages/swift/obj/updater.py", line 274, in _iter_async_pendings
    os.unlink(update_path)
OSError: [Errno 2] No such file or directory: '/srv/node/d771/async_pending/329/9efcd9bceed4654551d722daea539329-1588818164.40581'

The renamer may or may not be something worth logging tracebacks, but the ENOENT bombs us out for the *entire disk* and we'll have to wait a full updater cycle to try any of the rest of those pendings. It'd be much better to ignore the error and move on.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.opendev.org/726738
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=f57d4cfa71888c887e0e8e0ce349f2a5befb57a5
Submitter: Zuul
Branch: master

commit f57d4cfa71888c887e0e8e0ce349f2a5befb57a5
Author: Tim Burke <email address hidden>
Date: Mon May 11 00:09:49 2020 -0700

    object-updater: Ignore ENOENT when trying to unlink stale pending files

    Change-Id: Iaac1fb891d70707af38c567d9cca5913b8355b7d
    Closes-Bug: #1877924

Changed in swift:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/losf)

Fix proposed to branch: feature/losf
Review: https://review.opendev.org/735381

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (feature/losf)
Download full text (20.6 KiB)

Reviewed: https://review.opendev.org/735381
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=481f126e6b59689599f438e5d27f7328f5b3e813
Submitter: Zuul
Branch: feature/losf

commit 51a587ed8dd5700b558ad26d70dcb7facc0f91e4
Author: Tim Burke <email address hidden>
Date: Tue Jun 16 11:34:01 2020 -0700

    Use ensure-pip role

    Hopefully this will fix the currently-broken probe test gate?

    Depends-On: https://review.opendev.org/#/c/736070/
    Change-Id: Ib652534b35236fdb6bcab131c7dc08a079bf72f6

commit 79811df34c84b416ce9f445926b31a23a32ea1a4
Author: Tim Burke <email address hidden>
Date: Fri Apr 10 22:02:57 2020 -0700

    Use ini_file to update timeout instead of crudini

    crudini seems to have trouble on py3 -- still not sure *why* it's using
    py3 for the losf job, though...

    Change-Id: Id98055994c8d59e561372417c9eb4aec969afc6a

commit e4586fdcde5267f39056bb1b5f413a411bb8e7a0
Author: Tim Burke <email address hidden>
Date: Tue Jun 9 10:50:07 2020 -0700

    memcached: Plumb logger into MemcacheRing

    This way proxies log memcached errors in the normal way instead of
    to the root logger (which eventually gets them out on STDERR).

    If no logger is provided, fall back to the root logger behavior.

    Change-Id: I2f7b3e7d5b976fab07c9a2d0a9b8c0bd9a840dfd

commit 1dfa41dada30c139129cb2771b0d68c95fd84e32
Author: Tim Burke <email address hidden>
Date: Tue Apr 28 10:45:27 2020 -0700

    swift-get-nodes: Allow users to specify either quoted or unquoted paths

    Now that we can have null bytes in Swift paths, we need a way for
    operators to be able to locate such containers and objects. Our usual
    trick of making sure the name is properly quoted for the shell won't
    suffice; running something like

       swift-get-nodes /etc/swift/container.ring.gz $'AUTH_test/\0versions\0container'

    has the path get cut off after "AUTH_test/" because of how argv works.

    So, add a new option, --quoted, to let operators indicate that they
    already quoted the path.

    Drive-bys:

      * If account, container, or object are explicitly blank, treat them
        as though they were not provided. This provides better errors when
        account is explicitly blank, for example.
      * If account, container, or object are not provided or explicitly
        blank, skip printing them. This resolves abiguities about things
        like objects whose name is actually "None".
      * When displaying account, container, and object, quote them (since
        they may contain newlines or other control characters).

    Change-Id: I3d10e121b403de7533cc3671604bcbdecb02c795
    Related-Change: If912f71d8b0d03369680374e8233da85d8d38f85
    Closes-Bug: #1875734
    Closes-Bug: #1875735
    Closes-Bug: #1875736
    Related-Bug: #1791302

commit 1b6c8f7fdf630458affe2778fc7be86df3ef1674
Author: Tim Burke <email address hidden>
Date: Fri Jun 5 16:36:32 2020 -0700

    Remove etag-quoter from 2.25.0 release notes

    This was released in 2.24.0, which already has a release note for it.

    Change-Id: I9837df281ec8baa19e8e4a7976f415e8add4a2da

commi...

tags: added: in-feature-losf
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.