container partitions directories never removed

Bug #1396152 reported by Caleb Tennis
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
Undecided
Unassigned

Bug Description

Looks like was thought to be originally fixed in https://bugs.launchpad.net/swift/+bug/768816, however the actual partition directories never seem to be cleaned up (the suffix dirs do).

The issue with this is that the container replicator loops over the partitions directories during the start of a run, so having a ton of empty directories across many disks really degrades performance of the run.

(This may be also be a problem for accounts too, I'm not entire sure).

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.openstack.org/138524
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=d40cebfe55a2ef63e0fdceb21548587bad497a69
Submitter: Jenkins
Branch: master

commit d40cebfe55a2ef63e0fdceb21548587bad497a69
Author: Caleb Tennis <email address hidden>
Date: Tue Dec 2 15:28:26 2014 -0500

    Clean up empty account and container partitions directories.

    Because we iterate over these directories on a replication run,
    and they are not (previously) cleaned up, the time to start the
    replication increases incrementally for each stale directory
    lying around. Thousands of directories across dozens of disks
    on a single machine can make for non-trivial startup times.

    Plus it just seems like good housekeeping.
    Closes-Bug: #1396152

    Change-Id: Iab607b03b7f011e87b799d1f9af7ab3b4ff30019

Changed in swift:
status: New → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/139255

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/ec)

Fix proposed to branch: feature/ec
Review: https://review.openstack.org/139870

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (feature/ec)
Download full text (7.0 KiB)

Reviewed: https://review.openstack.org/139870
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=132f8b3169cd0b5ba094736b16fbc75ccc11551e
Submitter: Jenkins
Branch: feature/ec

commit cc2f0f4ed6f12554b7d8e8cb61e14f2b103445a0
Author: Samuel Merritt <email address hidden>
Date: Thu Dec 4 18:37:24 2014 -0800

    Speed up reading and writing xattrs for object metadata

    Object metadata is stored as a pickled hash: first the data is
    pickled, then split into strings of length <= 254, then stored in a
    series of extended attributes named "user.swift.metadata",
    "user.swift.metadata1", "user.swift.metadata2", and so forth.

    The choice of length 254 is odd, undocumented, and dates back to the
    initial commit of Swift. From talking to people, I believe this was an
    attempt to fit the first xattr in the inode, thus avoiding a
    seek. However, it doesn't work. XFS _either_ stores all the xattrs
    together in the inode (local), _or_ it spills them all to blocks
    located outside the inode (extents or btree). Using short xattrs
    actually hurts us here; by splitting into more pieces, we end up with
    more names to store, thus reducing the metadata size that'll fit in
    the inode.

    [Source: http://xfs.org/docs/xfsdocs-xml-dev/XFS_Filesystem_Structure//tmp/en-US/html/Extended_Attributes.html]

    I did some benchmarking of read_metadata with various xattr sizes
    against an XFS filesystem on a spinning disk, no VMs involved.

    Summary:

     name | rank | runs | mean | sd | timesBaseline
    ------|------|------|-----------|-----------|--------------
    32768 | 1 | 2500 | 0.0001195 | 3.75e-05 | 1.0
    16384 | 2 | 2500 | 0.0001348 | 1.869e-05 | 1.12809122912
     8192 | 3 | 2500 | 0.0001604 | 2.708e-05 | 1.34210998858
     4096 | 4 | 2500 | 0.0002326 | 0.0004816 | 1.94623473988
     2048 | 5 | 2500 | 0.0003414 | 0.0001409 | 2.85674781189
     1024 | 6 | 2500 | 0.0005457 | 0.0001741 | 4.56648611635
      254 | 7 | 2500 | 0.001848 | 0.001663 | 15.4616067887

    Here, "name" is the chunk size for the pickled metadata. A total
    metadata size of around 31.5 KiB was used, so the "32768" runs
    represent storing everything in one single xattr, while the "254" runs
    represent things as they are without this change.

    Since bigger xattr chunks make things go faster, the new chunk size is
    64 KiB. That's the biggest xattr that XFS allows.

    Reading of metadata from existing files is unaffected; the
    read_metadata() function already handles xattrs of any size.

    On non-XFS filesystems, this is no worse than what came before:

    ext4 has a limit of one block (typically 4 KiB) for all xattrs (names
    and values) taken together [1], so this change slightly increases the
    amount of Swift metadata that can be stored on ext4.

    ZFS let me store an xattr with an 8 MiB value, so that's plenty. It'll
    probably go further, but I stopped there.

    [1] https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Extended_Attributes

    Change-Id: Ie22db08ac0050eda693de4c30d4bc0d...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (stable/juno)

Reviewed: https://review.openstack.org/139255
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=70e35c6084ddb62a6124cdd5ba35f29748dfd746
Submitter: Jenkins
Branch: stable/juno

commit 70e35c6084ddb62a6124cdd5ba35f29748dfd746
Author: Caleb Tennis <email address hidden>
Date: Tue Dec 2 15:28:26 2014 -0500

    Clean up empty account and container partitions directories.

    Because we iterate over these directories on a replication run,
    and they are not (previously) cleaned up, the time to start the
    replication increases incrementally for each stale directory
    lying around. Thousands of directories across dozens of disks
    on a single machine can make for non-trivial startup times.

    Plus it just seems like good housekeeping.
    Closes-Bug: #1396152

    Change-Id: Iab607b03b7f011e87b799d1f9af7ab3b4ff30019

tags: added: in-stable-juno
Thierry Carrez (ttx)
Changed in swift:
milestone: none → 2.2.1
status: Fix Committed → Fix Released
Revision history for this message
clayg (clay-gerrard) wrote :

This issue was finally addressed/closed when we fixed https://bugs.launchpad.net/swift/+bug/1583719

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.