Reconstructor remaining time is incorrect, because total jobs number is increase continually

Bug #1468298 reported by Charles Hsu
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
Low
Charles Hsu

Bug Description

Here is a lot of reconstructed jobs need to be process. But the total jobs count increases continually, so that makes the remaining time is no meaning. The user don't know the reconstructed process will take how much time to finish.

It seems reconstructor counting jobs number while checking each partation.
https://github.com/openstack/swift/blob/0009a43eb45fdee6716d1272b346cfc76d946e4b/swift/obj/reconstructor.py#L866-L884

Jun 24 09:48:32 localhost.localdomain object-reconstructor: 255/278 (91.73%) partitions reconstructed in 2826.57s (0.09/sec, 4m remaining)
Jun 24 09:48:32 localhost.localdomain object-reconstructor: Partition times: max 2638.2300s, min 0.0126s, med 92.2764s
Jun 24 09:49:03 localhost.localdomain object-reconstructor: 256/280 (91.43%) partitions reconstructed in 2857.90s (0.09/sec, 4m remaining)
Jun 24 09:49:03 localhost.localdomain object-reconstructor: Partition times: max 2638.2300s, min 0.0126s, med 92.2764s
Jun 24 09:49:34 localhost.localdomain object-reconstructor: 256/282 (90.78%) partitions reconstructed in 2888.71s (0.09/sec, 4m remaining)
Jun 24 09:49:34 localhost.localdomain object-reconstructor: Partition times: max 2638.2300s, min 0.0126s, med 92.2764s
Jun 24 09:50:10 localhost.localdomain object-reconstructor: 260/283 (91.87%) partitions reconstructed in 2924.78s (0.09/sec, 4m remaining)
Jun 24 09:50:10 localhost.localdomain object-reconstructor: Partition times: max 2638.2300s, min 0.0126s, med 92.2764s
Jun 24 09:50:10 localhost.localdomain object-reconstructor: Removing partition: /srv/node/d40/objects-3/2921
Jun 24 09:50:54 localhost.localdomain object-reconstructor: 262/285 (91.93%) partitions reconstructed in 2968.50s (0.09/sec, 4m remaining)
Jun 24 09:50:54 localhost.localdomain object-reconstructor: Partition times: max 2741.8954s, min 0.0126s, med 92.2860s
Jun 24 09:51:43 localhost.localdomain object-reconstructor: 265/289 (91.70%) partitions reconstructed in 3018.14s (0.09/sec, 4m remaining)
Jun 24 09:51:43 localhost.localdomain object-reconstructor: Partition times: max 2741.8954s, min 0.0126s, med 92.2860s
Jun 24 09:52:14 localhost.localdomain object-reconstructor: 268/295 (90.85%) partitions reconstructed in 3049.26s (0.09/sec, 5m remaining)
Jun 24 09:52:14 localhost.localdomain object-reconstructor: Partition times: max 2741.8954s, min 0.0126s, med 94.9995s
Jun 24 09:52:46 localhost.localdomain object-reconstructor: 269/295 (91.19%) partitions reconstructed in 3081.00s (0.09/sec, 4m remaining)
Jun 24 09:52:46 localhost.localdomain object-reconstructor: Partition times: max 2741.8954s, min 0.0126s, med 94.9995s
Jun 24 09:53:17 localhost.localdomain object-reconstructor: 272/295 (92.20%) partitions reconstructed in 3111.83s (0.09/sec, 4m remaining)
Jun 24 09:53:17 localhost.localdomain object-reconstructor: Partition times: max 2741.8954s, min 0.0126s, med 94.9995s
Jun 24 09:53:17 localhost.localdomain object-reconstructor: Removing partition: /srv/node/d40/objects-3/93
Jun 24 09:53:47 localhost.localdomain object-reconstructor: 274/299 (91.64%) partitions reconstructed in 3142.11s (0.09/sec, 4m remaining)
Jun 24 09:53:47 localhost.localdomain object-reconstructor: Partition times: max 2741.8954s, min 0.0126s, med 95.2591s
Jun 24 09:54:20 localhost.localdomain object-reconstructor: 276/302 (91.39%) partitions reconstructed in 3175.10s (0.09/sec, 4m remaining)
Jun 24 09:54:20 localhost.localdomain object-reconstructor: Partition times: max 2741.8954s, min 0.0126s, med 95.2591s
Jun 24 09:54:52 localhost.localdomain object-reconstructor: 278/304 (91.45%) partitions reconstructed in 3207.38s (0.09/sec, 4m remaining)
Jun 24 09:54:52 localhost.localdomain object-reconstructor: Partition times: max 2741.8954s, min 0.0126s, med 94.9995s
Jun 24 09:55:36 localhost.localdomain object-reconstructor: 284/308 (92.21%) partitions reconstructed in 3250.66s (0.09/sec, 4m remaining)
Jun 24 09:55:36 localhost.localdomain object-reconstructor: Partition times: max 2741.8954s, min 0.0126s, med 95.2591s
Jun 24 09:56:06 localhost.localdomain object-reconstructor: 285/310 (91.94%) partitions reconstructed in 3281.01s (0.09/sec, 4m remaining)
Jun 24 09:56:06 localhost.localdomain object-reconstructor: Partition times: max 2741.8954s, min 0.0126s, med 95.2591s
Jun 24 09:56:12 localhost.localdomain object-reconstructor: Removing partition: /srv/node/d40/objects-3/3741
Jun 24 09:57:31 localhost.localdomain object-reconstructor: 288/314 (91.72%) partitions reconstructed in 3366.22s (0.09/sec, 5m remaining)
Jun 24 09:57:31 localhost.localdomain object-reconstructor: Partition times: max 2741.8954s, min 0.0126s, med 95.2591s
Jun 24 09:58:07 localhost.localdomain object-reconstructor: 290/316 (91.77%) partitions reconstructed in 3402.30s (0.09/sec, 5m remaining)
Jun 24 09:58:07 localhost.localdomain object-reconstructor: Partition times: max 2920.7774s, min 0.0126s, med 99.2620s
Jun 24 09:58:40 localhost.localdomain object-reconstructor: 297/323 (91.95%) partitions reconstructed in 3434.70s (0.09/sec, 5m remaining)
Jun 24 09:58:40 localhost.localdomain object-reconstructor: Partition times: max 2920.7774s, min 0.0126s, med 101.9376s
Jun 24 09:59:10 localhost.localdomain object-reconstructor: 300/324 (92.59%) partitions reconstructed in 3464.94s (0.09/sec, 4m remaining)
Jun 24 09:59:10 localhost.localdomain object-reconstructor: Partition times: max 2920.7774s, min 0.0126s, med 109.4912s
Jun 24 09:59:40 localhost.localdomain object-reconstructor: 305/331 (92.15%) partitions reconstructed in 3495.04s (0.09/sec, 4m remaining)
Jun 24 09:59:40 localhost.localdomain object-reconstructor: Partition times: max 3398.9383s, min 0.0126s, med 110.4927s
Jun 24 10:00:22 localhost.localdomain object-reconstructor: 313/340 (92.06%) partitions reconstructed in 3537.11s (0.09/sec, 5m remaining)
Jun 24 10:00:22 localhost.localdomain object-reconstructor: Partition times: max 3398.9383s, min 0.0126s, med 102.8712s
Jun 24 10:01:06 localhost.localdomain object-reconstructor: 313/340 (92.06%) partitions reconstructed in 3580.96s (0.09/sec, 5m remaining)
Jun 24 10:01:06 localhost.localdomain object-reconstructor: Partition times: max 3398.9383s, min 0.0126s, med 102.8712s
Jun 24 10:01:36 localhost.localdomain object-reconstructor: 315/343 (91.84%) partitions reconstructed in 3611.08s (0.09/sec, 5m remaining)
Jun 24 10:01:36 localhost.localdomain object-reconstructor: Partition times: max 3398.9383s, min 0.0126s, med 109.4912s
Jun 24 10:02:06 localhost.localdomain object-reconstructor: 318/344 (92.44%) partitions reconstructed in 3641.09s (0.09/sec, 4m remaining)
Jun 24 10:02:06 localhost.localdomain object-reconstructor: Partition times: max 3398.9383s, min 0.0126s, med 111.0776s
Jun 24 10:02:40 localhost.localdomain object-reconstructor: 321/348 (92.24%) partitions reconstructed in 3675.14s (0.09/sec, 5m remaining)
Jun 24 10:02:40 localhost.localdomain object-reconstructor: Partition times: max 3398.9383s, min 0.0126s, med 111.0776s
Jun 24 10:03:14 localhost.localdomain object-reconstructor: 323/350 (92.29%) partitions reconstructed in 3708.94s (0.09/sec, 5m remaining)
Jun 24 10:03:14 localhost.localdomain object-reconstructor: Partition times: max 3398.9383s, min 0.0126s, med 111.0776s
Jun 24 10:03:57 localhost.localdomain object-reconstructor: 329/352 (93.47%) partitions reconstructed in 3751.59s (0.09/sec, 4m remaining)
Jun 24 10:03:57 localhost.localdomain object-reconstructor: Partition times: max 3398.9383s, min 0.0126s, med 112.1430s
Jun 24 10:04:30 localhost.localdomain object-reconstructor: 330/356 (92.70%) partitions reconstructed in 3784.79s (0.09/sec, 4m remaining)
Jun 24 10:04:30 localhost.localdomain object-reconstructor: Partition times: max 3398.9383s, min 0.0126s, med 112.1430s
Jun 24 10:04:49 localhost.localdomain object-reconstructor: Removing partition: /srv/node/d40/objects-3/4049
Jun 24 10:05:04 localhost.localdomain object-reconstructor: 334/358 (93.30%) partitions reconstructed in 3819.12s (0.09/sec, 4m remaining)
Jun 24 10:05:04 localhost.localdomain object-reconstructor: Partition times: max 3398.9383s, min 0.0126s, med 112.3085s
Jun 24 10:05:35 localhost.localdomain object-reconstructor: 334/360 (92.78%) partitions reconstructed in 3850.00s (0.09/sec, 4m remaining)
Jun 24 10:05:35 localhost.localdomain object-reconstructor: Partition times: max 3398.9383s, min 0.0126s, med 112.3085s
Jun 24 10:06:05 localhost.localdomain object-reconstructor: 335/361 (92.80%) partitions reconstructed in 3880.02s (0.09/sec, 5m remaining)
Jun 24 10:06:05 localhost.localdomain object-reconstructor: Partition times: max 3398.9383s, min 0.0126s, med 112.3085s
Jun 24 10:06:36 localhost.localdomain object-reconstructor: 339/367 (92.37%) partitions reconstructed in 3910.78s (0.09/sec, 5m remaining)
Jun 24 10:06:36 localhost.localdomain object-reconstructor: Partition times: max 3398.9383s, min 0.0126s, med 112.3085s
Jun 24 10:07:06 localhost.localdomain object-reconstructor: 342/368 (92.93%) partitions reconstructed in 3940.80s (0.09/sec, 4m remaining)
Jun 24 10:07:06 localhost.localdomain object-reconstructor: Partition times: max 3398.9383s, min 0.0126s, med 114.2424s
Jun 24 10:07:36 localhost.localdomain object-reconstructor: 345/371 (92.99%) partitions reconstructed in 3970.86s (0.09/sec, 4m remaining)
Jun 24 10:07:36 localhost.localdomain object-reconstructor: Partition times: max 3398.9383s, min 0.0126s, med 117.7179s
Jun 24 10:08:06 localhost.localdomain object-reconstructor: 348/374 (93.05%) partitions reconstructed in 4000.88s (0.09/sec, 4m remaining)
Jun 24 10:08:06 localhost.localdomain object-reconstructor: Partition times: max 3398.9383s, min 0.0126s, med 118.5659s
Jun 24 10:08:36 localhost.localdomain object-reconstructor: 349/376 (92.82%) partitions reconstructed in 4030.95s (0.09/sec, 5m remaining)
Jun 24 10:08:36 localhost.localdomain object-reconstructor: Partition times: max 3398.9383s, min 0.0126s, med 118.5659s

CVE References

tags: added: ec
clayg (clay-gerrard)
Changed in swift:
status: New → Confirmed
importance: Undecided → Low
Revision history for this message
clayg (clay-gerrard) wrote :
Changed in swift:
assignee: nobody → Charles Hsu (charles0126)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.openstack.org/195275
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=39b6ef6e4fd515d81e688ef365d26d5e0499be7c
Submitter: Jenkins
Branch: master

commit 39b6ef6e4fd515d81e688ef365d26d5e0499be7c
Author: Charles Hsu <email address hidden>
Date: Thu Jun 25 02:06:54 2015 +0800

    Fix reconstructor stats mssage.

    Calculate reconstruction job count and remaining time that
    would be inappropriate for user. Use real partition count would
    be suitable for user.

    Change-Id: I6b025854baf4757dddf9d7fe7bc2cece58a49157
    Closes-Bug: #1468298

Changed in swift:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/crypto)

Fix proposed to branch: feature/crypto
Review: https://review.openstack.org/208513

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (feature/crypto)
Download full text (17.4 KiB)

Reviewed: https://review.openstack.org/208513
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=942c9bb45c6b8124bcaa407eb9b9ac7f0589c743
Submitter: Jenkins
Branch: feature/crypto

commit 207dd9b49d7d53a9faa4849af2c40bb875416fce
Author: Darrell Bishop <email address hidden>
Date: Thu Jul 30 14:32:08 2015 -0700

    Fix regression in WSGI server SIGHUP behavior

    The SIGHUP receipt used to pop us out of an os.wait() where now, it's in
    a "green" wait() and Timeout() combo, some part of which eats the signal
    receipt. This causes the while loop condition to never get checked and
    SIGHUP no longer works as a server reload command.

    The fix is to loop at least every 0.5 seconds, as a trade-off between
    not busy-waiting and checking the "keep running" condition often enough
    to feel responsive.

    Change-Id: I95283b8b7cfc2998ab5813e0ad3ca1fa231696c8
    Closes-Bug: #1479972

commit bcd00d9461603db1477c5f1e9f8dd6405a319eb9
Author: Alistair Coles <email address hidden>
Date: Mon Jun 8 19:40:56 2015 +0100

    Refactor diskfile

    This patch mostly eliminates the duplicate code that was
    deliberately left in place during EC review to avoid major
    churn of the diskfile module prior to the kilo release.

    This focuses on obvious de-duplication and shuffling code
    between classes. It deliberately does not attempt to
    hammer out every last piece of de-duplication where that
    would introduce more complex changes - that can come later.

    Code is moved from the module level and from ECDiskFile*
    classes into new BaseDiskFile* classes.

    Concrete classes for replication and EC policy retain their
    existing names i.e. DiskFile[Manager|Writer|Reader|] and
    ECDiskFile[Manager|Writer|Reader|] respectively.

    Knock-on changes:

    - fix bug whereby get_hashes was ignoring self.reclaim_age
      and always using the default arg value.

    - replication diskfile manager now deletes a tombstone that is older
      than reclaim_age even when there is a newer .meta file.

    - replication diskfile manager will no longer raise an
      AssertionError if only a .meta file is found during
      hash_cleanup_listdir.

    - fix stale test in test_auditor.py: test_with_tombstone test
      setup was convoluted (probably dates back to when object puts
      did not clean up the object dir). Now that they do you have to
      try harder to create a dir with a tombstone and a data file.

    Change-Id: I963e0d0ae0d6569ad1de605034c529529cbb4f9a

commit 9cb7eb4a4b6cdab8a5f16b3dc800b39ab4068522
Author: Victor Stinner <email address hidden>
Date: Mon Jul 27 11:34:07 2015 +0200

    Update hacking to 0.10.0

    Replace the whitelist of flake8 checks (select) with a blacklist
    (ignore). It makes possible to disable a single check, which was not
    possible before. This new approach permits to enable new tests more
    easily and see which checks are currently disabled.

    Only new checks are disabled, this change doesn't run less checks than
    before. Currently, many checks are disabled, but following changes will
    ...

tags: added: in-feature-crypto
Thierry Carrez (ttx)
Changed in swift:
milestone: none → 2.4.0
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/hummingbird)

Fix proposed to branch: feature/hummingbird
Review: https://review.openstack.org/221410

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (feature/hummingbird)
Download full text (70.7 KiB)

Reviewed: https://review.openstack.org/221410
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=eb8f1f83f1cfc63d8452bc30096fd1c145781527
Submitter: Jenkins
Branch: feature/hummingbird

commit cb683d391cb66d0f52830de16760c80fd2afedf9
Author: OpenStack Proposal Bot <email address hidden>
Date: Sat Sep 5 06:17:51 2015 +0000

    Imported Translations from Transifex

    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure

    Change-Id: I2d92b8e34a665fb0bb4c048cfb0c59de295dfce6

commit e4542455c8a07b7981c247df8b737816062c1655
Author: Emett Speer <email address hidden>
Date: Wed Sep 2 17:18:03 2015 -0700

    [Labs] Update links to Cloud Admin Guide

    Update links to the Cloud Admin Guide after the
    RST conversion of that book altered URLs.

    Change-Id: I899f8938498b744e62887968a65e58c00ef27f1b

commit 58fcc07523978306cd3889ada73af5d9e664cf59
Author: Christian Schwede <email address hidden>
Date: Wed Sep 2 10:52:34 2015 +0000

    Test if container_sweep is executed on unmounted devices

    This change ensures that container_sweep is not run if a device is not mounted
    and mount_check is set to True.

    Change-Id: I823083c8431d9e61fd426508033ec9188503957b

commit e02609c66a804845672413b06830b87395afef31
Author: Samuel Merritt <email address hidden>
Date: Tue Sep 1 15:19:50 2015 -0700

    Preserve traceback in swift-dispersion-report

    Commit c690bcb fixed a bug in the dispersion report, but changed this
    from a bare "raise" to "raise err", which loses the traceback. Not a
    big deal, but worth putting back IMO.

    Change-Id: Id5b72153a4b8df8e3faaf1fa3fb2040e28ba85cc

commit d06d4ad0fd2dfe69da8008e729651264522c6c06
Author: Minwoo Bae <email address hidden>
Date: Tue Sep 1 15:08:44 2015 -0500

    Included reference in swift.obj.diskfile to enumerate the string
    used for data file paths.

    Change-Id: Ie22caa678bc00dfc43fabec7efbbb9f34490f1b5

commit 615c7a204b9386e05c5bab658bfe96766ad1e680
Author: Brian Cline <email address hidden>
Date: Tue Sep 1 10:51:20 2015 -0500

    Adds useful dispersion info from changelog

    Change-Id: I1a45088fc32620b02ff9a754b02ec1eb75a59d6e

commit 3b8755098a1786c5447abf158bd686293a82977c
Author: janonymous <email address hidden>
Date: Sun Aug 2 21:29:13 2015 +0530

    Replace a / b with a // b to use integer division where needed

    Change-Id: I72c81faa62786e140b0de00e3a04934bf1b5adbd

commit 524c89b7eeff037b8a6b421888771e15f98c2da2
Author: John Dickinson <email address hidden>
Date: Fri Aug 21 13:39:41 2015 -0700

    Updated CHANGELOG, AUTHORS, and .mailmap for 2.4.0 release.

    Change-Id: Ic6301146b839c9921bb85c4f4c1e585c9ab66661

commit 05de1305a903ee4ce9c8c50fde53c552d5b90d51
Author: Clay Gerrard <email address hidden>
Date: Thu Aug 27 18:35:09 2015 -0700

    Make ssync_sender send valid chunked requests

    The connect method of ssync_sender tells the remote connection that it's
    going to send a valid HTTP chunked request, but if the remote end needs
    to respond with an error of any kind sender th...

tags: added: in-feature-hummingbird
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.