Swift recon does not work in kilo (quarantine check)

Bug #1453599 reported by Bjoern
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
Undecided
Andy McCrae
OpenStack-Ansible
Invalid
Medium
Andy McCrae
Kilo
Invalid
Medium
Andy McCrae
Trunk
Invalid
Medium
Andy McCrae

Bug Description

swift-recon -q
===============================================================================
--> Starting reconnaissance on 3 hosts
===============================================================================
[2015-05-10 19:37:38] Checking quarantine
Traceback (most recent call last):
  File "/usr/local/bin/swift-recon", line 24, in <module>
    sys.exit(main())
  File "/usr/local/lib/python2.7/dist-packages/swift/cli/recon.py", line 1087, in main
    reconnoiter.main()
  File "/usr/local/lib/python2.7/dist-packages/swift/cli/recon.py", line 1077, in main
    self.quarantine_check(hosts)
  File "/usr/local/lib/python2.7/dist-packages/swift/cli/recon.py", line 776, in quarantine_check
    if response['policies']:
KeyError: 'policies'

Packages :
python-swiftclient==2.3.1
swift==2.3.0rc1.post1

Changed in openstack-ansible:
importance: Undecided → High
status: New → Confirmed
Changed in swift:
assignee: nobody → Andy McCrae (andrew-mccrae)
Revision history for this message
Andy McCrae (andrew-mccrae) wrote :

I can't recreate this.

:~# pip list | grep swift
python-swiftclient (2.3.1)
swift (2.3.0rc1.post23)

:~# swift-recon -q
===============================================================================
--> Starting reconnaissance on 4 hosts
===============================================================================
[2015-05-14 10:48:24] Checking quarantine
[quarantined_objects] low: 0, high: 0, avg: 0.0, total: 0, Failed: 0.0%, no_result: 0, reported: 4
[quarantined_accounts] low: 0, high: 0, avg: 0.0, total: 0, Failed: 0.0%, no_result: 0, reported: 4
[quarantined_containers] low: 0, high: 0, avg: 0.0, total: 0, Failed: 0.0%, no_result: 0, reported: 4
===============================================================================

On 2 separate installs, second install I had swift (2.3.0rc1.post29)

I'm not sure if the swift pip package would be the cause of the issue, it seems unlikely, can you recreate the issue, is this happening on all installs you've seen?

Revision history for this message
Christian Schwede (cschwede) wrote :

Bjoern, any chance that there was an older server process still running? The replication key is included in the recon response since ~ 6 months (commit f8fa1a92), and if you use a newer swift-recon cli with an older server process you will see this error.

I submit a patch to make this backward-compatible.

Revision history for this message
Christian Schwede (cschwede) wrote :
Changed in swift:
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.openstack.org/187119
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=b4c1d73ad56e0705d1c5dd1fc4f1d89b09fceff1
Submitter: Jenkins
Branch: master

commit b4c1d73ad56e0705d1c5dd1fc4f1d89b09fceff1
Author: Christian Schwede <email address hidden>
Date: Mon Jun 1 06:50:33 2015 +0000

    Make swift-recon compatible for servers without storage policies

    Swift recon introduced a new key for storage policies, and the CLI expected this
    key in the server response. However, if one updates the CLI but not yet the
    server an exception will be raised, because there is no default value and no
    check if the key is included in the response.

    This change checks if the policies key is included in the response and updates
    one test to ensure backward compability.

    Closes-Bug: 1453599

    Change-Id: I7c7a90f9933bec2ab45595df9dc600a6cba65666

Changed in swift:
status: Confirmed → Fix Committed
Revision history for this message
Bjoern (bjoern-t) wrote :

This is still an issue.
Our deployment updated the swift version to :

python-swiftclient (2.3.1)
swift (2.3.0)

and it was a deployment updated from Juno to Kilo.
The commit b4c1d73ad56e0705d1c5dd1fc4f1d89b09fceff1 did not went into the kilo branch so I assume it will be continue to be an issue

Revision history for this message
Bjoern (bjoern-t) wrote :

Same error:

===============================================================================
--> Starting reconnaissance on 3 hosts
===============================================================================
[2015-06-10 13:33:29] Checking quarantine
Traceback (most recent call last):
  File "/usr/local/bin/swift-recon", line 24, in <module>
    sys.exit(main())
  File "/usr/local/lib/python2.7/dist-packages/swift/cli/recon.py", line 1087, in main
    reconnoiter.main()
  File "/usr/local/lib/python2.7/dist-packages/swift/cli/recon.py", line 1077, in main
    self.quarantine_check(hosts)
  File "/usr/local/lib/python2.7/dist-packages/swift/cli/recon.py", line 776, in quarantine_check
    if response['policies']:
KeyError: 'policies'

Revision history for this message
Andy McCrae (andrew-mccrae) wrote :

Can you try with a new version of recon.py?

As Christian noted, this shouldn't be possible unless your version of recon doesn't match up with your version of swift that is running, although the patch he added to swift means that it will have backwards compatibility.

https://github.com/openstack/swift/blob/master/swift/cli/recon.py

You can just run this in the same way, so download that to somewhere and do a python recon.py -q and see if it outputs? I'd be interested to see if that works or not.

I'm not able to recreate the bug myself at this point.

Revision history for this message
Bjoern (bjoern-t) wrote :

Yes this version of recon is working.

Revision history for this message
Andy McCrae (andrew-mccrae) wrote :

Hmm ok this is odd then.

The issue here is that it should never be possible for the version of recon.py with your swift install to be incompatible, unless you had the server running and it didnt restart after the update (which i guess is possible). E.g. the server is still running on older code, and thats why the recon isn't working, because until Christian's patch it wasn't backwards compatible.

So I'm wondering how that happened - e.g. how is the recon.py that is setup on the server (most likely /usr/local/lib/python2.7/dist-packages/swift/cli/recon.py) not working with the rest of the swift code installed, since we build the swift pip package from 1 sha it shouldn't be 2.

I assume it is possible for the update to happen and for swift services not to restart, but thats just a theory at this point. Will think about it perhaps Matt O can weigh in.

Revision history for this message
Matthew Oliver (matt-0) wrote :

Andy are you building from the Kilo branch/tag?

If so then the Christian's fix wont be there as it was deployed to master. It will be in the next swift release (which will hopefully be not too far away as swift releases more regularly then the rest of OpenStack).

Having said that, the bug should only exist if the CLI version doesn't match the recon middleware in the storage servers (unless your running the CLI that Christian has patched).

The part of swift that pulls the quarantined data is the recon middleware which should be in the pipelines on the storage servers (account, container, object). Becuase it's a middleware it runs with the servers. This code, in kilo, will default storage policies to {}, meaning so long as your CLI and servers are all running the same code, they should work. If you have rolled out the new code to the entire environment, then the CLI and recon middleware should match.. if you have rolled out and still erroring, then the server in question hasn't been restarted (as it is using the old recon middleware in memory) or missed the upgrade.

That's my 2 cents, how ever if you have found a bug, awesome! I will go fix it.

Revision history for this message
Steve Ribble (steve-ribble) wrote :

Bjoern will run point working with Devs for remediation

Revision history for this message
Kevin Carter (kevin-carter) wrote :

marked invalid as there's been nothing to the effect that this is a problem after upstream changes in swift and no confirmation from the bug reporter that this still an issue.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/hummingbird)

Fix proposed to branch: feature/hummingbird
Review: https://review.openstack.org/202227

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on swift (feature/hummingbird)

Change abandoned by Michael Barton (<email address hidden>) on branch: feature/hummingbird
Review: https://review.openstack.org/202227
Reason: Apparently I did this wrong.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/hummingbird)

Fix proposed to branch: feature/hummingbird
Review: https://review.openstack.org/202230

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (feature/hummingbird)
Download full text (72.8 KiB)

Reviewed: https://review.openstack.org/202230
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=f7cb1777e1b514b3345b9e516ed8f89ad1a4ae87
Submitter: Jenkins
Branch: feature/hummingbird

commit 51f806d3e3d3a1fcbc80d2f7d7ddbe5cc4c024c9
Author: John Dickinson <email address hidden>
Date: Tue Jul 14 20:49:08 2015 -0700

    remove Python 2.6 from the classifier

    Change-Id: I67233e9c7b69826242546bd6bd98c24b81070579

commit 278adf5c20101a191979ce1e4d6277e5f209149e
Author: Hisashi Osanai <email address hidden>
Date: Tue Jul 14 15:33:45 2015 +0900

    Make logic of unit tests responsive to the method names

    The two methods, test_authorize_succeeds_for_tenant_name_in_roles and
    test_authorize_succeeds_for_tenant_id_in_roles, have names that don't
    match what they are testing. tenant_name and tenant_id need to be
    switched.

    Change-Id: I7cb0a7d2b2111127fd5d6b55f2da6a3eadf2235d

commit 1cc3eff958fdd4fb07c2b74c52df7829d3125466
Author: Victor Stinner <email address hidden>
Date: Fri Jul 10 13:04:44 2015 +0200

    Fixes for mock 1.1

    The new release of mock 1.1 is more strict. It helped to find bugs in
    tests.

    Closes-Bug: #1473369
    Change-Id: Id179513c6010d827cbcbdda7692a920e29213bcb

commit ff192cfe5705324497a389aa2f22227d75dc0f8e
Author: janonymous <email address hidden>
Date: Wed Jul 8 18:38:22 2015 +0530

    Replace reduce and unichr , these are no longer available in py3

    * Replace reduce() with six.moves.reduce()
    * Replace unichr with six.unichr

    Change-Id: I2038e47e0a6522dd992fd2a4aeff981cf7750fe0

commit 4beceab4f4be99f14025815cf7ed4510ea77f460
Author: OpenStack Proposal Bot <email address hidden>
Date: Thu Jul 9 06:14:56 2015 +0000

    Imported Translations from Transifex

    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure

    Change-Id: I9ff1dde06be45fc7d6c441a1e1c07221f839a9a1

commit 56ee39a7e13417203c5e1816d7a3184a07f85826
Author: Matthew Oliver <email address hidden>
Date: Thu Jul 9 15:19:32 2015 +1000

    Ring builder code clean up follow up patch

    This is a simple change that cleans up a NIT from Sam's 'stop moving
    partitions unnecessarily when overload is on' patch.

    Change-Id: I9d9f1cc23e2bb625d8e158f4d3f64e10973176a1

commit 6cafd0a4c0bb8f311fc59df580b42e801214effd
Author: Oshrit Feder <email address hidden>
Date: Wed Jul 8 15:18:22 2015 +0300

    Fix Container Sync example

    Container-sync realm uses cluster_ as a prefix to specify clusters'
    names. At use, the prefix should not be included. Fixing the examples
    and sample conf to make it clearer that only the name of the cluster
    should be passed.

    Change-Id: I2e521d86faffb59e1b45d3f039987ee023c5e939

commit 125238612f58481316db68d7087252bb7729f447
Author: Janie Richling <email address hidden>
Date: Sat Jul 4 17:08:32 2015 -0500

    Add CORS unit tests to base

    In earlier versions of swift when a request was made with an
    existing origin, but without any CORS settings in the container,
    it was possible to get an u...

tags: added: in-feature-hummingbird
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/crypto)

Fix proposed to branch: feature/crypto
Review: https://review.openstack.org/205579

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (feature/crypto)
Download full text (80.5 KiB)

Reviewed: https://review.openstack.org/205579
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=8ab46b64365b8eab80680f2562f81e8adb3032a3
Submitter: Jenkins
Branch: feature/crypto

commit 89f705e8aab144092d40a13fc4ef19ffef5f3eba
Author: OpenStack Proposal Bot <email address hidden>
Date: Thu Jul 23 06:11:27 2015 +0000

    Imported Translations from Transifex

    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure

    Change-Id: I94cf347564cb33977f33b1e64259bcb39a8cf809

commit a65e9db8752793ec37b594dc9eca5066171825db
Author: Christian Schwede <email address hidden>
Date: Wed Jul 22 10:43:17 2015 +0000

    Removing commented out code in test/unit/account/test_backend.py

    Noticed this while reviewing another change. Looks like the test itself already
    ensures correct functionality of the reclaim() method in AccountBroker without
    the commented code, thus removing this stale code.

    Change-Id: I6a26a7591adef9fd794ca68a4e9c493d1127f93c

commit 99d052772a9585e0befdfd292fd03aefde77180a
Author: Kota Tsuyuzaki <email address hidden>
Date: Mon Jul 13 01:12:43 2015 -0700

    Fix 499 client disconnected on COPY EC object

    Currently, a COPY request for an EC object might go to fail as 499 Client
    disconnected because of the difference between destination request content
    length and actual transferred bytes.

    That is because the conditional response status and content length for
    an EC object range GET is handled at calling the response instance on
    proxy server. Therefore the calling response instance (resp()) will change
    the conditional status from 200 (HTTP_OK) to 206 (PartialContent) and will
    change the content length for the range GET.

    In EC case, sometimes Swift needs whole stored contents to decode a segment.
    It will make 200 HTTP OK response from object-server and proxy-server
    will unfortunately set whole content length to the destination content
    length and it makes the bug 1467677.

    This patch introduces a new method "fix_conditional_response" for
    swift.common.swob.Response that calling _response_iter() and cached the
    iter in the Response instance. By calling it, Swift can set correct condtional
    response any time after setting whole content_length to the response
    instance like EC case.

    Change-Id: If85826243f955d2f03c6ad395215c73daab509b1
    Closes-Bug: #1467677

commit 62ed4f81ef80440550633eaaaa962a4f9383c2d3
Author: Timur Alperovich <email address hidden>
Date: Tue Jul 14 16:56:44 2015 -0700

    Add two functional tests for delimiter.

    The first test verifies that a delimiter will trim entries beyond the
    first matching instance of delimiter (after the given matching prefix,
    if any) and squash duplicates. So, when setting the delimiter
    to "-", given blobs "test", "test-foo" and "test-bar-baz", we expect
    only "test" (no matching delim) and "test-" (trim all characters after
    the first "-", and squash duplicates).

    The second test verifies that when a prefix is provid...

tags: added: in-feature-crypto
Thierry Carrez (ttx)
Changed in swift:
milestone: none → 2.4.0
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.