Errors during initial indexing should cause fatal error

Bug #1557003 reported by Steve McLellan
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Searchlight
Fix Released
High
Rick Aulino

Bug Description

If a plugin fails to index correctly, the error is essentially ignored and manage.py moves onto the next one. I can't remember if this was a conscious design decision but it seems like the wrong one, especially once zero-downtime reindexing is implemented. A failure should cause the listener alias to get reset to the old index, the new index be left alone for debugging (along with a message indicating that it exists) and the old index be left.

In a setup with multiple indices any successful indexing operations should then be left alone; a message can indicate to the user what failed and that it's not necessary to reindex the successful indices.

Changed in searchlight:
milestone: none → mitaka-rc1
importance: Undecided → High
Revision history for this message
Travis Tripp (travis-tripp) wrote :

This is actually particularly difficult. I do agree we should stop, but my primary concern is that the zero downtime indexing code is just going to leave around a bunch of bad indexes and the listeners will keep populating bad indexes. Soon we'll have dozens of indexes that aren't removed.

Revision history for this message
Travis Tripp (travis-tripp) wrote :

^ just a note saying we have to handle cleanup better.

Revision history for this message
Travis Tripp (travis-tripp) wrote :

From: https://review.openstack.org/#/c/277860/12/searchlight/cmd/manage.py

If there are any error in any of the processing anywhere along here, there is no cleanup logic. The indexes will continue to exist and be part of the alias, meaning the listeners will be populating outdated indexes from now until somebody happens to notice.
Just in my testing with stopping and starting at breakpoints a few times, i ended up with multiple dead indices.

I guess this can be handled in the following bug (but this will require discussion):

https://bugs.launchpad.net/searchlight/+bug/1557003

E.g. GET searchlight-listener/_status
 {
   "_shards": {
      "total": 30,
      "successful": 15,
      "failed": 0
   },
   "indices": {
      "searchlight-2016_03_14_18_14_12": {
      },
      "searchlight-2016_03_14_16_39_01": {
      },
      "searchlight-2016_03_14_18_12_06": {
         }
      }
   }
 }

Changed in searchlight:
assignee: nobody → Rick Aulino (rick-aulino)
Changed in searchlight:
status: New → In Progress
Changed in searchlight:
milestone: mitaka-rc1 → mitaka-rc2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to searchlight (master)

Reviewed: https://review.openstack.org/293079
Committed: https://git.openstack.org/cgit/openstack/searchlight/commit/?id=bc9eeaa9bac867eb28c07077fe7ee4cc4af9db5f
Submitter: Jenkins
Branch: master

commit bc9eeaa9bac867eb28c07077fe7ee4cc4af9db5f
Author: Rick Aulino <email address hidden>
Date: Tue Mar 15 12:01:54 2016 -0600

    Zero Downtime Re-indexing Error Handling.

    See spec:
    http://specs.openstack.org/openstack/searchlight-specs/specs/mitaka/zero-downtime-reindexing.html

    The Zero Downtime code is deficient in the error handling department.
    One cleanup method too few. If we get any unrecoverable exceptions
    while creating/preparing the indexes or aliases, we cannot continue.
    In this case we need to roll back and undo any changes we made.

    Change-Id: I5591d5208f75015f93b8431e73d2d7c7f5f2e167
    Closes-Bug: #1557003

Changed in searchlight:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to searchlight (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/298466

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to searchlight (stable/mitaka)

Reviewed: https://review.openstack.org/298466
Committed: https://git.openstack.org/cgit/openstack/searchlight/commit/?id=531447d28045e0da15a85bf942ee03d7707b3679
Submitter: Jenkins
Branch: stable/mitaka

commit 531447d28045e0da15a85bf942ee03d7707b3679
Author: Rick Aulino <email address hidden>
Date: Tue Mar 15 12:01:54 2016 -0600

    Zero Downtime Re-indexing Error Handling.

    See spec:
    http://specs.openstack.org/openstack/searchlight-specs/specs/mitaka/zero-downtime-reindexing.html

    The Zero Downtime code is deficient in the error handling department.
    One cleanup method too few. If we get any unrecoverable exceptions
    while creating/preparing the indexes or aliases, we cannot continue.
    In this case we need to roll back and undo any changes we made.

    Change-Id: I5591d5208f75015f93b8431e73d2d7c7f5f2e167
    Closes-Bug: #1557003
    (cherry picked from commit bc9eeaa9bac867eb28c07077fe7ee4cc4af9db5f)

tags: added: in-stable-mitaka
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/searchlight 0.2.0.0rc2

This issue was fixed in the openstack/searchlight 0.2.0.0rc2 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to searchlight (master)

Fix proposed to branch: master
Review: https://review.openstack.org/302579

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to searchlight (master)
Download full text (5.8 KiB)

Reviewed: https://review.openstack.org/302579
Committed: https://git.openstack.org/cgit/openstack/searchlight/commit/?id=73b8efe4addac065faa489b5d4ef5dc11ee2706b
Submitter: Jenkins
Branch: master

commit 2a44b637b2ce1202a873cd8d136cc373c3cfc7c6
Author: Lakshmi N Sampath <email address hidden>
Date: Wed Mar 30 20:37:22 2016 -0700

    Backward compatibility with designate v1 api

    Adds backward compatibility for designate v1 api

    dns.record.create, dns.record.update and dns.record.delete
    events needs to be handled if cli user switches between
    v1 and v2 api. Currenlty those are ignored and indexed records
    goes out of sync.

    Also switch dns.domain.xxxxx to dns.zone.xxxxx events since
    zone is v2 api.

    Change-Id: I459841423df776c5afa41255798dacc97b7d9e3e
    Closes-Bug: #1550506

commit 4abf9618720a6c0b2250a92e6869d49dc6aa2032
Author: Steve McLellan <email address hidden>
Date: Tue Mar 29 10:33:02 2016 -0500

    Disable oslo_config file discovery in tests

    oslo_config looks for files in some standard places
    (~/.searchlight.conf, /etc/searchlight/searchlight.conf,
    ./etc/searchlight.conf and possibly more). This means that
    unit tests were using one of these config files which could
    cause failures under some circumstances. This patch monkey
    patches the find_config_files function during tests which
    prevents this behavior.

    Similar to https://review.openstack.org/#/c/172354

    Change-Id: I2b3c98ccd8f3540d5fe8bb1e21596c665600a7f6
    Closes-Bug: #1563053
    (cherry picked from commit 9fecc29a51ac06fa3d27bdd59e5016d77df4d12c)

commit bcc82435ef20edf6fc884b012844f112527161d8
Author: Steve McLellan <email address hidden>
Date: Thu Mar 31 13:50:24 2016 -0500

    Increase heap for elasticsearch in func tests

    We're seeing a number of outofmemory exceptions from elasticsearch.
    Increasing the java heap space to 50m to see if that helps temporarily;
    longer term we need to look at using fewer processes.

    Change-Id: I32277b0613b1ee1a25005655b108fe2b6530679b
    (cherry picked from commit 8a287fcc0d083d133ca4b4d0a75180eeb618ce68)

commit 1e80553aa46153d5a3c0e60047b3d0cc9dbcf5d9
Author: Steve McLellan <email address hidden>
Date: Tue Mar 29 14:27:52 2016 -0500

    Fix _is_multiple_alias_exception signature

    Remove erroneous 'self' from function signature
    to _is_multiple_alias_exception.

    Change-Id: I6fcdd13f360b31be66b553ff8e88011107ecd2b7
    Closes-Bug: #1563516
    (cherry picked from commit 600b691bf950108e93beafe8ae4691503e5b8108)

commit 574a095ebe897cdab935a803ae036c43de32e52f
Author: Steve McLellan <email address hidden>
Date: Mon Mar 28 11:29:54 2016 -0500

    Add missing zero-downtime indexing documentation

    Adds documentation that was not added with zero-downtime reindexing,
    describing the index model and the mechanisms by which plugins are
    indexed.

    Change-Id: Ibd21724d14f2d92a78b908581687ae6811ee3611
    Closes-Bug: #1561190
    (cherry picked from commit a73cd1d85cbf172cfc5882d92d269e1fd5401bdb)

commit bb5ff0701301d6d4b86c313a3d221e7d9c757f75
A...

Read more...

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/searchlight 1.0.0.0b1

This issue was fixed in the openstack/searchlight 1.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.