OpenStack Searchlight

Errors during initial indexing should cause fatal error

Bug #1557003 reported by Steve McLellan on 2016-03-14

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Searchlight	Fix Released	High	Rick Aulino	OpenStack Searchlight mitaka-rc2 "RC2"

Bug Description

If a plugin fails to index correctly, the error is essentially ignored and manage.py moves onto the next one. I can't remember if this was a conscious design decision but it seems like the wrong one, especially once zero-downtime reindexing is implemented. A failure should cause the listener alias to get reset to the old index, the new index be left alone for debugging (along with a message indicating that it exists) and the old index be left.

In a setup with multiple indices any successful indexing operations should then be left alone; a message can indicate to the user what failed and that it's not necessary to reindex the successful indices.

Tags:

Travis Tripp (travis-tripp) on 2016-03-14

Changed in searchlight:
milestone:	none → mitaka-rc1
importance:	Undecided → High

Revision history for this message

Travis Tripp (travis-tripp) wrote on 2016-03-14:

This is actually particularly difficult. I do agree we should stop, but my primary concern is that the zero downtime indexing code is just going to leave around a bunch of bad indexes and the listeners will keep populating bad indexes. Soon we'll have dozens of indexes that aren't removed.

Revision history for this message

Travis Tripp (travis-tripp) wrote on 2016-03-14:

^ just a note saying we have to handle cleanup better.

Revision history for this message

Travis Tripp (travis-tripp) wrote on 2016-03-14:

From: https://review.openstack.org/#/c/277860/12/searchlight/cmd/manage.py

If there are any error in any of the processing anywhere along here, there is no cleanup logic. The indexes will continue to exist and be part of the alias, meaning the listeners will be populating outdated indexes from now until somebody happens to notice.
Just in my testing with stopping and starting at breakpoints a few times, i ended up with multiple dead indices.

I guess this can be handled in the following bug (but this will require discussion):

https://bugs.launchpad.net/searchlight/+bug/1557003

E.g. GET searchlight-listener/_status
{
   "_shards": {
      "total": 30,
      "successful": 15,
      "failed": 0
   },
   "indices": {
      "searchlight-2016_03_14_18_14_12": {
      },
      "searchlight-2016_03_14_16_39_01": {
      },
      "searchlight-2016_03_14_18_12_06": {
         }
      }
   }
}

Rick Aulino (rick-aulino) on 2016-03-17

Changed in searchlight:
assignee:	nobody → Rick Aulino (rick-aulino)

OpenStack Infra (hudson-openstack) on 2016-03-17

Changed in searchlight:
status:	New → In Progress

Travis Tripp (travis-tripp) on 2016-03-22

Changed in searchlight:
milestone:	mitaka-rc1 → mitaka-rc2

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-03-28: Fix merged to searchlight (master)

Reviewed: https://review.openstack.org/293079
Committed: https://git.openstack.org/cgit/openstack/searchlight/commit/?id=bc9eeaa9bac867eb28c07077fe7ee4cc4af9db5f
Submitter: Jenkins
Branch: master

commit bc9eeaa9bac867eb28c07077fe7ee4cc4af9db5f
Author: Rick Aulino <email address hidden>
Date: Tue Mar 15 12:01:54 2016 -0600

Zero Downtime Re-indexing Error Handling.

See spec:
http://specs.openstack.org/openstack/searchlight-specs/specs/mitaka/zero-downtime-reindexing.html

    The Zero Downtime code is deficient in the error handling department.
    One cleanup method too few. If we get any unrecoverable exceptions
    while creating/preparing the indexes or aliases, we cannot continue.
    In this case we need to roll back and undo any changes we made.

Change-Id: I5591d5208f75015f93b8431e73d2d7c7f5f2e167
Closes-Bug: #1557003

Changed in searchlight:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-03-28: Fix proposed to searchlight (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/298466

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-03-28: Fix merged to searchlight (stable/mitaka)

Reviewed: https://review.openstack.org/298466
Committed: https://git.openstack.org/cgit/openstack/searchlight/commit/?id=531447d28045e0da15a85bf942ee03d7707b3679
Submitter: Jenkins
Branch: stable/mitaka

commit 531447d28045e0da15a85bf942ee03d7707b3679
Author: Rick Aulino <email address hidden>
Date: Tue Mar 15 12:01:54 2016 -0600

Zero Downtime Re-indexing Error Handling.

See spec:
http://specs.openstack.org/openstack/searchlight-specs/specs/mitaka/zero-downtime-reindexing.html

    Change-Id: I5591d5208f75015f93b8431e73d2d7c7f5f2e167
    Closes-Bug: #1557003
    (cherry picked from commit bc9eeaa9bac867eb28c07077fe7ee4cc4af9db5f)

tags:

added: in-stable-mitaka

Revision history for this message

Doug Hellmann (doug-hellmann) wrote on 2016-03-31: Fix included in openstack/searchlight 0.2.0.0rc2

This issue was fixed in the openstack/searchlight 0.2.0.0rc2 release candidate.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-04-07: Fix proposed to searchlight (master)

Fix proposed to branch: master
Review: https://review.openstack.org/302579

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-04-07: Fix merged to searchlight (master)

Download full text (5.8 KiB)

Reviewed: https://review.openstack.org/302579
Committed: https://git.openstack.org/cgit/openstack/searchlight/commit/?id=73b8efe4addac065faa489b5d4ef5dc11ee2706b
Submitter: Jenkins
Branch: master

commit 2a44b637b2ce1202a873cd8d136cc373c3cfc7c6
Author: Lakshmi N Sampath <email address hidden>
Date: Wed Mar 30 20:37:22 2016 -0700

Backward compatibility with designate v1 api

Adds backward compatibility for designate v1 api

    dns.record.create, dns.record.update and dns.record.delete
    events needs to be handled if cli user switches between
    v1 and v2 api. Currenlty those are ignored and indexed records
    goes out of sync.

Also switch dns.domain.xxxxx to dns.zone.xxxxx events since
zone is v2 api.

Change-Id: I459841423df776c5afa41255798dacc97b7d9e3e
Closes-Bug: #1550506

commit 4abf9618720a6c0b2250a92e6869d49dc6aa2032
Author: Steve McLellan <email address hidden>
Date: Tue Mar 29 10:33:02 2016 -0500

Disable oslo_config file discovery in tests

    oslo_config looks for files in some standard places
    (~/.searchlight.conf, /etc/searchlight/searchlight.conf,
    ./etc/searchlight.conf and possibly more). This means that
    unit tests were using one of these config files which could
    cause failures under some circumstances. This patch monkey
    patches the find_config_files function during tests which
    prevents this behavior.

Similar to https://review.openstack.org/#/c/172354

    Change-Id: I2b3c98ccd8f3540d5fe8bb1e21596c665600a7f6
    Closes-Bug: #1563053
    (cherry picked from commit 9fecc29a51ac06fa3d27bdd59e5016d77df4d12c)

commit bcc82435ef20edf6fc884b012844f112527161d8
Author: Steve McLellan <email address hidden>
Date: Thu Mar 31 13:50:24 2016 -0500

Increase heap for elasticsearch in func tests

    We're seeing a number of outofmemory exceptions from elasticsearch.
    Increasing the java heap space to 50m to see if that helps temporarily;
    longer term we need to look at using fewer processes.

Change-Id: I32277b0613b1ee1a25005655b108fe2b6530679b
(cherry picked from commit 8a287fcc0d083d133ca4b4d0a75180eeb618ce68)

commit 1e80553aa46153d5a3c0e60047b3d0cc9dbcf5d9
Author: Steve McLellan <email address hidden>
Date: Tue Mar 29 14:27:52 2016 -0500

Fix _is_multiple_alias_exception signature

Remove erroneous 'self' from function signature
to _is_multiple_alias_exception.

    Change-Id: I6fcdd13f360b31be66b553ff8e88011107ecd2b7
    Closes-Bug: #1563516
    (cherry picked from commit 600b691bf950108e93beafe8ae4691503e5b8108)

commit 574a095ebe897cdab935a803ae036c43de32e52f
Author: Steve McLellan <email address hidden>
Date: Mon Mar 28 11:29:54 2016 -0500

Add missing zero-downtime indexing documentation

    Adds documentation that was not added with zero-downtime reindexing,
    describing the index model and the mechanisms by which plugins are
    indexed.

    Change-Id: Ibd21724d14f2d92a78b908581687ae6811ee3611
    Closes-Bug: #1561190
    (cherry picked from commit a73cd1d85cbf172cfc5882d92d269e1fd5401bdb)

commit bb5ff0701301d6d4b86c313a3d221e7d9c757f75
A...

Reviewed:  https://review.openstack.org/302579
Committed: https://git.openstack.org/cgit/openstack/searchlight/commit/?id=73b8efe4addac065faa489b5d4ef5dc11ee2706b
Submitter: Jenkins
Branch:    master

commit 2a44b637b2ce1202a873cd8d136cc373c3cfc7c6
Author: Lakshmi N Sampath <lakshmi.sampath@hp.com>
Date:   Wed Mar 30 20:37:22 2016 -0700

Backward compatibility with designate v1 api
    
    Adds backward compatibility for designate v1 api
    
    dns.record.create, dns.record.update and dns.record.delete
    events needs to be handled if cli user switches between
    v1 and v2 api. Currenlty those are ignored and indexed records
    goes out of sync.
    
    Also switch dns.domain.xxxxx to dns.zone.xxxxx events since
    zone is v2 api.
    
    Change-Id: I459841423df776c5afa41255798dacc97b7d9e3e
    Closes-Bug: #1550506

commit 4abf9618720a6c0b2250a92e6869d49dc6aa2032
Author: Steve McLellan <steven.j.mclellan@gmail.com>
Date:   Tue Mar 29 10:33:02 2016 -0500

Disable oslo_config file discovery in tests
    
    oslo_config looks for files in some standard places
    (~/.searchlight.conf, /etc/searchlight/searchlight.conf,
    ./etc/searchlight.conf and possibly more). This means that
    unit tests were using one of these config files which could
    cause failures under some circumstances. This patch monkey
    patches the find_config_files function during tests which
    prevents this behavior.
    
    Similar to https://review.openstack.org/#/c/172354
    
    Change-Id: I2b3c98ccd8f3540d5fe8bb1e21596c665600a7f6
    Closes-Bug: #1563053
    (cherry picked from commit 9fecc29a51ac06fa3d27bdd59e5016d77df4d12c)

commit bcc82435ef20edf6fc884b012844f112527161d8
Author: Steve McLellan <steven.j.mclellan@gmail.com>
Date:   Thu Mar 31 13:50:24 2016 -0500

Increase heap for elasticsearch in func tests
    
    We're seeing a number of outofmemory exceptions from elasticsearch.
    Increasing the java heap space to 50m to see if that helps temporarily;
    longer term we need to look at using fewer processes.
    
    Change-Id: I32277b0613b1ee1a25005655b108fe2b6530679b
    (cherry picked from commit 8a287fcc0d083d133ca4b4d0a75180eeb618ce68)

commit 1e80553aa46153d5a3c0e60047b3d0cc9dbcf5d9
Author: Steve McLellan <steven.j.mclellan@gmail.com>
Date:   Tue Mar 29 14:27:52 2016 -0500

Fix _is_multiple_alias_exception signature
    
    Remove erroneous 'self' from function signature
    to _is_multiple_alias_exception.
    
    Change-Id: I6fcdd13f360b31be66b553ff8e88011107ecd2b7
    Closes-Bug: #1563516
    (cherry picked from commit 600b691bf950108e93beafe8ae4691503e5b8108)

commit 574a095ebe897cdab935a803ae036c43de32e52f
Author: Steve McLellan <steven.j.mclellan@gmail.com>
Date:   Mon Mar 28 11:29:54 2016 -0500

Add missing zero-downtime indexing documentation
    
    Adds documentation that was not added with zero-downtime reindexing,
    describing the index model and the mechanisms by which plugins are
    indexed.
    
    Change-Id: Ibd21724d14f2d92a78b908581687ae6811ee3611
    Closes-Bug: #1561190
    (cherry picked from commit a73cd1d85cbf172cfc5882d92d269e1fd5401bdb)

commit bb5ff0701301d6d4b86c313a3d221e7d9c757f75
Author: Rick Aulino <rick.aulino@hp.com>
Date:   Mon Mar 21 15:37:22 2016 -0600

Re-indexing optimization for doc_type
    
    When the user calls the "index sync" command and specifies a
    document type (i.e. OS::Nova::Server) we will go to all plugins
    and have them re-index. Instead only the specified doc type(s)
    should re-index through their plugins. The other doc types should
    be transferred using internal ElasticSearch functionality, like
    the Helper reindex method. If no type is specified we will reindex
    through the plugins.
    
    Change-Id: Ida85002c306a52ccb6b5ec73a3dbe021bca333bc
    Closes-Bug: #1558618
    (cherry picked from commit cb2137bd23f112ff6cb64c13eb9a76f36eb3c9c8)

commit 25ca1311be0363e1650d530e1f833871143b80a2
Author: Steve McLellan <steven.j.mclellan@gmail.com>
Date:   Thu Mar 24 12:21:36 2016 -0500

Add missing mitaka release notes
    
    Add release notes for zero downtime reindexing and for role-based
    index separation.
    
    Change-Id: I77a0e26b326aa54c07830570bb9f351ee5e171a8
    Closes-Bug: #1561189
    (cherry picked from commit ba67c2cdb100819c1462f446c4736503d409ad7d)

commit 531447d28045e0da15a85bf942ee03d7707b3679
Author: Rick Aulino <rick.aulino@hp.com>
Date:   Tue Mar 15 12:01:54 2016 -0600

Zero Downtime Re-indexing Error Handling.
    
    See spec:
    http://specs.openstack.org/openstack/searchlight-specs/specs/mitaka/zero-downtime-reindexing.html
    
    The Zero Downtime code is deficient in the error handling department.
    One cleanup method too few. If we get any unrecoverable exceptions
    while creating/preparing the indexes or aliases, we cannot continue.
    In this case we need to roll back and undo any changes we made.
    
    Change-Id: I5591d5208f75015f93b8431e73d2d7c7f5f2e167
    Closes-Bug: #1557003
    (cherry picked from commit bc9eeaa9bac867eb28c07077fe7ee4cc4af9db5f)

commit b1d339b8da178e87aeda09cae6783b3b6fef504f
Author: Steve McLellan <steven.j.mclellan@gmail.com>
Date:   Tue Mar 22 11:04:13 2016 -0500

Don't index DHCP ports
    
    We don't have any reliable way to index DHCP ports so for now we won't
    index them on initial indexing; patch also disables their indexing from
    notifications for consistency. If neutron starts sending notifications
    for them both pieces of code should be removed.
    
    Change-Id: Ia783a9b59fa05b7d89f5add95797fd509ad391a2
    Closes-Bug: #1558790
    (cherry picked from commit 22e05a88a4c68ff2e3d81aadc7b1c965bfb1e80c)

commit a5ccab8a9db1105fde5f6f3005b78cd05fcf4829
Author: Thierry Carrez <thierry@openstack.org>
Date:   Wed Mar 23 11:05:46 2016 +0100

Update .gitreview for stable/mitaka
    
    Change-Id: Ib2bff9b50a7d1b8aad132ec496bde1ae71be82cc

Revision history for this message

Doug Hellmann (doug-hellmann) wrote on 2016-06-02: Fix included in openstack/searchlight 1.0.0.0b1

#10

This issue was fixed in the openstack/searchlight 1.0.0.0b1 development milestone.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.