Method that pulls replication snapshot info from the master site failing due timeouts on RPC call. Timeout depends on size of data stored at master site.

Bug #1362062 reported by Denis M.
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack DBaaS (Trove)
Triaged
Medium
Doug Shelley

Bug Description

This is potential break place once we have master that has lots of Gb's of data. We should not use RPC CALL method, because we can potentially fall into RPC Timeout.

As i can see we should do what we've did with backup API. We need to use RPC CAST and poll backup status(expecting two statuses: COMPLETED/FAILED), at taskmanager site, until it's ready. Another concern against RPC CALL: what if while we expecting for a response from guest AMPQ service will go down? How do we handle it? With give code - we're not doing anything (at the end we would have a backup record that hangs in NEW state).

RPC call timeout (higher one) is static per service launch (taskmanager).
So, there's no way to extend base timeout taking into account the size of master site's data.

See https://review.openstack.org/#/c/109687/11..12/trove/taskmanager/models.py,cm

Denis M. (dmakogon)
description: updated
Greg Lucas (glucas-q)
Changed in trove:
assignee: nobody → Greg Lucas (glucas-q)
Changed in trove:
milestone: ongoing → juno-rc1
Changed in trove:
milestone: juno-rc1 → ongoing
Changed in trove:
status: Confirmed → Triaged
Amrith Kumar (amrith)
Changed in trove:
assignee: Greg Lucas (glucas-q) → Doug Shelley (0-doug)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.