Insufficient RPC default timeout for pre_live_migration

Bug #1243601 reported by Loganathan Parthipan
32
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Expired
Undecided
Unassigned

Bug Description

Pre_live_migration on destination node take a long time if the base image of the instance to be moved is not yet cached and large. The default RPC timeout of 60s is insufficient for this. This would result in a timeout error and migration will be aborted. A typical usecase is an instance spawned off a snapshot for the first time.

However, changing the default timeout globally is not recommended since it's difficult to model the system behaviour changes that can be caused by this. In addition, we have seen that we need timeouts of over 1200s in certain scenarios and this is obviously unsuitable for a global timeout.

It would be good to change the timeout just for the pre_live_migration RPC api and keep it configurable.

tags: added: compute
melanie witt (melwitt)
Changed in nova:
importance: Undecided → Medium
status: New → Confirmed
Yaguang Tang (heut2008)
Changed in nova:
assignee: nobody → Yaguang Tang (heut2008)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/61728

Changed in nova:
status: Confirmed → In Progress
Matt Riedemann (mriedem)
Changed in nova:
status: In Progress → New
assignee: Yaguang Tang (heut2008) → nobody
melanie witt (melwitt)
Changed in nova:
status: New → Confirmed
jichenjc (jichenjc)
Changed in nova:
assignee: nobody → jichencom (jichenjc)
tags: added: live-migrate
Revision history for this message
Timofey Durakov (tdurakov) wrote :

@jichenjc, are you working on this patch?

Revision history for this message
jichenjc (jichenjc) wrote :

Removed my self, thanks~

Changed in nova:
assignee: jichenjc (jichenjc) → nobody
Paul Murray (pmurray)
tags: added: live-migration
removed: live-migrate
lvmxh (shaohef)
Changed in nova:
assignee: nobody → lvmxh (shaohef)
Revision history for this message
lvmxh (shaohef) wrote :

https://review.openstack.org/61728 was abandon on Dec 22, 2013.

It simply add a config option live_migration_rpc_timeout

Joe Gordon , prefers to fix the logic so we don't use a call that can potentially take so long to respond.

It should be async.

Changed in nova:
assignee: lvmxh (shaohef) → nobody
stgleb (gstepanov)
Changed in nova:
assignee: nobody → stgleb (gstepanov)
Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote :

Solving an inconsistency: This bug report has an assignee and it looks
like this could result in a patch. Therefore I switch the status to
"In Progress".
Dear assignee, please provide a patch in the next 2 weeks. If you stop
working on this report, please remove yourself as assignee and switch
the status back. If you need assistance, reach out on the IRC channel
#openstack-nova or use the mailing list.

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
Paul Carlton (paul-carlton2) wrote :

This is still a live issue, HPE are carrying a patch for this in our product that I'd like to remove, will progress this issue.

Changed in nova:
assignee: stgleb (gstepanov) → Paul Carlton (paul-carlton2)
Revision history for this message
Paul Carlton (paul-carlton2) wrote :

stgleb, if you are working on this then feel free to assign the bug back to you

Revision history for this message
Paul Carlton (paul-carlton2) wrote :

cancel that, apparently we should have dropped this patch

Changed in nova:
assignee: Paul Carlton (paul-carlton2) → nobody
Revision history for this message
Pushkar Umaranikar (pushkar-umaranikar) wrote :

Solving inconsistency: Changing bug status from "In progress" to "Confirmed" as it has not assigned to anyone.

Changed in nova:
status: In Progress → Confirmed
Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote : Cleanup EOL bug report

This is an automated cleanup. This bug report has been closed because it
is older than 18 months and there is no open code change to fix this.
After this time it is unlikely that the circumstances which lead to
the observed issue can be reproduced.

If you can reproduce the bug, please:
* reopen the bug report (set to status "New")
* AND add the detailed steps to reproduce the issue (if applicable)
* AND leave a comment "CONFIRMED FOR: <RELEASE_NAME>"
  Only still supported release names are valid (LIBERTY, MITAKA, OCATA, NEWTON).
  Valid example: CONFIRMED FOR: LIBERTY

Changed in nova:
importance: Medium → Undecided
status: Confirmed → Expired
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/python-tripleoclient 0.0.10

This issue was fixed in the openstack/python-tripleoclient 0.0.10 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.