undercloud install fails (nova-db-sync timeout) on VM on an SATA disk hypervisor

Bug #1661396 reported by Alex Schultz
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Incomplete
Undecided
Unassigned
tripleo
Fix Released
Critical
Alex Schultz

Bug Description

2017-02-01 15:24:49,084 INFO: Error: Command exceeded timeout
2017-02-01 15:24:49,084 INFO: Error: /Stage[main]/Nova::Db::Sync/Exec[nova-db-sync]/returns: change from notrun to 0 failed: Command exceeded timeout

The nova-db-sync command is exceeding 300 seconds when installing the undercloud on a VM that is using SATA based storage. This seems to be related to the switch to innodb_file_per_table to ON which has doubled the amount of time the db sync takes on this class of hardware. To unblock folks doing Ocata testing, we need to skip doing this in Ocata and will need to revisit enabling it in Pike.

See Bug 1660722 for details as to why we enabled this.

Changed in tripleo:
assignee: nobody → Alex Schultz (alex-schultz)
status: New → In Progress
Revision history for this message
Alex Schultz (alex-schultz) wrote :
Revision history for this message
Mike Bayer (zzzeek) wrote :

why aren't we opening a bug in Nova that their migrations are overall exceedingly slow?

Revision history for this message
Emilien Macchi (emilienm) wrote :

Mike: done.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (master)

Reviewed: https://review.openstack.org/428435
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=3f7e74ab24bb43f9ad7e24e0efd4206ac6a3dd4e
Submitter: Jenkins
Branch: master

commit 3f7e74ab24bb43f9ad7e24e0efd4206ac6a3dd4e
Author: Alex Schultz <email address hidden>
Date: Thu Feb 2 21:29:32 2017 +0000

    Revert "set innodb_file_per_table to ON for MySQL / Galera"

    This reverts commit 621ea892a299d2029348db2b56fea1338bd41c48.

    We're getting performance problems on SATA disks.

    Change-Id: I30312fd5ca3405694d57e6a4ff98b490de388b92
    Closes-Bug: #1661396
    Related-Bug: #1660722

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to instack-undercloud (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/428843

Revision history for this message
Matt Riedemann (mriedem) wrote :

Do you have a log file or something that shows us which of the migrations are taking the longest so someone could dig into this?

Changed in nova:
status: New → Incomplete
tags: added: db
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to instack-undercloud (master)

Reviewed: https://review.openstack.org/428843
Committed: https://git.openstack.org/cgit/openstack/instack-undercloud/commit/?id=578599f2eed3611b66ed96031a8e6260716b4b79
Submitter: Jenkins
Branch: master

commit 578599f2eed3611b66ed96031a8e6260716b4b79
Author: Alex Schultz <email address hidden>
Date: Fri Feb 3 11:09:34 2017 -0700

    Increase sync timeout for nova db syncs

    We have seen on lower quality hardware that the nova db syncs can take
    an excessive amount of time. In order to still support deploying on this
    hardware, let's increase the timeout from the default 300 seconds to 900
    seconds to allow for this less performant gear. It is not recommended to
    increase this past 900 as if we start hitting this then we need to be
    understanding what is occuring in these db syncs. 300 seconds should be
    enough time to setup a database especially on install. But there are
    cases for upgrades or slower disks where it can exceed 300 seconds.

    Change-Id: I77507c638237072e38d9888aff3da884aeff0b59
    Related-Bug: #1660722
    Related-Bug: #1661396

Revision history for this message
Matt Riedemann (mriedem) wrote :

I wonder if https://review.openstack.org/#/c/430390/ would indirectly help here in case it makes the base (and biggest) DB schema migration in nova run faster.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-tripleo 6.2.0

This issue was fixed in the openstack/puppet-tripleo 6.2.0 release.

Revision history for this message
Sean Dague (sdague) wrote :

Automatically discovered version ocata in description. If this is incorrect, please update the description to include 'nova version: ...'

tags: added: openstack-version.ocata
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.