[integration tests] Nova live migrartion test TimeoutExpired: Timeout of 300 seconds expired waiting for instances to be ready

Bug #1584749 reported by Georgy Dyuldin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Invalid
High
Georgy Dyuldin

Bug Description

Test result:

https://mirantis.testrail.com/index.php?/tests/view/6048960

Trace:

self = <mos_tests.nova.live_migration_test.TestLiveMigration object at 0x7f0050f94f10>
big_hypervisors = [<Hypervisor: 1>, <Hypervisor: 2>], block_migration = True
big_port_quota = None, with_volume = False

    @pytest.mark.testrail_id('838028', block_migration=True)
    @pytest.mark.testrail_id('838257',
                             block_migration=False,
                             with_volume=False)
    @pytest.mark.testrail_id('838231', block_migration=False, with_volume=True)
    @pytest.mark.parametrize(
        'block_migration, with_volume',
        [(True, False), (False, False), (False, True)],
        ids=['block LM w/o vol', 'true LM w/o vol', 'true LM w vol'],
        indirect=['block_migration'])
    @pytest.mark.usefixtures('unlimited_live_migrations', 'cleanup_instances',
                             'cleanup_volumes')
    def test_live_migration_max_instances_with_all_flavors(
            self, big_hypervisors, block_migration, big_port_quota,
            with_volume):
        """LM of maximum allowed amount of instances created with all available
                flavors

            Scenario:
                1. Allow unlimited concurrent live migrations
                2. Restart nova-api services on controllers and
                    nova-compute services on computes
                3. Create maximum allowed number of instances on a single
                    compute node
                4. Initiate serial block LM of previously created instances
                    to another compute node and estimate total time elapsed
                5. Check that all live-migrated instances are hosted on target host
                    and are in Active state:
                6. Send pings between pairs of VMs to check that network
                    connectivity between these hosts is still alive
                7. Initiate concurrent block LM of previously created instances
                    to another compute node and estimate total time elapsed
                8. Check that all live-migrated instances are hosted on target host
                    and are in Active state
                9. Send pings between pairs of VMs to check that network
                    connectivity between these hosts is alive
                10. Repeat pp.3-9 for every available flavor
            """
        project_id = self.os_conn.session.get_project_id()
        image = self.os_conn._get_cirros_image()

        instances_create_args = []
        if with_volume:
            max_volumes = self.os_conn.cinder.quotas.get(project_id).volumes
            for i in range(max_volumes):
                vol = common.create_volume(self.os_conn.cinder,
                                           image['id'],
                                           size=10,
                                           timeout=5,
                                           name='volume_i'.format(i))
                self.volumes.append(vol)
                instances_create_args.append(
                    dict(block_device_mapping={'vda': vol.id}))

        zone = self.os_conn.nova.availability_zones.find(zoneName="nova")
        hypervisor1, hypervisor2 = big_hypervisors
        flavors = sorted(self.os_conn.nova.flavors.list(),
                         key=lambda x: -x.ram)
        for flavor in flavors:
            # Skip small flavors
            if flavor.ram < 512:
                continue

            instances_count = min(
                self.os_conn.get_hypervisor_capacity(hypervisor1, flavor),
                self.os_conn.get_hypervisor_capacity(hypervisor2, flavor))

            instance_zone = '{}:{}'.format(zone.zoneName,
                                           hypervisor1.hypervisor_hostname)
            if with_volume:
                instances_count = min(instances_count, max_volumes)
                create_args = instances_create_args[:instances_count]
            else:
                create_args = None
            self.create_instances(instance_zone,
                                  flavor,
                                  instances_count,
> create_args=create_args)

mos_tests/nova/live_migration_test.py:375:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <mos_tests.nova.live_migration_test.TestLiveMigration object at 0x7f0050f94f10>
zone = 'nova:node-2.test.domain.local', flavor = <Flavor: m1.tiny>
instances_count = 38, image_id = None, userdata = None
create_args = [{}, {}, {}, {}, {}, {}, ...]

    def create_instances(self,
                         zone,
                         flavor,
                         instances_count,
                         image_id=None,
                         userdata=None,
                         create_args=None):
        boot_marker = 'INSTANCE BOOT COMPLETED'

        logger.info('Start with flavor {0.name}, '
                    'creates {1} instances'.format(flavor, instances_count))
        if userdata is not None:
            userdata += '\necho "{marker}"'.format(marker=boot_marker)

        if create_args is not None:
            assert len(create_args) == instances_count
        else:
            create_args = [{}] * instances_count
        for i in range(instances_count):
            kwargs = create_args[i]
            instance = self.os_conn.create_server(
                name='server%02d' % i,
                image_id=image_id,
                userdata=userdata,
                flavor=flavor,
                availability_zone=zone,
                key_name=self.keypair.name,
                nics=[{'net-id': self.network['network']['id']}],
                security_groups=[self.security_group.id],
                wait_for_active=False,
                wait_for_avaliable=False,
                **kwargs)
            self.instances.append(instance)
        predicates = [lambda: self.os_conn.is_server_active(x)
                      for x in self.instances]
        common.wait(
            ALL(predicates),
            timeout_seconds=5 * 60,
            waiting_for="instances to became to ACTIVE status")

        if userdata is None:
            predicates = [lambda: self.os_conn.is_server_ssh_ready(x)
                          for x in self.instances]
        else:
            predicates = [lambda: boot_marker in x.get_console_output()
                          for x in self.instances]
        common.wait(
            ALL(predicates),
            timeout_seconds=5 * 60,
> waiting_for="instances to be ready")
E TimeoutExpired: Timeout of 300 seconds expired waiting for instances to be ready

mos_tests/nova/live_migration_test.py:219: TimeoutExpired

Tags: area-qa
Changed in mos:
assignee: nobody → MOS QA Team (mos-qa)
Changed in mos:
milestone: 10.0 → 9.0
Revision history for this message
Anna Babich (ababich) wrote :

Сannot reproduce it on my env, and see in http://cz7776.bud.mirantis.net:8080/jenkins/job/9.0_Neutron_Nova/15/testReport/mos_tests.nova.live_migration_test/TestLiveMigration/test_live_migration_max_instances_with_all_flavors_block_LM_w_o_vol___838028__/ that booted instances went to ACTIVE status successfully, but a lot of "Instance unavailable yet: Error reading SSH protocol banner" and "Authentication failed." messages here for the 'waiting_for="instances to be ready"' event. Also see that some of instances nevertheless got 'instances to be ready' status and were migrated. It looks like some env's resources/configuration issues

Changed in mos:
assignee: MOS QA Team (mos-qa) → Georgy Dyuldin (g-dyuldin)
status: Confirmed → Incomplete
Revision history for this message
Dina Belova (dbelova) wrote :

More than a month in the incomplete state, moving to Invalid. Please mark back to confirmed if it will be reproduced again.

Changed in mos:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.