[9.2] [SWARM] RH compute tests are blocked

Bug #1640093 reported by Yury Tregubov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Yury Tregubov

Bug Description

Since the 9.2 snapshot #465 several OSTF in deploy_rh_compute_ha_one_controller_tun are failed.

======================================================================
FAIL: Deploy RH-based compute in HA One Controller mode
.......
AssertionError: Failed 3 OSTF tests; should fail 0 tests. Names of failed tests:
  - Create volume and attach it to instance (failure) Time limit exceeded while waiting for volume becoming 'available' to finish. Please refer to OpenStack logs for more details.
  - Check network connectivity from instance via floating IP (failure) Instance is not reachable by IP. Please refer to OpenStack logs for more details.
  - Launch instance with file injection (failure) Execution command on Instance fails with unexpected result. Please refer to OpenStack logs for more details.

The tests are run right after the RH compute is deployed.

The diagnostic snapshots are available in the job:
https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.rh/117/

However the compute after the deploy seems to be operational.

[root@nailgun ~]# fuel nodes
id | status | name | cluster | ip | mac | roles | pending_roles | online | group_id
---+--------+---------------------+---------+------------+-------------------+------------+---------------+--------+---------
 3 | ready | slave-03_cinder | 1 | 10.109.0.5 | 64:c5:d8:ee:48:28 | cinder | | 1 | 1
 1 | ready | slave-01_controller | 1 | 10.109.0.3 | 64:a9:81:0a:ed:e2 | controller | | 1 | 1
 2 | ready | slave-02_compute | 1 | 10.109.0.4 | 64:5b:5b:cb:8b:6d | compute | | | 1

It is even possible to get on it with ssh:

[root@nailgun ~]# ssh 10.109.0.4
Warning: Permanently added '10.109.0.4' (ECDSA) to the list of known hosts.
Last login: Thu Jun 2 14:31:17 2016
[root@rh-1 ~]# uname -a
Linux rh-1.test.domain.local 3.10.0-327.18.2.el7.x86_64 #1 SMP Fri Apr 8 05:09:53 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux

But in Fuel UI it is stated as OFFLINE
And this ERROR is seen in logs on it:
nova.compute.manager [req-d11c43d1-e704-4bdd-9ac8-4517e9e64991 - - - - -] No compute node record for host node-2.test.domain.local

Changed in fuel:
milestone: none → 9.2
importance: Undecided → High
tags: added: swarm-blocker
Changed in fuel:
assignee: nobody → MOS Maintenance (mos-maintenance)
status: New → Confirmed
Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

Linux team, can you check, there is some problem with qemu or libvirt on rhel node.

Changed in fuel:
assignee: MOS Maintenance (mos-maintenance) → MOS Linux (mos-linux)
tags: added: area-linux
Changed in fuel:
assignee: MOS Linux (mos-linux) → Fuel Sustaining (fuel-sustaining-team)
Changed in fuel:
assignee: Fuel Sustaining (fuel-sustaining-team) → MOS Linux (mos-linux)
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

We took a look at a live environment with this problem reproduced together with Oleksii Molchanov and found out, that VMs got stuck very early on boot (please see a screenshot attached). This looks like a problem with qemu. I'm not sure how to proceed here. Any advice is welcome.

Revision history for this message
Ivan Suzdal (isuzdal) wrote :

Hmm… First of all, this seems a little strange.

[root@nailgun ~]# ssh node-2 rpm -q nailgun-agent
package nailgun-agent is not installed

Is it really supposed to be?

Changed in fuel:
assignee: MOS Linux (mos-linux) → Yury Tregubov (ytregubov)
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
status: Confirmed → Incomplete
Revision history for this message
Alexander Kurenyshev (akurenyshev) wrote :

Guys, new reproduce here https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.rh/126/console

If anybody want I could revert the env till Nov 18

Changed in fuel:
status: Incomplete → Confirmed
assignee: Yury Tregubov (ytregubov) → MOS Linux (mos-linux)
Revision history for this message
Ivan Suzdal (isuzdal) wrote :

Long time debugging short:

Libvrit decides which 'accel' value should be passed to qemu command line by reading 'domain type' from xml (if kvm => accel=kvm, if qemu => accel=tcg).

Qemu parses command line and if it contains 'enable-kvm' it add/set accel key value to kvm.

In Ubuntu 'kvm' is a wrapper which append '-enable-kvm' key to qemu command line. In RH is not.

My proposal is change virt_type to kvm in nova.conf

Changed in fuel:
assignee: MOS Linux (mos-linux) → Yury Tregubov (ytregubov)
Revision history for this message
Victor Ryzhenkin (vryzhenkin) wrote :

JFYI: RH or Oracle nodes can't be managed by nailgun.

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

It is a blocker for QA team, OL-based compute nodes don't work.

It is also a blocker for verification of the fix for https://bugs.launchpad.net/fuel/+bug/1640771

tags: added: blocker-for-qa
Revision history for this message
Vladimir Jigulin (vjigulin) wrote :

Last 2 runs were not affected

Revision history for this message
Ekaterina Shutova (eshutova) wrote :

Checked run: https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.rh/150/console
OSTFs are passed, but test is now failed on next step: "Prepare node for Puppet run":
DevopsCalledProcessError: Command '/usr/sbin/subscription-manager attach --pool=8a85f98151631b320151648b7333720b' returned exit code 1 while expected [0]
 STDOUT:
Pool with id 8a85f98151631b320151648b7333720b could not be found

Revision history for this message
Vladimir Jigulin (vjigulin) wrote :

"rh-based compute" feature no longer supported in 9.x and 10.0. These tests are disabled, so this bug is not relevant anymore

Changed in fuel:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.