Deploying ISO triggers to an Error: DVR, NFV, Ceilometer, Sahara, Ironic

Bug #1600791 reported by Sergii Turivnyi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Invalid
High
Unassigned

Bug Description

The deployment of the environment with NFV, Neutron DVR, Sahara, Ceilometer and Ironic failed in 100% of cases. It looks like the root of the issue can be in NFV experimental feature or updated puppet manifests.

Detailed bug description:
Deploying ISO
=== ===

Triggers to an Error
========================
Error

Deployment has failed. All nodes are finished. Failed tasks: Task[ironic_upload_images/1] Stopping the deployment process!

(/Stage[main]/Openstack_tasks::Openstack_cinder::Create_cinder_types/Osnailyfacter::Openstack::Manage_cinder_types[volumes_lvm]/Cinder_type[volumes_lvm]/ensure) change from absent to present failed: Command: 'openstack ["volume type", "create", "--format", "shell", ["--property", "volume_backend_name=LVM-backend", "volumes_lvm"]]' has been running for more then 20 seconds!
========================

Steps to reproduce:
Task: https://mirantis.jira.com/browse/PROD-4355
1. Get ISO: #577
2. Nodes = 6
    Conroller + Mongo = 3
    Compute + Cinder = 2
    Ironic = 1
    Neutron DVR
    Nova quotas
    Cinder LVM over iSCSI for volumes
    OpenStack debug logging
    Install Sahara
    Install Ceilometer and Aodh
    Install Ironic
3. Add Repo
`nfv deb http://172.18.162.63/feature-nfv-repos/ubuntu/9.0/ mos9.0 main 1200`
4. Deploy

Expected results:
Deployment finishes successfully

Actual result:
Error

Deployment has failed. All nodes are finished. Failed tasks: Task[ironic_upload_images/1] Stopping the deployment process!

Reproducibility:
100%

Workaround:
--

Impact:
--

Description of the environment:
see attachments

Additional information:
[root@nailgun ~]# shotgun2 short-report
cat /etc/fuel_build_id:
 577
cat /etc/fuel_build_number:
 577
cat /etc/fuel_release:
 9.0
cat /etc/fuel_openstack_version:
 mitaka-9.0
rpm -qa | egrep 'fuel|astute|network-checker|nailgun|packetary|shotgun':
 fuel-release-9.0.0-1.mos6349.noarch
 fuelmenu-9.0.0-1.mos274.noarch
 fuel-notify-9.0.0-1.mos8460.noarch
 fuel-ostf-9.0.0-1.mos936.noarch
 fuel-provisioning-scripts-9.0.0-1.mos8743.noarch
 fuel-mirror-9.0.0-1.mos141.noarch
 fuel-openstack-metadata-9.0.0-1.mos8743.noarch
 rubygem-astute-9.0.0-1.mos750.noarch
 fuel-misc-9.0.0-1.mos8460.noarch
 python-fuelclient-9.0.0-1.mos325.noarch
 fuel-9.0.0-1.mos6349.noarch
 fuel-utils-9.0.0-1.mos8460.noarch
 fuel-setup-9.0.0-1.mos6349.noarch
 nailgun-mcagents-9.0.0-1.mos750.noarch
 fuel-library9.0-9.0.0-1.mos8460.noarch
 network-checker-9.0.0-1.mos74.x86_64
 fuel-agent-9.0.0-1.mos285.noarch
 fuel-ui-9.0.0-1.mos2717.noarch
 fuel-migrate-9.0.0-1.mos8460.noarch
 python-packetary-9.0.0-1.mos141.noarch
 fuel-bootstrap-cli-9.0.0-1.mos285.noarch
 shotgun-9.0.0-1.mos90.noarch
 fuel-nailgun-9.0.0-1.mos8743.noarch

Related bug:
https://bugs.launchpad.net/mos/+bug/1592019/
Snapshot: http://172.18.198.49:8231/fuel-snapshot-2016-07-08_10-58-46.tar.gz

Changed in mos:
assignee: nobody → MOS Puppet Team (mos-puppet)
status: New → Confirmed
Dmitry Klenov (dklenov)
tags: added: area-library
Revision history for this message
Denis Egorenko (degorenko) wrote :

Can not reproduce on https://product-ci.infra.mirantis.net/job/9.0-mos.all/597/

Deploy successful.

Changed in mos:
status: Confirmed → Incomplete
Revision history for this message
Denis Egorenko (degorenko) wrote :

I was able to reproduce this bug on very slow environment, where usual Keystone operation takes more then 30 seconds:

root@node-2:~# time openstack endpoint list
+----------------------------------+-----------+----------------+-----------------+
| ID | Region | Service Name | Service Type |
+----------------------------------+-----------+----------------+-----------------+
| 181ca598b76c4fa1a8cc010413a57f31 | RegionOne | neutron | network |
  ...
| 06acaa28cc914a5ab9a7fa6393363e9c | RegionOne | keystone | identity |
+----------------------------------+-----------+----------------+-----------------+

real 0m33.146s
user 0m0.791s
sys 0m0.227s

VM has single core which is overloaded on 100%.

Also it doesn't make sense to raise timeout again, because they are already reasonable,
and we can't raise timeouts every time for each slow env.

Changed in mos:
status: Incomplete → Invalid
Curtis Hovey (sinzui)
Changed in mos:
assignee: Registry Administrators (registry) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.