Scale ironic cluster with ironic+contorller node broke instance creation

Bug #1589389 reported by Georgy Dyuldin
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Status tracked in 10.0.x
10.0.x
Fix Committed
High
Fuel Toolbox

Bug Description

Detailed bug description:

After add new ironic+controller node to cluster and deploy changes i can't create baremetal instance (instance became to ERROR status with MessagingTimeout exception)

Steps to reproduce:

1. Deploy MOS with 1 controller, 1 compute, 2 ironic-conductor+ceph nodes
2. Add new node (with fuel-devops)
3. Wait this node to be discovered
4. Assign ironic and controller roles to this node
5. Run network check
6. Deploy changes
7. Run network check
8. Start baremetal instance

Expected results:

All steps should pass

Actual result:

After some time instance created in step 8 reach ERROR status

Reproducibility:

Always

Description of the environment:
- MOS 9.0 ISO build (at least from build #432)
- Network model: VLAN

Additional information:

This bug doesn't appear if in step 4 assign only ironic role to node.

Tags: area-ironic
Revision history for this message
Georgy Dyuldin (g-dyuldin) wrote :
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

I checked the logs and see the following:

1) request to boot an instance indeed fails with MessagingTimeout in nova-api on node-5

2016-06-03T14:33:04.297454+00:00 err: 2016-06-03 14:33:04.281 9426 ERROR nova.api.openstack.extensions [req-3ff12a0b-d829-4217-adb3-c870e88a3b3
4 71d4f3e3083643ddb2b2eb1a7dcb74b4 8fa6f6fac6d149dfb84fc021c32619b1 - - -] Unexpected exception in API method#0122016-06-03 14:33:04.281 9426 E
RROR nova.api.openstack.extensions Traceback (most recent call last):#0122016-06-03 14:33:04.281 9426 ERROR nova.api.openstack.extensions Fil
e "/usr/lib/python2.7/dist-packages/nova/api/openstack/extensions.py", line 478, in wrapped#0122016-06-03 14:33:04.281 9426 ERROR nova.api.open
stack.extensions return f(*args, **kwargs)#0122016-06-03 14:33:04.281 9426 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/d
ist-packages/nova/api/validation/__init__.py", line 73, in wrapper#0122016-06-03 14:33:04.281 9426 ERROR nova.api.openstack.extensions retu
rn func(*args, **kwargs)#0122016-06-03 14:33:04.281 9426 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/api/
validation/__init__.py", line 73, in wrapper#0122016-06-03 14:33:04.281 9426 ERROR nova.api.openstack.extensions return func(*args, **kwarg
s)#0122016-06-03 14:33:04.281 9426 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/api/validation/__init__.py
", line 73, in wrapper#0122016-06-03 14:33:04.281 9426 ERROR nova.api.openstack.extensions return func(*args, **kwargs)#0122016-06-03 14:33
:04.281 9426 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/api/openstack/compute/servers.py", line 629, in
create#0122016-06-03 14:33:04.281 9426 ERROR nova.api.openstack.extensions **create_kwargs)#0122016-06-03 14:33:04.281 9426 ERROR nova.api.
openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/hooks.py", line 154, in inner#0122016-06-03 14:33:04.281 9426 ERROR nova.api
.openstack.extensions rv = f(*args, **kwargs)#0122016-06-03 14:33:04.281 9426 ERROR nova.api.openstack.extensions File "/usr/lib/python2.
7/dist-packages/nova/compute/api.p

Because it tries to do an RPC call to nova-network, which we do not actually deploy.

2) looks like nova.conf was updated, but nova-api was not restarted after that:

$ grep use_neutron ./node-5/etc/nova/nova.conf
use_neutron=True

2016-06-03T14:07:47.358934+00:00 debug: 2016-06-03 14:07:47.358 6789 DEBUG oslo_service.service [-] use_neutron = False log_opt_values /usr/lib/python2.7/dist-packages/oslo_config/cfg.py:2517

which makes nova-api think it should call nova-network, not neutron.

The workaround is to restart nova-api on all controller nodes. We'll need to check the Puppet manifests to make sure nova-api is properly restarted after all changes done to config files.

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

This is very similar to https://bugs.launchpad.net/mos/+bug/1570819/

MOS Ironic team, could you please take a look?

Dina Belova (dbelova)
Changed in mos:
assignee: nobody → MOS Ironic (mos-ironic)
tags: added: area-ironic
Revision history for this message
Vasyl Saienko (vsaienko) wrote :

task 'openstack-network-nova-server' has been skipped on newly added controller (node-5).
But it should be executed https://github.com/openstack/fuel-library/blob/11842efcff53a1fa076dd47b7a86e4fc0e1b7d64/deployment/puppet/openstack_tasks/examples/openstack-network/tasks.yaml#L228

2016-06-03 14:11:07 INFO [27696] Task[ceph-mon/5]: Run on node: Node[5]
2016-06-03 14:11:07 INFO [27696] Casting message to Nailgun:
{"method"=>"deploy_resp",
 "args"=>
  {"task_uuid"=>"69115af1-58ba-44a0-b0a3-fa7c9f61d322",
   "nodes"=>
    [{"uid"=>"5",
      "status"=>"deploying",
      "progress"=>66,
      "deployment_graph_task_name"=>"openstack-network-server-nova",
      "task_status"=>"skipped",
      "custom"=>{}}]}}

2016-06-03 14:11:07 DEBUG [27696] Waiting for puppet to finish deployment on node 5 (timeout = 3600 sec)...

Changed in mos:
assignee: MOS Ironic (mos-ironic) → Fuel Library (Deprecated) (fuel-library)
importance: Undecided → High
status: New → Confirmed
milestone: none → 9.0
Revision history for this message
Michael Polenchuk (mpolenchuk) wrote :

Will be fixed once https://review.openstack.org/325341 get merged.
Or change task condition to reflect new node addition (e.g + $.network_metadata.nodes.keys()).

Andrey Maximov (maximov)
Changed in mos:
assignee: Fuel Library (Deprecated) (fuel-library) → Fuel Toolbox (fuel-toolbox)
Revision history for this message
Alexey Shtokolov (ashtokolov) wrote :
Changed in mos:
status: Confirmed → Fix Committed
tags: added: on-verification
Changed in mos:
status: Fix Committed → Fix Released
tags: removed: on-verification
Revision history for this message
Dmitry Belyaninov (dbelyaninov) wrote :

verified on 9.0
[root@nailgun ~]# shotgun2 short-report
cat /etc/fuel_build_id:
 460
cat /etc/fuel_build_number:
 460
cat /etc/fuel_release:
 9.0
cat /etc/fuel_openstack_version:
 mitaka-9.0
rpm -qa | egrep 'fuel|astute|network-checker|nailgun|packetary|shotgun':
 fuel-release-9.0.0-1.mos6349.noarch
 python-packetary-9.0.0-1.mos140.noarch
 fuel-bootstrap-cli-9.0.0-1.mos285.noarch
 fuel-migrate-9.0.0-1.mos8450.noarch
 rubygem-astute-9.0.0-1.mos750.noarch
 fuel-mirror-9.0.0-1.mos140.noarch
 shotgun-9.0.0-1.mos90.noarch
 fuel-openstack-metadata-9.0.0-1.mos8741.noarch
 fuel-notify-9.0.0-1.mos8450.noarch
 nailgun-mcagents-9.0.0-1.mos750.noarch
 fuel-provisioning-scripts-9.0.0-1.mos8741.noarch
 python-fuelclient-9.0.0-1.mos325.noarch
 fuel-9.0.0-1.mos6349.noarch
 fuel-utils-9.0.0-1.mos8450.noarch
 fuel-setup-9.0.0-1.mos6349.noarch
 fuel-misc-9.0.0-1.mos8450.noarch
 fuel-library9.0-9.0.0-1.mos8450.noarch
 network-checker-9.0.0-1.mos74.x86_64
 fuel-agent-9.0.0-1.mos285.noarch
 fuel-ui-9.0.0-1.mos2717.noarch
 fuel-ostf-9.0.0-1.mos935.noarch
 fuelmenu-9.0.0-1.mos273.noarch
 fuel-nailgun-9.0.0-1.mos8741.noarch
[root@nailgun ~]#

Steps:
cd mos-ci-deployment-scripts
export ISO_PATH='....'
export ENV_NAME='env1'
./deploy_template.sh templates/ironic/default.yaml
cd ../mos-integration-tests
tox -e ironic -- -v -ra -I '10.109.22.2' -E 'env1' -S 'ha_deploy_ironic_default' -s -k '631897' --pdb

test with described scenario passed.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.