When a customer selects Ironic, some parameters needs to be tuned

Bug #1553214 reported by Sergii Turivnyi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Status tracked in 10.0.x
10.0.x
Fix Released
High
Vasyl Saienko
8.0.x
Confirmed
High
Vasyl Saienko
9.x
Fix Released
High
Vasyl Saienko

Bug Description

When a customer selects Ironic in MOS he assumes that Ironic will works 'from a box'.
Some parameters needs to be tuned to get maximum capability.

1) Nova:
max_concurrent_builds can be increased to 50
scheduler_host_subset_size = 10000
This parameter should be is near total amount of baremetal servers in the cloud. It defines how nova scheduler map instance to hypervisor. If this value is less than number of simultaneous boot requests to nova, there is a high chance that nova will try to map 2 Instances to same hypervisor, it leads to failure of one of instances as result of scheduller_max_attemp is reached (default 3). Instance marked as ERROR.
Nova claims in API should solve this problem.

2) ironic.nova.compute.manager.ClusteredComputeManager: at the moment during instance termination Ironic initiates resource update for all nodes. It leads to high performance degradation during intensive cloud usage (adding/deleting nodes simultaneously) with high number of nodes. Resource update should be disabled on instance termination or we should update resources for specific instance only 1, 2 .

3) Fuel-agent: doesn’t support UEFI shell. UEFI shell should be disabled manually by administrator before enrolling node to Ironic.

===NOTE===
The above actually describes 3 different bugs/improvement requests,
so to track these separately:
1) this bug is left for the Nova parameters tuning only
2) is duplicate of https://bugs.launchpad.net/mos/+bug/1552120
3) is tracked as separate bug https://bugs.launchpad.net/mos/+bug/1587036

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

The priority is high because we need to spend several hours for the configuration of Ironic / Nova and other services "from the box" to allow manage large Ironic clusters.

Customers should get the properly configured Ironic and Nova "from the box" without any additional manual configuration.

Changed in mos:
milestone: none → 9.0
tags: added: area-ironic
Revision history for this message
Pavlo Shchelokovskyy (pshchelo) wrote :

per item comments:

1) can be tweaked in puppet manifests, although appropriate value for scheduler_host_subset_size is hard to guess beforehand (Fuel does not manage nodes to be enrolled in Ironic, so does not know how many of them will be there)
2) already fixed in upstream after mitaka release, so 10.0 is not affected. We could probably cherry-pick those two patches (one in Ironic, one in Nova) to 9.0.
3) we have no control over it, again as Fuel does not manage those nodes

Revision history for this message
Pavlo Shchelokovskyy (pshchelo) wrote :
Revision history for this message
Pavlo Shchelokovskyy (pshchelo) wrote :

2) is addressed by these two cherry-picks
https://review.fuel-infra.org/#/q/topic:bug/1552120

Revision history for this message
Serge Kovaleff (serge-kovaleff) wrote :

Max_concurrent_builds from 1) is merged here https://review.openstack.org/#/c/299929/

Revision history for this message
Pavlo Shchelokovskyy (pshchelo) wrote :

1) `max_concurrent_builds` is addressed by https://review.openstack.org/#/c/299929/ (merged)

description: updated
Revision history for this message
Sergii Turivnyi (sturivnyi) wrote :

Waiting for scale lab for verification

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.