[TripleO-Docs] The nic-config related documentation should mention that unused itnerfaces in overcloud should be set to use_dhcp: false

Bug #1673882 reported by Sai Sindhur Malleni
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Expired
Undecided
Unassigned

Bug Description

Let us consider this example,
My inspection range is 192.0.2.5-192.0.2.50
My dhcp range is 192.0.2.52-192.0.2.250
I deployed a few nodes already and have an overcloud stack. Some nodes that weren't a part of the overcloud initially now need to be introspected. The overcloud node has 2 nics but I've used only 1 and have used single-nic-vlans type templates. So this would mean by default, the other unused NIC is set to dhcp. When introspection is now attempted, it will fail because the unused NICs on overcloud nodes chew up the introspection IPs handed out by the undercloud. It needs to be documented that unused NICs in overcloud need to be included in the nic-config template with use_dhcp: false

summary: [TripleO-Docs] The nic-config related documentation should mention that
- unused itnerfaces in overcloud should be set to dhcp_false
+ unused itnerfaces in overcloud should be set to use_dhcp: false
description: updated
Revision history for this message
Dan Sneddon (dsneddon) wrote :

The recommendation is to only use a single NIC on the provisioning interface, or to disable other NICs, or to bond all provisioning NICs together.

I will add a note to this effect in tripleo-docs.

Changed in tripleo:
status: New → Triaged
assignee: nobody → Dan Sneddon (dsneddon)
importance: Undecided → Low
Revision history for this message
Sai Sindhur Malleni (smalleni) wrote :

Just for clarity, you would also see this if all the NICs of al lthe systems are in the same QinQ VLAN( used because different compute nodes have different number of NICs and this ensures that we use as many nics as possible from each compute)

Revision history for this message
Dan Sneddon (dsneddon) wrote :
Download full text (4.5 KiB)

One suggestion for using more NICs might be to configure the Neutron bridge on a separate NIC from the provisioning NIC. The NIC carrying the bridge should be configured to not use DHCP, and the bridge doesn't need to have an IP address if it is used only for hosting Tenant networks. In fact, you can create two bridges, one with the default name of br-ex on NIC2, and one with the name br-tenant on NIC3. That would allow you to host Tenant networks on br-tenant, and External networks on br-ex. Here is an example of the configuration for a Controller:

              network_config:
              - type: interface
                name: nic1
                use_dhcp: false
                dns_servers:
                  get_param: DnsServers
                addresses:
                - ip_netmask:
                    list_join:
                    - /
                    - - get_param: ControlPlaneIp
                      - get_param: ControlPlaneSubnetCidr
                routes:
                - ip_netmask: 169.254.169.254/32
                  next_hop:
                    get_param: EC2MetadataIp
              - type: ovs_bridge
                name: bridge_name
                use_dhcp: false # No IP address needed on the bridge
                dns_servers:
                  get_param: DnsServers
                members:
                - type: interface
                  name: nic2
                  use_dhcp: false
                  # force the MAC address of the bridge to this interface
                  primary: true
                - type: vlan
                  vlan_id:
                    get_param: ExternalNetworkVlanID
                  addresses:
                  - ip_netmask:
                      get_param: ExternalIpSubnet
                  routes:
                  - default: true
                    next_hop:
                      get_param: ExternalInterfaceDefaultRoute
              - type: ovs_bridge
                name: br-tenant
                dns_servers:
                  get_param: DnsServers
                use_dhcp: false
                members:
                - type: interface
                  name: nic3
                  use_dhcp: false
                  primary: true
              - type: interface
                name: nic4
                use_dhcp: false # This effectively disables NIC4

And here is the corresponding example for a Compute node:

              network_config:
              - type: interface
                name: nic1
                use_dhcp: false
                dns_servers:
                  get_param: DnsServers
                addresses:
                - ip_netmask:
                    list_join:
                    - /
                    - - get_param: ControlPlaneIp
                      - get_param: ControlPlaneSubnetCidr
                routes:
                - ip_netmask: 169.254.169.254/32
                  next_hop:
                    get_param: EC2MetadataIp
                - default: true
                  next_hop:
                    get_param: ControlPlaneDefaultRoute
              - type: interface
                name: nic2
                use_dhcp: false # This ef...

Read more...

Revision history for this message
Dan Sneddon (dsneddon) wrote :
Download full text (3.7 KiB)

Here is another similar suggestion, but this one creates a bond for a bridge using OVS balance-slb, which will use only one link in the bond for any particular VLAN at any given moment. It requires no special configuration on the switch, although I don't think it's been tested with Q-in-Q, so results might vary:

Controller:

              network_config:
              - type: interface
                name: nic1
                use_dhcp: false
                dns_servers:
                  get_param: DnsServers
                addresses:
                - ip_netmask:
                    list_join:
                    - /
                    - - get_param: ControlPlaneIp
                      - get_param: ControlPlaneSubnetCidr
                routes:
                - ip_netmask: 169.254.169.254/32
                  next_hop:
                    get_param: EC2MetadataIp
              - type: ovs_bridge
                name: bridge_name
                dns_servers:
                  get_param: DnsServers
                members:
                - type: ovs_bond
                  name: bond1
                  ovs_options:
                    get_param: BondInterfaceOvsOptions
                  members:
                  - type: interface
                    name: nic2
                    primary: true # set the bond MAC address to the MAC of this interface
                    use_dhcp: false
                  - type: interface
                    name: nic3
                    use_dhcp: false
                  - type: interface
                    name: nic4
                    use_dhcp: false
                - type: vlan
                  device: bond1
                  vlan_id:
                    get_param: ExternalNetworkVlanID
                  addresses:
                  - ip_netmask:
                      get_param: ExternalIpSubnet
                  routes:
                  - default: true
                    next_hop:
                      get_param: ExternalInterfaceDefaultRoute

Compute:

              network_config:
              - type: interface
                name: nic1
                use_dhcp: false
                dns_servers:
                  get_param: DnsServers
                addresses:
                - ip_netmask:
                    list_join:
                    - /
                    - - get_param: ControlPlaneIp
                      - get_param: ControlPlaneSubnetCidr
                routes:
                - ip_netmask: 169.254.169.254/32
                  next_hop:
                    get_param: EC2MetadataIp
                - default: true
                  next_hop:
                    get_param: ControlPlaneDefaultRoute
              - type: ovs_bridge
                name: bridge_name
                dns_servers:
                  get_param: DnsServers
                members:
                - type: ovs_bond
                  name: bond1
                  ovs_options:
                    get_param: BondInterfaceOvsOptions
                  members:
                  - type: interface
                    name: nic2
                    primary: true
                    use_dhcp: false
                ...

Read more...

Revision history for this message
Dan Sneddon (dsneddon) wrote :

The above examples use a static External IP, which is only applicable to the network isolation model. If network isolation is not used, the External network would be configured for DHCP.

Revision history for this message
Sai Sindhur Malleni (smalleni) wrote :

Agreed Dan. Thanks for the explanation.

However, what I was trying to say is I hit a fairly uncommon scenario. I was deploying with computes some of which had 4 nics and some had 2. To be able to use all the 4 nics when available we used QinQ based VLAN seperation on the switch side. So all the nics of all the machines were in the same QinQ VLAN but within the QinQ VLAN I was free to use the VLAN numbers I wanted for network isolation. So the idea was(example):
For compute with 4 nics:

Storaganetwork on VLAN 1 on bridge br-storage on em1
InternalAPI on VLAN 2 on bridge br-api on em4
Tenantnetwork on VLAN 3 on br-ten on em3
ControlPlane on native on em2

For compute with 2 nics:

Storage, InternalAPI, and Controlplane on bridge br-api on em1
Controlplane on native on em2

So the above summarized why we needed to use QinQ.

However to test the deployment initally, I used only one nic on all node types to deploy using single-nic-vlans type templates (did not make references to other NICs in the templates). Hence, there were unused NICs in the overcloud which had boot protocol set to DHCP. Later to scale the overcloud, I introspected a few other nodes and thats when these unused NICs (although not technically on the provisioning network) were able to send DHCP requests and get replies from the undercloud. Hence, the nodes that were waiting on IP for introspection never got it because the undercloud instead handed out this IP to an unused NIC on an overcloud node.

Revision history for this message
Sai Sindhur Malleni (smalleni) wrote :

Based on what you said, If I had disabled the unused NICs I wuldn't have run into this
- type: interface
  name: nic4
   use_dhcp: false # This effectively disables NIC4

Changed in tripleo:
milestone: none → pike-1
Changed in tripleo:
milestone: pike-1 → pike-2
Changed in tripleo:
milestone: pike-2 → pike-3
Revision history for this message
Emilien Macchi (emilienm) wrote :

There are no currently open reviews on this bug, changing the status back to the previous state and unassigning. If there are active reviews related to this bug, please include links in comments.

Changed in tripleo:
assignee: Dan Sneddon (dsneddon) → nobody
Changed in tripleo:
milestone: pike-3 → pike-rc1
Changed in tripleo:
milestone: pike-rc1 → queens-1
Changed in tripleo:
milestone: queens-1 → queens-2
Changed in tripleo:
milestone: queens-2 → queens-3
Changed in tripleo:
milestone: queens-3 → queens-rc1
Changed in tripleo:
milestone: queens-rc1 → rocky-1
Changed in tripleo:
milestone: rocky-1 → rocky-2
Changed in tripleo:
milestone: rocky-2 → rocky-3
Changed in tripleo:
milestone: rocky-3 → rocky-rc1
Changed in tripleo:
milestone: rocky-rc1 → stein-1
Changed in tripleo:
milestone: stein-1 → stein-2
Revision history for this message
Emilien Macchi (emilienm) wrote : Cleanup EOL bug report

This is an automated cleanup. This bug report has been closed because it
is older than 18 months and there is no open code change to fix this.
After this time it is unlikely that the circumstances which lead to
the observed issue can be reproduced.

If you can reproduce the bug, please:
* reopen the bug report (set to status "New")
* AND add the detailed steps to reproduce the issue (if applicable)
* AND leave a comment "CONFIRMED FOR: <RELEASE_NAME>"
  Only still supported release names are valid (FUTURE, PIKE, QUEENS, ROCKY, STEIN).
  Valid example: CONFIRMED FOR: FUTURE

Changed in tripleo:
importance: Low → Undecided
status: Triaged → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.