Unable to scale cluster beyond 49 nodes

Bug #1641212 reported by Chris Nipper
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Magnum
Expired
Undecided
Unassigned

Bug Description

I've got a cluster with enough resources to theoretically deploy 100 nodes. Every time I try to scale past 49, however, my cluster goes into UPDATE_FAILED status. The error in magnum-conductor.log is the same each time: 'ValueError: Field `node_addresses[48]' cannot be None'.

Limits modified in OpenStack:
  Heat maximum resources set to -1
  Open file limit set to 99999 on all nodes/containers
  Nova volume quota increased to 200
  Nova instance quota increased to 150
  Nova RAM quota increased to 5TB
  Nova cores quota increased to 500
  Neutron floating ip quota increased to 150
  Neutron port quota increased to 200
  >200 IP addresses available in networks used

Cluster deploys successfully and all update operations are successful until attempting to scale from 48 to 49 nodes. Strangely, even though the cluster goes into UPDATE_FAILED status, the 49th VM does get created, and it does get assigned an IP address from the network pool.

The error appears to be a database error, but I'm not sure what to make of the error message. I've attached the outputs of the openstack stack resource commands that pinpoint the failure for review.

Revision history for this message
Chris Nipper (cnipp) wrote :
Revision history for this message
Spyros Trigazis (strigazi) wrote :

The heat engine log might be more helpful. Do you also have enough floating ips?

Changed in magnum:
status: New → Incomplete
Revision history for this message
Spyros Trigazis (strigazi) wrote :

Sorry, I saw you have 150.

Revision history for this message
Chris Nipper (cnipp) wrote :

Thanks for your reply, I've attached the heat-engine.log file for review.

Revision history for this message
Spyros Trigazis (strigazi) wrote :

This seems to be more like a heat bug to me. We should share with them.

FIY, the error is at 2016-11-11 11:14:44.624 or line 35221

Revision history for this message
Thomas Herve (therve) wrote :

It looks like a duplicate of bug #1626256. It just got fixed in the newton branch.

Revision history for this message
Thomas Herve (therve) wrote :

Sorry it just got fixed in master, not in newton yet.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for Magnum because there has been no activity for 60 days.]

Changed in magnum:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.