manual provider: network-get --primary-address returns a hostname

Bug #1721368 reported by Florian Haas
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Joseph Phillips
2.2
Won't Fix
Undecided
Unassigned
2.3
Fix Released
High
Eric Claude Jones
OpenStack Nova Compute Charm
Triaged
Low
Unassigned

Bug Description

Seen on nova-compute rev. 273 deploying Ocata (15.0.6).

The charm creates a nova.conf that uses a hostname for my_ip:

my_ip = bob.example.com

This looks innocent enough, but nova.compute.manager then logs this:

2017-10-04 14:51:54.232 26850 ERROR nova.compute.manager [req-824df060-0d74-41d1-a107-c0df38932cb4 - - - - -] No compute node record for host bob
2017-10-04 14:51:54.301 26850 WARNING nova.compute.resource_tracker [req-824df060-0d74-41d1-a107-c0df38932cb4 - - - - -] No compute node record for bob:bob.example.com
2017-10-04 14:51:54.302 26850 ERROR nova.compute.manager [req-824df060-0d74-41d1-a107-c0df38932cb4 - - - - -] Error updating resources for node bob.example.com.
2017-10-04 14:51:54.302 26850 ERROR nova.compute.manager Traceback (most recent call last):
2017-10-04 14:51:54.302 26850 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 6574, in update_available_resource_for_node
2017-10-04 14:51:54.302 26850 ERROR nova.compute.manager rt.update_available_resource(context, nodename)
2017-10-04 14:51:54.302 26850 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py", line 551, in update_available_resource
2017-10-04 14:51:54.302 26850 ERROR nova.compute.manager self._update_available_resource(context, resources)
2017-10-04 14:51:54.302 26850 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 274, in inner
2017-10-04 14:51:54.302 26850 ERROR nova.compute.manager return f(*args, **kwargs)
2017-10-04 14:51:54.302 26850 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py", line 575, in _update_available_resource
2017-10-04 14:51:54.302 26850 ERROR nova.compute.manager self._init_compute_node(context, resources)
2017-10-04 14:51:54.302 26850 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py", line 456, in _init_compute_node
2017-10-04 14:51:54.302 26850 ERROR nova.compute.manager self._copy_resources(cn, resources)
2017-10-04 14:51:54.302 26850 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py", line 491, in _copy_resources
2017-10-04 14:51:54.302 26850 ERROR nova.compute.manager compute_node.update_from_virt_driver(resources)
2017-10-04 14:51:54.302 26850 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/objects/compute_node.py", line 338, in update_from_virt_driver
2017-10-04 14:51:54.302 26850 ERROR nova.compute.manager setattr(self, key, resources[key])
2017-10-04 14:51:54.302 26850 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 72, in setter
2017-10-04 14:51:54.302 26850 ERROR nova.compute.manager field_value = field.coerce(self, name, value)
2017-10-04 14:51:54.302 26850 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/oslo_versionedobjects/fields.py", line 195, in coerce
2017-10-04 14:51:54.302 26850 ERROR nova.compute.manager return self._type.coerce(obj, attr, value)
2017-10-04 14:51:54.302 26850 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/oslo_versionedobjects/fields.py", line 497, in coerce
2017-10-04 14:51:54.302 26850 ERROR nova.compute.manager raise ValueError(six.text_type(e))
2017-10-04 14:51:54.302 26850 ERROR nova.compute.manager ValueError: failed to detect a valid IP address from 'bob.example.com'

Since the compute host fails to do this update, it never actually becomes available to the scheduler, and thus can't receive newly booted instances. This persists until my_ip is manually fixed (to contain an actual IP address), followed by a restart of nova-compute on that node.

The reasons for the error quoted above actually rather escape me, because bob.example.com is of course perfectly resolvable via an /etc/hosts entry on that node, but that's evidently an upstream bug.

What makes me file a bug against the charm is that the template that generates this (https://github.com/openstack/charm-nova-compute/blob/master/templates/ocata/nova.conf#L22) contains:

my_ip = {{ host_ip }}

So my question is, why and under what circumstances would host_ip contain a hostname rather than an actual IP address? And how can this be fixed?

Tags: cpe-onsite
Revision history for this message
Frode Nordahl (fnordahl) wrote :

Could you provide the following output:
juju status # make sure it includes juju version information associated with model at the top
juju config nova-compute

# replace unit-name/number with the node you see this on
juju run --unit nova-compute/0 -- network-get --primary-address cloud-compute
juju run --unit nova-compute/0 -- unit-get private-address

Changed in charm-nova-compute:
status: New → Incomplete
Revision history for this message
Florian Haas (fghaas) wrote :
Revision history for this message
Florian Haas (fghaas) wrote :
Revision history for this message
Florian Haas (fghaas) wrote :

I've attached juju status and juju config, as requested. The juju run command is acting up in that it just hangs. I've added the --debug flag, and this is what I get:

$ juju --debug run --unit nova-compute/0 -- network-get --primary-address cloud-compute
08:18:57 INFO juju.cmd supercommand.go:63 running juju [2.2.4 gc go1.8]
08:18:57 DEBUG juju.cmd supercommand.go:64 args: []string{"juju", "--debug", "run", "--unit", "nova-compute/0", "--", "network-get", "--primary-address", "cloud-compute"}
08:18:57 INFO juju.juju api.go:67 connecting to API addresses: [192.168.122.100:17070]
08:18:57 DEBUG juju.api apiclient.go:863 successfully dialed "wss://192.168.122.100:17070/model/9319e009-4e35-43e0-80b0-8d6e309ece5a/api"
08:18:57 INFO juju.api apiclient.go:617 connection established to "wss://192.168.122.100:17070/model/9319e009-4e35-43e0-80b0-8d6e309ece5a/api"

$ juju --debug run --unit nova-compute/0 -- unit-get private-address
08:18:13 INFO juju.cmd supercommand.go:63 running juju [2.2.4 gc go1.8]
08:18:13 DEBUG juju.cmd supercommand.go:64 args: []string{"juju", "--debug", "run", "--unit", "nova-compute/0", "--", "unit-get", "private-address"}
08:18:13 INFO juju.juju api.go:67 connecting to API addresses: [192.168.122.100:17070]
08:18:13 DEBUG juju.api apiclient.go:863 successfully dialed "wss://192.168.122.100:17070/model/9319e009-4e35-43e0-80b0-8d6e309ece5a/api"
08:18:13 INFO juju.api apiclient.go:617 connection established to "wss://192.168.122.100:17070/model/9319e009-4e35-43e0-80b0-8d6e309ece5a/api"

It doesn't seem to progress any further than that, both times. Could this be the cause of host_ip being populated with a name, not an IP address?

Revision history for this message
James Page (james-page) wrote :

Looking at the status output attached, the agent for nova-compute is listed as missing/lost - which indicates that its connection to the controller is down, and as a result the juju run ops won't ever complete (as the required comms is not in place).

Use of the manual provider may be important in understanding what's going on here.

Revision history for this message
Florian Haas (fghaas) wrote :

Eek, sorry. Stupid of me not to have checked on that — I did take those juju status commands right after my test stack had been resumed from suspend.

Here's a new "juju status". "juju run" is still timing out on me, so that problem is unchanged.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack nova-compute charm because there has been no activity for 60 days.]

Changed in charm-nova-compute:
status: Incomplete → Expired
Revision history for this message
Florian Haas (fghaas) wrote :

@james-page, any chance you could check on the comment from early October?

Changed in charm-nova-compute:
status: Expired → New
Revision history for this message
James Page (james-page) wrote :

Groking the code:

class HostIPContext(context.OSContextGenerator):
    def __call__(self):
        ctxt = {}
        # Use the address used in the cloud-compute relation in templates for
        # this host
        host_ip = get_relation_ip('cloud-compute',
                                  cidr_network=config('os-internal-network'))

        if host_ip:
            # NOTE: do not format this even for ipv6 (see bug 1499656)
            ctxt['host_ip'] = host_ip

        return ctxt

this template item should always be resolved to an IP address; in the event that the provider does not support network binding ('cloud-compute' relation) or the os-internal-network configuration option is not provided (cidr_network), the function should just be doing:

   address = get_host_ip(unit_get('private-address'))

Revision history for this message
James Page (james-page) wrote :

Hmm OK so it would appear that the Juju >= 2.0 network space support can actually return a hostname instead of an IP address for:

  network-get --primary-address cloud-compute

which is somewhat unhelpful. The unit_get('private-address') fallback won't be used in this deployment.

summary: - Sets my_ip to hostname, not IP address
+ manual provider: network-get --primary-address returns a hostname
Changed in charm-nova-compute:
status: New → Triaged
importance: Undecided → Low
Revision history for this message
James Page (james-page) wrote :

Raising a Juju task for feedback on this issue from the Juju developers.

Revision history for this message
John A Meinel (jameinel) wrote :

network-get --primary-address should always be returning an IP address (by how we define the return value of the command, because we had confusion around this in the past). Returning a hostname would be a bug in our implementation. It may be that we're doing something weird for manual provider.

Revision history for this message
John A Meinel (jameinel) wrote :

(The intent was to have a separate field like 'primary-hostname' that would provide the hostname when it was available, and maintain separation of what could be in what field.)

Revision history for this message
Anastasia (anastasia-macmood) wrote :

It may be that Juju of different version exhibit different behaviors...

@James Page (james-page),
could you please clarify what Juju version is returning hostname?

Changed in juju:
status: New → Incomplete
Revision history for this message
James Page (james-page) wrote :

2.2.4 according to the attached information

Changed in juju:
status: Incomplete → New
Revision history for this message
John A Meinel (jameinel) wrote :

Offhand, I would guess the issue is that the machines were added with "juju add-machine ssh:hostname", and a workaround would be to do "juju add-machine ssh:IP".
I haven't done anything to confirm that is the case, so I could be wildly off base. But I'm guessing that the bug is in how Juju is recording what that machine is.
I'm guessing this is elevated priority because of Openstack on s390, but I'm also guessing there might be a quick workaround for people.

I'll try to task someone with at least working through how this could be happening (AFAIK, nobody has explicitly tried to reproduce this bug and get a feel for the scope of it yet, to know how hard it will be to fix.) I also don't know that it has been escalated internally for us to prioritize it, but we should be able to take a little bit of time and see how hard it would be to fix.

Changed in juju:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Florian Haas (fghaas) wrote :

Taking the liberty to follow up on this one: have there been any new findings lately? Also, if the issue is indeed with my_ip being populated from the machine name, can a machine be renamed from its hostname to its IP address after the fact?

It does seem very strange if things were actually that way. For one thing, my_ip would be expected to honor os-internal-network, which would probably not work if if would blindly be filled with the machine name — or am I missing something here?

Changed in juju:
assignee: nobody → Eric Claude Jones (ecjones)
Changed in juju:
status: Triaged → In Progress
Revision history for this message
Eric Claude Jones (ecjones) wrote :
Changed in juju:
status: In Progress → Fix Committed
Revision history for this message
Calvin Hartwell (calvinh) wrote :

This bug also appears to be affecting Kubernetes deployments (CDK): https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/573

tags: added: cpe-onsite
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

This problem is not fully solved as of 2.4-rc3:

1) network-get --ingress-address still returns a hostname
2) network-get with --primary-address returns an address

See below:

https://pastebin.ubuntu.com/p/8zYkhqHbZY

juju --version
2.4-rc3-xenial-amd64

juju status -m controller
Model Controller Cloud/Region Version SLA Timestamp
controller manual manual 2.4-rc3 unsupported 11:52:24+02:00

Machine State DNS Inst id Series AZ Message
0 started prdjujuctl1 manual: xenial Manually provisioned machine

juju run --unit kubernetes-master/0 'network-get kube-control'
bind-addresses:
- macaddress: ""
  interfacename: ""
  addresses:
  - hostname: mastl1
    address: 203.0.113.11
    cidr: ""
egress-subnets:
- 203.0.113.11/32
ingress-addresses:
- mastl1

$ juju run --unit kubernetes-master/0 'network-get --primary-address kube-control'
203.0.113.11
$ juju run --unit kubernetes-master/0 'network-get --ingress-address kube-control'
mastl1

Changed in juju:
status: Fix Committed → Confirmed
Ian Booth (wallyworld)
Changed in juju:
milestone: none → 2.4.1
Changed in juju:
assignee: Eric Claude Jones (ecjones) → Joseph Phillips (manadart)
status: Confirmed → In Progress
Revision history for this message
Joseph Phillips (manadart) wrote :

I have proposed https://github.com/juju/juju/pull/8885 against the develop branch to resolve this.

Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

I believe https://bugs.launchpad.net/juju/+bug/1785232 is a spin-off of this bug.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.