grizzly uses FQDN for services

Bug #1151012 reported by Sina Sadeghi
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Critical
Michael Still

Bug Description

grizzly has started using FQDN for nova services as opposed to short hostname. This means that anyone upgrading from Folsom will end up with a double list of services. e.g.:

root@api01:~# nova-manage service list
Binary Host Zone Status State Updated_At
nova-consoleauth api01 internal enabled XXX 2013-03-07 02:04:42
nova-scheduler api01 internal enabled XXX 2013-03-07 02:04:42
nova-scheduler api01.sy3.aptira.com internal enabled :-) 2013-03-07 02:09:29
nova-consoleauth api01.sy3.aptira.com internal enabled :-) 2013-03-07 02:09:29

If the user is running a large installation this will result in a massive list of double services.

Please consider updating nova-manage db sync such that it either removes all existing services in the nova.services table or that it updates the entries in the table to match the FQDN.

Revision history for this message
Russell Bryant (russellb) wrote :

This is the related change:

commit 5dd1553cca7f7e62eebce75e1d936fc211b239ec
Author: Luis Fernandez Alvarez <email address hidden>
Date: Tue Sep 25 17:33:59 2012 +0200

    Replaced default hostname function from gethostname to getfqdn

    Fixes bug 1055503

    The standard behaviour of the 'gethostname' function in Python differs from
    Linux to Windows. A common Linux configuration returns the FQDN, while a
    Windows one returns only the host name.

    To resolve inconsistent node naming in deployments that mix windows and
    Linux, it is proposed to use 'getfqdn' as default function instead of
    'gethostname'. This is function is more predictable in all cases.

    Change-Id: I3164d9a36df2b8484bbf9a57879c31fa0e342503

Changed in nova:
status: New → Confirmed
importance: Undecided → High
milestone: none → grizzly-rc1
Revision history for this message
Russell Bryant (russellb) wrote :
Changed in nova:
importance: High → Medium
Revision history for this message
Russell Bryant (russellb) wrote :

So, this change will break live upgrades ... but that doesn't matter *too* much since live upgrades don't really work full anyway because of other reasons.

I do see that leaving a mess in the database is not ideal, so it's worth looking into if anything can be done there. Everything *should* still work though, at least ...

Revision history for this message
Sina Sadeghi (sina-sa) wrote :

Hi Russell,

Isn't it as simple as adding a

DELETE * FROM nova.services;

to the code for nova-manage db sync from Folsom db version to Grizzly db version?

Michael Still (mikal)
Changed in nova:
importance: Medium → Critical
Revision history for this message
Russell Bryant (russellb) wrote :

After talking to Michael Still about this, we may need to just completely revert the original patch. It looks like the situation is much worse than just extra entries in the services table. It's actually quite problematic for upgrades as far as we can tell. It changes the value of host, which is stored in the instances table to track what host an instance is running on. Changing that could totally break a deployment.

Revision history for this message
Michael Still (mikal) wrote :

On reflection, I think this is a really big deal. I'm pretty sure the hosts column in the instances table will now be wrong. I'm going to propose a revert of the original change, and start a conversation with CERN about how to fix their problem some other way.

Revision history for this message
Russell Bryant (russellb) wrote :

I think the solution to the original issue is just to set the option yourself instead of relying on the default.

Michael Still (mikal)
Changed in nova:
assignee: nobody → Michael Still (mikalstill)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/24080

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/24080
Committed: http://github.com/openstack/nova/commit/c8b240d74e9d050eb8770095e22174d1653be8f3
Submitter: Jenkins
Branch: master

commit c8b240d74e9d050eb8770095e22174d1653be8f3
Author: Michael Still <email address hidden>
Date: Tue Mar 12 03:13:02 2013 +1100

    Revert changing to FQDN for hostnames.

    5dd1553cca7f7e62eebce75e1d936fc211b239ec moved to using a FQDN for
    host names. This caused bug 1151012, but Russell and I also believe
    the change will break the host column on the instances table. I can't
    do a plain old git revert for this change because a lot of the code
    has moved after this change.

    Revert to using just the hostname (no domain), and we'll revisit the
    intent of 5dd1553cca7f7e62eebce75e1d936fc211b239ec separately.

    Resolves bug 1151012.

    Change-Id: I944f46d6eb2a6944a12833ec8de7afa2b18e66e7

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
Matthew Macdonald-Wallace (proffalken) wrote :

Unfortunately this "fix" breaks any systems that have already been upgraded from Folsem to Grizzly (i.e. our entire development and test platform at present!) since September last year.

We have found a work-around for the time being, however perhaps I could ask that in future database migrations are provided to allow changes like this to take place in a more managed way?

FWIW, I would respectfully suggest that getfqdn() is the "correct" way of managing data related to hostnames so that we can support multiple platforms going forward - it is far less ambiguous than gethostname() as demonstrated in the original patch that caused the above issue and the recent code reversion.

Revision history for this message
Sina Sadeghi (sina-sa) wrote :

Is it possible that we keep getfqdn() and simply handle the nova.instances.hosts (and any other appropriate columns) in the nova-manage db sync from Folsom version?

Revision history for this message
Russell Bryant (russellb) wrote :

I'm afraid that putting the change back in would break far more people that those that upgraded to a development version already. An easy enough workaround is to just set the host option to whatever you want it to be.

Since we're so close to grizzly, I think we need to leave this alone to the same default value as Folsom. We can revisit this in Havana when there is more time to ensure that upgrade issues are fully considered.

Revision history for this message
Matthew Macdonald-Wallace (proffalken) wrote :

OK, thanks both. I'll set the config flag for now and work around it this way.

Would this need a blueprint or similar to discuss the way forward for Havana?

Revision history for this message
Michael Still (mikal) wrote :

A blueprint is probably a good idea, mostly because it might get some more eyes onto the problem.

Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: grizzly-rc1 → 2013.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.