Bootstrapped slave is stored in nailgun with None ip address

Bug #1398048 reported by Andrey Sledzinskiy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Ihor Kalnytskyi

Bug Description

{

    "build_id": "2014-11-30_11-15-26",
    "ostf_sha": "dc66fd39d4d035bb972e4c0225591290593c459d",
    "build_number": "24",
    "auth_required": true,
    "api": "1.0",
    "nailgun_sha": "58e5f47457a0e832c005ce350e01b75a0c01b90a",
    "production": "docker",
    "fuelmain_sha": "f324b592399c544eace2f64cb499564da01ab38c",
    "astute_sha": "1da516b88d1a8d0014d78ab0d796e5b08379a59b",
    "feature_groups": [
        "mirantis"
    ],
    "release": "6.0",
    "release_versions": {
        "2014.2-6.0": {
            "VERSION": {
                "build_id": "2014-11-30_11-15-26",
                "ostf_sha": "dc66fd39d4d035bb972e4c0225591290593c459d",
                "build_number": "24",
                "api": "1.0",
                "nailgun_sha": "58e5f47457a0e832c005ce350e01b75a0c01b90a",
                "production": "docker",
                "fuelmain_sha": "f324b592399c544eace2f64cb499564da01ab38c",
                "astute_sha": "1da516b88d1a8d0014d78ab0d796e5b08379a59b",
                "feature_groups": [
                    "mirantis"
                ],
                "release": "6.0",
                "fuellib_sha": "bbf26b499bf47ca41302ba6f62c3ebc5a493013d"
            }
        }
    },
    "fuellib_sha": "bbf26b499bf47ca41302ba6f62c3ebc5a493013d"

}

Steps:
1. Bootstrap 5 slaves
2. Create cluster - HA, flat nova network, 1 controller
3. Deploy cluster
4. Add 2 controllers
5. Re-deploy cluster
6. After that onenode goes offline because Nailgun shows that this node has no ip address (node-3)
[root@nailgun ~]# fuel nodes
id | status | name | cluster | ip | mac | roles | pending_roles | online | group_id
---|----------|---------------------|---------|-------------|-------------------|------------|---------------|--------|---------
1 | ready | slave-01_controller | 1 | 10.108.95.3 | 64:09:9b:7f:ec:60 | controller | | True | 1
2 | ready | slave-03_controller | 1 | 10.108.95.5 | 64:4b:81:a2:78:4c | controller | | True | 1
4 | ready | slave-02_controller | 1 | 10.108.95.4 | 64:84:c3:b0:d0:05 | controller | | True | 1
5 | discover | Untitled (0e:37) | None | 10.108.95.7 | 64:82:c3:cc:0e:37 | | | True | None
3 | discover | Untitled (3b:74) | None | None | 64:c6:c4:9b:eb:a1 | | | False | None

Vnc to node showed that it has ip address and node is online

Logs are attached

Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :
Revision history for this message
Dima Shulyak (dshulyak) wrote :

Please also attach screenshot from vnc with ip a

Revision history for this message
Dima Shulyak (dshulyak) wrote :

Nailgun ip in DB was updated with None, and after that PUT always fails with:

  File "/usr/lib/python2.6/site-packages/nailgun/api/v1/handlers/node.py", line 140, in PUT
    ip = IPAddress(node.ip)
  File "/usr/lib/python2.6/site-packages/netaddr/ip/__init__.py", line 307, in __init__
    'address from %r' % addr)

Changed in fuel:
importance: Medium → High
status: New → Triaged
Revision history for this message
Dima Shulyak (dshulyak) wrote :

I think we need to make ip required in jsonschema, absense of ip in time of deployment can cause issues with network data
serialization

Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :
Revision history for this message
Dima Shulyak (dshulyak) wrote :

Thank you, i thought there was no connectivity.
1st time ip wasnt reported, lag in ohai or whatever, and then db state became corrupted,
which resulted in 500 ERROR on consequent api requests by agent of node-3

Revision history for this message
Ihor Kalnytskyi (ikalnytskyi) wrote :

I think the IP addr must be a mandatory field, and its presence should be checked by JSON Schema. However, we have to protect ourselves from loosing nodes, so I propose always change timestamp of our nodes and then validate JSON Schema.

BTW, ohai is too laggy, what about replacing it with some handmade python script?

Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Igor Kalnitsky (ikalnitsky)
Revision history for this message
Mike Scherbakov (mihgen) wrote :

> BTW, ohai is too laggy
what do you mean by it?

ohai might not get IP if there is no IP on the node at the moment, for instance when there is network reconfiguration. So adding a simple check on Nailgun side should solve an issue.

How do we want to solve this issue for now?

Changed in fuel:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/138720

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/138720
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=7e124c6b58fa73e2e9facc4a5a3470b572408e30
Submitter: Jenkins
Branch: master

commit 7e124c6b58fa73e2e9facc4a5a3470b572408e30
Author: Igor Kalnitsky <email address hidden>
Date: Wed Dec 3 13:59:42 2014 +0200

    Validate HTTP PUT requests of node handler

    Currently we have turned off validation of HTTP PUT requests for node
    handlers. This leads to the situation when we save in database broken
    data. Since now it's fixed.

    Change-Id: I844e7c48537d32791fc4f952a8b8e7b24bb419e1
    Closes-Bug: #1398048

Changed in fuel:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.