[nailgun] Cannot update interfaces: Cannot find interface with ip which includes to admin subnet

Bug #1304469 reported by Anastasia Palkina
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
Low
Fuel Python (Deprecated)

Bug Description

"build_id": "2014-04-08_04-17-26",
"mirantis": "yes",
"build_number": "271",
"nailgun_sha": "7a05e365240ab27c492b20585ef8ac8557102cc0",
"ostf_sha": "de0222fed646525d248dc6892eeceab139d5c469",
"fuelmain_sha": "5714f2a3cf56ebdc124e9555dfff0fe274e84bdd",
"astute_sha": "55df06b2e84fa5d71a1cc0e78dbccab5db29d968",
"release": "4.1",
"fuellib_sha": "52e7f57695f33bafa5d84d524d77f1bc3a2289b2"

1. Create new environment (CentOS, HA mode)
2. Add 3 controllers
3. Uncheck Cinder LVM on Settings tab
4. Start deployment. It was successful
5. There are many warnings in Web backend logs:

2014-04-08 12:58:28 WARNING

[7fee0cca0700] (manager) Cannot update interfaces: Cannot find interface with ip which includes to admin subnet "Untitled (D6:A2) (id=9, mac=16:fe:49:dd:0b:4e)"

2014-04-08 12:58:21 WARNING

[7fee0eaa3700] (manager) Cannot update interfaces: Cannot find interface with ip which includes to admin subnet "Untitled (2C:50) (id=10, mac=9a:92:10:1f:51:42)"

2014-04-08 12:58:18 WARNING

[7fedfe1fc700] (manager) Cannot update interfaces: Cannot find interface with ip which includes to admin subnet "Untitled (CA:40) (id=8, mac=5e:c5:c5:38:04:4f)"

Revision history for this message
Anastasia Palkina (apalkina) wrote :
Revision history for this message
Mike Scherbakov (mihgen) wrote :

Affects 4.1.A as well, as you can see from sha / build ID

Changed in fuel:
status: New → Confirmed
milestone: 4.1.1 → 5.0
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Set Medium priority. Let's raise it if it affects more than one deployment case.

Changed in fuel:
importance: Undecided → Medium
Revision history for this message
Miroslav Anashkin (manashkin) wrote :

We've got customer who encountered this bug as well.

Changed in fuel:
importance: Medium → High
Mike Scherbakov (mihgen)
tags: added: customer-found
Artem Roma (aroma-x)
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Artem Roma (aroma-x)
Revision history for this message
Artem Roma (aroma-x) wrote :

It seems that in some cases nailgun agent sends incorrect meta to server by means that this data lacks ip address information for all node interfaces. This then leads to failure of checking interface inclusion into admin network https://github.com/stackforge/fuel-web/blob/master/nailgun/nailgun/network/manager.py#L753-L768

Revision history for this message
ADMiNZ (nikolay-zadoya) wrote :

And you need to do to fix this problem?

Revision history for this message
ADMiNZ (nikolay-zadoya) wrote :

[7f1cfa1fc700] (manager) Cannot update interfaces: Cannot find interface with ip which includes to admin subnet "Untitled (00:3F) (id=32, mac=ee:ce:3e:02:a9:49)"

[7f1cfabfd700] (manager) Cannot update interfaces: Cannot find interface with ip which includes to admin subnet "Untitled (00:7E) (id=33, mac=e6:0d:a4:19:b1:41)"

[7f1d08d87700] (manager) Cannot update interfaces: Cannot find interface with ip which includes to admin subnet "Untitled (00:AF) (id=31, mac=b2:04:8a:6a:7b:48)"

After deploy not open OpenStack dashboard (Horizon)

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

It is just a warning in logs and does not affect anything. Decreasing level and moving to 5.1

Changed in fuel:
milestone: 5.0 → 5.1
importance: High → Low
Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

Provided solution is not suitable. We can remove this warning message safely.

Revision history for this message
Mike Scherbakov (mihgen) wrote :

Increasing to High. At least two external users hit it. Bug is still not triaged, and we really need to understand real reasons of this bug and affected areas.

Changed in fuel:
importance: Low → High
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
milestone: 5.1 → 5.0
assignee: Artem Roma (aroma-x) → nobody
assignee: nobody → Fuel Python Team (fuel-python)
status: In Progress → Triaged
Revision history for this message
Openstack Gerrit (openstack-gerrit) wrote : Related fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/92321
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=96fcd884e44541c7f675be3c5339012220a4092b
Submitter: Jenkins
Branch: master

commit 96fcd884e44541c7f675be3c5339012220a4092b
Author: Evgeniy L <email address hidden>
Date: Tue May 6 13:12:51 2014 +0400

    Decrease logging level for interfaces update

    This warning occured only after node is deployed,
    logging level was decreased because we don't
    want to confuse user.

    Change-Id: I3b80bbca36a3604584588f112c5d51058199af41
    Related-Bug: #1304469

Dmitry Pyzhov (dpyzhov)
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Evgeniy L (rustyrobot)
status: Triaged → Fix Committed
Revision history for this message
Joseph Mori (jmori) wrote : Re: Cannot update interfaces: Cannot find interface with ip which includes to admin subnet

I have the same issue and it will not let the nodes deploy. Just a few of the log entries.

2014-05-12 12:48:05 WARNING

[7f480a1fc700] (manager) Cannot update interfaces: Cannot find interface with ip which includes to admin subnet "Untitled (7D:BC) (id=2, mac=92:c4:4a:71:1b:4d)"

2014-05-12 12:48:04 WARNING

[7f48097fb700] (manager) Cannot update interfaces: Cannot find interface with ip which includes to admin subnet "Untitled (4D:BC) (id=5, mac=8e:12:d4:32:5a:40)"

2014-05-12 12:48:04 WARNING

[7f4823314700] (manager) Cannot update interfaces: Cannot find interface with ip which includes to admin subnet "Untitled (4F:8C) (id=1, mac=ae:a4:be:08:d3:46)"

2014-05-12 12:48:04 WARNING

[7f480abfd700] (manager) Cannot update interfaces: Cannot find interface with ip which includes to admin subnet "Untitled (0C:0C) (id=4, mac=1a:68:86:c5:30:4f)"

2014-05-12 12:47:26 WARNING

[7f4823314700] (manager) Cannot update interfaces: Cannot find interface with ip which includes to admin subnet "Untitled (0C:0C) (id=4, mac=1a:68:86:c5:30:4f)"

2014-05-12 12:47:22 WARNING

[7f48097fb700] (manager) Cannot update interfaces: Cannot find interface with ip which includes to admin subnet "Untitled (7D:BC) (id=2, mac=92:c4:4a:71:1b:4d)"

2014-05-12 12:47:22 WARNING

[7f4820b10700] (manager) Cannot update interfaces: Cannot find interface with ip which includes to admin subnet "Untitled (08:CC) (id=3, mac=96:85:20:84:a3:4a)"

2014-05-12 12:47:12 WARNING

[7f4821f12700] (manager) Cannot update interfaces: Cannot find interface with ip which includes to admin subnet "Untitled (4D:BC) (id=5, mac=8e:12:d4:32:5a:40)"

2014-05-12 12:47:04 WARNING

Revision history for this message
Andrey Danin (gcon-monolake) wrote :

These log messages mean that the agent sends a bad info about NICs. It's because a state of network interfaces is changing during a provisioning/deploing procedure. These messages may indicate two things:
1) There are no actual errors, just few more messages occur in a log. A deployment will finish fine and a cluster will work.
2) The agent sent broken data previously but Nailgun took it and placed it to DB because the data was correct from its point of view. And now Nailgun has an inconsistent data in DB and cannot apply a good data from the agent. A deployment may not finish successfully or a cluster may not become operational.

ADMiNZ, Joseph Mori, if you get the last one case I wrote above, you need to attach a diagnostic snapshot to the ticket. With out it we cannot determine what wrong with your set up and what changes should be done to fix it. Please, see this doc about an info we need to do things better https://wiki.openstack.org/wiki/Fuel/Where_to_contribute#Test_and_report_bugs

Revision history for this message
Joseph Mori (jmori) wrote :

Here is the config. The problem I have is the 2nd option where the cluster does not deploy.

{"build_id": "2014-03-05_07-31-01", "mirantis": "yes", "build_number": "235", "nailgun_sha": "f58aad317829112913f364347b14f1f0518ad371", "ostf_sha": "dc54d99ddff2f497b131ad1a42362515f2a61afa", "fuelmain_sha": "16637e2ea0ae6fe9a773aceb9d76c6e3a75f6c3b", "astute_sha": "f15f5615249c59c826ea05d26707f062c88db32a", "release": "4.1", "fuellib_sha": "73313007c0914e602246ea41fa5e8ca2dfead9f8"}

Revision history for this message
Joseph Mori (jmori) wrote :

I have 6 nodes. 1-Fuel 2,3-Controllers 4-Controller/Storage 5,6-Computer
All Dual Xeon with 20GB RAM and 2 nics. 1 nics is on the switch (managed) with a vlan. The 2nd nic is on a separate unmanaged switch. I have one more server her I will be adding to the group as an additional storage node but not on the rack as yet.

Revision history for this message
Joseph Mori (jmori) wrote :

Diagnostic Snapshot enclosed

Revision history for this message
Joseph Mori (jmori) wrote :

Tried disconnecting the 2nd NIC and redeploy. Still will not deploy.

I also get this error:
2014-05-14 18:34:37 WARNING

[7f4821511700] (deployment_serializers) Invalid vlan_splinters value: None

Revision history for this message
Anastasia Palkina (apalkina) wrote :

Reproduced on ISO #213

"build_id": "2014-05-20_01-10-31",
"mirantis": "yes",
"build_number": "213",
"ostf_sha": "353f918197ec53a00127fd28b9151f248a2a2d30",
"nailgun_sha": "ab7f7dfddadfe0e08a39693c6d33aa0250f20142",
"production": "docker",
"api": "1.0",
"fuelmain_sha": "68c62519bc788fd8ff27e4576a6cdf7e7fac14c0",
"astute_sha": "a3432e6e31ffd6f1c56386b2eb54afeacb74750b",
"release": "5.0",
"fuellib_sha": "3d92142a5643af82596f0450e39282550a45e5db"

2014-05-20 11:04:35.533 WARNING [7fed0cdfa700] (node) Failed to update interfaces for node 'Untitled (fc:bd)' - invalid info in meta: Cannot find interface with ip which includes to admin subnet "Untitled (fc:bd) (id=6, mac=62:40:35:31:ad:46)"
2014-05-20 11:04:35.545 WARNING [7fed0cdfa700] (node) Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/nailgun/objects/node.py", line 226, in update_interfaces
    network_manager.check_interfaces_correctness(instance)
  File "/usr/lib/python2.6/site-packages/nailgun/network/manager.py", line 767, in check_interfaces_correctness
    'includes to admin subnet "%s"' % node.full_name)
InvalidInterfacesInfo: Cannot find interface with ip which includes to admin subnet "Untitled (fc:bd) (id=6, mac=62:40:35:31:ad:46)"

Revision history for this message
Anastasia Palkina (apalkina) wrote :
Changed in fuel:
status: Fix Committed → Confirmed
Changed in fuel:
assignee: Evgeniy L (rustyrobot) → Aleksey Kasatkin (alekseyk-ru)
status: Confirmed → In Progress
Revision history for this message
Aleksey Kasatkin (alekseyk-ru) wrote :

This thing doesn't lead to deployment problems as I can see from logs presented here. We can lower the importance of the message to INFO to not confuse users. It's not high-priority for 5.0.

Changed in fuel:
milestone: 5.0 → 5.1
status: In Progress → Confirmed
Revision history for this message
Aleksey Kasatkin (alekseyk-ru) wrote :

Problems with deployment observed in logs provided by Joseph Mori is caused by another reason. There is the same error everywhere in that logs that lead to fail of controller deployment:

2014-05-09T18:57:20 debug: [8859] 402da9e3-6a73-4014-a3d7-80ce8bab9c08: MC agent 'puppetd', method 'last_run_summary', results: {:sender=>"1", :statuscode=>0, :statusmsg=>"OK", :data=>{:status=>"stopped", :runtime=>8, :changes=>{"total"=>818}, :err_msg=>"Process not running but not empty lockfile is present. Trying to remove lockfile...ok.", :enabled=>1, :events=>{"failure"=>8, "total"=>826, "success"=>818}, :version=>{"config"=>1399660419, "puppet"=>"2.7.23"}, :idling=>0, :output=>"Currently stopped; last completed run 8 seconds ago", :stopped=>1, :time=>{"nova_paste_api_ini"=>0.060911, "cinder_config"=>0.063597, "glance_registry_config"=>0.04679, "sysctl"=>0.007922, "ring_container_device"=>5.166411, "service"=>119.635331, "cs_property"=>19.247938, "anchor"=>0.008439, "rabbitmq_user"=>2.576989, "file"=>1.283383, "file_line"=>0.001503, "rabbitmq_user_permissions"=>0.377321, "l3_if_downup"=>0.001907, "config_retrieval"=>37.280536, "package"=>633.97545, "l2_ovs_bridge"=>4.464839, "nova_config"=>0.147581, "cron"=>0.100862, "cs_shadow"=>10.925087, "l2_ovs_patch"=>1.433786, "last_run"=>1399661832, "ceilometer_config"=>0.30722, "glance_api_config"=>0.055452, "ring_object_device"=>3.242823, "firewall"=>16.83787, "neutron_dhcp_agent_config"=>0.054695, "rabbitmq_vhost"=>0.17639, "ring_devices"=>0.000317, "database"=>1.691244, "install_ssh_keys"=>0.011396, "cinder_api_paste_ini"=>0.027768, "l2_ovs_port"=>1.324343, "total"=>1008.071975, "cs_commit"=>8.578894, "database_grant"=>23.171662, "database_user"=>7.358681, "ring_account_device"=>3.249833, "cs_resource"=>18.846675, "heat_config"=>0.744471, "glance_cache_config"=>0.02279, "group"=>0.325504, "user"=>1.610422, "host"=>0.0134, "filebucket"=>0.000166, "exec"=>83.583636, "a2mod"=>0.000576, "notify"=>0.011234, "ceph_conf"=>0.01793}, :running=>0, :resources=>{"changed"=>795, "total"=>1465, "skipped"=>549, "scheduled"=>0, "failed_to_restart"=>1, "restarted"=>61, "failed"=>8, "out_of_sync"=>803}, :lastrun=>1399661832}}

err_msg=>"Process not running but not empty lockfile is present. Trying to remove lockfile...ok."

After 3rd try deployment is stopped.
This situation is repeated for about 10 times in that logs.

Changed in fuel:
assignee: Aleksey Kasatkin (alekseyk-ru) → Fuel Python Team (fuel-python)
Revision history for this message
Aleksey Kasatkin (alekseyk-ru) wrote :

New ticked is reported on "Process not running but not empty lockfile is present" problem:
https://bugs.launchpad.net/fuel/+bug/1321833

Dmitry Pyzhov (dpyzhov)
Changed in fuel:
importance: High → Medium
Revision history for this message
Dima Shulyak (dshulyak) wrote :
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Igor Kalnitsky (ikalnitsky)
status: Confirmed → In Progress
Revision history for this message
Mike Scherbakov (mihgen) wrote :

Many users hit this bug, and it's always unclear how severe it is to see such warnings in logs, so I'm raising this to High. There is also email about it in ML: https://lists.launchpad.net/fuel-dev/msg00955.html

Changed in fuel:
importance: Medium → High
Changed in fuel:
assignee: Igor Kalnitsky (ikalnitsky) → nobody
assignee: nobody → Fuel Python Team (fuel-python)
Changed in fuel:
status: In Progress → Confirmed
Dmitry Ilyin (idv1985)
summary: - Cannot update interfaces: Cannot find interface with ip which includes
- to admin subnet
+ [nailgun] Cannot update interfaces: Cannot find interface with ip which
+ includes to admin subnet
Revision history for this message
Dima Shulyak (dshulyak) wrote :

Reproduced it on latest iso, will provide update soon

Changed in fuel:
status: Confirmed → Triaged
assignee: Fuel Python Team (fuel-python) → Dima Shulyak (dshulyak)
Revision history for this message
Dima Shulyak (dshulyak) wrote :

2014-07-16 10:53:15.746 DEBUG [7faf4535a740] (logger) Request PUT /api/nodes/agent/ from 10.108.25.1:56371 {"manufacturer":"QEMU","os_platform":"centos","mac":"56:00:01:11:28:48","is_agent":true,"agent_checksum":"527b47116db325d4f0e58dcc805b14b76555508a","platform_name":"Standard PC (i440FX + PIIX, 1996)","meta":{"disks":[{"model":null,"disk":"disk/by-path/pci-0000:00:0a.0-virtio-pci-virtio7","removable":"0","size":53687091200,"extra":[],"name":"vdc"},{"model":null,"disk":"disk/by-path/pci-0000:00:09.0-virtio-pci-virtio6","removable":"0","size":53687091200,"extra":[],"name":"vdb"},{"model":null,"disk":"disk/by-path/pci-0000:00:08.0-virtio-pci-virtio5","removable":"0","size":53687091200,"extra":[],"name":"vda"}],"memory":{"devices":[{"type":"RAM","size":1610612736}],"maximum_capacity":1610612736,"slots":1,"total":1610612736},"cpu":{"total":1,"real":0,"spec":[{"model":"Intel Xeon E312xx (Sandy Bridge)","frequency":3500}]},"system":{"manufacturer":"QEMU","version":"pc-i440fx-trusty","fqdn":"node-1.test.domain.local","product":"Standard PC (i440FX + PIIX, 1996)"},"interfaces":[{"mac":"64:B2:85:E2:1F:1A","state":"up","current_speed":null,"name":"eth4"},{"mac":"64:1B:B5:93:51:BB","state":"up","current_speed":null,"name":"eth3"},{"mac":"64:8E:1D:61:B8:F9","state":"up","current_speed":null,"name":"eth2"},{"mac":"64:C7:3A:D0:67:0D","state":"up","current_speed":null,"name":"eth1"},{"mac":"64:15:A2:19:53:06","state":"up","current_speed":null,"name":"eth0"}]},"ip":"10.108.26.3"}
2014-07-16 10:53:15.790 DEBUG [7faf4535a740] (logger) Response code '200 OK' for PUT /api/nodes/agent/ from 10.108.25.1:56371

Revision history for this message
Dima Shulyak (dshulyak) wrote :

Interesting part is that source of the traffic is 10.108.25.1, which was default gateway before br-ex was configured.
And mac is 56:00:01:11:28:48 - br-ex mac address.

Revision history for this message
Dima Shulyak (dshulyak) wrote :

Looks like this is happening because of some lag on br-fw-admin bridge configuration.
To protect nailgun against such issues we shouldn't update interfaces data when "ip":"10.108.26.3" dont belong to admin subnet.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/107412

Changed in fuel:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/107412
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=8cfff5799dd1c5e476808f1e6af768cfbc484574
Submitter: Jenkins
Branch: master

commit 8cfff5799dd1c5e476808f1e6af768cfbc484574
Author: Dima Shulyak <email address hidden>
Date: Wed Jul 16 17:43:59 2014 +0300

    Do not update node info when network data is corrupted

    Deployment process introduces a lot of disruptive changes
    in system configuration which can lead to undesirable state in
    nailgun db

    As solution - update only timestamp when network data is incorrect

    This fix will protect nailgun against wrong network data stored in db
    that maybe generated during process of network configuration on slave

    See latest comment on bug for such case description

    Change-Id: I2d67f851cf0e99f04b50611787e4872193212540
    Closes-Bug: #1304469

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Anastasia Palkina (apalkina) wrote :

Reproduced on ISO #11

"build_id": "2014-09-17_21-40-34",
"ostf_sha": "64cb59c681658a7a55cc2c09d079072a41beb346",
"build_number": "11",
"auth_required": true,
"api": "1.0",
"nailgun_sha": "eb8f2b358ea4bb7eb0b2a0075e7ad3d3a905db0d",
"production": "docker",
"fuelmain_sha": "8ef433e939425eabd1034c0b70e90bdf888b69fd",
"astute_sha": "f5fbd89d1e0e1f22ef9ab2af26da5ffbfbf24b13",
"feature_groups": ["mirantis"],
"release": "5.1",
"release_versions": {"2014.1.1-5.1": {"VERSION": {"build_id": "2014-09-17_21-40-34", "ostf_sha": "64cb59c681658a7a55cc2c09d079072a41beb346", "build_number": "11", "api": "1.0", "nailgun_sha": "eb8f2b358ea4bb7eb0b2a0075e7ad3d3a905db0d", "production": "docker", "fuelmain_sha": "8ef433e939425eabd1034c0b70e90bdf888b69fd", "astute_sha": "f5fbd89d1e0e1f22ef9ab2af26da5ffbfbf24b13", "feature_groups": ["mirantis"], "release": "5.1", "fuellib_sha": "d9b16846e54f76c8ebe7764d2b5b8231d6b25079"}}}, "fuellib_sha": "d9b16846e54f76c8ebe7764d2b5b8231d6b25079"

Warnings http://paste.openstack.org/show/114090/

Changed in fuel:
status: Fix Committed → Confirmed
Revision history for this message
Dima Shulyak (dshulyak) wrote :

This is not a bug, such verification was added to prevent information in database to be updated if it is in wrong format, what is possible during cluster reconfiguration. Warning log is misleading and lets consider changing it to debug in future.

Changed in fuel:
importance: High → Low
milestone: 5.1 → 6.0
assignee: Dima Shulyak (dshulyak) → Fuel Python Team (fuel-python)
Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

Log level is debug now

Changed in fuel:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.