Broken cluster after plugin installation with invalid data

Bug #1484181 reported by Egor Kotko on 2015-08-12
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
High
Ihor Kalnytskyi
7.0.x
High
Ihor Kalnytskyi

Bug Description

Steps to reproduce:
1. Install plugin builder (http://paste.openstack.org/show/394727/)
2. Create plugin: fpb --create vip_reservation_plugin
3. Copy configs (metadata.yaml, network_roles.yaml, task.yaml, ) into build dir.
    -metadata.yaml
      http://paste.openstack.org/show/412722/
    -network_roles.yaml
      http://paste.openstack.org/show/412720/ -- here was added all possible networks in the cluster
    -tasks.yalm
      http://paste.openstack.org/show/412721/
4. Build plugin: fpb --build <path_to_plugin>
5. Install plugin: fuel plugins --install <path_to_plugin>
6. Create new cluster, go on tab "Settings", find section with the name of your plugin, switch it on
7. Deploy cluster

Expected result:
Cluster works ok

Actual:
Cluster become unavailable via Web

With such variant of network_roles.yaml works fine:
http://paste.openstack.org/show/412728/

Egor Kotko (ykotko) wrote :
Egor Kotko (ykotko) wrote :
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Liubov Efremova (lefremova)
summary: - Broken cluster after plugin installation with non valid data
+ Broken cluster after plugin installation with invalid data
Changed in fuel:
status: New → Confirmed
Egor Kotko (ykotko) wrote :

Also reproduced in the case when number of reserved vips in network_roles.yaml (http://paste.openstack.org/show/414041/)
bigger then number of accessible public addresses in the pool.

{"build_id": "2015-08-12_17-24-26", "build_number": "165", "release_versions": {"2015.1.0-7.0": {"VERSION": {"build_id": "2015-08-12_17-24-26", "build_number": "165", "api": "1.0", "fuel-library_sha": "1176b634eeafb8465a88ff357fdcf40005fba610", "nailgun_sha": "68642a8207d6f12543f244bab0c130e2510536ee", "feature_groups": ["mirantis"], "fuel-nailgun-agent_sha": "e01693992d7a0304d926b922b43f3b747c35964c", "openstack_version": "2015.1.0-7.0", "fuel-agent_sha": "57145b1d8804389304cd04322ba0fb3dc9d30327", "production": "docker", "python-fuelclient_sha": "26fc025e0fc5791b62e5ed8561a6016bf8a406bc", "astute_sha": "e1d3a435e5df5b40cbfb1a3acf80b4176d15a2dc", "fuel-ostf_sha": "58220583f10fa47f12291488ef77854809c68310", "release": "7.0", "fuelmain_sha": "67e5214c0dc5d4ba6da4ae651cef9934800459a9"}}}, "auth_required": true, "api": "1.0", "fuel-library_sha": "1176b634eeafb8465a88ff357fdcf40005fba610", "nailgun_sha": "68642a8207d6f12543f244bab0c130e2510536ee", "feature_groups": ["mirantis"], "fuel-nailgun-agent_sha": "e01693992d7a0304d926b922b43f3b747c35964c", "openstack_version": "2015.1.0-7.0", "fuel-agent_sha": "57145b1d8804389304cd04322ba0fb3dc9d30327", "production": "docker", "python-fuelclient_sha": "26fc025e0fc5791b62e5ed8561a6016bf8a406bc", "astute_sha": "e1d3a435e5df5b40cbfb1a3acf80b4176d15a2dc", "fuel-ostf_sha": "58220583f10fa47f12291488ef77854809c68310", "release": "7.0", "fuelmain_sha": "67e5214c0dc5d4ba6da4ae651cef9934800459a9"}

Changed in fuel:
status: Confirmed → In Progress
Egor Kotko (ykotko) wrote :

Also reproduced in the case when names of several "-id" & "-name" are the same http://paste.openstack.org/show/414294/

{"build_id": "2015-08-12_17-24-26", "build_number": "165", "release_versions": {"2015.1.0-7.0": {"VERSION": {"build_id": "2015-08-12_17-24-26", "build_number": "165", "api": "1.0", "fuel-library_sha": "1176b634eeafb8465a88ff357fdcf40005fba610", "nailgun_sha": "68642a8207d6f12543f244bab0c130e2510536ee", "feature_groups": ["mirantis"], "fuel-nailgun-agent_sha": "e01693992d7a0304d926b922b43f3b747c35964c", "openstack_version": "2015.1.0-7.0", "fuel-agent_sha": "57145b1d8804389304cd04322ba0fb3dc9d30327", "production": "docker", "python-fuelclient_sha": "26fc025e0fc5791b62e5ed8561a6016bf8a406bc", "astute_sha": "e1d3a435e5df5b40cbfb1a3acf80b4176d15a2dc", "fuel-ostf_sha": "58220583f10fa47f12291488ef77854809c68310", "release": "7.0", "fuelmain_sha": "67e5214c0dc5d4ba6da4ae651cef9934800459a9"}}}, "auth_required": true, "api": "1.0", "fuel-library_sha": "1176b634eeafb8465a88ff357fdcf40005fba610", "nailgun_sha": "68642a8207d6f12543f244bab0c130e2510536ee", "feature_groups": ["mirantis"], "fuel-nailgun-agent_sha": "e01693992d7a0304d926b922b43f3b747c35964c", "openstack_version": "2015.1.0-7.0", "fuel-agent_sha": "57145b1d8804389304cd04322ba0fb3dc9d30327", "production": "docker", "python-fuelclient_sha": "26fc025e0fc5791b62e5ed8561a6016bf8a406bc", "astute_sha": "e1d3a435e5df5b40cbfb1a3acf80b4176d15a2dc", "fuel-ostf_sha": "58220583f10fa47f12291488ef77854809c68310", "release": "7.0", "fuelmain_sha": "67e5214c0dc5d4ba6da4ae651cef9934800459a9"}

Egor Kotko (ykotko) wrote :
tags: added: non-release
tags: removed: non-release
Andrew Maksimov (maximov) wrote :

Liubov, status in progress should be set only when you submit patch on review. it may look strange but this is our current process.

Changed in fuel:
status: In Progress → Confirmed
Aleksey Kasatkin (alekseyk-ru) wrote :

So, we need to check:
1. Networks have sufficient address space for VIPs (and hosts).
2. Names of VIPs are unique.
It should be checked before IPs (VIPs) allocation.
VIPs are allocated when network configuration is being requiested.
So, it's required there also (not only before deployment).
AFAIC, It can be included into networking settings validator or CheckNetworks class.

Liubov Efremova (lefremova) wrote :

Also we should check that:
3. Network roles have unique ids (for 7.0 it should be so, but for 8.0 this limitation will be removed)
4. Network roles have the correct values of default mapping (any default network except the private network)

Now if we have invalid data in one of this four cases, we get Server Error during deployment ( serialization in the execute() method of ApplyChangesTaskManager class. )

So, may be we should add some validation before the call of execute() method?

And what is the expected result? Should we get cluster in error state (as after fail in check_before_deployment_task) or not?

If yes I can offer a fast fix that simply handles the exceptions during serialization and returns the task in error state with the corresponding fail message.

If not I suppose that validation should be provided in BaseDefferedTaskValidator class.
Unfortunately in this case I have some ideas and attempts but I don't have enough experience and cannot promise that I can implement this in a few days. So if someone of you feels that could implement this validation faster, please reassign bug. I can share with you my knowledge in this question.

Aleksey Kasatkin (alekseyk-ru) wrote :

Luibov, what about checking 3 and 4 in check_before_deployment_task ?
AFAIC, it could be good enough. If it does not lead to any problems..

I'm just not clear why do you write : "(any default network except the private network)" in 4. Some network roles are mapped to private network as well.

Liubov Efremova (lefremova) wrote :

As I understood, private network has no ip-address (in 4).

check_before_deployment_task is not correct decision because the error occurs earlier (during serialization) than this task runs.

Aleksey Kasatkin (alekseyk-ru) wrote :

Private has no IPs in VLAN mode , it has IPs in GRE/TUN mode. But network roles can still be mapped to it. VIPs cannot be created if no IPs are allowed though.

Aleksey Kasatkin (alekseyk-ru) wrote :

And VIPs cannot be created for admin network as it uses DHCP and is shared between clusters.

tags: added: module-nailgun
removed: feature-plugins
tags: added: tricky
Dmitry Pyzhov (dpyzhov) wrote :

Tricky bug about validation of plugin data. Does not affect usual workflow. Moving to 8.0

Changed in fuel:
milestone: 7.0 → 8.0
tags: added: feature-validation
Changed in fuel:
assignee: Liubov Efremova (lefremova) → Fuel Python Team (fuel-python)
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Igor Kalnitsky (ikalnitsky)

Fix proposed to branch: master
Review: https://review.openstack.org/226844

Changed in fuel:
status: Confirmed → In Progress

Reviewed: https://review.openstack.org/226844
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=cdf7c774d5f8cab7e6640a613e8ffecbd151a982
Submitter: Jenkins
Branch: master

commit cdf7c774d5f8cab7e6640a613e8ffecbd151a982
Author: Igor Kalnitsky <email address hidden>
Date: Wed Sep 23 15:06:04 2015 +0300

    Fix assign_vip for default admin network case

    Default fuelweb_admin doesn't belong to any node group, so we can't use
    the same SQL query to this network. So let's fallback to default admin
    network in case there's no admin network in conroller's node group.

    Closes-Bug: #1484181

    Change-Id: I93034d9555b0851cf9c3d525cd7118de5a089dc3
    Signed-off-by: Igor Kalnitsky <email address hidden>

Changed in fuel:
status: In Progress → Fix Committed
tags: added: on-verification
Vitaly Sedelnik (vsedelnik) wrote :

Igor, please review the fix and backport to 7.0 if applicable. If not, please update the status accordingly (Won't Fix or Invalid).

Can't reproduce the bug on 8.0 build 108.

Steps to reproduce:
  ...
  6. Create new cluster, go on tab "Settings", find section with the name of your plugin, switch it on:

No section "vip_reservation_plugin" on tab page "OpenStack Settings"

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "8.0"
  openstack_version: "2015.1.0-8.0"
  api: "1.0"
  build_number: "108"
  build_id: "108"

[root@nailgun ~]# fuel plugins
id | name | version | package_version
---|------------------------|---------|----------------
1 | vip_reservation_plugin | 3.0.0 | 3.0.0

Confirmed on fuel-kilo-8.0-140-2015-10-09_22-23-00.iso:

Steps to reproduce:
  ...
  3a. Change version '7.0' to '8.0' for fuel-8.0:
  sed -ir 's/7.0/8.0/' vip_reservation_plugin/metadata.yaml
  ...
  6. Create new cluster, go on tab "Settings", find section with the name of your plugin, switch it on
  6a. Select tab "Nodes" check "All nodes" click [Configure interfaces]:
  Note: Configure interfaces dialog doesn't open
  6b. Select any other tab page (e.g. "Networks"):

Results:
Cluster unavailable via Web

---------------
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "8.0"
  openstack_version: "2015.1.0-8.0"
  api: "1.0"
  build_number: "140"

Reviewed: https://review.openstack.org/232555
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=caf5d0e45417a6f4206921178a8947664438fc2b
Submitter: Jenkins
Branch: stable/7.0

commit caf5d0e45417a6f4206921178a8947664438fc2b
Author: Igor Kalnitsky <email address hidden>
Date: Wed Sep 23 15:06:04 2015 +0300

    Fix assign_vip for default admin network case

    Default fuelweb_admin doesn't belong to any node group, so we can't use
    the same SQL query to this network. So let's fallback to default admin
    network in case there's no admin network in conroller's node group.

    Closes-Bug: #1484181

    Change-Id: I93034d9555b0851cf9c3d525cd7118de5a089dc3
    Signed-off-by: Igor Kalnitsky <email address hidden>

tags: removed: on-verification
tags: added: on-verification
Dmitry Pyzhov (dpyzhov) on 2015-10-22
tags: added: area-python

Verified on
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "8.0"
  openstack_version: "2015.1.0-8.0"
  api: "1.0"
  build_number: "171"
  build_id: "171"

Fix Released

Changed in fuel:
status: Fix Committed → Fix Released
tags: removed: on-verification

verified on 7.0-301 with MU

tags: added: 7mu1-verified
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers