LCM cannot be operated once a server failure occurs

Bug #1924917 reported by Toshiaki Takahashi
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tacker
Fix Released
High
Hiromu Asahina

Bug Description

If the tacker servers stop during LCM execution, the LCM state will be fixed to "Processing" even after servers restart. Once in this situation, we cannot do any error handling operation of the LCM.

[operation log]
stack@instance-2:~/devstack$ openstack vnflcm instantiate fbe894ad-acb6-4ff2-bb48-c61ddfd7e643 ~/work/package/param.json
Instantiate request for VNF Instance fbe894ad-acb6-4ff2-bb48-c61ddfd7e643 has been accepted.

stack@instance-2:~/work/package/sample$ pkill -SIGINT -f tacker # assuming system error etc.
stack@instance-2:~/work/package/sample$ ps aux | grep tacker
stack 655957 0.0 0.0 5192 2424 pts/3 S+ 14:06 0:00 grep --color=auto tacker

stack@instance-2:~/work/package/sample$ sudo systemctl restart devstack@tacker-conductor devstack@tacker
stack@instance-2:~/work/package/sample$ ps aux | grep tacker
stack 659040 92.5 0.3 167840 117736 ? Rs 14:07 0:01 /usr/bin/python3.8 /usr/local/bin/tacker-conductor --config-file /etc/tacker/tacker.conf
stack 659042 92.5 0.3 166484 115784 ? Rs 14:07 0:01 /usr/bin/python3.8 /usr/local/bin/tacker-server --config-file /etc/tacker/tacker.conf
stack 659287 0.0 0.0 5192 2536 pts/3 S+ 14:07 0:00 grep --color=auto tacker

stack@instance-2:~/devstack$ date; openstack vnflcm op list
Sun Apr 18 14:07:40 UTC 2021
+--------------------------------------+-----------------+--------------------------------------+-------------+
| ID | Operation State | VNF Instance ID | Operation |
+--------------------------------------+-----------------+--------------------------------------+-------------+
| fbdb864f-5dbe-435f-92ea-29b52de5b731 | PROCESSING | fbe894ad-acb6-4ff2-bb48-c61ddfd7e643 | INSTANTIATE |
+--------------------------------------+-----------------+--------------------------------------+-------------+

stack@instance-2:~/devstack$ date; openstack vnflcm op list
Sun Apr 18 14:37:50 UTC 2021
+--------------------------------------+-----------------+--------------------------------------+-------------+
| ID | Operation State | VNF Instance ID | Operation |
+--------------------------------------+-----------------+--------------------------------------+-------------+
| fbdb864f-5dbe-435f-92ea-29b52de5b731 | PROCESSING | fbe894ad-acb6-4ff2-bb48-c61ddfd7e643 | INSTANTIATE |
+--------------------------------------+-----------------+--------------------------------------+-------------+

Yasufumi Ogawa (yasufum)
Changed in tacker:
importance: Undecided → High
Yasufumi Ogawa (yasufum)
Changed in tacker:
assignee: nobody → Yasufumi Ogawa (yasufum)
Changed in tacker:
assignee: Yasufumi Ogawa (yasufum) → Hiromu Asahina (h-asahina)
Changed in tacker:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tacker (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/tacker/+/819275

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tacker (master)

Change abandoned by "Hiromu Asahina <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/tacker/+/819275
Reason: duplicated

Revision history for this message
renu rani (renur) wrote :

Is this bug for all LCM operation:
1- VNF instantiate
2- VNF heal
3- VNF terminate
4- VNF delete

Revision history for this message
Hiromu Asahina (h-asahina) wrote :

Yes.
Any operation that transitions to the PROCESSING state [1] can face this bug.

[1]: https://www.etsi.org/deliver/etsi_gs/NFV-SOL/001_099/003/03.05.01_60/gs_nfv-sol003v030501p.pdf#page=162

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tacker (master)

Reviewed: https://review.opendev.org/c/openstack/tacker/+/815416
Committed: https://opendev.org/openstack/tacker/commit/d40de6c71ecf9d5d31d31ddc94b4ae8ad40f2ca8
Submitter: "Zuul (22348)"
Branch: master

commit d40de6c71ecf9d5d31d31ddc94b4ae8ad40f2ca8
Author: Hiromu Asahina <email address hidden>
Date: Fri Oct 29 23:02:03 2021 +0900

    Fix LCM failure

    In current implementation, if the tacker servers go down during LCM
    operation, the operation state will be fixed to `PROCESSING` even after
    servers restart. Once this problem happens, users cannot change the
    operation state with APIs, which means users have to change the state by
    updating DB directly.

    This patch fixes this problem by adding a `/vnf_lcm_op_occs/{id}/cancel`
    endpoint according to ETSI NFV SOL003 [1]. Users can change the state of
    LCM operations in `PROCESSING` to `FAILED_TEMP` with this API.

    Note that currently the cancel API support only a transition from
    PROCESSING -> FAILED_TEMP (i.e., transitions from ROLLING_BACK and
    STARTING are not supported)

    In addition, as the current ``_get_affected_resources`` doesn't work
    correctly when there are no updated resources, this patch modify
    ``utils.py`` [2] to fix it and update the tests [3] depending on
    ``utils.py``.

    [1] https://www.etsi.org/deliver/etsi_gs/NFV-SOL/001_099/003/03.05.01_60/gs_NFV-SOL003v030501p.pdf
    [2] tacker/vnflcm/utils.py
    [3] tacker/tests/unit/conductor/test_conductor_server.py

    Change-Id: I54aa967be7903c064433f01adf0f99074577a8da
    Closes-bug: #1924917
    Signed-off-by: Hiromu Asahina <email address hidden>

Changed in tacker:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tacker 7.0.0.0rc1

This issue was fixed in the openstack/tacker 7.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.