cinder service is down when re-spawning child process

Bug #1811344 reported by Yikun Jiang
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Medium
Yikun Jiang

Bug Description

Currently, then oslo.service provider a mechanism that the main process will re-spawn the children as necessary.

But if we kill the children service, the services (except the latest service) are always down after re-spawn.

Reproduce Step:

1. First, if we have 2 volume backends like: lvmdriver-1, lvmdriver-2, we will start 3 process:
# ps -ef | grep cinder-v
stack 24951 1 1 01:45 ? 00:01:02 cinder-volume --config-file /etc/cinder/cinder.conf
stack 25379 24951 2 01:47 ? 00:01:44 cinder-volume --config-file /etc/cinder/cinder.conf
stack 25380 24951 2 01:47 ? 00:01:44 cinder-volume --config-file /etc/cinder/cinder.conf

the 24951 is the main process, and the 25379 and 25380 are children process.

The cinder-volume.conf is like:

enabled_backends = lvmdriver-1,lvmdriver-2

[lvmdriver-1]
image_volume_cache_enabled = True
volume_clear = zero
lvm_type = auto
target_helper = tgtadm
volume_group = stack-volumes-lvmdriver-1
volume_driver = cinder.volume.drivers.lvm.LVMVolumeDriver
volume_backend_name = lvmdriver-1

[lvmdriver-2]
image_volume_cache_enabled = True
volume_clear = zero
lvm_type = auto
target_helper = tgtadm
volume_group = stack-volumes-lvmdriver-1
volume_driver = cinder.volume.drivers.lvm.LVMVolumeDriver
volume_backend_name = lvmdriver-1

# cinder service-list
+------------------+------------------------+------+---------+-------+----------------------------+-----------------+
| Binary | Host | Zone | Status | State | Updated_at | Disabled Reason |
+------------------+------------------------+------+---------+-------+----------------------------+-----------------+
| cinder-backup | ubuntubase | nova | enabled | up | 2019-01-11T08:16:26.000000 | - |
| cinder-scheduler | ubuntubase | nova | enabled | up | 2019-01-11T08:16:24.000000 | - |
| cinder-volume | ubuntubase@lvmdriver-1 | nova | enabled | up | 2019-01-11T08:16:25.000000 | - |
| cinder-volume | ubuntubase@lvmdriver-2 | nova | enabled | up | 2019-01-11T08:16:25.000000 | - |
+------------------+------------------------+------+---------+-------+----------------------------+-----------------+

2. Kill the 25379 and 25380 child.
If we kill the 25379 and 25380, the main process will re-spawn these two process.

# kill -9 25379;kill -9 25380
# ps -ef | grep cinder-v
stack 24951 1 1 01:45 ? 00:01:07 cinder-volume --config-file /etc/cinder/cinder.conf
stack 32433 24951 5 03:17 ? 00:00:00 cinder-volume --config-file /etc/cinder/cinder.conf
stack 32434 24951 5 03:17 ? 00:00:00 cinder-volume --config-file /etc/cinder/cinder.conf

We can see the process are started as expected.

3. The lvmdriver-1 is always down.
root@ubuntubase:/opt/stack/cinder# cinder service-list
+------------------+------------------------+------+---------+-------+----------------------------+-----------------+
| Binary | Host | Zone | Status | State | Updated_at | Disabled Reason |
+------------------+------------------------+------+---------+-------+----------------------------+-----------------+
| cinder-backup | ubuntubase | nova | enabled | up | 2019-01-11T08:18:16.000000 | - |
| cinder-scheduler | ubuntubase | nova | enabled | up | 2019-01-11T08:18:14.000000 | - |
| cinder-volume | ubuntubase@lvmdriver-1 | nova | enabled | down | 2019-01-11T08:17:25.000000 | - |
| cinder-volume | ubuntubase@lvmdriver-2 | nova | enabled | up | 2019-01-11T08:18:23.000000 | - |
+------------------+------------------------+------+---------+-------+----------------------------+-----------------+

But the problem is that, if we kill 25379 and 25380, only latest service (ubuntubase@lvmdriver-2) is always up, but lvmdriver-2 volume is always down.

[1] https://github.com/openstack/oslo.service/blob/d987a4a/oslo_service/service.py#L661

Yikun Jiang (yikunkero)
Changed in cinder:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/630068

Changed in cinder:
assignee: nobody → Yikun Jiang (yikunkero)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.opendev.org/630068
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=40127d95a9e83e97970dc388c18c99eea7715edc
Submitter: Zuul
Branch: master

commit 40127d95a9e83e97970dc388c18c99eea7715edc
Author: Yikun Jiang <email address hidden>
Date: Fri Jan 11 16:50:24 2019 +0800

    Refresh the Service.service_id after re-spawning children

    Currently, the oslo.service provider a mechanism that the main process
    will re-spawn the children as necessary. But if we kill the children
    service, the services (except the latest service) are always down
    after re-spawn.

    This reason of this problem is that the Service.service_id is inherited
    from the parent process [1] (the parent Service service id class
    attribute) and has been recorded as the last created service [2][3]
    when latest process is stared. But when re-spawning child process,
    only start() method would be called[1], so that the Service.service_id
    is not refreshed as expected.

    In order to refresh the Service class attribute service_id, we
    should store the service_id in instance attr origin_service_id, and set
    the class attr back using the instance attribute in start method.

    [1] https://github.com/openstack/oslo.service/blob/d987a4a/oslo_service/service.py#L648
    [2] https://github.com/openstack/cinder/blob/099b141/cinder/service.py#L193
    [3] https://github.com/openstack/cinder/blob/099b141/cinder/service.py#L344

    Change-Id: Ibefda81215c5081634876a2064b15638388ae921
    Closes-bug: #1811344

Changed in cinder:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 15.0.0.0rc1

This issue was fixed in the openstack/cinder 15.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/687145

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/stein)

Reviewed: https://review.opendev.org/687145
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=c0c272b3b874e00fce7e41f03779edf9b8fd0c15
Submitter: Zuul
Branch: stable/stein

commit c0c272b3b874e00fce7e41f03779edf9b8fd0c15
Author: Yikun Jiang <email address hidden>
Date: Fri Jan 11 16:50:24 2019 +0800

    Refresh the Service.service_id after re-spawning children

    Currently, the oslo.service provider a mechanism that the main process
    will re-spawn the children as necessary. But if we kill the children
    service, the services (except the latest service) are always down
    after re-spawn.

    This reason of this problem is that the Service.service_id is inherited
    from the parent process [1] (the parent Service service id class
    attribute) and has been recorded as the last created service [2][3]
    when latest process is stared. But when re-spawning child process,
    only start() method would be called[1], so that the Service.service_id
    is not refreshed as expected.

    In order to refresh the Service class attribute service_id, we
    should store the service_id in instance attr origin_service_id, and set
    the class attr back using the instance attribute in start method.

    [1] https://github.com/openstack/oslo.service/blob/d987a4a/oslo_service/service.py#L648
    [2] https://github.com/openstack/cinder/blob/099b141/cinder/service.py#L193
    [3] https://github.com/openstack/cinder/blob/099b141/cinder/service.py#L344

    Change-Id: Ibefda81215c5081634876a2064b15638388ae921
    Closes-bug: #1811344
    (cherry picked from commit 40127d95a9e83e97970dc388c18c99eea7715edc)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 14.0.3

This issue was fixed in the openstack/cinder 14.0.3 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.