[quickstart] if a virtualbmc process crashes ironic cannot recover

Bug #1722037 reported by Michele Baldessari
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Low
Michele Baldessari

Bug Description

In the current oooq configuration of virtualbmc processes we do the following:
- name: Create the VirtualBMC systemd service
  when: release not in ['liberty', 'mitaka', 'newton']
  copy:
    mode: 0664
    dest: "/usr/lib/systemd/system/virtualbmc.service"
    content: |
      [Unit]
      Description=VirtualBMC service
      After=network.target

      [Service]
      Type=oneshot
      ExecStart=/bin/bash -c 'for bmc in $(ls /root/.vbmc/); do vbmc start $bmc; done'
      ExecStop=/bin/bash -c 'for bmc in $(ls /root/.vbmc/); do vbmc stop $bmc; done'
      RemainAfterExit=yes

      [Install]
      WantedBy=multi-user.target

While the above works for when the undercloud reboot, it does not work if a virtualbmc process crashes. We can fix this by using systemd's instantiated services and use systemd with a service for each virtualbmc.

Tags: quickstart
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-quickstart-extras (master)

Fix proposed to branch: master
Review: https://review.openstack.org/510331

Changed in tripleo:
status: Triaged → In Progress
Changed in tripleo:
milestone: queens-1 → queens-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-quickstart-extras (master)

Change abandoned by Emilien Macchi (<email address hidden>) on branch: master
Review: https://review.openstack.org/510331
Reason: Clearing the gate now, see context on http://lists.openstack.org/pipermail/openstack-dev/2017-October/123979.html

I'll restore the patch once we're green. Apologizes in advance and don't worry for your patch, it will merge asap.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-quickstart-extras (master)

Reviewed: https://review.openstack.org/510331
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=2d1e0cc9b572c5adeef4b24e51303cd598bcbd1b
Submitter: Zuul
Branch: master

commit 2d1e0cc9b572c5adeef4b24e51303cd598bcbd1b
Author: Michele Baldessari <email address hidden>
Date: Sun Oct 8 09:24:51 2017 +0200

    Switch vbmc to Systemd's instantiated services and open up corresponding firewall

    We currently use the following single systemd service to start all vbmc
    instances:

      Type=oneshot
      ExecStart=/bin/bash -c 'for bmc in $(ls /root/.vbmc/); do vbmc start $bmc; done'
      ExecStop=/bin/bash -c 'for bmc in $(ls /root/.vbmc/); do vbmc stop $bmc; done'

    While this works okay for deployment and undercloud reboot, it does
    not restart a crashing vbmc process. Let's switch to starting a vbmc process
    via systemd's instantiated service, which will cover for all existing
    cases and can also take care of restarting a crashing vbmc process:

    $ systemctl status virtualbmc@ceph_2
      virtualbmc@ceph_2.service - VirtualBMC ceph_2 service
       Loaded: loaded (/usr/lib/systemd/system/virtualbmc@.service; disabled; vendor preset: disabled)
       Active: active (running) since Sun 2017-10-08 07:09:36 UTC; 23s ago
      Process: 18184 ExecStart=/bin/vbmc start %i (code=exited, status=0/SUCCESS)
     Main PID: 18194 (vbmc)
       CGroup: /system.slice/system-virtualbmc.slice/virtualbmc@ceph_2.service
               18194 /usr/bin/python2 /bin/vbmc start ceph_2

    Oct 08 07:09:35 undercloud systemd[1]: Starting VirtualBMC ceph_2 service...
    Oct 08 07:09:36 undercloud systemd[1]: PID file /root/.vbmc/ceph_2/pid not readable (yet?) after start.
    Oct 08 07:09:36 undercloud systemd[1]: Started VirtualBMC ceph_2 service.

    $ kill 18194
    $ systemctl status virtualbmc@ceph_2
      virtualbmc@ceph_2.service - VirtualBMC ceph_2 service
       Loaded: loaded (/usr/lib/systemd/system/virtualbmc@.service; disabled; vendor preset: disabled)
       Active: active (running) since Sun 2017-10-08 07:10:22 UTC; 816ms ago
      Process: 18311 ExecStop=/bin/vbmc stop %i (code=exited, status=0/SUCCESS)
      Process: 18319 ExecStart=/bin/vbmc start %i (code=exited, status=0/SUCCESS)
     Main PID: 18328 (vbmc)
       CGroup: /system.slice/system-virtualbmc.slice/virtualbmc@ceph_2.service
               18328 /usr/bin/python2 /bin/vbmc start ceph_2

    Oct 08 07:10:22 undercloud systemd[1]: virtualbmc@ceph_2.service holdoff time over, scheduling restart.
    Oct 08 07:10:22 undercloud systemd[1]: Starting VirtualBMC ceph_2 service...
    Oct 08 07:10:22 undercloud systemd[1]: PID file /root/.vbmc/ceph_2/pid not readable (yet?) after start.
    Oct 08 07:10:22 undercloud systemd[1]: Started VirtualBMC ceph_2 service.

    Additionally let's open the IPMI ports used by vbmc on the undercloud.
    This way ironic on the overcloud and cluster stonith can actually work
    correctly.

    Change-Id: I2d2f8554a58196e88996aee5089899529b5af831
    Closes-Bug: #1722037

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-quickstart-extras 2.1.1

This issue was fixed in the openstack/tripleo-quickstart-extras 2.1.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.