tests/bundles/bionic-ussuri.yaml observed in https://review.opendev.org/#/c/743212/
There is an issue with the manila-share service post-startup where it suddenly gets SIGTERM after a few minutes of operation. It properly reads its config, connects to Ceph mons CephFS, runs periodic share status tasks while being alive.
The behavior matches what systemd does by default when stopping services:
https://www.freedesktop.org/software/systemd/man/systemd.kill.html#KillMode=
"Processes will first be terminated via SIGTERM (unless the signal to send is changed via KillSignal= or RestartKillSignal=)."
Based on the log entries, something stops manila-share via systemd
18:17:31.274360 <--- journal log entry about stopping the manila-share service
18:17:31.278 <--- caught SIGTERM
Meanwhile, Pacemaker was not doing anything at that time by the looks of it:
grep -iP 'stop' /var/log/pacemaker/pacemaker.log | grep 18:17 ; echo $?
1
crm resource status
res_manila_share_manila_share (systemd:manila-share): Started
res_nfs_ganesha_nfs_ganesha (systemd:nfs-ganesha): Started
Resource Group: grp_ganesha_vips
res_ganesha_a3b980a_vip (ocf::heartbeat:IPaddr2): Started
The resources are listed as started
More information below:
1) the NFS share resource is in the error state:
ubuntu@dmitriis-bastion:~$ openstack share list
+--------------------------------------+---------------+------+-------------+--------+-----------+-----------------+------+-------------------+
| ID | Name | Size | Share Proto | Status | Is Public | Share Type Name | Host | Availability Zone |
+--------------------------------------+---------------+------+-------------+--------+-----------+-----------------+------+-------------------+
| ec095c71-f8f2-4aa0-bdd1-ed1f47bd3cce | cephnfsshare1 | 1 | NFS | error | False | cephfsnfstype | | None |
+--------------------------------------+---------------+------+-------------+--------+-----------+-----------------+------+-------------------+
2) manila-share services are DOWN:
ubuntu@dmitriis-bastion:~$ manila service-list
+----+------------------+--------------------------------------------+------+---------+-------+----------------------------+
| Id | Binary | Host | Zone | Status | State | Updated_at |
+----+------------------+--------------------------------------------+------+---------+-------+----------------------------+
| 1 | manila-scheduler | juju-111c97-zaza-1335ec68e4d0-13 | nova | enabled | up | 2020-08-04T14:57:38.000000 |
| 2 | manila-data | juju-111c97-zaza-1335ec68e4d0-13 | nova | enabled | up | 2020-08-04T14:57:43.000000 |
| 3 | manila-share | juju-111c97-zaza-1335ec68e4d0-5@cephfsnfs1 | nova | enabled | down | 2020-08-04T13:36:25.000000 |
| 4 | manila-share | juju-111c97-zaza-1335ec68e4d0-4@cephfsnfs1 | nova | enabled | down | 2020-08-04T13:40:57.000000 |
| 5 | manila-share | juju-111c97-zaza-1335ec68e4d0-3@cephfsnfs1 | nova | enabled | down | 2020-08-04T13:56:55.000000 |
+----+------------------+--------------------------------------------+------+---------+-------+----------------------------+
3) /var/log/manila/manila-share.log at manila-ganesha units contains the following:
2020-08-04 18:13:43.190 214098 INFO oslo_service.service [req-19fc1843-0bdf-4c8a-8b66-d190a192e4ee - - - - -] Starting 1 workers
...
2020-08-04 18:13:43.875 214156 DEBUG manila.service [req-e89377c0-b153-462f-91db-1b1912a38223 - - - - -] Creating RPC server for service manila-share. start /usr/lib/python3/dist-packages/manila/service.py:121
2020-08-04 18:13:43.918 214156 DEBUG manila.share.manager [req-e89377c0-b153-462f-91db-1b1912a38223 - - - - -] Start initialization of driver: 'CephFSDriver@juju-111c97-zaza-1335ec68e4d0-4@cephfsnfs1' _driver_setup /usr/lib/python3/dist-packages/manila/share/manager.py:309
2020-08-04 18:13:43.919 214156 INFO manila.share.drivers.cephfs.driver [req-e89377c0-b153-462f-91db-1b1912a38223 - - - - -] [CEPHFSNFS1}] Ceph client found, connecting...
...
2020-08-04 18:17:30.935 214156 INFO manila.share.manager [req-b8b9a3fe-59f0-4666-94ad-3ffd6f43bb32 - - - - -] Updating share status
2020-08-04 18:17:30.941 214156 DEBUG manila.share.driver [req-b8b9a3fe-59f0-4666-94ad-3ffd6f43bb32 - - - - -] Updating share stats. _update_share_stats /usr/lib/python3/dist-packages/manila/share/driver.py:1271
2020-08-04 18:17:30.941 214156 DEBUG oslo_service.periodic_task [req-b8b9a3fe-59f0-4666-94ad-3ffd6f43bb32 - - - - -] Running periodic task ShareManager.periodic_share_status_update run_periodic_tasks /usr/lib/python3/dist-packages/oslo_service/periodic_task.py:216
2020-08-04 18:17:30.942 214156 DEBUG manila.share.manager [req-b8b9a3fe-59f0-4666-94ad-3ffd6f43bb32 - - - - -] Updating status of share instances. periodic_share_status_update /usr/lib/python3/dist-packages/manila/share/manager.py:4515
2020-08-04 18:17:31.278 214098 INFO oslo_service.service [req-19fc1843-0bdf-4c8a-8b66-d190a192e4ee - - - - -] Caught SIGTERM, stopping children
2020-08-04 18:17:31.279 214098 DEBUG oslo_concurrency.lockutils [req-19fc1843-0bdf-4c8a-8b66-d190a192e4ee - - - - -] Acquired lock "singleton_lock" lock /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:265
2020-08-04 18:17:31.279 214098 DEBUG oslo_concurrency.lockutils [req-19fc1843-0bdf-4c8a-8b66-d190a192e4ee - - - - -] Releasing lock "singleton_lock" lock /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:281
2020-08-04 18:17:31.280 214098 DEBUG oslo_service.service [req-19fc1843-0bdf-4c8a-8b66-d190a192e4ee - - - - -] Stop services. stop /usr/lib/python3/dist-packages/oslo_service/service.py:699
2020-08-04 18:17:31.280 214098 INFO manila.coordination [req-19fc1843-0bdf-4c8a-8b66-d190a192e4ee - - - - -] Stopped Coordinator (Agent ID: 73b71760-242a-40e4-a54f-108702fb28ef, prefix: manila-)
2020-08-04 18:17:31.280 214098 DEBUG oslo_service.service [req-19fc1843-0bdf-4c8a-8b66-d190a192e4ee - - - - -] Killing children. stop /usr/lib/python3/dist-packages/oslo_service/service.py:704
2020-08-04 18:17:31.286 214098 INFO oslo_service.service [req-19fc1843-0bdf-4c8a-8b66-d190a192e4ee - - - - -] Waiting on 1 children to exit
2020-08-04 18:17:31.294 214098 INFO oslo_service.service [req-19fc1843-0bdf-4c8a-8b66-d190a192e4ee - - - - -] Child 214156 killed by signal 15
journalctl -o short-precise -u manila-share.service
Aug 04 18:13:41.971356 juju-111c97-zaza-1335ec68e4d0-4 systemd[1]: Started OpenStack Manila Share.
Aug 04 18:13:44.588087 juju-111c97-zaza-1335ec68e4d0-4 manila-share[214188]: /usr/lib/python3/dist-packages/pymysql/cursors.py:170: Warning: (3719, "'utf8' is curren>
Aug 04 18:13:44.588087 juju-111c97-zaza-1335ec68e4d0-4 manila-share[214188]: result = self._query(query)
Aug 04 18:13:44.588087 juju-111c97-zaza-1335ec68e4d0-4 manila-share[214188]: Exception ignored in: <function _after_fork at 0x7f35f2a6c670>
Aug 04 18:13:44.588087 juju-111c97-zaza-1335ec68e4d0-4 manila-share[214188]: Traceback (most recent call last):
Aug 04 18:13:44.588087 juju-111c97-zaza-1335ec68e4d0-4 manila-share[214188]: File "/usr/lib/python3.8/threading.py", line 1454, in _after_fork
Aug 04 18:13:44.588087 juju-111c97-zaza-1335ec68e4d0-4 manila-share[214188]: assert len(_active) == 1
Aug 04 18:13:44.588087 juju-111c97-zaza-1335ec68e4d0-4 manila-share[214188]: AssertionError:
Aug 04 18:13:44.604497 juju-111c97-zaza-1335ec68e4d0-4 sudo[214188]: manila : TTY=unknown ; PWD=/var/lib/manila ; USER=root ; COMMAND=/usr/bin/manila-rootwrap /etc>
Aug 04 18:13:44.605673 juju-111c97-zaza-1335ec68e4d0-4 sudo[214188]: pam_unix(sudo:session): session opened for user root by (uid=0)
Aug 04 18:13:44.816512 juju-111c97-zaza-1335ec68e4d0-4 sudo[214188]: pam_unix(sudo:session): session closed for user root
Aug 04 18:17:31.274360 juju-111c97-zaza-1335ec68e4d0-4 systemd[1]: Stopping OpenStack Manila Share...
Aug 04 18:17:31.331146 juju-111c97-zaza-1335ec68e4d0-4 systemd[1]: manila-share.service: Succeeded.
Aug 04 18:17:31.331676 juju-111c97-zaza-1335ec68e4d0-4 systemd[1]: Stopped OpenStack Manila Share.
----
Earlier errors in the Pacemaker log indicate that stop and start operations time out:
Aug 04 17:42:09 juju-111c97-zaza-1335ec68e4d0-4 pacemaker-controld [30769] (process_lrm_event) error: Result of start operation for res_nfs_ganesha_nfs_ganesha on juju-111c97-zaza-1335ec68e4d0-4: Timed Out | call=211 key=res_nfs_ganesha_nfs_ganesha_start_0 timeout=20000ms
Aug 04 17:42:16 juju-111c97-zaza-1335ec68e4d0-4 pacemaker-controld [30769] (process_lrm_event) error: Result of start operation for res_manila_share_manila_share on juju-111c97-zaza-1335ec68e4d0-4: Timed Out | call=214 key=res_manila_share_manila_share_start_0 timeout=20000ms
Aug 04 17:42:19 juju-111c97-zaza-1335ec68e4d0-4 pacemaker-controld [30769] (process_lrm_event) error: Result of stop operation for res_manila_share_manila_share on juju-111c97-zaza-1335ec68e4d0-4: Timed Out | call=216 key=res_manila_share_manila_share_stop_0 timeout=20000ms
Aug 04 17:42:19 juju-111c97-zaza-1335ec68e4d0-4 pacemaker-controld [30769] (process_lrm_event) error: Result of stop operation for res_nfs_ganesha_nfs_ganesha on juju-111c97-zaza-1335ec68e4d0-4: Timed Out | call=217 key=res_nfs_ganesha_nfs_ganesha_stop_0 timeout=20000ms
As a sanity test:
* systemctl stop pacemaker;
* systemctl start manila-share.
Results in the same behavior - the manila-share service gets stopped in ~ 3 minutes.
So it's definitely not Pacemaker that stops manila-share.