Trove GuestAgent Stop Also Stops Some Datastores

Bug #1295313 reported by Auston McReynolds
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack DBaaS (Trove)
Fix Released
High
Auston McReynolds

Bug Description

executing "sudo service trove-guest stop" in a cassandra guest vm not
only shuts down the guest agent, it shuts down cassandra. after some
investigation, it became clear that the cassandra process created by
the guest agent belongs to the same process group as trove-guest.

relevant code points-of-interest:

* http://git.io/Q03cKw
* http://git.io/Q90dLQ
* http://git.io/Pvn8rA

a couple of options:

* change 'sudo /usr/sbin/cassandra' to 'sudo -b /usr/sbin/cassandra'.
  this runs the command in the background, preventing the shutdown of
  trove-guest from manipulating it.

* change 'sudo /usr/sbin/cassandra' to 'sudo service cassandra start'.
  it's unclear why this approach was not used to begin with, but that
  aside, this works like the "-b" switch above due to the use of
  start-stop-daemon
  (see https://gist.github.com/amcrn/d2c7c9a7d8838b303117).

it's worth noting that the final command resulting from the init
script differs from running /usr/sbin/cassandra directly in that it:

* adds "-Dcassandra-pidfile=/var/run/cassandra/cassandra.pid"
* adds to path: "/usr/share/java/jna.jar"
* adds "-XX:HeapDumpPath=/var/lib/cassandra/java_1395267419.hprof"
* adds "-XX:ErrorFile=/var/lib/cassandra/hs_err_1395267419.log"

suffice it to say, it looks like using 'sudo service cassandra start'
is the correct approach/fix.

thus far cassandra has been the focal point of this write-up, but
let's step back and consider holistically how datastores should be
started.

proposed guidelines:

* if an init script exists, it should be used
  (e.g. 'sudo service <datastore> start')

* if an init script exists, but it doesn't properly daemonize (to
  avoid being stopped when trove-guest is stopped), the '-b' switch
  should be added.
  (e.g. 'sudo -b service <datastore> start')

* if an init script does not exist, or it does not withstand a
  a trove-guest stop, with or without the '-b' switch, running the
  bin directly is acceptable.

taking an example, mysql 5.6 suggests to start the mysql daemon with
mysqld_safe (https://gist.github.com/amcrn/861ed3d016e3d772e6d0).
however, mysqld_safe does not utilize start-stop-daemon or an
equivalent strategy, meaning mysql will be shutdown if trove-guest is
stopped. using the guidelines above, the approach would be:
'sudo -b service mysql.server start'.

a slight variation of this approach would be to always use the '-b'
switch for a datastore start,

i.e. modify:

* http://git.io/25H4Dw
* http://git.io/IXjGag
* http://git.io/GFgj3w

one thing that remains unclear however, is: should mysqld_safe be
used in favor of mysqld for mysql 5.5 for ubuntu/fedora? the verbiage
in the mysql literature is somewhat wishy-washy:

* http://dev.mysql.com/doc/refman/5.5/en/automatic-start.html
* http://dev.mysql.com/doc/refman/5.5/en/mysqld-safe.html

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to trove (master)

Fix proposed to branch: master
Review: https://review.openstack.org/81914

Changed in trove:
assignee: nobody → Auston McReynolds (amcrn)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to trove (master)

Reviewed: https://review.openstack.org/81914
Committed: https://git.openstack.org/cgit/openstack/trove/commit/?id=bb2d3179013bef289b40c048ca5bc235eff3ac01
Submitter: Jenkins
Branch: master

commit bb2d3179013bef289b40c048ca5bc235eff3ac01
Author: amcrn <email address hidden>
Date: Thu Mar 20 13:33:42 2014 -0700

    Change Cassandra to Service Start vs Bin

    starts cassandra with 'sudo service cassandra start' vs.
    'sudo /usr/sbin/cassandra' to avoid having cassandra stopped if the
    guestagent is stopped. see the bug details for more information.

    Change-Id: Ifab557aaff5023f4b0fb4b91193785a0deb5ee2f
    Closes-Bug: #1295313

Changed in trove:
status: In Progress → Fix Committed
Changed in trove:
milestone: none → juno-1
importance: Undecided → Medium
importance: Medium → High
Thierry Carrez (ttx)
Changed in trove:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in trove:
milestone: juno-1 → 2014.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.