Datastore Install In Prepare (vs. Prebaked) Fails

Bug #1276863 reported by Auston McReynolds
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack DBaaS (Trove)
Fix Released
Undecided
Viswa Vutharkar

Bug Description

note that in https://review.openstack.org/#/c/53378/12/scripts/files/elements/ubuntu-mongodb/install.d/10-mongodb, "rm -rf /var/lib/mongodb/" is present. this is used to clear out the prealloc files created upon installation.

if the "rm -rf" above is not present, then the guest fails to initialize due to an issue with the rsync of the data-dir never returning successfully (because the guest is convinced it needs to migrate existing data).

2014-01-30 07:04:41.031 DEBUG trove.openstack.common.processutils [req-4df30578-48f8-46b1-af35-7dfc47f65d1a 1608984302624e548e8617d1c5983a19 7eec58dd037e488689a920d8ec07cbeb] Running cmd (subprocess): sudo rsync --safe-links --perms --recursive --owner --group --xattrs --sparse /var/lib/mongodb/ /mnt/volume from (pid=1167) execute /home/stack/trove/trove/openstack/common/processutils.py:137 2014-01-30 07:04:49

^^ will spin forever.

the issue is, that if the package is being installed in prepare() vs. pre-baked in the image, the same "rm -rf" needs to happen to avoid the issue mentioned above, but such a command is never issued.

either the rsync logic needs to be fixed, the same "rm -rf" needs to be done in the guest, or some intelligence needs to be added to the guest to understand the difference between an empty/dummy data-dir vs. a legitimate one that needs to be migrated.

tl;dr: unless mongodb is pre-baked into the image, mongodb will not work. the same issue exists for cassandra, and likely for others.

summary: - Datastore Install On Prepare Fails for Mongo/Cassandra
+ Datastore Install In Prepare (vs. Prebaked) Fails
Changed in trove:
milestone: none → icehouse-3
Changed in trove:
assignee: nobody → Viswa Vutharkar (vvutharkar)
Revision history for this message
Viswa Vutharkar (vvutharkar) wrote :
Download full text (3.4 KiB)

I have done some reading around this https://github.com/openstack/diskimage-builder
install.d elements run in 'chroot' mode on the Host machine itself where the image is being built (the redstack machine). It makes no sense to delete this dir at this stage because I have observed one of the two things happen based on version of mongo
mongo 2.0.4
------------------
By default the apt-get install step above installs mongo version 2.0.4
The install process also configures 'upstart'/init.d scripts to startup mongod process automatically at boot time.
Given the above, when the image is provisioned and VM comes up, at boot time the mongodb process starts up, and not finding the /var/lib/mongodb/ dir, it goes ahead and creates one again and populates it with sparse files (about 3GB total size)
The trove guest agent install (rsync) etc comes later, as part of the execution of first-boot.d element. So when the manager comes up, it detects the presence of /var/lib/mongodb/ dir and triggers the 'migration' process anyway (mount the volume, move the files via rsync, remount /var/lib/mongodb on the volume etc).
So, not sure what exactly is accomplished by removing the directory at install.d element stage
mongo 2.4.9
------------------
I forced the installation of mongo 2.4.9 by adding couple more steps before the apt-get install and including mongodb.org debian repos in the etc/apt/sources.list.d/
This install (of version 2.4.9) via the apt-get install also configures the 'upstart'/init.d scripts to startup mongod process automatically at boot time.
Given the above, when the image is provisioned and VM comes up, at boot time the mongodb process starts up. But possibly due to non resilient upstart scripts bundled with this version, the mongo version 2.4.9 fails to startup if the /var/lib/mongodb/ dir is not present. So I tweaked the rm statement instead to "rm -rf /var/lib/mongodb/* " . But that doesn't help either due to the remaining logic described in the 2.0.4 case (mongodb starts up and creates files in the dir, 3.5GB worth of data, and first-boot.d only comes later on, when guest agent starts and detects the dir presence and triggers migration).
What I also noticed is that the rsync triggered to migrate the data into the mounted volume took about 10 mins, but this might be environmental (cinder network speed) at our company.
Anyway, So, not sure what exactly is accomplished by removing the directory at install.d element stage
Proper solution
----------------------
The proper solution is to tackle this removal of /var/lib/mongodb/* contents in the guestagent manager::prepare() method. Even there what is the need to do rsync at all, If that data is discardable (which is why you would consider rm -rf in the first place). Just delete the content, don't do rsync, and just mount the /var/lib/mongodb on the empty formatted volume.
But there could be a sensitive datastore that may put some critical data there without which it cannot start up. In that case you may want to migrate the data rather than delete it. That is one more reason why you want to tackle this in a datastore specific manager, at one time prepare() phase.
My assumption is that the prepare() ...

Read more...

Revision history for this message
Viswa Vutharkar (vvutharkar) wrote :

There is one implication of deleting the content of /var/lib/mongodb/ dir ( rm -fr /var/lib/mongodb/*) , whether it is done in the install.d/10-mongodb redstack element or in the guest agent prepare() method just before mounting /var/lib/mongodb on the attached volume device, either way, I noticed that the mongod daemon when it starts up, it tries to create the prealloc files if they are not present and starts zeroing them out. During this stage, the mongod is not reaachable from the mongo client or mongostat. Hence the status is reported as shutdown for a while until the preallocation finishes.

In both mongo versions 2.0.4 and 2.4.9 I noticed that the size of these files is about 3.9G total. At least in the redstack environment (which is on a VM ) in openstack cloud at my company, the DB instance is created as nested VM, which is presented with a volume that is based off of a loop device on the devstack host. This is probably slowing IOPS, or there might be another reason, but in any case, this prealloc takes about 10 min ! May be the performance is much better on a DB guest instance VM that is provisioned on a bare metal hypervisor compute node rather than on top of devstack node.

Revision history for this message
Viswa Vutharkar (vvutharkar) wrote :

Also note that the rsync that happens at volume migration (moving /var/lib/mongodb contents to the newly formatted volume and mounting that volume on /var/lib/mongodb ) , may seem to take long time and appear to spin forever. But in my case it did finish after about 12 minutes. And this happens even in your install.d/10-mongodb element had a "rm -fr /var/lib/mongodb/" because the mongod starts up during first boot and before the rsync step at volume migration time happens, the mongod would have recreated the sparse prealloc large files.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to trove (master)

Fix proposed to branch: master
Review: https://review.openstack.org/76066

Changed in trove:
status: New → In Progress
Changed in trove:
milestone: icehouse-3 → icehouse-rc1
Changed in trove:
milestone: icehouse-rc1 → none
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.