git usage can trigger diskspace/memory issues for charms with blobs

Bug #1232304 reported by JuanJo Ciarlante
52
This bug affects 8 people
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
Critical
William Reade

Bug Description

juju-core internal git usage can trigger diskspace/memory issues for charms including (big) blobs.

We have a usecase where we use upgrade-charm to not only push charm code itself, but also 'application' code, and website tar-gzipped content blobs [1] (FYI in the order of ~600MB total - an arguable approach, with its own tradeoffs).

Because of the VCS'ing that juju-core uses to land + merge upgraded charm, /var/lib/juju/agents/unit-FOO-*/state/deployer/update-* trees where increasing size ~proportionally to the charmsize*upgrade-rate, given these big 'binary' blobs.

While trying to workaround this issue by manually doing shallow git cloning, got git OOM-killed (order of ~2G RSS, plus non-swap units), succeeded after tuning with [2]
        git config --global core.bigFileThreshold 10m
        git config --global pack.windowMemory 256m

I suggest:
[1] Documenting this git workflow as CAVEAT, re: "big" binary blobs.
[2] Having juju-core to tune git to avoid it from potentially interfering with running services.

Related branches

Revision history for this message
JuanJo Ciarlante (jjo) wrote :

FYI/FTR steps I did to successfully reclaim diskspace (granted - just buying time ;): http://paste.ubuntu.com/6164905/

Curtis Hovey (sinzui)
Changed in juju-core:
status: New → Triaged
importance: Undecided → Low
tags: added: docs feature
Curtis Hovey (sinzui)
tags: added: doc
removed: docs
Curtis Hovey (sinzui)
tags: added: canonical-webops
Curtis Hovey (sinzui)
tags: added: cpc
tags: added: pes
removed: cpc
Revision history for this message
Curtis Hovey (sinzui) wrote :

Hi John, Tim, and William, Kapil, et al.

Several stakeholders want to escalate this bug. I think we are being asked to fix this issue soon.

I think this bug as it is currently described is low because it advocates a solution without defining the root problem that needs solving. Canonical webops create fat-charms (embedded dependencies and content) to deploy services. I think this is a hack that has reached its evolutionary dead end. The mojo project may dictate that fat-charms are the only way to deploy to production.

I believe the real issue here is that we do not have a best-practice strategy to recommend placing dependencies in an environment for all charms/services to share. As for content, surely this could be handled as other DB and site charms by mounting/switching the storage.

If we do want to address this bug as it is described, then may this is a performance bug.

Revision history for this message
Nate Finch (natefinch) wrote :

Some background about git:

When you store binary blob in git, and then submit a change that changes the binary.... it doesn't store the diff of the two binaries (like it does for text files), it just stores both two binaries. So, if you have a 200 meg zip file and you add something to it and re-commit, you now have two 200-meg zip files in the repo.

Whenever you get code from git, you get the WHOLE REPO. There's no getting a specific branch to reduce the amount you have to download.... so the increased size of the repo from the binaries becomes a headache really quickly.

I'm not really up to speed on upgrade-charm, so I'm not sure whose "fault" this is, but *any* process that relies on committing binary blobs to git is never going to be viable.

Revision history for this message
Kapil Thangavelu (hazmat) wrote :

I'd suggest dropping git and going to a manifest file on extraction of the charm and then a comparison against that when extracting, removing old files in the manifest no longer present in the new manifest. The main goals are keeping new files introduced into the charm during unit lifecycle, while removing old files no longer in new charms when upgrading.

Curtis Hovey (sinzui)
tags: added: cts-cloud-review
removed: pes
Curtis Hovey (sinzui)
tags: added: docs
removed: doc
Revision history for this message
David Murphy (schwuk) wrote :

Why was the pes tag removed? Until WebOps give us a different way to deploy we are being bitten hard by this bug and currently cannot deploy to prodstack.

Revision history for this message
Haw Loeung (hloeung) wrote :

This worked for me to reclaim space:

rm -rf .git
git init
git config --global core.bigFileThreshold 10m
git config --global pack.windowMemory 256m
git add .
git commit -m '[hloeung] stripped to save diskspace'

At least, I think it did. Tested against a staging service :)

Revision history for this message
James Troup (elmo) wrote :

Can we please bump the priority of this bug? It's causing outages on production services by OOMing instances.

Revision history for this message
Jacek Nykis (jacekn) wrote :

I just hit this bug in production. On a box with 2G of RAM:
29668 root 20 0 1690m 1.4g 113m D 11.6 70.3 0:44.87 git

This took the server over oom threshold and some of my apps were killed. After using Haw's workaround everything came back to normal.

Revision history for this message
Mark Ramm (mark-ramm) wrote :

This is creating issues for a number of users today, bumping the priority.

Particularly if it is just an issue of setting the config correctly to handle large binary files, this should be done as soon as we can.

Changed in juju-core:
importance: Low → High
milestone: none → 2.0
milestone: 2.0 → 1.17.1
Revision history for this message
Ryan Finnie (fo0bar) wrote :

Haw Loeung (hloeung) wrote on 2013-12-04:
> This worked for me to reclaim space:

It's a little more complicated than that. Juju appears to work by git pulling from the newly deployed deployer dir (which is a branch of the previously deployed deployer dir) to the charm dir. Therefore, the charm dir must be a git ancestor of the last deployed deployer dir to work. Otherwise upgrade-charm falls on its face.

This procedure seems be the minimum you can clean down to, after an upgrade. It results in two .gits without history:

UNIT=foo-0
git config --global core.bigFileThreshold 10m
git config --global pack.windowMemory 256m
cd /var/lib/juju/agents/unit-${UNIT}
rm -rf charm/.git state/deployer/update-*/.git
(
  cd state/deployer/current/
  git init
  git add .
  git commit -m '[rfinnie] clean slate'
)
rsync -a state/deployer/current/.git/ charm/.git/

Also, if you're looking for stuff to clean in a fat charm environment, keep in mind /var/lib/juju/agents/unit-${UNIT}/state/bundles has previous zips as well.

William Reade (fwereade)
Changed in juju-core:
assignee: nobody → William Reade (fwereade)
Martin Packman (gz)
Changed in juju-core:
milestone: 1.17.1 → 1.18.0
Revision history for this message
Mark Ramm (mark-ramm) wrote :

I may be remembering incorrectly but I also think we are calling git commit after every hook, causing some un-bounded growth patterns, even on non "fat" charms. One quick and dirty solution would be to check

    git diff

and only do commits if there is anything in the result.

Changed in juju-core:
importance: High → Critical
Curtis Hovey (sinzui)
tags: added: upgrade-charm
Jacek Nykis (jacekn)
tags: added: canonical-is
Revision history for this message
John A Meinel (jameinel) wrote :

I believe William implement the "don't commit after every hook call", so that should keep things leaner on average.

William Reade (fwereade)
Changed in juju-core:
milestone: 1.20.0 → 2.0
status: Triaged → In Progress
William Reade (fwereade)
Changed in juju-core:
status: In Progress → Fix Committed
milestone: 2.0 → 1.19.1
Curtis Hovey (sinzui)
summary: - consider tuning git setup for juju-core, and document caveats
+ git usage can trigger diskspace/memory issues for charms with blobs
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.