apt cache operations have potential for filesystem races

Bug #1346489 reported by Andreas Hasenack
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Charm Helpers
High
James Page
ceph (Juju Charms Collection)
High
James Page
ceph-osd (Juju Charms Collection)
High
Liam Young
ceph-radosgw (Juju Charms Collection)
High
Björn Tillenius
landscape-client (Juju Charms Collection)
High
Chris Glass
mysql (Juju Charms Collection)
High
Chris Glass
ntpmaster (Juju Charms Collection)
High
Chris Glass
rabbitmq-server (Juju Charms Collection)
High
Chris Glass

Bug Description

Got this backtrace while deploying cs:trusty/ceph-radosgw:
2014-07-21 18:37:41 INFO identity-service-relation-changed /usr/bin/radosgw is running.
2014-07-21 18:45:11 INFO mon-relation-changed ^MReading package lists... 0%^M^MReading package lists... 0%^M^MReading package lists... 1%^M^MReading package lists... 11%^M^MReading package lists... 11%^M^MReading package lists... 21%^M^MReading package lists... 28%^M^MReading package lists... 41%^M^MReading package lists... 54%^M^MReading package lists... 57%^M^MReading package lists... 57%^M^MReading package lists... 58%^M^MReading package lists... 58%^M^MReading package lists... 64%^M^MReading package lists... 64%^M^MReading package lists... 64%^M^MReading package lists... 64%^M^MReading package lists... 82%^M^MReading package lists... 91%^M^MReading package lists... 91%^M^MReading package lists... 93%^M^MReading package lists... 93%^M^MReading package lists... 95%^M^MReading package lists... 95%^M^MReading package lists... 95%^M^M^MReading package lists... 95%^M^MReading package lists... 95%^M^MReading package lists... 96%^M^MReading package lists... 96%^M^MReading package lists... 96%^M^MReading package lists... 96%^M^MReading package lists... 96%^M^MReading package lists... 96%^M^MReading package lists... 96%^M^MReading package lists... 96%^M^MReading package lists... 96%^M^MReading package lists... 96%^M^MReading package lists... 97%^M^MReading package lists... 97%^M^MReading package lists... 97%^M^MReading package lists... 97%^M^MReading package lists... 97%^M^MReading package lists... 97%^M^MReading package lists... 97%^M^MReading package lists... 97%^M^MReading package lists... 97%^M^MReading package lists... 97%^M^MReading package lists... 97%^M^MReading package lists... 97%^M^MReading package lists... 98%^M^MReading package lists... 98%^M^MReading package lists... 98%^M^MReading package lists... 98%^M^MReading package lists... 98%^M^MReading package lists... 98%^M^MReading package lists... 99%^M^MReading package lists... 99%^M^MReading package lists... 99%^M^MReading package lists... 99%^M^MReading package lists... 99%^M^MReading package lists... 99%^M^MReading package lists... 99%
^M^MReading package lists... 99%^M^MReading package lists... Error!
2014-07-21 18:45:11 INFO mon-relation-changed Traceback (most recent call last):
2014-07-21 18:45:11 INFO mon-relation-changed File "/var/lib/juju/agents/unit-ceph-radosgw-0/charm/hooks/mon-relation-changed", line 234, in <module>
2014-07-21 18:45:11 INFO mon-relation-changed hooks.execute(sys.argv)
2014-07-21 18:45:11 INFO mon-relation-changed File "/var/lib/juju/agents/unit-ceph-radosgw-0/charm/hooks/charmhelpers/core/hookenv.py", line 478, in execute
2014-07-21 18:45:11 INFO mon-relation-changed self._hooks[hook_name]()
2014-07-21 18:45:11 INFO mon-relation-changed File "/var/lib/juju/agents/unit-ceph-radosgw-0/charm/hooks/mon-relation-changed", line 181, in mon_relation
2014-07-21 18:45:11 INFO mon-relation-changed emit_cephconf()
2014-07-21 18:45:11 INFO mon-relation-changed File "/var/lib/juju/agents/unit-ceph-radosgw-0/charm/hooks/mon-relation-changed", line 76, in emit_cephconf
2014-07-21 18:45:11 INFO mon-relation-changed 'version': ceph.get_ceph_version('radosgw'),
2014-07-21 18:45:11 INFO mon-relation-changed File "/var/lib/juju/agents/unit-ceph-radosgw-0/charm/hooks/ceph.py", line 227, in get_ceph_version
2014-07-21 18:45:11 INFO mon-relation-changed cache = apt.Cache()
2014-07-21 18:45:11 INFO mon-relation-changed SystemError: E:Problem renaming the file /var/cache/apt/pkgcache.bin.IUy28v to /var/cache/apt/pkgcache.bin - rename (2: No such file or directory), W:You may want to run apt-get update to correct these problems
2014-07-21 18:45:11 ERROR juju.worker.uniter uniter.go:486 hook failed: exit status 1

The apt.Cache() call should perhaps be placed in a try/except block with a few of retries.

This happened inside a container where ceph-radosgw is the principal and there are two subordinates: landscape-client and ntp.

Related branches

Revision history for this message
James Page (james-page) wrote :

I think we can make that run in memory, avoiding the need for filesystem locking.

Revision history for this message
Björn Tillenius (bjornt) wrote :

Agreed. I've put up a branch for review that changes it run in memory, which deployed fine for me.

Changed in ceph-radosgw (Juju Charms Collection):
status: New → In Progress
assignee: nobody → Björn Tillenius (bjornt)
Revision history for this message
James Page (james-page) wrote :

Looking at the codebases for ceph and ceph-osd, this is a potential race condition there as well - raising tasks for those charms as well.

Changed in ceph-radosgw (Juju Charms Collection):
importance: Undecided → High
Changed in ceph-osd (Juju Charms Collection):
importance: Undecided → High
Changed in ceph (Juju Charms Collection):
importance: Undecided → High
Revision history for this message
James Page (james-page) wrote :

Hmm - and the same issue appears in charmhelpers/contrib/host

Revision history for this message
James Page (james-page) wrote :

Thinking about this a bit harder; the code in the ceph charms can quite happily use the charmhelper function, so lets a) fix the charmhelper and b) remove the code from the ceph.py codebase.

summary: - apt-get update error should be better handled
+ apt cache operations have potential for filesystem races
Changed in charm-helpers:
assignee: nobody → James Page (james-page)
status: New → In Progress
importance: Undecided → High
James Page (james-page)
Changed in ceph (Juju Charms Collection):
status: New → In Progress
assignee: nobody → James Page (james-page)
James Page (james-page)
Changed in ceph-osd (Juju Charms Collection):
status: New → In Progress
Changed in charm-helpers:
status: In Progress → Fix Released
Changed in ceph-osd (Juju Charms Collection):
status: In Progress → Fix Released
James Page (james-page)
Changed in ceph-radosgw (Juju Charms Collection):
status: In Progress → Fix Released
Changed in ceph (Juju Charms Collection):
status: In Progress → Fix Released
David Britton (dpb)
Changed in ntpmaster (Ubuntu):
status: New → Fix Released
assignee: nobody → David Britton (davidpbritton)
no longer affects: ntpmaster (Ubuntu)
Changed in ntpmaster (Juju Charms Collection):
assignee: nobody → Chris Glass (tribaal)
David Britton (dpb)
Changed in ntpmaster (Juju Charms Collection):
importance: Undecided → High
status: New → Fix Released
Changed in mysql (Juju Charms Collection):
assignee: nobody → Chris Glass (tribaal)
importance: Undecided → High
status: New → Fix Released
David Britton (dpb)
Changed in rabbitmq-server (Juju Charms Collection):
status: New → Fix Released
importance: Undecided → High
assignee: nobody → Chris Glass (tribaal)
David Britton (dpb)
Changed in ceph-osd (Juju Charms Collection):
assignee: nobody → Liam Young (gnuoy)
David Britton (dpb)
Changed in landscape-client (Juju Charms Collection):
status: New → Fix Committed
importance: Undecided → High
assignee: nobody → Chris Glass (tribaal)
Revision history for this message
Trent Lloyd (lathiat) wrote :

Quick research note (I know this bug is old and fixed, but just wanted to note for anyone looking into this later)

This issue is caused by apt-get clean removing the temporary file before it's renamed.

Related Debian bug (still unsolved):
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=782501

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.