object-server and object-replicator not robust to zero-byte hashes.pkl files

Bug #1089140 reported by Darrell Bishop
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
Undecided
Darrell Bishop

Bug Description

This is with swift 1.7.6-dev at commit 13937ad696893898022517846ad17233fb57d61e running on Ubuntu 11.10 Oneiric with Python version 2.7.2-7ubuntu2 (2.7.2+).

 Dec 11 14:21:39 object-server ERROR __call__ error with REPLICATE /d25/206982 :
 Traceback (most recent call last):
   File "/usr/lib/pymodules/python2.7/swift/obj/server.py", line 893, in __call__
     res = method(req)
   File "/usr/lib/pymodules/python2.7/swift/common/utils.py", line 1458, in wrapped
     return func(*a, **kw)
   File "/usr/lib/pymodules/python2.7/swift/common/utils.py", line 477, in _timing_stats
     resp = func(ctrl, *args, **kwargs)
   File "/usr/lib/pymodules/python2.7/swift/obj/server.py", line 873, in REPLICATE
     _junk, hashes = tpool_reraise(get_hashes, path, recalculate=suffixes)
   File "/usr/lib/pymodules/python2.7/swift/obj/replicator.py", line 228, in tpool_reraise
     raise resp
 RuntimeError: maximum recursion depth exceeded

 root@swift-test-04:/var/log/swift# ll /srv/node/d25/objects/206982 -al
 total 1812
 drwxr-xr-x 2 swift swift 35 2012-12-04 17:23 ./
 drwxr-xr-x 49316 swift swift 1241088 2012-12-04 20:00 ../
 -rw------- 1 swift swift 0 2012-11-01 19:31 hashes.pkl
 -rwxr-xr-x 1 swift swift 0 2012-09-13 00:00 .lock*

 176 hashed = 0
 177 hashes_file = join(partition_dir, HASH_FILE)
 178 modified = False
 179 hashes = {}
 180 mtime = -1
 181 try:
 182 with open(hashes_file, 'rb') as fp:
 183 hashes = pickle.load(fp)
 184 mtime = os.path.getmtime(hashes_file)
 185 except Exception:
 186 do_listdir = True
 187 if do_listdir:
 188 for suff in os.listdir(partition_dir):
 189 if len(suff) == 3:
 190 hashes.setdefault(suff, None)
 191 modified = True
 192 hashes.update((hash_, None) for hash_ in recalculate)
 193 for suffix, hash_ in hashes.items():
 194 if not hash_:
 195 suffix_dir = join(partition_dir, suffix)
 196 try:
 197 hashes[suffix] = hash_suffix(suffix_dir, reclaim_age)
 198 hashed += 1
 199 except PathNotDir:
 200 del hashes[suffix]
 201 except OSError:
 202 logging.exception(_('Error hashing suffix'))
 203 modified = True
 204 if modified:
 205 with lock_path(partition_dir):
 206 if not os.path.exists(hashes_file) or \
 207 os.path.getmtime(hashes_file) == mtime:
 208 write_pickle(
 209 hashes, hashes_file, partition_dir, PICKLE_PROTOCOL)
 210 return hashed, hashes
 211 return get_hashes(partition_dir, recalculate, do_listdir,
 212 reclaim_age)
 213 else:
 214 return hashed, hashes

With a zero-byte "hashes_file", line 183 raises an exception; mtime remains -1, and do_listdir is set to True.
"modified" will be set to True.
Line 206 will get evaluated, but the conditional will be False (the file *does* exist and its mtime will not be == -1).
Then, line 211 recurses causing infinite recursion and raising the above RuntimeError.

You could solve this two ways:
 1) assign to mtime before trying to load the pickle data (move line 184 above line 182).
 2) have the conditional on line 206 include a test for os.path.getsize() == 0.

Revision history for this message
Darrell Bishop (darrellb) wrote :

The code in question is formatted better here: http://paste.openstack.org/raw/6iSCAkxpM0Uerp0Jg8J5/

Also, the file in question is swift/obj/replicator.py.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (master)

Fix proposed to branch: master
Review: https://review.openstack.org/17889

Changed in swift:
assignee: nobody → Darrell Bishop (darrellb)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.openstack.org/17889
Committed: http://github.com/openstack/swift/commit/ea95d0092ae56182decaf8bf18cb533838c01c10
Submitter: Jenkins
Branch: master

commit ea95d0092ae56182decaf8bf18cb533838c01c10
Author: Darrell Bishop <email address hidden>
Date: Tue Dec 11 15:32:09 2012 -0800

    Avoid infinite recursion in swift.obj.replicator.get_hashes.

    Fixes bug 1089140.

    Turns out that if an exception bails out of the pickle loading (eg.
    zero-byte hahes_file), the if clause to determine whether or not to
    write out a fresh hashes_file can evaluate to false, leading to an
    infinite loop.

    This patch fixes this infinite loop generally, by ensuring that if any
    exception is thrown, a new hashes_file is written.

    Change-Id: I344c5f8e261ce7c667bdafe1687263a4150b21dc

Changed in swift:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in swift:
milestone: none → 1.7.6
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.