account-server errors when using swift_hash_path_prefix introduced in 1.8.0

Bug #1172358 reported by Sergio Rubio
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Invalid
Undecided
David Hadas

Bug Description

David Hadas (the author of the swift_hash_path_prefix patch I believe) gently asked me to open the bug report so here we go.

We've been able to reproduce account-server errors trying to locate a DB when using Swift 1.8.0 in combination with the recently introduced setting swift_hash_path_prefix (see https://bugs.launchpad.net/swift/+bug/1157454).

The issue goes away if swift_hash_path_prefix is NOT present (i.e. the way it was before 1.8.0). Tested quite a few times both enabling and disabling swift_hash_path_prefix and we can consistently reproduce the issue.

The error log:

Apr 24 18:35:47 swift-002 account-server ERROR __call__ error with PUT /7e0cfcfc-7c6e-41a4-adc8-e4173147bd2a/464510/AUTH_aeedb7bdcb8846599e2b6b87bb8a947f/dispersion_e888be22b1a24f218cc77cdcda1762be : #012Traceback (most recent call last):#012 File "/usr/lib/python2.7/dist-packages/swift/account/server.py", line 333, in __call__#012 res = method(req)#012 File "/usr/lib/python2.7/dist-packages/swift/common/utils.py", line 1558, in wrapped#012 return func(*a, **kw)#012 File "/usr/lib/python2.7/dist-packages/swift/common/utils.py", line 520, in _timing_stats#012 resp = func(ctrl, *args, **kwargs)#012 File "/usr/lib/python2.7/dist-packages/swift/account/server.py", line 112, in PUT#012 req.headers['x-bytes-used'])#012 File "/usr/lib/python2.7/dist-packages/swift/common/db.py", line 1431, in put_container#012 raise DatabaseConnectionError(self.db_file, "DB doesn't exist")#012DatabaseConnectionError: DB connection error (/srv/node/7e0cfcfc-7c6e-41a4-adc8-e4173147bd2a/accounts/464510/adc/e2cfc6be58be71d5ad6111364fff0adc/e2cfc6be58be71d5ad6111364fff0adc.db, 0):#012DB doesn't exist

We get this kind of errors periodically (perhaps triggered by the replicators/updaters?).

The way we reproduce it:

1. Start with an empty cluster
2. Use swift-dispersion-populate the cluster

root@swift-proxy-01:~# swift-dispersion-populate
Created 5242 containers for dispersion reporting, 1m, 0 retries
Created 5242 objects for dispersion reporting, 41s, 0 retries

# dispersion config, redacted to remove sensitive info
[dispersion]
auth_url = http://test-host:5000/v2.0/
auth_version = 2.0
auth_user = tenant:user
auth_key = secret
swift_dir = /etc/swift
dispersion_coverage = 1
retries = 5
concurrency = 25
dump_json = no

We get some errors here in the storage nodes sometimes when running populate, concurrency related perhaps.

3. Have a look at the error log in the storage nodes, we start getting messages like these:

Apr 24 19:08:04 swift-001 account-server ERROR __call__ error with PUT /d3829495-9f10-4075-a558-f99fb665cfe2/464510/AUTH_aeedb7bdcb8846599e2b6b87bb8a947f/dispersion_e39093c2f161418a93dbf981ef839325 : #012Traceback (most recent call last):#012 File "/usr/lib/python2.7/dist-packages/swift/account/server.py", line 333, in __call__#012 res = method(req)#012 File "/usr/lib/python2.7/dist-packages/swift/common/utils.py", line 1558, in wrapped#012 return func(*a, **kw)#012 File "/usr/lib/python2.7/dist-packages/swift/common/utils.py", line 520, in _timing_stats#012 resp = func(ctrl, *args, **kwargs)#012 File "/usr/lib/python2.7/dist-packages/swift/account/server.py", line 112, in PUT#012 req.headers['x-bytes-used'])#012 File "/usr/lib/python2.7/dist-packages/swift/common/db.py", line 1431, in put_container#012 raise DatabaseConnectionError(self.db_file, "DB doesn't exist")#012DatabaseConnectionError: DB connection error (/srv/node/d3829495-9f10-4075-a558-f99fb665cfe2/accounts/464510/adc/e2cfc6be58be71d5ad6111364fff0adc/e2cfc6be58be71d5ad6111364fff0adc.db, 0):#012DB doesn't exist
Apr 24 19:08:04 swift-001 account-server ERROR __call__ error with PUT /d3829495-9f10-4075-a558-f99fb665cfe2/464510/AUTH_aeedb7bdcb8846599e2b6b87bb8a947f/dispersion_4a651c8ea0384cbc8e0b206acc89c5a3 : #012Traceback (most recent call last):#012 File "/usr/lib/python2.7/dist-packages/swift/account/server.py", line 333, in __call__#012 res = method(req)#012 File "/usr/lib/python2.7/dist-packages/swift/common/utils.py", line 1558, in wrapped#012 return func(*a, **kw)#012 File "/usr/lib/python2.7/dist-packages/swift/common/utils.py", line 520, in _timing_stats#012 resp = func(ctrl, *args, **kwargs)#012 File "/usr/lib/python2.7/dist-packages/swift/account/server.py", line 112, in PUT#012 req.headers['x-bytes-used'])#012 File "/usr/lib/python2.7/dist-packages/swift/common/db.py", line 1431, in put_container#012 raise DatabaseConnectionError(self.db_file, "DB doesn't exist")#012DatabaseConnectionError: DB connection error (/srv/node/d3829495-9f10-4075-a558-f99fb665cfe2/accounts/464510/adc/e2cfc6be58be71d5ad6111364fff0adc/e2cfc6be58be71d5ad6111364fff0adc.db, 0):#012DB doesn't exist

We waited for the replicators/updaters long enough to let them do their job, but the issue is always there, with the account server periodically logging that.

Looking for the database file the account server doesn't find, reveals that the file is in another partition:

root@swift-001:/srv/node# find|grep e2cfc6be58be71d5ad6111364fff0adc.db
./07cc80bf-b033-4b31-87c5-49ae9aced24c/accounts/523880/adc/e2cfc6be58be71d5ad6111364fff0adc/e2cfc6be58be71d5ad6111364fff0adc.db
./07cc80bf-b033-4b31-87c5-49ae9aced24c/accounts/523880/adc/e2cfc6be58be71d5ad6111364fff0adc/e2cfc6be58be71d5ad6111364fff0adc.db.pending

That's pretty much all we do to consistently reproduce the issue. Removing swift_hash_path_prefix works around the issue for us. YMMV.

Test cluster related information:

# swift.conf
[swift-hash]
swift_hash_path_suffix = 0d85cf6346c7086d
swift_hash_path_prefix = 64543f1f5108d509

OpenStack Swift 1.8.0 from Ubuntu Cloud Archive (Grizzly)
root@swift-002:/srv/node# apt-cache policy python-swift
python-swift:
  Installed: 1.8.0-0ubuntu1~cloud0
  Candidate: 1.8.0-0ubuntu1~cloud0
  Version table:
 *** 1.8.0-0ubuntu1~cloud0 0
        500 http://ubuntu-cloud.archive.canonical.com/ubuntu/ precise-updates/grizzly/main amd64 Packages
        100 /var/lib/dpkg/status

Ubuntu 12.04.2 amd64
Test cluster with 2 storage nodes (more than 20GB RAM each, multiple cores, 10+ SATA disks each, running obj/cont/acct servers), 2 proxy nodes (virtualized, big enough) and one load balancer.

Feel free to ask me for any other info you may require to debug the issue.

Revision history for this message
Samuel Merritt (torgomatic) wrote :

Alright, silly questions time:

Are you running swift-dispersion-(populate|report) from 1.8.0 as well?

From which machine are you running swift-dispersion-report? One of the proxies?

Are the rings synchronized across all the machines?

Are both the prefix and the suffix in /etc/swift/swift.conf on the machine running dispersion report?

Sorry if these questions seem sort of basic; I'm just trying to make sure the setup is sane before digging into the code.

Changed in swift:
assignee: nobody → David Hadas (david-hadas)
Revision history for this message
Sergio Rubio (rubiojr) wrote :

Thank you Samuel, comments in line:

> Alright, silly questions time:
> Are you running swift-dispersion-(populate|report) from 1.8.0 as well?

As a matter of fact, the proxy node (swift-proxy-01) used to run swift-dispersion-populate was running 1.8.1+git201303190331~precise-0ubuntu1 packages while storage nodes where running released 1.8.0 packages.

Replaced 1.8.1+git201303190331 with 1.8.0 and retested and apparently the issue is GONE (sigh).

> From which machine are you running swift-dispersion-report? One of the proxies?

Yeah, one of the proxies.

> Are the rings synchronized across all the machines?

Yes.

> Are both the prefix and the suffix in /etc/swift/swift.conf on the machine running dispersion report?

Yup.

> Sorry if these questions seem sort of basic; I'm just trying to make sure the setup is sane before digging into the code.

You see, there's always the risk of a monkey typing on a keyboard and creating useless bug reports :). Oh well.

I'm sorry for the mess. I will spend some more time re-testing and mark the bug as invalid if that's the case.

Thanks!

Revision history for this message
Sergio Rubio (rubiojr) wrote :

The commit from David Hadas merging swift_hash_path_prefix support is from "Wed Mar 20 01:35:41 2013 +0200" and I was running 1.8.1+git201303190331 in the proxy nodes so that may explain why removing swift_hash_path_prefix fixed the issue.

Changed in swift:
status: New → Invalid
Revision history for this message
Anil Bhargava (anilbhargava777) wrote :

Hello all,

I am also facing the same problem, I just added a new storage node as a new zone and did rebalance. Same error is appearing in my syslog as folows:

Jul 4 14:28:57 swift-node account-server ERROR __call__ error with PUT /sdb1/91380/AUTH_365bc339-7f05-45f3-9854-0fb804cb50ed/don_container1_segments : #012Traceback (most recent call last):#012 File "/usr/lib/python2.7/dist-packages/swift/account/server.py", line 317, in __call__#012 res = getattr(self, req.method)(req)#012 File "/usr/lib/python2.7/dist-packages/swift/account/server.py", line 105, in PUT#012 req.headers['x-bytes-used'])#012 File "/usr/lib/python2.7/dist-packages/swift/common/db.py", line 1446, in put_container#012 raise DatabaseConnectionError(self.db_file, "DB doesn't exist")#012DatabaseConnectionError: DB connection error (/srv/1/node/sdb1/accounts/91380/a31/1ff74362319d350c0921907021125a31/1ff74362319d350c0921907021125a31.db, 0):#012DB doesn't exist
Jul 4 14:28:57 swift-node account-server ERROR __call__ error with PUT /sdb1/91380/AUTH_365bc339-7f05-45f3-9854-0fb804cb50ed/Saluja_container1 : #012Traceback (most recent call last):#012 File "/usr/lib/python2.7/dist-packages/swift/account/server.py", line 317, in __call__#012 res = getattr(self, req.method)(req)#012 File "/usr/lib/python2.7/dist-packages/swift/account/server.py", line 105, in PUT#012 req.headers['x-bytes-used'])#012 File "/usr/lib/python2.7/dist-packages/swift/common/db.py", line 1446, in put_container#012 raise DatabaseConnectionError(self.db_file, "DB doesn't exist")#012DatabaseConnectionError: DB connection error (/srv/1/node/sdb1/accounts/91380/a31/1ff74362319d350c0921907021125a31/1ff74362319d350c0921907021125a31.db, 0):#012DB doesn't exist
Jul 4 14:28:57 swift-node account-server ERROR __call__ error with PUT /sdb1/27481/AUTH_.auth/.token_6 : #012Traceback (most recent call last):#012 File "/usr/lib/python2.7/dist-packages/swift/account/server.py", line 317, in __call__#012 res = getattr(self, req.method)(req)#012 File "/usr/lib/python2.7/dist-packages/swift/account/server.py", line 105, in PUT#012 req.headers['x-bytes-used'])#012 File "/usr/lib/python2.7/dist-packages/swift/common/db.py", line 1446, in put_container#012 raise DatabaseConnectionError(self.db_file, "DB doesn't exist")#012DatabaseConnectionError: DB connection error (/srv/1/node/sdb1/accounts/27481/528/5bbc8ec47ce5a77183c1f6f33666d528/5bbc8ec47ce5a77183c1f6f33666d528.db, 0):#012DB doesn't exist

How can i recover my data back.

Revision history for this message
Andre' Hazelwood (ahazelwood) wrote :
Download full text (4.4 KiB)

We're seeing similar problems with adding a 3rd node to an existing 2 node cluster. All versions of swift are the same on the other servers. On the new server we are getting:

Aug 23 14:44:15 sdev3 account-server ERROR __call__ error with REPLICATE /sdb/210790/cdd9a7cf77f32ca0a1c0e7cafb96be55 : #012Traceback (most recent call last):#012 File "/usr/lib/python2.7/dist-packages/swift/account/server.py", line 333, in __call__#012 res = method(req)#012 File "/usr/lib/python2.7/dist-packages/swift/common/utils.py", line 1558, in wrapped#012 return func(*a, **kw)#012 File "/usr/lib/python2.7/dist-packages/swift/common/utils.py", line 520, in _timing_stats#012 resp = func(ctrl, *args, **kwargs)#012 File "/usr/lib/python2.7/dist-packages/swift/account/server.py", line 285, in REPLICATE#012 ret = self.replicator_rpc.dispatch(post_args, args)#012 File "/usr/lib/python2.7/dist-packages/swift/common/db_replicator.py", line 573, in dispatch#012 return self.complete_rsync(drive, db_file, args)#012 File "/usr/lib/python2.7/dist-packages/swift/common/db_replicator.py", line 647, in complete_rsync#012 broker.newid(args[0])#012 File "/usr/lib/python2.7/dist-packages/swift/common/db.py", line 371, in newid#012 conn.commit()#012 File "/usr/lib/python2.7/contextlib.py", line 35, in __exit__#012 self.gen.throw(type, value, traceback)#012 File "/usr/lib/python2.7/dist-packages/swift/common/db.py", line 321, in get#012 self.possibly_quarantine(*sys.exc_info())#012 File "/usr/lib/python2.7/dist-packages/swift/common/db.py", line 313, in get#012 yield conn#012 File "/usr/lib/python2.7/dist-packages/swift/common/db.py", line 361, in newid#012 ''' % self.db_type, (str(uuid4()),))#012 File "/usr/lib/python2.7/dist-packages/swift/common/db.py", line 85, in execute#012 return self._timeout(lambda: sqlite3.Connection.execute(#012 File "/usr/lib/python2.7/dist-packages/swift/common/db.py", line 78, in _timeout#012 return call()#012 File "/usr/lib/python2.7/dist-packages/swift/common/db.py", line 86, in <lambda>#012 self, *args, **kwargs))#012OperationalError: no such table: account_stat

We also tried to comment out the hash prefix with no effect. All files under /etc/swift/ have been copied to all 3 servers where account, container and object servers are running.

Any additional thoughts?

Thanks.

On new server:
root@sdev3:/srv/node/sdb# apt-cache policy python-swift
python-swift:
  Installed: 1.8.0-0ubuntu1.2~cloud0
  Candidate: 1.8.0-0ubuntu1.2~cloud0
  Version table:
 *** 1.8.0-0ubuntu1.2~cloud0 0
        500 http://ubuntu-cloud.archive.canonical.com/ubuntu/ precise-updates/grizzly/main amd64 Packages
        100 /var/lib/dpkg/status
     1.4.8-0ubuntu2.2 0
        500 http://us.archive.ubuntu.com/ubuntu/ precise-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu/ precise-security/main amd64 Packages
     1.4.8-0ubuntu2 0
        500 http://us.archive.ubuntu.com/ubuntu/ precise/main amd64 Packages
     1.4.6-0ubuntu0~ppa1~precise1 0
        500 http://ppa.launchpad.net/swift-core/release/ubuntu/ precise/main amd64 Packages

On existing servers:
root@sdev1:/srv/node/sdc# apt-cache policy...

Read more...

Revision history for this message
clayg (clay-gerrard) wrote :

So the "OperationalError: no such table: account_stat" is definitely different than the "DB doesn't exist" error.

In both of the two previous reports on this invalid bug the operator was suffering from a change in the hash function of swift that happened between 1.7 and 1.8 which allowed new clusters to optionally configure a hash_prefix. The DB doesn't exist errors came from either running new code with a hash_prefix on data that was originally PUT without having a hash_prefix defined, or running old code against data PUT with a hash_prefix defined.

If you PUT data with old or new code without a hash_prefix, you must not add a hash prefix to swift.conf
If you PUT data with new code and a hash_prefix then, then you may not run old code to access that data.

You error does not seem related to either of this scenarios.

Can you do some more investigation (not in *this* invalid bug report). Please open a question on launchpad or ask, or send an email to the mailing list, or jump on Freenode #openstack-swift.

Can you identify the account database that does not have an account_stat table? It seems to have happened during a 'complete_rsync' when the new temporary copy is being re-id'd - maybe it was a transmission failure, or maybe it's a bug in in db_replicator. Do you see any other errors? Was it only the "/sdb/210790/cdd9a7cf77f32ca0a1c0e7cafb96be55" database? Any errors on the old nodes?

Revision history for this message
Andre' Hazelwood (ahazelwood) wrote :

Actually it turned out that my ring definition was wrong (wrong TCP Ports defined). It might be helpful to have a check, in all of the servers to verify that they are talking to the right type of server. Sorry for the confusion.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.