concurrent db access are not handled

Bug #724893 reported by Vincent Ladeuil on 2011-02-25
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Distributed Development
High
Unassigned

Bug Description

Now that the importer/driver can propagate exceptions across threads, such concurrent accesses also produce tracebacks which should help diagnose and fix them.

Some relevant urls worth reading:

 http://www.sqlite.org/lockingv3.html
 http://www.sqlite.org/wal.html

I'll post more tracebacks when I'm done analysing the logs.

Vincent Ladeuil (vila) wrote :

2011-02-24 21:07:13,905 - __main__ - WARNING - Importing pulseaudio failed:
Traceback (most recent call last):
  File "/srv/package-import.canonical.com/new/scripts/import_package.py", line 1095, in <module>
    persistent_download_cache=options.persistent_download_cache))
  File "/srv/package-import.canonical.com/new/scripts/import_package.py", line 983, in main
    revid_db.discard_last_run()
  File "/srv/package-import.canonical.com/new/scripts/icommon.py", line 1131, in discard_last_run
    c.execute(self.DELETE_WORKING, (self.package,))
sqlite3.OperationalError: database is locked

Vincent Ladeuil (vila) wrote :

2011-02-24 21:21:23,965 - __main__ - INFO - Driver failed with exception:
Traceback (most recent call last):
  File "/srv/package-import.canonical.com/new/scripts/cethread.py", line 129, in run
    super(CatchingExceptionThread, self).run()
  File "/usr/lib/python2.5/threading.py", line 446, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/srv/package-import.canonical.com/new/scripts/icommon.py", line 2070, in drive
    self.do_one_step()
  File "/srv/package-import.canonical.com/new/scripts/icommon.py", line 2091, in do_one_step
    self.collect_terminated_threads()
  File "/srv/package-import.canonical.com/new/scripts/mass_import.py", line 229, in collect_terminated_threads
    super(ImportDriver, self).collect_terminated_threads()
  File "/srv/package-import.canonical.com/new/scripts/icommon.py", line 2119, in collect_terminated_threads
    t.collect()
  File "/srv/package-import.canonical.com/new/scripts/mass_import.py", line 153, in collect
    unicode_output.encode("utf-8", "replace"))
  File "/srv/package-import.canonical.com/new/scripts/icommon.py", line 543, in finish_job
    (row[1], 0, row[3], row[4], row[5], now, row[0]))
OperationalError: database is locked

Vincent Ladeuil (vila) wrote :

No more tracebacks of this kind so far.

See also bug #724898 which is about catching (but stilll reporting) this kind of exception.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2/25/2011 5:11 AM, Vincent Ladeuil wrote:
> Public bug reported:
>
> Now that the importer/driver can propagate exceptions across threads,
> such concurrent accesses also produce tracebacks which should help
> diagnose and fix them.
>
> Some relevant urls worth reading:
>
> http://www.sqlite.org/lockingv3.html
> http://www.sqlite.org/wal.html
>

Wal is only available for sqlite 3.7 and later. I'm pretty sure the
standard sqlite that is available on Lucid (and probably Maverick and
maybe even Natty) is 3.6.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk1nxKwACgkQJdeBCYSNAAOlPACeOxLNvuNHhtrvS47/vlPFesDD
VQcAnAv6m7Xl9/Y4BlZBDei8sfB3F6eS
=VJqe
-----END PGP SIGNATURE-----

Vincent Ladeuil (vila) wrote :

>>>>> John A Meinel <email address hidden> writes:

    > On 2/25/2011 5:11 AM, Vincent Ladeuil wrote:
    >> Public bug reported:
    >>
    >> Now that the importer/driver can propagate exceptions across threads,
    >> such concurrent accesses also produce tracebacks which should help
    >> diagnose and fix them.
    >>
    >> Some relevant urls worth reading:
    >>
    >> http://www.sqlite.org/lockingv3.html
    >> http://www.sqlite.org/wal.html
    >>

    > Wal is only available for sqlite 3.7 and later. I'm pretty sure the
    > standard sqlite that is available on Lucid (and probably Maverick and
    > maybe even Natty) is 3.6.

maverick provides 3.7.2 but lucid indeed provides 3.6.22 :-/

Vincent Ladeuil (vila) wrote :

If we can't use an sqlite solution we should resort to a file-based one as we do for most of the scripts.

Since the transactions are short we should retry a small number of times after a small period.

Finding the right values for "small" above is left as an exercise for the dev fixing the bug, educated guesses welcome ;)

Changed in udd:
assignee: nobody → canonical-bazaar (canonical-bazaar)
importance: Undecided → Critical
status: New → Confirmed
John A Meinel (jameinel) wrote :

Is this really meant to be critical? It seems we have been running like this for a while, and it doesn't seem to be causing critical-esqe failures.

Vincent Ladeuil (vila) wrote :

Try raising the concurrency and this will blow up quite quickly.

That's one way to trigger it but there surely are others,

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 5/11/2011 9:17 AM, Vincent Ladeuil wrote:
> Try raising the concurrency and this will blow up quite quickly.
>
> That's one way to trigger it but there surely are others,
>

I'm not saying it isn't *High* priority, but it certainly doesn't seem
like something that jumps the queue...

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk3KaUsACgkQJdeBCYSNAAOOpwCfbOSyzjt/RRkYCsbX+i8cKIpu
9TIAn0x/jMFRplslhrsFF3OZaHpxfnz0
=uqE4
-----END PGP SIGNATURE-----

Vincent Ladeuil (vila) wrote :

Well, it didn't jump, it started as Critical as soon as I diagnosed it.

It can very well start triggering without notice just because of some other apparently unrelated code change, not to mention hardware or workflow policy or number of packages or another debian release :)

And if it starts triggering, it will have to be bumped to critical, so better avoid the panic and fix it first.

John A Meinel (jameinel) wrote :

Critical => a bug we should do before we work on any high bugs. Hence "jumping the Queue" and placing this as more important than any other work we are doing.
Same thing for the "package importer uses james_w credentials" .If neither one is *actually* jumping the Queue and getting done before all the other work, then we shouldn't lie about it and say that it should.

Martin Pool (mbp) wrote :

This is worth fixing, but as far as I know:
 * when it fails, it fails safely (no corruption, just the import fails and can potentially be retried)
 * it's not actually triggering often
 * it's not utterly trivial to fix (needs some consideration of what db we want to use, or whether we want to work around sqlite limits)
 * we can deal with it when we hit it

So, downgrading.

Changed in udd:
importance: Critical → Medium
Vincent Ladeuil (vila) on 2012-06-15
Changed in udd:
assignee: canonical-bazaar (canonical-bazaar) → nobody
importance: Medium → High
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers