librarian files still available after being deleted through the web ui

Bug #397188 reported by Diogo Matsubara
268
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
High
Abel Deuring

Bug Description

Today Brian Murray asked about some files that were deleted through the web ui but were still available through the librarian URL. <https://pastebin.canonical.com/19505/>

As a workaround I've asked herb to mark those files as deleted in the LibraryFileContent table using this query <https://pastebin.canonical.com/19511/>

Investigation is needed to find out why those files being deleted through the web ui weren't marked as deleted, so the librarian-gc.py could do its job.

This could be related to bug 88862 but Francis doesn't think so.

Revision history for this message
Brian Murray (brian-murray) wrote :

I was looking at the Ubuntu bugs mailing list trying to find out how many bugs are affected by bug 373351 when I discovered this. Looking at the mbox for July the earliest bug I could find with an attachment still available was bug 394690. The attachment was removed on July 2nd but I was still able to download it today. I checked the mbox for June but didn't find any attachments available for download.

I was under the impression that attachments were cleaned up every 24 hours. If that isn't the case this is rather problematic for apport crashes.

Revision history for this message
Brian Murray (brian-murray) wrote :

The list I posted on pastebin was not comprehensive and I imagine there are more that are available when they should not be.

Deryck Hodge (deryck)
Changed in malone:
status: New → Triaged
Revision history for this message
Kees Cook (kees) wrote :

Is there any ETA on this issue? I'd like to get it on the work-list for launchpad, if there is room.

Deryck Hodge (deryck)
Changed in malone:
milestone: none → 3.1.10
Kees Cook (kees)
tags: added: platform-blocker
Abel Deuring (adeuring)
Changed in malone:
assignee: nobody → Abel Deuring (adeuring)
Revision history for this message
Stuart Bishop (stub) wrote :

This is an original design feature of the Librarian. If there are any unexpired URLs referencing the same file on disk, all URLs continue to work. Also, even if all the URLs do expire and the physical file removed from disk, reuploading the file again will allow all the expired URLs to start working again.

If we need to remove security sensitive information, we need to actually remove it rather than just flagging data as expired. This currently involves removing the LibraryFileAlias record if we need to stop a single URL referencing data to stop working, or the LibraryFileContent and all LibraryFileAliases (and deal with all the fallout) if we need to remove all traces from our system for legal reasons.

We certainly don't want removal of the LibraryFileContent being the default.

Revision history for this message
Abel Deuring (adeuring) wrote :

This design feature obviously leads to problems for security related bug attachments ;)

Seems that we should

- describe more clearly that simply removing a bug attachment does not automatically delete the Librarian file

- add an option to reliably delete the librarian file of a bug attachment. Such an option should probably only available to people with sufficient privileges, like a project owner or a driver.

A form allowing the latter should display something like:

  This file is also referenced as
  * a bug attachment of bug 1234
  * a logo for projext X
  * ....
  are you sure that you want to completely delete this file?

(OK, having a librarian file both as a bug attachment and as a logo is not very likely, but Launchpad references LFAs from 43 different tables; I can't come up quickly with a better example...)

Revision history for this message
Stuart Bishop (stub) wrote : Re: [Bug 397188] Re: librarian files still available after being deleted through the web ui

On Wed, Nov 4, 2009 at 12:31 AM, Abel Deuring
<email address hidden> wrote:
> This design feature obviously leads to problems for security related bug
> attachments ;)

Yes. We can also reconsider these features. The file resurrection
feature can go IMO - it isn't useful except to surprise users. The
ability to continue to serve expired files if there are other active
links in the database is also of dubious value. Dropping the
LibraryFileContent.deleted and making LibraryFileAlias.content
nullable might be a better approach and would not have major impact on
the codebase. We would lose history of the hashes and filesize of an
expired or deleted file, but I don't think anyone would care. I guess
Bjorn is the person to discuss this with.

--
Stuart Bishop <email address hidden>
http://www.stuartbishop.net/

Deryck Hodge (deryck)
Changed in malone:
milestone: 3.1.10 → 3.1.11
Revision history for this message
Christian Reis (kiko) wrote :

So I don't quite understand -- if the attachment is marked as deleted, then what is the remaining link to the LFA (or LFC?) that would cause the attachment not to be garbage collected? Who else refers to it?

Revision history for this message
Abel Deuring (adeuring) wrote :

I see two reasons why LFAs are not deleted:

1. We have two LFAs, LFA_1, and LFA_2, referencing the same LFC. Both belonged to (different) BugAttachments, and one of the BugAttachment has been deleted. The two LFAs may have been created by a user going inadvertently two times trough filing a buf report with Apport data.

2. sorry, I noticed this only a few minutes ago, though the data is lying around since yesterday (https://pastebin.canonical.com/24784/plain/ ): We have a table LibrayFileDownloadCount which references LFA... Note that there are probably many more supposedly deleted LFAs referenced by libraryfiledownloadcount than the four results listed in the paste. These are just four LFAs having the filename CoreDump.gz which happen to also match the former case. We have probably many more "privacy affecting" LFAs which are supposed to be deleted but stay around because libraryfiledownloadcount.libraryfilealias references them :( Time for a CP for the Librarian GC, I guess...)

Revision history for this message
Abel Deuring (adeuring) wrote :

Two more remarks:

1. I'm currently working on lp:~adeuring/launchpad/bug387188-db-schema. The idea is that we should be able to explicitly mark LFAs as "deleted"; if somebody tries to access such a "deleted" lbrarian file, the server will return a 404.

2. bigjools reminded us on the Canonical LP mailing list that we have a restricted librarian. We should indeed store the data uploaded via Apport there, instead of the public Librarian... And we should perhaps add an option for uploading a bug attachment via the web UI allowing people can say "this should go into the restricted librarian".

Revision history for this message
Abel Deuring (adeuring) wrote :

We'll land a fix for this bug during the next roll out on Wednesday (Dec 3).

One issue remains: We'd like to remove any remaining privacy related files that should be deleted but aren't before we merge the related branch, i.e., we'll remove files named "CoreDump.gz" that are not referenced by any tables of the LP database.

Brian, are there any other filenames that indicate privacy related content and that we should include in this procedure?

Revision history for this message
Brian Murray (brian-murray) wrote :

Additionally, the filenames "Stacktrace.txt" and "ThreadStacktrace.txt" should be included.

Revision history for this message
Diogo Matsubara (matsubara) wrote : Bug fixed by a commit
Changed in malone:
status: Triaged → Fix Committed
Abel Deuring (adeuring)
Changed in malone:
status: Fix Committed → Fix Released
William Grant (wgrant)
visibility: private → public
To post a comment you must log in.
This report contains Public Security information  
Everyone can see this security related information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.