Newlines in Sources.bz2 indices

Bug #436182 reported by Loïc Minier
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
High
Michael Nelson

Bug Description

Hi

This looks similar to bug #435316, but I'm not sure it's the same.

In http://ppa.launchpad.net/moblin/ppa/ubuntu/dists/karmic/main/source/Sources.bz2 I get:
[...]
Package: openoffice.org
Binary: openoffice.org, broffice.org, openoffice.org-l10n-za, openoffice.org-l10n-in, openoffice.org-core, openoffice.org-common, openoffice.org-java-common, openoffice.org-writer, openoffice.org-calc, openoffice.org-impress, openoffice.org-draw, openoffice.org-math, openoffice.org-base-core, openoffice.org-base, openoffice.org-style-crystal, openoffice.org-style-oxygen, openoffice.org-style-industrial, openoffice.org-style-tango, openoffice.org-style-human, openoffice.org-style-hicontrast, openoffice.org-style-galaxy, openoffice.org-style-andromeda, openoffice.org-gtk, openoffice.org-gnome, openoffice.org-evolution, openoffice.org-emailmerge, python-uno, openoffice.org-officebean, openoffice.org-filter-binfilter, openoffice.org-filter-mobiledev, libmythes-dev, openoffice.org-dtd-officedocument1.0, uno-libs3, uno-libs3-dbg, ure, ure-dbg, openoffice.org-gcj, cli-uno-bridge, libuno-cli-basetypes1.0-cil, libuno-cli-uretypes1.0-cil, libuno-cli-oootypes1.0-cil, libuno-cli-cppuhelper1.0-cil,
libuno-cli-ure1.0-cil, mozilla-openoffice.org, openoffice.org-ogltrans, openoffice.org-wiki-publisher, openoffice.org-report-builder, openoffice.org-report-builder-bin, openoffice.org-presentation-minimizer, openoffice.org-presenter-console, openoffice.org-pdfimport, ttf-opensymbol, openoffice.org-dev, openoffice.org-dev-doc, openoffice.org-kde, openoffice.org-kab, openoffice.org-sdbc-postgresql

Version: 1:3.1.1-2ubuntu1moblin2
[...]

there's a superfluous newline after libuno-cli-cppuhelper1.0-cil and after openoffice.org-sdbc-postgresql.

[ This is breaking apt_pkg's parsing of this file which breaks germinate for the Ubuntu Moblin Remix image so it's quite high on my radar right now. ;-) ]

Related branches

Revision history for this message
Loïc Minier (lool) wrote :

Downloading http://ppa.launchpad.net/moblin/ppa/ubuntu/dists/karmic/main/source/Sources.bz2 file ...
Decompressing http://ppa.launchpad.net/moblin/ppa/ubuntu/dists/karmic/main/source/Sources.bz2 file ...
Traceback (most recent call last):
  File "/usr/bin/germinate-update-metapackage", line 273, in <module>
    germinator, [dist], components, architecture, cleanup=True)
  File "/usr/lib/germinate/Germinate/Archive/tagfile.py", line 121, in feed
    "source/Sources"))
  File "/usr/lib/germinate/Germinate/germinator.py", line 323, in parseSources
    ver = p.Section["Version"]
KeyError: 'Version'

Revision history for this message
Celso Providelo (cprov) wrote :

It seems we have accepted some DSC with this format mistake:

{{{
launchpad_prod=> SELECT spn.name as name, spr.version as version, o.name as owner, a.name as archive from sourcepackagerelease spr, sourcepackagename spn, archive a, person o where spr.upload_archive=a.id AND a.owner=o.id AND spr.sourcepackagename=spn.id AND dsc_binaries LIKE '%
%';
      name | version | owner | archive
----------------+-------------------------+----------------+---------
 openoffice.org | 1:3.1.1-2ubuntu1 | ubuntu-drivers | primary
 linux | 2.6.31-10.36~undervolt1 | raaa | testing
 mono | 2.4.2.3+dfsg-2 | ubuntu-drivers | primary
 linux | 2.6.31-10.36~undervolt2 | raaa | testing
 openoffice.org | 1:3.1.1-2ubuntu1moblin1 | moblin | ppa
 linux | 2.6.31-10.35~awe1 | awe | ppa
 linux | 2.6.31-10.36~eee1 | yofel | ppa
 linux | 2.6.31-10.36~undervolt3 | raaa | testing
 linux | 2.6.31-10.36~eee2 | yofel | ppa
 linux | 2.6.31-10.36~eee3 | yofel | ppa
 linux | 2.6.31-10.36~undervolt4 | raaa | testing
 openoffice.org | 1:3.1.1-2ubuntu1moblin2 | moblin | ppa
(12 rows)

Revision history for this message
Celso Providelo (cprov) wrote :

This is somehow related with bug #435316, we have to find the right way to identify this format problem (I'm assuming it is a problem) and reject those uploads.

Since there are not *many* yet, the migration of the existing that will be easy.

Changed in soyuz:
importance: Undecided → High
milestone: none → 3.1.10
status: New → Triaged
Revision history for this message
Colin Watson (cjwatson) wrote :

From the Debian (and Ubuntu) Policy Manual:

5.6.19. `Binary'
----------------

     This field is a list of binary packages. Its syntax and meaning
     varies depending on the control file in which it appears.

     When it appears in the `.dsc' file, it lists binary packages which a
     source package can produce, separated by commas[1]. It may span
     multiple lines.

tags: added: soyuz-upload
Revision history for this message
Joey Stanford (joey) wrote :

This is preventing OEM Services from running germinate for some Intel work we're doing. Very important this gets fixes ASAP.

tags: added: oem-services
Revision history for this message
Michael Nelson (michael.nelson) wrote :

Just trying to reproduce this locally - with a pristine sources.txt in the same directory in a python console:

{{{
f = open('sources.txt')
import apt_pkg
p = apt_pkg.ParseTagFile(f)
while p.Step() == 1:
   ver = p.Section['Package']
   src = p.Section['Version']

f.close()
}}}

Add *two* consecutive new-lines within the Binary: section as you like and save, then:

{{{
f = open('sources.txt')
p = apt_pkg.ParseTagFile(f)
while p.Step() == 1:
    src = p.Section['Package']
    ver = p.Section['Version']

}}}

The two consecutive new-lines results in:
Traceback (most recent call last):
  File "<console>", line 3, in ?
KeyError: 'Version'

But using just one new-line (as described in the bug) seems to work fine. This is with apt_pkg.Version == '0.7.20.2ubuntu6'.

I'll try to find out which version of apt-pkg is being used and verify the problem.

Revision history for this message
Michael Nelson (michael.nelson) wrote :

OK, exactly the same (two new-lines required) to replicate this on dogfood which has:

$ dpkg-query -W apt python-apt
apt 0.7.9ubuntu17.5~0.CAT.8.04
python-apt 0.7.4ubuntu7.5

which, after checking with cjwatson is the same as the production publisher.

Revision history for this message
Michael Nelson (michael.nelson) wrote :

So it seems when the original dsc contains a single \n, we're publishing a Sources.bz2 with a *blank* line, which is incorrect and apt_pkg correctly fails to parse it.

OK, now for the fix...

Changed in soyuz:
assignee: nobody → Michael Nelson (michael.nelson)
Revision history for this message
Michael Nelson (michael.nelson) wrote :

The actual issue is in lp.archiveuploader.parse_tagfile_lines - it has introduce a limited number of corrupted values for SourcePackageRelease.dsc_binaries with a trailing '\n'.

I've created and tested an intermediate work-around that rstrips the dsc_binaries when creating the index file, which will give us a bit more time to solve the original issue properly.

Revision history for this message
Diogo Matsubara (matsubara) wrote : Bug fixed by a commit
Changed in soyuz:
status: Triaged → Fix Committed
Changed in soyuz:
status: Fix Committed → Triaged
Revision history for this message
Michael Nelson (michael.nelson) wrote :
Download full text (4.8 KiB)

I'm setting the status back to Triaged, as what we've committed is a work-around - not a fix. It's a work-around that we can cherry-pick if necessary, and it will ensure that there are no trailing '\n' at end of fields in the Sources index file, so that the file can be correctly parsed by apt_pkg.ParseTagFile().

The work-around does not ensure that the stray trailing '\n' will not get into the SPR.dsc_binaries field in the first place, nor does it fix the corrupt data. Hence leaving this bug open.

I've not verified it yet, but we're 90% certain that when there are valid single '\n' chars within a field's values (as in the snippet in the bug description above), the lp.archiveuploader.parse_tagfile_lines is appending a trailing '\n' to the end of the value (also shown in the snippet above) - and it is this trailing '\n' that is invalid and causing the problem (If/when I have read access, we can simply run a slightly modified version of Celso's above query with the condition '%\n' to verify that these SPRs all have an invalid trailing '\n'.

I can't see any reason why we are not simply using apt_pkg.ParseTagFile() instead of our own parse_tagfile_lines() method, so I'd be keen to replace it if there is no reason not to. Celso's thought was that, as we are seeing more and more debian packages with the valid '\n' coming through, this issue could snow-ball, meaning more corrupt data for which we are relying on the work-around.

As another example, currently the Sources in the following ppa (identified by Celso's query above) displays the problem also:
http://ppa.launchpad.net/awe/ppa/ubuntu/dists/karmic/main/source/Sources
{{{
Package: linux
Binary: linux-source-2.6.31, linux-doc, linux-headers-2.6.31-10, linux-libc-dev, linux-image-2.6.31-10-386, linux-headers-2.6.31-10-386, linux-image-debug-2.6.31-10-386, linux-image-2.6.31-10-generic, linux-headers-2.6.31-10-generic, linux-image-debug-2.6.31-10-generic, linux-image-2.6.31-10-generic-pae, linux-headers-2.6.31-10-generic-pae, linux-image-debug-2.6.31-10-generic-pae, linux-image-2.6.31-10-ia64, linux-headers-2.6.31-10-ia64, linux-image-debug-2.6.31-10-ia64, linux-image-2.6.31-10-lpia, linux-headers-2.6.31-10-lpia, linux-image-debug-2.6.31-10-lpia, linux-image-2.6.31-10-powerpc, linux-headers-2.6.31-10-powerpc, linux-image-debug-2.6.31-10-powerpc, linux-image-2.6.31-10-powerpc64-smp, linux-headers-2.6.31-10-powerpc64-smp, linux-image-debug-2.6.31-10-powerpc64-smp, linux-image-2.6.31-10-powerpc-smp, linux-headers-2.6.31-10-powerpc-smp, linux-image-debug-2.6.31-10-powerpc-smp, linux-image-2.6.31-10-server, linux-headers-2.6.31-10-server, linux-image-debug-2.6.31-10-server,
linux-image-2.6.31-10-sparc64, linux-headers-2.6.31-10-sparc64, linux-image-debug-2.6.31-10-sparc64, linux-image-2.6.31-10-sparc64-smp, linux-headers-2.6.31-10-sparc64-smp, linux-image-debug-2.6.31-10-sparc64-smp, linux-image-2.6.31-10-virtual, kernel-image-2.6.31-10-generic-di, nic-modules-2.6.31-10-generic-di, nic-shared-modules-2.6.31-10-generic-di, serial-modules-2.6.31-10-generic-di, ppp-modules-2.6.31-10-generic-di, firewire-core-modules-2.6.31-10-generic-di, scsi-modules-2.6.31-10-generic-di, plip-modules-2.6.31-10-generic...

Read more...

Changed in soyuz:
status: Triaged → In Progress
Curtis Hovey (sinzui)
tags: added: tech-debt
Revision history for this message
Diogo Matsubara (matsubara) wrote :
Changed in soyuz:
status: In Progress → Fix Committed
Changed in soyuz:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.