Newlines in Sources.bz2 indices

Bug #436182 reported by Loïc Minier on 2009-09-24
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Michael Nelson

Bug Description


This looks similar to bug #435316, but I'm not sure it's the same.

In I get:
Binary:,,,,,,,,,,,,,,,,,,,,,,,,,, python-uno,,,, libmythes-dev,, uno-libs3, uno-libs3-dbg, ure, ure-dbg,, cli-uno-bridge, libuno-cli-basetypes1.0-cil, libuno-cli-uretypes1.0-cil, libuno-cli-oootypes1.0-cil, libuno-cli-cppuhelper1.0-cil,
libuno-cli-ure1.0-cil,,,,,,,,, ttf-opensymbol,,,,,

Version: 1:3.1.1-2ubuntu1moblin2

there's a superfluous newline after libuno-cli-cppuhelper1.0-cil and after

[ This is breaking apt_pkg's parsing of this file which breaks germinate for the Ubuntu Moblin Remix image so it's quite high on my radar right now. ;-) ]

Related branches

Loïc Minier (lool) wrote :

Downloading file ...
Decompressing file ...
Traceback (most recent call last):
  File "/usr/bin/germinate-update-metapackage", line 273, in <module>
    germinator, [dist], components, architecture, cleanup=True)
  File "/usr/lib/germinate/Germinate/Archive/", line 121, in feed
  File "/usr/lib/germinate/Germinate/", line 323, in parseSources
    ver = p.Section["Version"]
KeyError: 'Version'

Celso Providelo (cprov) wrote :

It seems we have accepted some DSC with this format mistake:

launchpad_prod=> SELECT as name, spr.version as version, as owner, as archive from sourcepackagerelease spr, sourcepackagename spn, archive a, person o where AND AND AND dsc_binaries LIKE '%
      name | version | owner | archive
----------------+-------------------------+----------------+--------- | 1:3.1.1-2ubuntu1 | ubuntu-drivers | primary
 linux | 2.6.31-10.36~undervolt1 | raaa | testing
 mono | | ubuntu-drivers | primary
 linux | 2.6.31-10.36~undervolt2 | raaa | testing | 1:3.1.1-2ubuntu1moblin1 | moblin | ppa
 linux | 2.6.31-10.35~awe1 | awe | ppa
 linux | 2.6.31-10.36~eee1 | yofel | ppa
 linux | 2.6.31-10.36~undervolt3 | raaa | testing
 linux | 2.6.31-10.36~eee2 | yofel | ppa
 linux | 2.6.31-10.36~eee3 | yofel | ppa
 linux | 2.6.31-10.36~undervolt4 | raaa | testing | 1:3.1.1-2ubuntu1moblin2 | moblin | ppa
(12 rows)

Celso Providelo (cprov) wrote :

This is somehow related with bug #435316, we have to find the right way to identify this format problem (I'm assuming it is a problem) and reject those uploads.

Since there are not *many* yet, the migration of the existing that will be easy.

Changed in soyuz:
importance: Undecided → High
milestone: none → 3.1.10
status: New → Triaged
Colin Watson (cjwatson) wrote :

From the Debian (and Ubuntu) Policy Manual:

5.6.19. `Binary'

     This field is a list of binary packages. Its syntax and meaning
     varies depending on the control file in which it appears.

     When it appears in the `.dsc' file, it lists binary packages which a
     source package can produce, separated by commas[1]. It may span
     multiple lines.

tags: added: soyuz-upload
Joey Stanford (joey) wrote :

This is preventing OEM Services from running germinate for some Intel work we're doing. Very important this gets fixes ASAP.

tags: added: oem-services
Michael Nelson (michael.nelson) wrote :

Just trying to reproduce this locally - with a pristine sources.txt in the same directory in a python console:

f = open('sources.txt')
import apt_pkg
p = apt_pkg.ParseTagFile(f)
while p.Step() == 1:
   ver = p.Section['Package']
   src = p.Section['Version']


Add *two* consecutive new-lines within the Binary: section as you like and save, then:

f = open('sources.txt')
p = apt_pkg.ParseTagFile(f)
while p.Step() == 1:
    src = p.Section['Package']
    ver = p.Section['Version']


The two consecutive new-lines results in:
Traceback (most recent call last):
  File "<console>", line 3, in ?
KeyError: 'Version'

But using just one new-line (as described in the bug) seems to work fine. This is with apt_pkg.Version == ''.

I'll try to find out which version of apt-pkg is being used and verify the problem.

Michael Nelson (michael.nelson) wrote :

OK, exactly the same (two new-lines required) to replicate this on dogfood which has:

$ dpkg-query -W apt python-apt
apt 0.7.9ubuntu17.5~0.CAT.8.04
python-apt 0.7.4ubuntu7.5

which, after checking with cjwatson is the same as the production publisher.

Michael Nelson (michael.nelson) wrote :

So it seems when the original dsc contains a single \n, we're publishing a Sources.bz2 with a *blank* line, which is incorrect and apt_pkg correctly fails to parse it.

OK, now for the fix...

Changed in soyuz:
assignee: nobody → Michael Nelson (michael.nelson)
Michael Nelson (michael.nelson) wrote :

The actual issue is in lp.archiveuploader.parse_tagfile_lines - it has introduce a limited number of corrupted values for SourcePackageRelease.dsc_binaries with a trailing '\n'.

I've created and tested an intermediate work-around that rstrips the dsc_binaries when creating the index file, which will give us a bit more time to solve the original issue properly.

Changed in soyuz:
status: Triaged → Fix Committed
Changed in soyuz:
status: Fix Committed → Triaged
Download full text (4.8 KiB)

I'm setting the status back to Triaged, as what we've committed is a work-around - not a fix. It's a work-around that we can cherry-pick if necessary, and it will ensure that there are no trailing '\n' at end of fields in the Sources index file, so that the file can be correctly parsed by apt_pkg.ParseTagFile().

The work-around does not ensure that the stray trailing '\n' will not get into the SPR.dsc_binaries field in the first place, nor does it fix the corrupt data. Hence leaving this bug open.

I've not verified it yet, but we're 90% certain that when there are valid single '\n' chars within a field's values (as in the snippet in the bug description above), the lp.archiveuploader.parse_tagfile_lines is appending a trailing '\n' to the end of the value (also shown in the snippet above) - and it is this trailing '\n' that is invalid and causing the problem (If/when I have read access, we can simply run a slightly modified version of Celso's above query with the condition '%\n' to verify that these SPRs all have an invalid trailing '\n'.

I can't see any reason why we are not simply using apt_pkg.ParseTagFile() instead of our own parse_tagfile_lines() method, so I'd be keen to replace it if there is no reason not to. Celso's thought was that, as we are seeing more and more debian packages with the valid '\n' coming through, this issue could snow-ball, meaning more corrupt data for which we are relying on the work-around.

As another example, currently the Sources in the following ppa (identified by Celso's query above) displays the problem also:
Package: linux
Binary: linux-source-2.6.31, linux-doc, linux-headers-2.6.31-10, linux-libc-dev, linux-image-2.6.31-10-386, linux-headers-2.6.31-10-386, linux-image-debug-2.6.31-10-386, linux-image-2.6.31-10-generic, linux-headers-2.6.31-10-generic, linux-image-debug-2.6.31-10-generic, linux-image-2.6.31-10-generic-pae, linux-headers-2.6.31-10-generic-pae, linux-image-debug-2.6.31-10-generic-pae, linux-image-2.6.31-10-ia64, linux-headers-2.6.31-10-ia64, linux-image-debug-2.6.31-10-ia64, linux-image-2.6.31-10-lpia, linux-headers-2.6.31-10-lpia, linux-image-debug-2.6.31-10-lpia, linux-image-2.6.31-10-powerpc, linux-headers-2.6.31-10-powerpc, linux-image-debug-2.6.31-10-powerpc, linux-image-2.6.31-10-powerpc64-smp, linux-headers-2.6.31-10-powerpc64-smp, linux-image-debug-2.6.31-10-powerpc64-smp, linux-image-2.6.31-10-powerpc-smp, linux-headers-2.6.31-10-powerpc-smp, linux-image-debug-2.6.31-10-powerpc-smp, linux-image-2.6.31-10-server, linux-headers-2.6.31-10-server, linux-image-debug-2.6.31-10-server,
linux-image-2.6.31-10-sparc64, linux-headers-2.6.31-10-sparc64, linux-image-debug-2.6.31-10-sparc64, linux-image-2.6.31-10-sparc64-smp, linux-headers-2.6.31-10-sparc64-smp, linux-image-debug-2.6.31-10-sparc64-smp, linux-image-2.6.31-10-virtual, kernel-image-2.6.31-10-generic-di, nic-modules-2.6.31-10-generic-di, nic-shared-modules-2.6.31-10-generic-di, serial-modules-2.6.31-10-generic-di, ppp-modules-2.6.31-10-generic-di, firewire-core-modules-2.6.31-10-generic-di, scsi-modules-2.6.31-10-generic-di, plip-modules-2.6.31-10-generic...


Changed in soyuz:
status: Triaged → In Progress
Curtis Hovey (sinzui) on 2009-10-09
tags: added: tech-debt
Diogo Matsubara (matsubara) wrote :
Changed in soyuz:
status: In Progress → Fix Committed
Changed in soyuz:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers