Comment 7 for bug 753000

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 753000] Re: NotOneError caused by duplicate stuctural subscriptions

I see a couple of idiosyncracies here.

Firstly,

    "sourcepackagename_requires_distribution" CHECK (sourcepackagename
IS NULL OR distribution IS NOT NULL)

in bug target constraints we permit distroseries + sourcepackagename:

    "bugtask_assignment_checks" CHECK (
CASE
    WHEN product IS NOT NULL THEN productseries IS NULL AND
distribution IS NULL AND distroseries IS NULL AND sourcepackagename IS
NULL
    WHEN productseries IS NOT NULL THEN distribution IS NULL AND
distroseries IS NULL AND sourcepackagename IS NULL
    WHEN distribution IS NOT NULL THEN distroseries IS NULL
    WHEN distroseries IS NOT NULL THEN true
    ELSE false
END)

So, this suggests a mismatch between structure and how folk may want
to subscribe. (Specifically, why do we let someone say 'natty only
please' or 'ubuntu python-subunit only please' but not 'natty
python-subunit ony please'.

I don't think this affects your issue, but perhaps a separate bug
should be filed?

Similarly:

    "one_target" CHECK (null_count(ARRAY[product, productseries,
project, distroseries, distribution, milestone]) = 5)

assumes that a subscription to a milestone is interesting *across all
of LP*, which is only true because milestone ids are not milestone
names: this may be a problem in the future [subscribe to the milestone
'foo' of the launchpad-project group, for instance]. Again, I think
this is separate bug worth a note in the bug tracker. [can't subscribe
to a milestone of a project-group].

---------------
SELECT StructuralSubscription.project, StructuralSubscription.subscriber
FROM StructuralSubscription
WHERE StructuralSubscription.project IS NOT NULL
GROUP BY StructuralSubscription.project, StructuralSubscription.subscriber
HAVING Count(*)>1;
 project | subscriber
---------+------------
      82 | 2
(1 row)
CORRECT (ProjectGroup)

Hah - this is me ;).
More data shows an interesting story:

 22643 | | | 82 | |
| | | 2 | 2 |
2010-07-09 05:59:22.012246 | 2010-07-09 05:59:22.012246
 22644 | | | 82 | |
| | | 2 | 2 |
2010-07-09 05:59:23.001104 | 2010-07-09 05:59:23.001104

Thats double-form submission or something, 1 second apart [under our
replication lag possibly].

I modified this query;
SELECT StructuralSubscription.product,
StructuralSubscription.subscriber, count(*)
FROM StructuralSubscription
WHERE StructuralSubscription.product IS NOT NULL
GROUP BY StructuralSubscription.product, StructuralSubscription.subscriber
HAVING Count(*)>1;
to see the frequency - also 2.

 SELECT *
FROM StructuralSubscription
WHERE product in (9868, 2461, 7534, 24395, 7533, 8269) and subscriber
in (3388985, 2212151, 1814750, 3391740, 1814750, 242763) order by
product, id;

shows a couple of dupes with multi day gaps, but others with subminute
gaps. One in particular caught my eye:

 25131 | 9868 | | | |
| | | 3388985 | 3388985 |
2010-09-16 12:37:32.187706 | 2010-09-16 12:37:32.187706
 25132 | 9868 | | | |
| | | 3388985 | 3388985 |
2010-09-16 12:37:30.823026 | 2010-09-16 12:37:30.823026
Note that the earlier-by-sequence number row has a created date two
seconds later than the row after it.

So I concur that:
 - there are duplicate rows
 - we should clean them up
 - there is a missing unique index - needs to be built across target +
sourcepackagename + subscriber

And probably redundantly, I'll note that
distro X, package N, subscriber Y
distro X, package M, subscriber Y
does not count as a dupe.
That is - the only dupes we have are on project and product subscriptions.