Entry ID in Publish ping function

Bug #1017904 reported by Dominique Guardiola
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
subhub
New
Undecided
Unassigned

Bug Description

Hi Ivan

This is not a bug report, rather a question
We're thinking of using subhub for several django sites that need to synchronize their data in "real-time"
It's not only articles, but objects that are passed as a JSON-LD payload in the <summary> XML tag.
Your concept of "private" hub fits well in our project too

We tested before a hosted hub (SuperFeeder) and when pinging the hub to tell it there was new content, we only passed the feed URL (the "topic") and the hub was then in charge of notifying suscribers, depending probably on a diff with the previous version of the feed.

If I understand well your code,
- One DistributionTask is created for each entry and for each suscriber
- when processed, the notifications are grouped to be delivered together to the suscriber callback URL

does it means that there's no use for the suscriber to have a different callback URL for each subscription, as all their subscriptions will be delivered together ?

We currently use the djpubsubhubbub module for the suscriber part, and there's a unique callback URL per subscription:
https://bitbucket.org/petersanchez/djpubsubhubbub/src/953e9209d65a/djpubsubhubbub/views.py#cl-30
(and at first, we found it strange)

description: updated
Revision history for this message
Ivan Sagalaev (isagalaev) wrote :

Hi Dominique,

Distribution tasks are grouped not by a subscriber callback URL alone but by a pair of (callback, topic). The purpose of it is to group together several new entries of one topic intended for a single subscriber. This is merely a network optimization and it doesn't influence the design of a subscriber. It can either have a separate callback URL for different topics or have a single URL and determine the topic from <link rel="self"> in the AtomPub payload (or infer it from any other metadata in the system). Anyway, SubHub won't send entries from different topics in one payload.

As for the new entries, yes, SubHub indeed doesn't keep track of the latest published entry itself. Instead, the user is supposed to call `subhub.publish()` for every new entry, usually right at the time when it's created. This call is not cheap, though, as it is immediately tries to process all the distribution tasks in a blocked fashion. If you don't want to block for distribution then instead of calling `subhub.publish()` you can just add a new distribution task:

    DistributionTask.objects.add(topic, entry_id)

… and then send a signal to some separate process that would call `manage.py subhub_maintenance --distribute` from the shell or `DistributionTask.objects.process()` directly if it's a Python environment.

Revision history for this message
Dominique Guardiola (dguardiola) wrote :

Thanks for clarifying this - we can continue to have different callbacks for suscribers

We were thinking about moving the cron tasks to a lightweight task queue such as python-rq
(with the help of django-rq and rq-scheduler)

Is it something that you'd like to integrate ?

Revision history for this message
Ivan Sagalaev (isagalaev) wrote :

People use all sorts of various solutions to off-load blocking tasks and I specifically designed the library to be independent of it so everyone could use it with whatever infrastructure the have in place.

BTW, I just updated the trunk code with a small change that allows call `subhub.publish()` with a parameter telling it not to force processing distribution entries:

    subhub.publish(topics, entry_id, False)

This is basically an equivalent of manually adding a distribution task but more "official" and "API-ish" :-)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.