data items on pages should be identified

Bug #575557 reported by Dan Trevino
This bug report is a duplicate of:  Bug #585307: export data to json. Edit Remove
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
LoCo Team Portal
New
Undecided
Unassigned

Bug Description

for many reasons, data items on pages should be identified. for instance:

<ul class="teamlist"><li class="team"><a class="teamlink">

<ul class="eventlist"><li class="event"><a class="eventlink">

Revision history for this message
Thomas Bechtold (toabctl) wrote :

for what reasons exactly?

Revision history for this message
Michael Hall (mhall119) wrote :

Dan Trevino is writing an Android app for accessing loco-directory data. Until we can provide a webservice API, he is scraping our screens. Adding class="" attributes to certain data points lets him do this in a safer manner.

Revision history for this message
David Rubin (drubin) wrote :

Or this provides a legacy system that will break should we ever want to update or change the css layout of the web application.

Personally I think this is a bad idea.

Not so much adding the classes(this is always a good idea) but the reasoning behind it. Now every time we make a trivial html change do we have to verify that it doesn't break the android client? Then bigger question to what extent do we support this Android client?

Revision history for this message
Daniel Holbach (dholbach) wrote :

What about http://loco.ubuntu.com/data/xml - we could also export json easily.

Revision history for this message
Dan Trevino (dantrevino) wrote :

oooo... that xml is fugly. How can i hope to parse this:

<object>
                <p>loco-philippine-teamUbuntu Team PhilippinesPhilippineshttp://wiki.ubuntu.com/PhilippineTeamhttp://ubuntu-ph.org/https://lists.ubuntu.com/mailman/listinfo/ubuntu-phhttp://<email address hidden>#ubuntu-phTrueTrue2010-06-22zakamehttps://edge.launchpad.net/api/beta/~loco-philippine-team/mugshot <object/>
                    <object/>
                    <object/>
                    <object/>
                    <object/>
                </p>
      </object>

Whats worse is it changes based on team details.

Revision history for this message
Dan Trevino (dantrevino) wrote :

David, I cant argue with html being a bad api. I'd certainly prefer rest/json.

My plan was to not "release" until there is a real api to access. In the meantime, I'm using the scraping for proof of concept and feature feedback which I hope will foster something better down the road. As it stands today, the farther I get from just replicating the site, the bigger the hit on performance, but at least I can have data to play with.

Revision history for this message
Daniel Holbach (dholbach) wrote :

Dan: that's because you looked at it in the browser.

daniel@miyazaki:~$ head -c 500 xml
<?xml version="1.0" encoding="utf-8"?>
<django-objects version="1.0"><object pk="1" model="teams.team"><field type="SlugField" name="lp_name">ubuntu-alaska</field><field type="CharField" name="name">Alaska Ubuntu LoCo Team</field><field type="CharField" name="country"><None></None></field><field type="CharField" name="spr"><None></None></field><field type="CharField" name="city"><None></None></field><field type="CharField" name="wiki_url"><None></None></field><field type="CharField" name="web_ur
daniel@miyazaki:~$

Revision history for this message
Daniel Holbach (dholbach) wrote :

The XML we have is certainly better than screenscraping because it's only going to change slightly which you can never expect from HTML.

Shall we make this bug about exporting JSON instead?

Revision history for this message
Dan Trevino (dantrevino) wrote :

That'd be a nice first step. I put a blueprint up a few weeks back.

https://blueprints.launchpad.net/loco-directory/+spec/json-interface

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.