nrss can't parse non-UTF-8 encoded feed that contains non-ASCII characters

Bug #319994 reported by Sergey Romanov
10
Affects Status Importance Assigned to Milestone
nrss (Debian)
Fix Released
Unknown
nrss (Ubuntu)
Triaged
Low
Unassigned

Bug Description

Binary package hint: nrss

nrss 0.3.9-1 gets an error parsing a feed encoded in ISO 8859-1 that contains international symbols. Sometimes only the first item get displayed if it contains no accented characters.

I've tried this feed:
http://rss.golem.de/rss.php?feed=RSS2.0

But Atom feed from the same site works flawlessly (it is encoded in UTF-8)
http://rss.golem.de/rss.php?feed=ATOM1.0

The problem seems to be that XML_ParserCreate is called in parse.c with encoding set to "UTF-8". When called without explicitly set encoding, Expat honors the document encoding declaration.
I've tested it with XML_ParserCreate(NULL) and that works. Patch attached.

ProblemType: Bug
Architecture: i386
DistroRelease: Ubuntu 9.04
Package: nrss 0.3.9-1
ProcEnviron:
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: nrss
Uname: Linux 2.6.28-4-generic i586

Revision history for this message
Sergey Romanov (sml-uni) wrote :
Joseph Smidt (jsmidt)
Changed in nrss:
status: New → In Progress
Revision history for this message
Joseph Smidt (jsmidt) wrote :

This debdiff should close the bug for jaunty. Please sponsor the upload.

Revision history for this message
Iain Lane (laney) wrote :

Hi, I have forwarded this patch to upstream and Debian for consideration. If it is applied at either of these places we can pull it in here. Thanks for your contribution.

Joseph, there is no patch in your debdiff so there's nothing to upload anyway.

Changed in nrss:
importance: Undecided → Low
status: In Progress → Triaged
Changed in nrss:
status: Unknown → New
Revision history for this message
Sergey Romanov (sml-uni) wrote :

So, where the things stand now. NRSS is abandoned upstream in favor of Canto, so it's unlikely to get fixed there.
It hasn't got any updates since July:
http://codezen.org/cgi-bin/gitweb.cgi?p=nrss.git

Michael, who currently maintains NRSS in Debian, is probably still in process of becoming DD. His last packaged source for NRSS (0.3.9-2) is on mentors.debian.net. I've taken the liberty of subscribing him to this bug. So it's up to him. Does he want to carry around patches that are unlikely ever to be applied upstream? Or should I really follow the upstream recommendation and move on to using Canto. ;) It's now in Jaunty too.

Revision history for this message
Jack Miller (jack-codezen) wrote :

Sergey: I believe that it would be smart to just abandon nrss support, I'm not taking patches for it anymore and I'm positive that there are a lot of other bugs that just haven't been reported/filed. Canto is actively maintained and should be many times more stable than nrss ever was.

The July '08 date on the gitweb is misleading, that was just when I synced the code in there for archival purposes (and bugs, since I was planning on maintaining it at the time)

... Not to mention the fact that nrss has a tiny subset of the functionality in canto with an even smaller amount of real testing. I'm going to file a bug against nrss in debian stating that it's dead.

Revision history for this message
Sergey Romanov (sml-uni) wrote :

@Jack

Well, canto as currently packaged in Jaunty doesn't work out of the box. :( See LP bug #342504

And thank you very much for your work.

Revision history for this message
Jack Miller (jack-codezen) wrote :

Sergey: That disappoints me. Of course the canto version in Jaunty is also 8 releases (1 major, 7 minor) away from current, so it's kind of a lost cause anyway.

tags: added: patch-forwarded-debian
Changed in nrss (Debian):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.