Regular expression fails with "_" as separator

Bug #1882906 reported by Alexander Schwiegel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
calibre
Fix Released
Undecided
Unassigned

Bug Description

I want to use a regular expression when adding books to extract title, author etc from the filename. It does not work with filenames where I use "_" (underline) as separator. Regexp checkers like regex101.com or pythex.org deliver the expected result. In case I use another separator like "-" or "ABC" everything works fine.

Example:
filename = Report on XY_John Doe.pdf
regexp = (?P<title>[^_]+)_(?P<author>[^\.]+)

result calibre => Title: "Report on XY John Doe", Authors: "Unknown"
result regex101 => title: "Report on XY", author: "John Doe"

I am using the latest calibre version 4.18.0 on macOS Catalina 10.15.5.

description: updated
Revision history for this message
Kovid Goyal (kovid) wrote : Re: calibre bug 1882906

IIRC the filename processing code automatically replaces _ with space.
So use spaces in your pattern. This is done because underscores are used
to represent spaces very commonly in filenames. If your filenames
actually have spaces and use underscores as a separator, you should use
some bulk rename utility to convert the underscores to something else
like a hyphen or an =.

 status fixreleased

Changed in calibre:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.