regex does not properly select match

Bug #2075970 reported by Benjamin Eeckhout
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
calibre
Fix Released
Undecided
Unassigned

Bug Description

The text selection when searching in a html file does not seem to be correct.

This is a mininal reproducable example:

```
π’Žπ’Žπ’Žπ’Žπ’Žπ’Ž
<p> test</p>
```
with the regex:
```
test(?=<\/p>)
```
I verified it on `https://regex101.com/` that this generates different matches (it selects `test` as expected)

I really don't know _what_ it is doing exactly, depending on where the search starts + it gives different selections if `up` or `down` is set as search direction.
Sometimes it _does_ give the correct match which makes it even more confusing.
It does **not** exhibit this behaviour when using "normal" characters (eg: "a", "b", ...) and breaks with "weird" characters (eg: "𝒖", "π’Ž", ...).

Version: 7.16
OS:
Edition Windows 10 Home
Version 22H2
Installed on β€Ž28/β€Ž12/β€Ž2023
OS build 19045.4651
Experience Windows Feature Experience Pack 1000.19060.1000.0

Tags: regex
Revision history for this message
Benjamin Eeckhout (benjamineeckh) wrote :

It looks like this happens if the regex parser goes over any non-single byte character while traversing the file.

So with search direction as "down" and `|` being the cursor position:
```
π’Žπ’Žπ’Žπ’Žπ’Žπ’Ž |
<p> test</p>
```
with the same search will correctly match `test`, while:
```
|π’Žπ’Ž
<p> test</p>
```
will match "> te" (probably because of the 2 extra bytes).

Also note:
search was in "regex" mode not case sensitive , wrap was on and "dot all" was off

Revision history for this message
Kovid Goyal (kovid) wrote :

Fixed in branch master. The fix will be in the next release. calibre is usually released every alternate Friday.

Changed in calibre:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.