[Upstream] Regular Expression Search for circumflex by itself does not find beginning of a paragraph

Bug #465309 reported by jimav on 2009-10-30
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
LibreOffice
Confirmed
Wishlist
OpenOffice
New
Unknown
libreoffice (Ubuntu)
Medium
Unassigned
openoffice.org (Ubuntu)
Undecided
Unassigned

Bug Description

Binary package hint: openoffice.org

1) lsb_release -rd
Description: Ubuntu 12.04 LTS
Release: 12.04

2) apt-cache policy libreoffice-calc
libreoffice-calc:
  Installed: 1:3.5.3-0ubuntu1
  Candidate: 1:3.5.3-0ubuntu1
  Version table:
 *** 1:3.5.3-0ubuntu1 0
        500 http://us.archive.ubuntu.com/ubuntu/ precise-updates/main i386 Packages
        100 /var/lib/dpkg/status
     1:3.5.2-2ubuntu1 0
        500 http://us.archive.ubuntu.com/ubuntu/ precise/main i386 Packages

3) What is expected to happen in Writer, Calc, or the Macro Editor is when one opens the Find & Replace window, with Regular Expression checkbox checked, in the Search for drop down put a circumflex in, and the beginning of every paragraph is found. Consulting the LO Wiki and built-in LO help, it is implied that using a circumflex by itself in the find field should match the beginning of a paragraph:
http://help.libreoffice.org/Common/List_of_Regular_Expressions

4) What happens instead is nothing is found.

WORKAROUND: Notepad++ 6.1.2:
http://notepad-plus-plus.org/download/v6.1.2.html

via WINE.

apt-cache policy wine1.5
wine1.5:
  Installed: 1.5.4-0ubuntu1~ppa1~precise1+pulse17
  Candidate: 1.5.4-0ubuntu1~ppa1~precise1+pulse17
  Version table:
 *** 1.5.4-0ubuntu1~ppa1~precise1+pulse17 0
        500 http://ppa.launchpad.net/ubuntu-wine/ppa/ubuntu/ precise/main i386 Packages
        100 /var/lib/dpkg/status

ProblemType: Bug
Architecture: amd64
Date: Fri Oct 30 11:36:03 2009
DistroRelease: Ubuntu 9.10
Package: openoffice.org-core 1:3.1.1-4ubuntu2 [modified: var/lib/openoffice/basis3.1/program/services.rdb]
ProcEnviron:
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.31-12.41-generic
SourcePackage: openoffice.org
Uname: Linux 2.6.31-12-generic x86_64

jimav (james-avera) wrote :
WeatherGod (ben-v-root) wrote :

I can confirm that '^' does not work in OO 3.1.1 on Fedora 11, but '$' does. This should probably be directed upstream as well.

Changed in openoffice.org (Ubuntu):
status: New → Confirmed
WeatherGod (ben-v-root) wrote :

Finally upstreamed it.

Changed in openoffice:
importance: Undecided → Unknown
status: New → Unknown
Changed in openoffice:
status: Unknown → New
Chris Cheney (ccheney) on 2010-05-13
tags: added: karmic
jimav (james-avera) on 2010-06-16
summary: - regular expression ^ by itself does not work
+ regular expression ^ by itself does not work in Find & Replace of Basic
+ code
description: updated
Jack Leigh (leighman) on 2011-02-10
Changed in libreoffice (Ubuntu):
status: New → Confirmed
status: Confirmed → Triaged
Changed in openoffice.org (Ubuntu):
status: Confirmed → Triaged
Changed in openoffice.org (Ubuntu):
status: Triaged → Won't Fix

[This is an automated message.]
There are no new official OpenOffice.org releases in Ubuntu packaging anymore => Won't Fix

If the problem persists, please mark this bug as "also affects project Libreoffice" or "also affects distribution Libreoffice (Ubuntu)" if that has not happened already.

Please leave references to upstream OpenOffice.org bugs in place to allow cross pollination.

summary: - regular expression ^ by itself does not work in Find & Replace of Basic
- code
+ Regular Expression Search for cirumflex by itself does find beginning of
+ a paragraph

jimav, thank you for taking the time to report this bug and helping to make Ubuntu better. The issue you are reporting is an upstream one and it would be nice if somebody having it could send the bug to the developers of the software by following the instructions at http://wiki.documentfoundation.org/BugReport . If you have done so, please tell us the number of the upstream bug (or the link), so we can add a bugwatch that will inform us about the status. Thanks in advance.

description: updated
tags: added: i386 precise
summary: - Regular Expression Search for cirumflex by itself does find beginning of
- a paragraph
+ Regular Expression Search for cirumflex by itself does not find
+ beginning of a paragraph
Changed in df-libreoffice:
status: New → Incomplete
Changed in libreoffice (Ubuntu):
importance: Undecided → Medium
Bob Bib (bobbib) on 2012-06-02
summary: - Regular Expression Search for cirumflex by itself does not find
+ Regular Expression Search for circumflex by itself does not find
beginning of a paragraph

What is expected to happen in Writer, Calc, or the Macro Editor is when one opens the Find & Replace window, with Regular Expression checkbox checked, in the Search for drop down put a circumflex in, and the beginning of every paragraph is found. Consulting the LO Wiki and built-in LO help, it is implied that using a circumflex by itself in the find field should match the beginning of a paragraph:
http://help.libreoffice.org/Common/List_of_Regular_Expressions

What happens instead is nothing is found.

NOTE: A dollarsign ($) by itself *does* work as expected, i.e., it matches the end of each line.

Changed in df-libreoffice:
importance: Undecided → Unknown
status: Incomplete → Unknown
summary: - Regular Expression Search for circumflex by itself does not find
- beginning of a paragraph
+ [Upstream] Regular Expression Search for circumflex by itself does not
+ find beginning of a paragraph
Changed in df-libreoffice:
importance: Unknown → Medium
status: Unknown → Confirmed

Hi Jim,

Pls use "^." (without the quotes) to find the first character of a paragraph.
I think the ^ only is used in combinations.
See some examples/explanation in the help .

Regards,
Cor

No. ^. is not equivalent. ^. means to match the first character on the line, and if doing a replace then the first character would be deleted. ^ by itself matches the start of the line (not including any characters), and replacing it with something effectively inserts the "replacement" text at the start of the line. You could use something ugly like replacing ^(.) with ${1}PREFIX to avoid deleting the first character, but that would fail on blank lines which don't have any characters in them.

In any case, ^ (by itslef) is a standard, well-defined regular expression syntax used everywhere else (Perl, Python, vim etc. etc.) and Libre Office should not do something incompatible.

If you are unsure how regular expression syntax should work (in industry-wide practice), there are many books and online references, for example

http://en.wikipedia.org/wiki/Regular_expression#POSIX_Basic_Regular_Expressions

Hi Jim,

OK, sorry & thanks for explanantion. (In the mena time I understood that the same applies for $, that cannot be used on itself to find the end of a paragraph).
Did it ever work as is expected, or is it something that has to be implemented..
In that case, this would be an enhancement...

Changed in df-libreoffice:
importance: Medium → Wishlist

AFAIK ^ has never worked correctly. I doubt anyone intentionally made Open Office regular expressions incompatible with industry practice, so I think this is a bug, not a missing feature.

-Jim

Incidentally $ does match the end of paragraphs (as documented), but seems to match the paragraph break (not just tne -position- at the end of the paragraph), so paragraphs are merged forming a single new paragraph. Except only one of a group of successive empty paragraphs is matched.

Matching the para-break itself seems odd to me (as usually unhelpful), but might be intentional. However the fact that only some empty paragraphs are matched is almost certainly a bug.

EXAMPLE: In the following 1-line paragraphs, there are two empty paras between b and c (<P> indicates the paragraph symbol which is shown when displaying non-printing characters):
a<P>
b<P>
<P>
<P>
c<P>
Find-and-replace of $ with X replaces the 5 paragraphs with 2 paragraphs:
aXbX<P>
Xc<P>
As you can see, the 5 paragraphs were collapsed into two paragraphs, except the "paragraph break" was not removed for one of the empty paragraps.

Any thoughts about fixing this? It's still a problem in 4.3-alpha1

Note that searching for ^. is not a work-around because it will not match the start of empty paragraphs (the "." does not match). So if you want to prepend something to every paragraph in a selection which includes empty paragraphs, then ^ alone is necessary.

Isn't your case just covered by using
 & in search and
 \nFOO in replace?

For me that works in Writer

> Isn't your case just covered by using
> & in search and
> \nFOO in replace?

Maybe that was a typo. The above does not work (does nothing--not matched).
Can you suggest a work-around which inserts some text at the start of every line in Calc's Basic macro editor (including empty lines)? That's the problem this bug was originally about and which *should* be easy by replacing ^ with the desired text. That is standard regex behavior everywhere else in computerdom.

^ on its own should work (just like $ on its own does).

(In reply to comment #9)

> Can you suggest a work-around which inserts some text at the start of every
> line in Calc's Basic macro editor (including empty lines)?

The component of this issue is Writer .. ?

Not sure where the regex code is. It manifests in writer and and ing Basic macro editor in Calc.

Still a problem in 4.4.0alpha1
 > New

Maybe Component should be changed to Spreadsheet, because the problem is more simply visible when editing Basic macro code. It is common to want to insert spaces at the start of every line in a range (e.g. to "indent" the code one level), and replacing ^ with spaces does not work.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.