--- cl-regex-1.orig/README +++ cl-regex-1/README @@ -0,0 +1,118 @@ +(documentation from: ) + +REGEX package + +The regex engine is a pretty full-featured matcher, and thus is useful by +itself. It was originally written as a prototype for a C++ matcher, though it +has since diverged greatly. + +The regex compiler supports the following pattern syntax: + + * ^ matches the start of a string. + * $ matches the end of a string. + * [...] denotes a character class. + * [^...] denotes a negated character class. + * [:...:] denotes a special character class. + o [:alpha:] == [A-Za-z] + o [:upper:] == [A-Z] + o [:lower:] == [a-z] + o [:digit:] == [0-9] + o [:alnum:] == [A-Za-z0-9] + o [:xdigit:] == [A-Fa-f0-9] + o [:space:] == whitespace + o [:punct:] == punctuation marks + o [:graph:] == printable characters other than space + o [:cntrl:] == control characters + o [:word:] == wordlike characters + o [^:...:] denotes a negated special character class. + * . matches any character. + * (...) delimits a regex subexpression. Also denotes a register pattern. + * (?...) denotes a regex subexpression that will not be captured in a register. + * (?=...) denotes a regex subexpression that will be used as a forward + lookahead. If the subexpression matches, then the rest of the match will + continue as if the lookahead match had not occurred (i.e. it does not consume + the candidate string). It will not be captured in a register, though it can + contain subexpressions that may be captured. + * (?!...) denotes a regex subexpression that will be used as a negative + forward lookahead (the match will continue only if the lookahead failed to + match). It will not be captured in a register, though it can contain + subexpressions that may be captured. + * * denotes the kleene closure of the previous regex subexpression. + * + denotes the positive closure of the previous regex subexpression. + * *? denotes the non-greedy kleene closure of the previous regex subexpression. + * +? denotes the non-greedy positive closure of the previous regex subexpression. + * ? denotes the greedy match of 0 or 1 occurrences of the previous regex subexpression. + * ?? denotes the non-greedy match of 0 or 1 occurrences of the previous + regex subexpression. + * \nn denotes a back-match against the contents of a previously-matched register. + * {nn,mm} denotes a bounded repetition. + * {nn,mm}? denotes a non-greedy bounded repetition. + * \n, \t, \r have their normal meanings. + * \d matches any decimal character, \D matches any nondecimal character. + * \w matches any wordlike character, \W matches any nonwordlike character. + * \s matches any whitespace character, \S matches any nonspace character. + * \< matches at the start of a word. \> matches at the end of a word. + * \ that character (escapes an otherwise special meaning). + * Special characters lose their specialness when escaped. There is a flag + to control this. + * All other characters are matched literally. + +There are a variety of functions in the REGEX package that allow the programmer +to adjust the allowable regular expression syntax: + + * The function ESCAPE-SPECIAL-CHARS allows you to change whether the + meta-characters have their magic meaning when escaped or unescaped. The default + behavior (per AWK syntax) is that special chars are unescaped. + * The function ALLOW-BACKMATCH allows you to change whether or not the \nn + syntax is allowed. By default it is allowed. + * The function ALLOW-RANGEMATCH allows you to change whether or not the the + {nn,mm} bounded repetition syntax is allowed. By default it is allowed. + * The function ALLOW-NONGREEDY-QUANTIFIERS allows you to change whether or + not the *?, +?, ??, and {nn,mm}? quantifiers are recognized. By default they + are allowed. + * The function ALLOW-NONREGISTER-GROUPS allows you to change whether or not + the (?...) syntax is recognized. By default it is allowed. + * The function DOT-MATCHES-NEWLINE allows you to change whether '.' in a + pattern matches the newline character. This is false by default. + +Parenthesized expressions within the pattern are considered a register pattern, +and will be recorded for use after the match. There is an implicit set of +parentheses around the entire expression, so the bounds of the matched text +itself will always occupy register 0. + +Extensions that will be coming soon include: + + 1. I am working on a second backend for the regex compiler that generates an + even faster matcher (~4-20x faster on Symbolics, ~ 2x faster on LWW). The + compilation process itself is substantially slower. I've got some more work to + do to get the speed up even further on Lispworks, although the current system + is already much, much faster than GNU Regex. + 2. Optionally allowing a negated regex pattern using the '^' + syntax. This also subsumes the negated character class in that [^...] === + [...]^. + 3. Faster scans by using a possible-prefix set. This isn't real high + priority at the moment since matching is plenty fast already :-) + 4. Prefix and postfix context patterns ala LEX. + +Regex has been recently enhanced. Everything from the parser back has been +completely rewritten. The regex system now includes a bunch of functions for +manipulating regex parse trees directly, a multipass optimizer and code +generator, and a new matching engine. + +The new regex system does a better job of optimizing a wider range of patterns. +It also supports an extension that allows you to provide an "accept" function +to the match-str function. This acceptfn takes the start and end position as +parameters, and can find the string itself in the special variable *STR* and +the registers in the special variable *regs*. It returns either nil to force +the matcher to backtrack, or a non-nil value which will be returned as the +success code for the match. + +An additional change is that register patterns within quantified patterns now +return the leftmost occurrence in the source string. There is a flag to force +the more usual rightmost match, but this will reduce the applicability of many +critical optimizations. + +The latest version of regex supports the Perl \d, \D, \w, \W, \s, and \S +metasequences, as well as the egrep \< start-of-word and \> end-of-word +metasequences. + --- cl-regex-1.orig/README.Debian +++ cl-regex-1/README.Debian @@ -0,0 +1,10 @@ +CL-REGEX for Debian +------------------- + +You can run the tests in the example directory evaluating: + +(require :regex) +(load "retest") +(load "regexp-test-suite") +(regex-test::run-tests) + --- cl-regex-1.orig/debian/changelog +++ cl-regex-1/debian/changelog @@ -0,0 +1,36 @@ +cl-regex (1-4.1) unstable; urgency=medium + + * Non-maintainer upload + * Drop Depends on common-lisp-controller, and postinst and prerm scripts + (Closes: #915507) + + -- Sébastien Villemot Mon, 17 Dec 2018 11:38:28 +0100 + +cl-regex (1-4) unstable; urgency=low + + * Updating debhelper level (Closes: #817398) + + -- Matthew Danish Fri, 11 Mar 2016 13:58:29 +0000 + +cl-regex (1-3) unstable; urgency=low + + * Re-adopting. Closes: #377922 + * Documentation from homepage included. Closes: #254015 + * Specify debhelper compatibility level in debian/compat now. + + -- Matthew Danish Sat, 02 Jun 2007 19:39:54 -0400 + +cl-regex (1-2) unstable; urgency=low + + * QA Upload + * Set Maintainer to QA Group, Orphaned: #377922 + * Move debhelper from B-D-I to B-D + * Conforms with latest standards version + + -- Michael Ablassmeier Mon, 31 Jul 2006 15:26:19 +0200 + +cl-regex (1-1) unstable; urgency=low + + * Initial release. (Closes: #168303) + + -- Matthew Danish Fri, 8 Nov 2002 10:30:01 -0500 --- cl-regex-1.orig/debian/compat +++ cl-regex-1/debian/compat @@ -0,0 +1 @@ +9 --- cl-regex-1.orig/debian/control +++ cl-regex-1/debian/control @@ -0,0 +1,14 @@ +Source: cl-regex +Section: devel +Priority: optional +Maintainer: Matthew Danish +Build-Depends: debhelper (>> 9.0.0) +Standards-Version: 3.9.7 + +Package: cl-regex +Architecture: all +Depends: ${misc:Depends} +Description: Common Lisp regular expression compiler/matcher + A fully-featured regular expression compiler and matching engine for + Common Lisp that claims to be roughly 5x-20x times faster than the GNU + regex matcher written in C. --- cl-regex-1.orig/debian/copyright +++ cl-regex-1/debian/copyright @@ -0,0 +1,34 @@ +Debian Copyright Section +======================== + +Upstream Source URL: http://www.geocities.com/mparker762/clawk.html +Upstream Author: Kenneth Michael Parker +Debian Maintainer: Matthew Danish + +Upstream Copyright Statement +============================ + +Copyright (c) 2000,2001,2002 Kenneth Michael Parker +All rights reserved. + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions +are met: +1. Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. +2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. +3. The name of the author may not be used to endorse or promote products + derived from this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR +IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES +OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. +IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, +INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT +NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF +THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. --- cl-regex-1.orig/debian/docs +++ cl-regex-1/debian/docs @@ -0,0 +1 @@ +README --- cl-regex-1.orig/debian/rules +++ cl-regex-1/debian/rules @@ -0,0 +1,92 @@ +#!/usr/bin/make -f + + +pkg := regex +debpkg := cl-regex + +files := packages.lisp gen.lisp optimize.lisp macs.lisp regex.lisp \ + closure.lisp parser.lisp +examples := regexp-test-suite.lisp retest.lisp +docs := README.Debian + +clc-source := usr/share/common-lisp/source +clc-systems := usr/share/common-lisp/systems +clc-pkg := $(clc-source)/$(pkg) +doc-dir := usr/share/doc/$(debpkg) + + +configure: configure-stamp +configure-stamp: + dh_testdir + # Add here commands to configure the package. + + touch configure-stamp + + +build-arch: build +build-indep: build + +build: build-stamp + +build-stamp: configure-stamp + dh_testdir + + # Add here commands to compile the package. + touch build-stamp + +clean: + dh_testdir + dh_testroot + rm -f build-stamp configure-stamp + # Add here commands to clean up after the build process. + rm -f debian/$(debpkg).postinst.* debian/$(debpkg).prerm.* + dh_clean + +install: build + dh_testdir + dh_testroot + dh_prep + # Add here commands to install the package into debian/$(pkg). + dh_installdirs $(clc-systems) $(clc-pkg) $(doc-dir) + chmod 644 $(files) $(pkg).asd $(examples) + dh_install $(pkg).asd $(files) $(clc-pkg) + dh_install $(docs) $(doc-dir) + dh_link $(clc-pkg)/$(pkg).asd $(clc-systems)/$(pkg).asd + + +# Build architecture-dependent files here. +binary-arch: build install + +# Build architecture-independent files here. +binary-indep: build install + dh_testdir + dh_testroot +# dh_installdebconf + dh_installdocs + dh_installexamples $(examples) + dh_link +# dh_installmenu +# dh_installlogrotate +# dh_installemacsen +# dh_installpam +# dh_installmime +# dh_installinit +# dh_installcron +# dh_installman +# dh_installinfo +# dh_undocumented + dh_installchangelogs + dh_strip + dh_compress + dh_fixperms +# dh_makeshlibs + dh_installdeb +# dh_perl + dh_shlibdeps + dh_gencontrol + dh_md5sums + dh_builddeb + +binary: binary-indep binary-arch +.PHONY: build clean binary-indep binary-arch binary install configure + --- cl-regex-1.orig/debian/source/format +++ cl-regex-1/debian/source/format @@ -0,0 +1 @@ +1.0 --- cl-regex-1.orig/macs.lisp +++ cl-regex-1/macs.lisp @@ -517,7 +517,7 @@ nil) -(defconstant +special-class-names+ +(defparameter +special-class-names+ '((":alpha:" alpha) (":upper:" upper) (":lower:" lower) (":digit:" digit) (":alnum:" alnum) (":xdigit:" xdigit) (":odigit:" odigit) (":punct:" punct) (":space:" space) (":word:" wordchar))) --- cl-regex-1.orig/regex.asd +++ cl-regex-1/regex.asd @@ -0,0 +1,19 @@ +;;; -*- Mode: Lisp; Syntax: ANSI-Common-lisp; Package: CL-USER; Base: 10 -*- + +(in-package "CL-USER") + + +(asdf:defsystem regex + :components ((:file "packages") + (:file "macs" :depends-on ("packages")) + (:file "parser" :depends-on ("packages" "macs")) + (:file "optimize" :depends-on ("packages" "macs")) + (:file "gen" :depends-on ("packages" "macs")) + (:file "closure" :depends-on ("packages" "macs")) + (:file "regex" :depends-on ("packages" + "macs" + "parser" + "optimize" + "gen" + "closure")))) +