1 files changed, 601 insertions, 0 deletions
diff --git a/info/regex b/info/regex
new file mode 100644
index 00000000000..a262ac6aacf
--- /dev/null
+++ b/info/regex
@@ -0,0 +1,601 @@
+Info file regex, produced by Makeinfo, -*- Text -*- from input file
+regex.texinfo.
+
+
+
+File: regex,  Node: top,  Next: syntax,  Up: (dir)
+
+"regex" regular expression matching library.
+********************************************
+
+Overview
+========
+
+Regular expression matching allows you to test whether a string fits
+into a specific syntactic shape.  You can also search a string for a
+substring that fits a pattern.
+
+A regular expression describes a set of strings.  The simplest case
+is one that describes a particular string; for example, the string
+`foo' when regarded as a regular expression matches `foo' and nothing
+else.  Nontrivial regular expressions use certain special constructs
+so that they can match more than one string.  For example, the
+regular expression `foo\|bar' matches either the string `foo' or the
+string `bar'; the regular expression `c[ad]*r' matches any of the
+strings `cr', `car', `cdr', `caar', `cadddar' and all other such
+strings with any number of `a''s and `d''s.
+
+The first step in matching a regular expression is to compile it. 
+You must supply the pattern string and also a pattern buffer to hold
+the compiled result.  That result contains the pattern in an internal
+format that is easier to use in matching.
+
+Having compiled a pattern, you can match it against strings.  You can
+match the compiled pattern any number of times against different
+strings.
+
+* Menu:
+
+* syntax::	Syntax of regular expressions
+* directives::	Meaning of characters as regex string directives.
+* emacs::	Additional character directives available
+		  only for use within Emacs.
+* programming:: Using the regex library from C programs
+* unix::	Unix-compatible entry-points to regex library
+
+
+
+File: regex,  Node: syntax,  Next: directives,  Prev: top,  Up: top
+
+Syntax of Regular Expressions
+=============================
+
+Regular expressions have a syntax in which a few characters are
+special constructs and the rest are "ordinary".  An ordinary
+character is a simple regular expression which matches that character
+and nothing else.  The special characters are `$', `^', `.', `*',
+`+', `?', `[', `]' and `\'.  Any other character appearing in a
+regular expression is ordinary, unless a `\' precedes it.
+
+For example, `f' is not a special character, so it is ordinary, and
+therefore `f' is a regular expression that matches the string `f' and
+no other string.  (It does *not* match the string `ff'.)  Likewise,
+`o' is a regular expression that matches only `o'.
+
+Any two regular expressions A and B can be concatenated.  The result
+is a regular expression which matches a string if A matches some
+amount of the beginning of that string and B matches the rest of the
+string.
+
+As a simple example, we can concatenate the regular expressions `f'
+and `o' to get the regular expression `fo', which matches only the
+string `fo'.  Still trivial.
+
+Note: for Unix compatibility, special characters are treated as
+ordinary ones if they are in contexts where their special meanings
+make no sense.  For example, `*foo' treats `*' as ordinary since
+there is no preceding expression on which the `*' can act.  It is
+poor practice to depend on this behavior; better to quote the special
+character anyway, regardless of where is appears.
+
+
+
+File: regex,  Node: directives,  Next: emacs,  Prev: syntax,  Up: top
+
+The following are the characters and character sequences which have
+special meaning within regular expressions.  Any character not
+mentioned here is not special; it stands for exactly itself for the
+purposes of searching and matching.  *Note syntax::.
+
+`.'
+     is a special character that matches anything except a newline. 
+     Using concatenation, we can make regular expressions like `a.b'
+     which matches any three-character string which begins with `a'
+     and ends with `b'.
+
+`*'
+     is not a construct by itself; it is a suffix, which means the
+     preceding regular expression is to be repeated as many times as
+     possible.  In `fo*', the `*' applies to the `o', so `fo*'
+     matches `f' followed by any number of `o''s.
+
+     The case of zero `o''s is allowed: `fo*' does match `f'.
+
+     `*' always applies to the *smallest* possible preceding
+     expression.  Thus, `fo*' has a repeating `o', not a repeating
+     `fo'.
+
+     The matcher processes a `*' construct by matching, immediately,
+     as many repetitions as can be found.  Then it continues with the
+     rest of the pattern.  If that fails, backtracking occurs,
+     discarding some of the matches of the `*''d construct in case
+     that makes it possible to match the rest of the pattern.  For
+     example, matching `c[ad]*ar' against the string `caddaar', the
+     `[ad]*' first matches `addaa', but this does not allow the next
+     `a' in the pattern to match.  So the last of the matches of
+     `[ad]' is undone and the following `a' is tried again.  Now it
+     succeeds.
+
+`+'
+     `+' is like `*' except that at least one match for the preceding
+     pattern is required for `+'.  Thus, `c[ad]+r' does not match
+     `cr' but does match anything else that `c[ad]*r' would match.
+
+`?'
+     `?' is like `*' except that it allows either zero or one match
+     for the preceding pattern.  Thus, `c[ad]?r' matches `cr' or
+     `car' or `cdr', and nothing else.
+
+`[ ... ]'
+     `[' begins a "character set", which is terminated by a `]'.  In
+     the simplest case, the characters between the two form the set. 
+     Thus, `[ad]' matches either `a' or `d', and `[ad]*' matches any
+     string of `a''s and `d''s (including the empty string), from
+     which it follows that `c[ad]*r' matches `car', etc.
+
+     Character ranges can also be included in a character set, by
+     writing two characters with a `-' between them.  Thus, `[a-z]'
+     matches any lower-case letter.  Ranges may be intermixed freely
+     with individual characters, as in `[a-z$%.]', which matches any
+     lower case letter or `$', `%' or period.
+
+     Note that the usual special characters are not special any more
+     inside a character set.  A completely different set of special
+     characters exists inside character sets: `]', `-' and `^'.
+
+     To include a `]' in a character set, you must make it the first
+     character.  For example, `[]a]' matches `]' or `a'.  To include
+     a `-', you must use it in a context where it cannot possibly
+     indicate a range: that is, as the first character, or
+     immediately after a range.
+
+`[^ ... ]'
+     `[^' begins a "complement character set", which matches any
+     character except the ones specified.  Thus, `[^a-z0-9A-Z]'
+     matches all characters *except* letters and digits.
+
+     `^' is not special in a character set unless it is the first
+     character.  The character following the `^' is treated as if it
+     were first (it may be a `-' or a `]').
+
+`^'
+     is a special character that matches the empty string -- but only
+     if at the beginning of a line in the text being matched. 
+     Otherwise it fails to match anything.  Thus, `^foo' matches a
+     `foo' which occurs at the beginning of a line.
+
+`$'
+     is similar to `^' but matches only at the end of a line.  Thus,
+     `xx*$' matches a string of one or more `x''s at the end of a line.
+
+`\'
+     has two functions: it quotes the above special characters
+     (including `\'), and it introduces additional special constructs.
+
+     Because `\' quotes special characters, `\$' is a regular
+     expression which matches only `$', and `\[' is a regular
+     expression which matches only `[', and so on.
+
+     For the most part, `\' followed by any character matches only
+     that character.  However, there are several exceptions:
+     characters which, when preceded by `\', are special constructs. 
+     Such characters are always ordinary when encountered on their own.
+
+     No new special characters will ever be defined.  All extensions
+     to the regular expression syntax are made by defining new
+     two-character constructs that begin with `\'.
+
+`\|'
+     specifies an alternative.  Two regular expressions A and B with
+     `\|' in between form an expression that matches anything that
+     either A or B will match.
+
+     Thus, `foo\|bar' matches either `foo' or `bar' but no other
+     string.
+
+     `\|' applies to the largest possible surrounding expressions. 
+     Only a surrounding `\( ... \)' grouping can limit the grouping
+     power of `\|'.
+
+     Full backtracking capability exists when multiple `\|''s are used.
+
+`\( ... \)'
+     is a grouping construct that serves three purposes:
+
+       1. To enclose a set of `\|' alternatives for other operations.
+          Thus, `\(foo\|bar\)x' matches either `foox' or `barx'.
+
+       2. To enclose a complicated expression for the postfix `*' to
+          operate on.  Thus, `ba\(na\)*' matches `bananana', etc.,
+          with any (zero or more) number of `na''s.
+
+       3. To mark a matched substring for future reference.
+
+     This last application is not a consequence of the idea of a
+     parenthetical grouping; it is a separate feature which happens
+     to be assigned as a second meaning to the same `\( ... \)'
+     construct because there is no conflict in practice between the
+     two meanings.  Here is an explanation of this feature:
+
+`\DIGIT'
+     After the end of a `\( ... \)' construct, the matcher remembers
+     the beginning and end of the text matched by that construct. 
+     Then, later on in the regular expression, you can use `\'
+     followed by DIGIT to mean "match the same text matched the
+     DIGIT'th time by the `\( ... \)' construct."  The `\( ... \)'
+     constructs are numbered in order of commencement in the regexp.
+
+     The strings matching the first nine `\( ... \)' constructs
+     appearing in a regular expression are assigned numbers 1 through
+     9 in order of their beginnings.  `\1' through `\9' may be used
+     to refer to the text matched by the corresponding `\( ... \)'
+     construct.
+
+     For example, `\(.*\)\1' matches any string that is composed of
+     two identical halves.  The `\(.*\)' matches the first half,
+     which may be anything, but the `\1' that follows must match the
+     same exact text.
+
+`\b'
+     matches the empty string, but only if it is at the beginning or
+     end of a word.  Thus, `\bfoo\b' matches any occurrence of `foo'
+     as a separate word.  `\bball\(s\|\)\b' matches `ball' or `balls'
+     as a separate word.
+
+`\B'
+     matches the empty string, provided it is *not* at the beginning
+     or end of a word.
+
+`\<'
+     matches the empty string, but only if it is at the beginning of
+     a word.
+
+`\>'
+     matches the empty string, but only if it is at the end of a word.
+
+`\w'
+     matches any word-constituent character.
+
+`\W'
+     matches any character that is not a word-constituent.
+
+There are a number of additional `\' regexp directives available for
+use within Emacs only.
+
+(*note emacs::.).
+
+
+
+File: regex,  Node: emacs,  Next: programming,  Prev: directives,  Up: top
+
+Constructs Available in Emacs Only
+----------------------------------
+
+`\`'
+     matches the empty string, but only if it is at the beginning of
+     the buffer.
+
+`\''
+     matches the empty string, but only if it is at the end of the
+     buffer.
+
+`\sCODE'
+     matches any character whose syntax is CODE.  CODE is a letter
+     which represents a syntax code: thus, `w' for word constituent,
+     `-' for whitespace, `(' for open-parenthesis, etc.  See the
+     documentation for the Emacs function `modify-syntax-entry' for
+     further details.
+
+     Thus, `\s(' matches any character with open-parenthesis syntax.
+
+`\SCODE'
+     matches any character whose syntax is not CODE.
+
+
+
+File: regex,  Node: programming,  Next: compiling,  Prev: emacs,  Up: top
+
+Programming using the `regex' library
+=====================================
+
+The subnodes accessible from this menu give information on entry
+points and data structures which C programs need to interface to the
+`regex' library.
+
+* Menu:
+
+* compiling::	How to compile regular expressions
+* matching::	Matching compiled regular expressions
+* searching::	Searching for compiled regular expressions
+* translation::	Translating characters into other characters
+		  (for both compilation and matching)
+* registers::	determining what was matched
+* split::	matching data which is split into two pieces
+* unix::	Unix-compatible entry-points to regex library
+
+
+
+File: regex,  Node: compiling,  Next: matching,  Prev: programming,  Up: programming
+
+Compiling a Regular Expression
+------------------------------
+
+To compile a regular expression, you must supply a pattern buffer. 
+This is a structure defined, in the include file `regex.h', as
+follows:
+
+     struct re_pattern_buffer
+       {
+         char *buffer   /* Space holding the compiled pattern commands. */
+         int allocated  /* Size of space that  buffer  points to */
+         int used       /* Length of portion of buffer actually occupied */
+         char *fastmap; /* Pointer to fastmap, if any, or zero if none. */
+                        /* re_search uses the fastmap, if there is one,
+                           to skip quickly over totally implausible
+                           characters */
+         char *translate;
+                        /* Translate table to apply to characters before
+                           comparing, or zero for no translation.
+                           The translation is applied to a pattern when
+                           it is compiled and to data when it is matched. */
+         char fastmap_accurate;
+                        /* Set to zero when a new pattern is stored,
+                           set to one when the fastmap is updated from it. */
+       };
+
+Before compiling a pattern, you must initialize the `buffer' field to
+point to a block of memory obtained with `malloc', and the
+`allocated' field to the size of that block, in bytes.  The pattern
+compiler will replace this block with a larger one if necessary.
+
+You must also initialize the `translate' field to point to the
+translate table that you will use when you match the compiled
+pattern, or to zero if you will use no translate table when you
+match.  *Note translation::.
+
+Then call `re_compile_pattern' to compile a regular expression into
+the buffer:
+
+     re_compile_pattern (REGEX, REGEX_SIZE, BUF)
+
+REGEX is the address of the regular expression (`char *'), REGEX_SIZE
+is its length (`int'), BUF is the address of the buffer (`struct
+re_pattern_buffer *').
+
+`re_compile_pattern' returns zero if it succeeds in compiling the
+regular expression.  In that case, `*buf' now contains the results. 
+Otherwise, `re_compile_pattern' returns a string which serves as an
+error message.
+
+After compiling, if you wish to search for the pattern, you must
+initialize the `fastmap' component of the pattern buffer.  *Note
+searching::.
+
+
+
+File: regex,  Node: matching,  Next: searching,  Prev: compiling,  Up: programming
+
+Matching a Compiled Pattern
+---------------------------
+
+Once a regular expression has been compiled into a pattern buffer,
+you can match the pattern buffer against a string with `re_match'.
+
+     re_match (BUF, STRING, SIZE, POS, REGS)
+
+BUF is, once again, the address of the buffer (`struct
+re_pattern_buffer *').  STRING is the string to be matched (`char *').
+sIZE is the length of that string (`int').  POS is the position
+within the string at which to begin matching (`int').  The beginning
+of the string is position 0.  REGS is described below.  Normally it
+is zero.  *Note registers::.
+
+`re_match' returns `-1' if the pattern does not match; otherwise, it
+returns the length of the portion of `string' which was matched.
+
+For example, suppose that BUF points to a buffer containing the
+result of compiling `x*', STRING points to `xxxxxy', and SIZE is `6'.
+Suppose that POS is `2'.  Then the last three `x''s will be matched,
+so `re_match' will return `3'.  If POS is zero, the value will be `5'.
+If POS is `5' or `6', the value will be zero, meaning that the null
+string was successfully matched.  Note that since `x*' matches the
+empty string, it will never entirely fail.
+
+It is up to the caller to avoid passing a value of POS that results
+in matching outside the specified string.  POS must not be negative
+and must not be greater than SIZE.
+
+
+
+File: regex,  Node: searching,  Next: translation,  Prev: matching,  Up: programming
+
+Searching for a Match
+---------------------
+
+Searching means trying successive starting positions for a match
+until a match is found.  To search, you supply a compiled pattern
+buffer.  Before searching you must initialize the `fastmap' field of
+the pattern buffer (see below).
+
+     re_search (BUF, STRING, SIZE, STARTPOS, RANGE, REGS)
+
+is called like `re_match' except that the POS argument is replaced by
+two arguments STARTPOS and RANGE.  `re_search' tests for a match
+starting at index STARTPOS, then at `STARTPOS + 1', and so on.  It
+tries RANGE consecutive positions before giving up and returning
+`-1'.  If a match is found, `re_search' returns the index at which
+the match was found.
+
+If RANGE is negative, RE_SEARCH tries starting positions STARTPOS,
+`STARTPOS - 1', ... in that order.  `|RANGE|' is the number of tries
+made.
+
+It is up to the caller to avoid passing value of STARTPOS and RANGE
+that result in matching outside the specified string.  STARTPOS must
+be between zero and SIZE, inclusive, and so must `STARTPOS + RANGE -
+1' (if RANGE is positive) or `STARTPOS + RANGE + 1' (if RANGE is
+negative).
+
+If you may be searching over a long distance (that is, trying many
+different match starting points) with a compiled pattern, you should
+use a "fastmap" in it.  This is a block of 256 bytes, whose address
+is placed in the `fastmap' component of the pattern buffer.  The
+first time you search for a particular compiled pattern, the fastmap
+is set so that `FASTMAP[CH]' is nonzero if the character CH might
+possibly start a match for this pattern.  `re_search' checks each
+character against the fastmap so that it can skip more quickly over
+non-matches.
+
+If you do not want a fastmap, store zero in the `fastmap' component
+of the pattern buffer before calling `re_search'.
+
+In either case, you must initialize this component in a pattern
+buffer before you can use that buffer in a search; but you can choose
+as an initial value either zero or the address of a suitable block of
+memory.
+
+If you compile a new pattern in an existing pattern buffer, it is not
+necessary to reinitialize the `fastmap' component (unless you wish to
+override your previous choice).
+
+
+
+File: regex,  Node: translation,  Next: registers,  Prev: searching,  Up: programming
+
+Translate Tables
+----------------
+
+With a translate table, you can apply a transformation to all
+characters before they are compared.  For example, a table that maps
+lower case letters into upper case (or vice versa) causes differences
+in case to be ignored by matching.
+
+A translate table is a block of 256 bytes.  Each character of raw
+data is used as an index in the translate table.  The value found
+there is used instead of the original character.  Each character in a
+regular expression, except for the syntactic constructs, is
+translated when the expression is compiled.  Each character of a
+string being matched is translated whenever it is compared or tested.
+
+A suitable translate table to ignore differences in case maps all
+characters into themselves, except for lower case letters, which are
+mapped into the corresponding upper case letters.  It could be
+initialized by:
+
+     for (i = 0; i < 0400; i++)
+       table[i] = i;
+     for (i = 'a'; i <= 'z'; i++)
+       table[i] = i - 040;
+
+You specify the use of a translate table by putting its address in
+the TRANSLATE component of the compiled pattern buffer.  If this
+component is zero, no translation is done.  Since both compilation
+and matching use the translate table, you must use the same table
+contents for both operations or confusing things will happen.
+
+
+
+File: regex,  Node: registers,  Next: split,  Prev: translation,  Up: programming
+
+Registers: or "What Did the `\( ... \)' Groupings Actually Match?"
+------------------------------------------------------------------
+
+If you want to find out, after the match, what each of the first nine
+`\( ... \)' groupings actually matched, you can pass the REGS
+argument to the match or search function.  Pass the address of a
+structure of this type:
+
+     struct re_registers
+       {
+         int start[RE_NREGS];
+         int end[RE_NREGS];
+       };
+
+  `re_match' and `re_search' will store into this structure the data
+you want.  `REGS->start[REG]' will be the index in STRING of the
+beginning of the data matched by the REG'th `\( ... \)' grouping, and
+`REGS->end[REG]' will be the index of the end of that data (the index
+of the first character beyond those matched).  The values in the
+start and end arrays at indexes greater than the number of `\( ...
+\)' groupings present in the regular expression will be set to the
+value -1.  Register numbers start at 1 and run to `RE_NREGS - 1'
+(normally `9').  `REGS->start[0]' and `REGS->end[0]' are similar but
+describe the extent of the substring matched by the entire pattern.
+
+  Both `struct re_registers' and `RE_NREGS' are defined in `regex.h'.
+
+
+
+File: regex,  Node: split,  Next: unix,  Prev: registers,  Up: programming
+
+Matching against Split Data
+---------------------------
+
+The functions `re_match_2' and `re_search_2' allow one to match in or
+search data which is divided into two strings.
+
+`re_match_2' works like `re_match' except that two data strings and
+sizes must be given.
+
+     re_match_2 (BUF, STRING1, SIZE1, STRING2, SIZE2, POS, REGS)
+
+The matcher regards the contents of STRING1 as effectively followed
+by the contents of STRING2, and matches the combined string against
+the pattern in BUF.
+
+`re_search_2' is likewise similar to `re_search':
+
+     re_search_2 (BUF, STRING1, SIZE1, STRING2, SIZE2, STARTPOS, RANGE, REGS)
+
+The value returned by RE_SEARCH_2 is an index into the combined data
+made up of STRING1 and STRING2.  It never exceeds `SIZE1 + SIZE2'. 
+The values returned in the REGS structure (if there is one) are
+likewise indices in the combined data.
+
+
+
+File: regex,  Node: unix,  Prev: split,  Up: programming
+
+Unix-Compatible Entry Points
+----------------------------
+
+The standard Berkeley Unix way to compile a regular expression is to
+call `re_comp'.  This function takes a single argument, the address
+of the regular expression, which is assumed to be terminated by a
+null character.
+
+`re_comp' does not ask you to specify a pattern buffer because it has
+its own pattern buffer -- just one.  Using `re_comp', one may match
+only the most recently compiled regular expression.
+
+The value of `re_comp' is zero for success or else an error message
+string, as for `re_compile_pattern'.
+
+Calling `re_comp' with the null string as argument it has no effect;
+the contents of the buffer remain unchanged.
+
+The standard Berkeley Unix way to match the last regular expression
+compiled is to call `re_exec'.  This takes a single argument, the
+address of the string to be matched.  This string is assumed to be
+terminated by a null character.  Matching is tried starting at each
+position in the string.  `re_exec' returns `1' for success or `0' for
+failure.  One cannot find out how long a substring was matched, nor
+what the `\( ... \)' groupings matched.
+
+
+
+Tag Table:
+Node: top85
+Node: syntax1706
+Node: directives3241
+Node: emacs10891
+Node: programming11653
+Node: compiling12381
+Node: matching14827
+Node: searching16269
+Node: translation18534
+Node: registers19952
+Node: split21245
+Node: unix22183
+
+End Tag Table