4.12 Regular expressions
Regular expressions can be used in cfengine in connection with
editfiles
and processes
to search for lines matching
certain expressions. A regular expression is a generalized wildcard. In
cfengine wildcards, you can use the characters '*' and '?' to match any
character or number of characters. Regular expressions are more
complicated than wildcards, but have far more flexibility.
NOTE: the special characters * and ?
used in wildcards do not have the
same meanings as regular expressions!.
Some regular expressions match only a single string. For example, every
string which contains no special characters is a regular expression
which matches only a string identical to itself. Thus the regular
expression cfengine would match only the string "cfengine", not
"Cfengine" or "cfengin" etc. Other regular expressions could match more
general strings. For instance, the regular expression c* matches
any number of c's (including none). Thus this expression would match the
empty string, "c", "cccc", "ccccccccc", but not "cccx".
Here is a list of regular expression special characters and operators.
- \
- The backslash character normally has a special purpose: either to
introduce a special command, or to tell the expression interpreter that
the next character is not to be treated as a special character.
The backslash character stands for itself only when protected by square
brackets
[\]
or quoted with a backslash itself \\.
- \b
- Matches word boundary operator.
- \B
- Match within a word (operator).
- \<
- Match beginning of word.
- \>
- Match end of word.
- \w
- Match a character which can be part of a word.
- \W
- Match a character which cannot be part of a word.
- any character
- Matches itself.
- .
- Matches any character
- *
- Match zero or more instances of the previous object. e.g. c*.
If no object precedes it, it represents a literal asterisk.
- +
- Match one or more instances of the preceding object.
- ?
- Match zero or one instance of the preceding object.
- { }
- Number of matches operator. {5} would match exactly 5
instances of the previous object. {6,} would match at least
6 instances of the previous object. {7,12} would match at least
7 instances of, but no more than 12 instances of the preceding object.
Clearly the first number must be less than the second to make a valid
search expression.
- |
- The logical OR operator, OR's any two regular expressions.
- [list]
- Defines a list of characters which are to be considered as a single
object (ORed). e.g. [a-z] matches any character in the range a to
z, abcd matches either a, b, c or d. Most characters are
ordinary inside a list, but there are some exceptions: ] ends the
list unless it is the first item, \ quotes the next character,
[: and :] define a character class operator (see below),
and - represents a range of characters unless it is the first
or last character in the list.
- [^list]
- Defines a list of characters which are NOT to be matched. i.e.
match any character except those in the list.
- [:class:]
- Defines a class of characters, using the ctype-library.
alnum
- Alpha numeric character
alpha
- An alphabetic character
blank
- A space or a TAB
cntrl
- A control character.
digit
- 0-9
graph
- same as print, without space
lower
- a lower case letter
print
- printable characters (non control characters)
punct
- neither control nor alphanumeric symbols
space
- space, carriage return, line-feed, vertical tab and form-feed.
upper
- upper case letter
xdigit
- a hexadecimal digit 0-9, a-f
- ( )
- Groups together any number of operators.
- \digit
- Back-reference operator (refer to the GNU regex documentation).
- ^
- Match start of a line.
- $
- Match the end of a line.
Here is a few examples. Remember that some commands look for
a regular expression match of part of a string, while others
require a match of the entire string (see Reference manual).
^# match string beginning with the # symbol
^[^#] match string not beginning with the # symbol
^[A-Z].+ match a string beginning with an uppercase letter
followed by at least one other character