Unlike commonly used regexp libraries, regular expressions are not strings: instead a first class syntax is used to define them.
Felix allows you to name regular expressions with the syntax:
regexp <name> = <regexp> ;The name is an identifier. A string used in a regexp stands for a match of each character of the string in sequence. The following symbols are special, and are given from weakest to strongest binding order:
symbol | syntax | meaning |
---|---|---|
| | infix | alternatives |
* | postfix | 0 or more occurences |
+ | postfix | 1 or more occurences |
? | postfix | 0 or 1 occurences |
<juxtaposition> | infix | concatenation |
<name> | atomic | re denoted by the name in a REGEXP definition |
<string> | atomic | sequence of chars of the string |
[<charset>] | atomic | any char of the charset |
[^<charset>] | atomic | any char not in the charset |
. | atomic | any char other than end of line |
_ | atomic | any char |
eof | atomic | end marker |
(<regexp>) | atomic | brackets |
symbol | meaning |
---|---|
<string> | any character in the string |
<char>-<char> | any between or including the two chars |
1: include "std"; 2: regexp lower = ["abcdefghijklmnopqrstuvwxyz"]; 3: regexp upper = ["ABCDEFGHIJKLMNOPQRSTUVWXYZ"]; 4: regexp digit = ["0123456789"]; 5: regexp alpha = lower | upper | "_"; 6: regexp id = alpha (alpha | digit) *; 7:
8: print 9: regmatch "identifier" with 10: | digit+ => "Number" 11: | id => "Identifier" 12: endmatch 13: ; 14: endl; 15: 16: print 17: regmatch "9999" with 18: | digit+ => "Number" 19: | id => "Identifier" 20: endmatch 21: ; 22: endl; 23: 24: print 25: regmatch "999xxx" with 26: | digit+ => "Number" 27: | id => "Identifier" 28: | _* => "Neither" 29: endmatch 30: ; 31: endl; 32: 33:
Note: the generated code is *extremely* fast, within one or two memory fetches of the fastest possible code. here is the generated code for the inner loop of a regmatch:
while(state && start != end) state = matrix[*start++][state];