Regular expression matching with DFA-based engine
This module provides a deterministic finite automaton (DFA) based regular expression engine with support for Unicode, capture groups, and anchors. The implementation uses Thompson's NFA construction followed by subset construction to build an efficient DFA.
Supported Features
- Literals:
abc, Unicode characters - Character classes:
[a-z],[^0-9],[α-ω] - Predefined classes:
\d(digits),\w(word),\s(whitespace),.(any except newline) - Quantifiers:
*(zero or more),+(one or more),?(zero or one) - Bounded repetition:
{n},{n,},{n,m} - Alternation:
foo|bar - Grouping:
(foo)capturing,(?:foo)non-capturing - Anchors:
^(start of string),$(end of string) - Escape sequences:
\t,\n,\r,\xHH,\uHHHH
Limitations
- No backreferences (this is a regular language engine)
- No lookahead/lookbehind assertions
- No word boundaries (
\b,\B) - No Unicode property classes
Examples
Basic matching
use std::regex::Regex;
let re = Regex::compile("hello").unwrap();
defer re.free();
assert!(re.is_match("hello world"));
assert!(!re.is_match("goodbye"));
Run this example
Character classes and quantifiers
use std::regex::Regex;
let re = Regex::compile("[0-9]+").unwrap();
defer re.free();
assert!(re.is_match("123"));
assert!(!re.is_match("abc"));
Run this example
Capture groups
use std::regex::Regex;
let re = Regex::compile("(\\w+)@(\\w+\\.\\w+)").unwrap();
defer re.free();
let caps = re.captures("user@example.com");
assert!(caps.is_some());
let caps = caps.unwrap();
defer caps.free();
assert_eq!(caps.get(1).unwrap(), "user");
assert_eq!(caps.get(2).unwrap(), "example.com");
Run this example
Anchors
use std::regex::Regex;
let re = Regex::compile("^start").unwrap();
defer re.free();
assert!(re.is_match("start of line"));
assert!(!re.is_match("not at start"));
Run this example