#
## Character Classes
And "assertions". These shortcuts cannot be used inside of a standard character class definition `[]`.
| PCRE | POSIX | Standard | Info |
| ---- | ------------- | --------------- | -------------------------------------------- |
| `\A` | | | Start of Subject |
| `\b` | | | Word Boundary |
| `\B` | | | Not Word Boundary |
| `\d` | `[[:digit:]]` | `[0-9]` | Digit |
| `\D` | | `[^0-9]` | Not a Digit |
| `\h` | | `[ \t]` | [[#Horizontal Space Characters]] [^1] |
| `\H` | | `[^ \t]` | Not [[#Horizontal Space Characters]] [^1]^1] |
| `\R` | | - | [[#Newline Sequences]] |
| `\s` | `[[:space:]]` | `[ \t\n\r]` | Whitespace [^1] |
| `\S` | `[[:graph:]]` | `[^ \t\n\r]` | Not Whitespace [^1] |
| `\v` | | `[\n\r]` | [[#Vertical Space Characters]] [^1] |
| `\V` | | `[^\n\r]` | Not [[#Vertical Space Characters]] [^1] |
| `\w` | `[[:word:]]` | `[A-Za-z0-9_]` | Word Character |
| `\W` | | `[^A-Za-z0-9_]` | Not Word Character |
| `\z` | | | End of Subject |
| `\Z` | | | End of Subject<br>before a newline |
[^1]: Some implementations will also match [[Unicode Standard|Unicode]] whitespace, not just the basic ASCII.
### Newline Sequences
```regex
(?>\r\n|\n|\x0b|\f|\r|\x85)
```
### Horizontal Space Characters
```
U+0009 Horizontal tab (HT)
U+0020 Space
U+00A0 Non-break space
U+1680 Ogham space mark
U+180E Mongolian vowel separator
U+2000 En quad
U+2001 Em quad
U+2002 En space
U+2003 Em space
U+2004 Three-per-em space
U+2005 Four-per-em space
U+2006 Six-per-em space
U+2007 Figure space
U+2008 Punctuation space
U+2009 Thin space
U+200A Hair space
U+202F Narrow no-break space
U+205F Medium mathematical space
U+3000 Ideographic space
```
### Vertical Space Characters
```
U+000A Linefeed (LF)
U+000B Vertical tab (VT)
U+000C Form feed (FF)
U+000D Carriage return (CR)
U+0085 Next line (NEL)
U+2028 Line separator
U+2029 Paragraph separator
```
## Character Escapes
If there is no escape, then most regex implementations will ignore the backslash.
See also: [[ASCII#Tables]]
| Escape | Name | Hex | | Info |
| ------ | ---- | ---- | --- | --------------- |
| `\0` | NUL | 0x00 | | Null |
| `\a` | BEL | 0x07 | | Bell |
| `\b` | BS | 0x08 | | Backspace |
| `\e` | ESC | 0x1B | | Escape |
| `\f` | FF | 0x0C | | Form Feed |
| `\n` | LF | 0x0A | | Line Feed |
| `\r` | CR | 0x0D | | Carriage Return |
| `\t` | HT | 0x09 | | Horizontal Tab |
| `\v` | VT | 0x0B | | Vertical Tab |
| `\z` | SUB | 0x1A | | Substitute |
```
| EOT | 0x04 | | End of Transmission |
| ---- | ---- | --- | ------------------- |
```
# Tips
## IPv4 Addresses
```regex
((25[0-5]|(2[0-4]|1\d|[1-9]|)\d)\.?){4}
```
## Match First Line of Document
```regex
^(?<!\n)
^(?<![\w\W])
^(?<![\s\S\r])
^(?<![\s\S\r])//\n
```
- `^` - start of a line
- `//` - a `//` substring
- `\n` - a line break
- `([\s\S\r]*)` - Group 1 (`$1`): any 0 or more chars as many as possible up to the file end.
The `^(?<![\s\S\r])//\n` regex means:
- `^(?<![\s\S\r])` - match the start of the first line only as `^` matches start of a line and `(?<![\s\S\r])` negative lookbehind fails the match if there is any 1 char immediately to the left of the current location
- `//\n` - `//` and a line break.
# Resources
## Analysis
- https://ihateregex.io
- Breaks regexes down into flowcharts
- Has a large library of example regexes
- https://regexr.com/
- Explains regexes token by token
- Has a solid collection of reference information
- Supports multiple implementations
- https://regex101.com
- Also has explanations, but less clear than Regexr
- Breaks out each match and capture group separately
# References
- https://pcre.org/original/doc/html/pcrepattern.html
## Application Specific
- https://stackoverflow.com/questions/60090875/match-the-beginning-of-a-file-in-a-vscode-regex-search
## Performance
- https://reidy-p.github.io/regex-performance/
- https://www.regular-expressions.info/catastrophic.html
## Libraries
- https://devopedia.org/regex-engines
- https://github.com/google/re2
## Regexes
- https://stackoverflow.com/questions/5284147/validating-ipv4-addresses-with-regexp