Regular expressions (regex) are powerful tools for finding patterns in strings. They’re incredibly useful for validation, ensuring data is in a particular format, and text processing. Compilers use regular expressions to validate program syntax, and web developers use them for everything from email validation to URL parsing.
What Are Regular Expressions?
At their core, regular expressions are a mini-language for describing text patterns. Instead of looking for exact matches, you describe the pattern you want to find, and the regex engine finds all strings that match that pattern.
Character Classes
Character classes let you match specific types of characters:
Predefined Character Classes
\d- Matches any digit (0-9)\D- Matches any non-digit\w- Matches any word character (letters, digits, underscore)\W- Matches any non-word character\s- Matches any whitespace (spaces, tabs, newlines)\S- Matches any non-whitespace
Custom Character Classes
[aeiou]- Matches any vowel[0-9]- Matches any digit (same as\d)[a-z]- Matches any lowercase letter[A-Z]- Matches any uppercase letter[0-35-9]- Matches digits 0-3 or 5-9 (excludes 4)
Negated Character Classes
[^4]- Matches any character except 4[^aeiou]- Matches any consonant[^0-9]- Matches any non-digit (same as\D)
Quantifiers
Quantifiers specify how many times a pattern should match:
*- Matches zero or more occurrences+- Matches one or more occurrences?- Matches zero or one occurrence (optional){n}- Matches exactly n occurrences{n,}- Matches at least n occurrences{n,m}- Matches between n and m occurrences (inclusive)
Quantifier Examples
A*- Matches “”, “A”, “AA”, “AAA”, etc.A+- Matches “A”, “AA”, “AAA”, etc. (but not empty string)A?- Matches “” or “A”A{3}- Matches exactly “AAA”A{2,4}- Matches “AA”, “AAA”, or “AAAA”
Special Characters
The Dot (.)
.- Matches any single character except newline.*- Matches any number of characters (except newlines).+- Matches one or more of any character
Anchors
^- Matches the beginning of a string$- Matches the end of a string^A- String must start with “A”Z$- String must end with “Z”^A.*Z$- String starts with “A” and ends with “Z”
Practical Examples
Email Validation (Basic)
\w+@\w+\.\w+
Matches: user@domain.com
Phone Number (US Format)
\d{3}-\d{3}-\d{4}
Matches: 555-123-4567
Postal Code (Canadian)
[A-Z]\d[A-Z] \d[A-Z]\d
Matches: T2P 1J9, V6B 2W9
Postal Code (Canadian, Flexible)
[A-Z]\d[A-Z]\s?\d[A-Z]\d
Matches: T2P1J9 or T2P 1J9
Finding Words
\b\w+\b
Matches individual words (using word boundaries)
IP Address (Simple)
\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
Matches: 192.168.1.1
Hexadecimal Colors
#[0-9A-Fa-f]{6}
Matches: #FF5733, #a1b2c3
Date Format (MM/DD/YYYY)
\d{2}/\d{2}/\d{4}
Matches: 12/25/2007
Username Validation
^[a-zA-Z0-9_]{3,16}$
Matches usernames 3-16 characters, letters/numbers/underscore only
Tips for Writing Better Regex
Start Simple
Begin with basic patterns and build complexity gradually:
\d(any digit)\d+(one or more digits)\d{3}(exactly three digits)\d{3}-\d{3}-\d{4}(phone number pattern)
Test Your Patterns
Always test regex patterns with various inputs, including:
- Valid examples that should match
- Invalid examples that shouldn’t match
- Edge cases (empty strings, very long strings)
Be Specific
- Use
\dinstead of[0-9]for digits - Use
^and `# Regular Expressions: Pattern Matching Guide
Regular expressions (regex) are powerful tools for finding patterns in strings. They’re incredibly useful for validation, ensuring data is in a particular format, and text processing. Compilers use regular expressions to validate program syntax, and web developers use them for everything from email validation to URL parsing.
What Are Regular Expressions?
At their core, regular expressions are a mini-language for describing text patterns. Instead of looking for exact matches, you describe the pattern you want to find, and the regex engine finds all strings that match that pattern.
Character Classes
Character classes let you match specific types of characters:
Predefined Character Classes
\d- Matches any digit (0-9)\D- Matches any non-digit\w- Matches any word character (letters, digits, underscore)\W- Matches any non-word character\s- Matches any whitespace (spaces, tabs, newlines)\S- Matches any non-whitespace
Custom Character Classes
[aeiou]- Matches any vowel[0-9]- Matches any digit (same as\d)[a-z]- Matches any lowercase letter[A-Z]- Matches any uppercase letter[0-35-9]- Matches digits 0-3 or 5-9 (excludes 4)
Negated Character Classes
[^4]- Matches any character except 4[^aeiou]- Matches any consonant[^0-9]- Matches any non-digit (same as\D)
Quantifiers
Quantifiers specify how many times a pattern should match:
*- Matches zero or more occurrences+- Matches one or more occurrences?- Matches zero or one occurrence (optional){n}- Matches exactly n occurrences{n,}- Matches at least n occurrences{n,m}- Matches between n and m occurrences (inclusive)
Quantifier Examples
A*- Matches “”, “A”, “AA”, “AAA”, etc.A+- Matches “A”, “AA”, “AAA”, etc. (but not empty string)A?- Matches “” or “A”A{3}- Matches exactly “AAA”A{2,4}- Matches “AA”, “AAA”, or “AAAA”
Special Characters
The Dot (.)
.- Matches any single character except newline.*- Matches any number of characters (except newlines).+- Matches one or more of any character
Anchors
^- Matches the beginning of a string$- Matches the end of a string^A- String must start with “A”Z$- String must end with “Z”^A.*Z$- String starts with “A” and ends with “Z”
Practical Examples
Email Validation (Basic)
\w+@\w+\.\w+
Matches: user@domain.com
Phone Number (US Format)
\d{3}-\d{3}-\d{4}
Matches: 555-123-4567
Postal Code (US ZIP)
\d{5}(-\d{4})?
Matches: 12345 or 12345-6789
Finding Words
\b\w+\b
Matches individual words (using word boundaries)
IP Address (Simple)
\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
Matches: 192.168.1.1
Hexadecimal Colors
#[0-9A-Fa-f]{6}
Matches: #FF5733, #a1b2c3
Date Format (MM/DD/YYYY)
\d{2}/\d{2}/\d{4}
Matches: 12/25/2007
Username Validation
^[a-zA-Z0-9_]{3,16}$
Matches usernames 3-16 characters, letters/numbers/underscore only
anchors to match entire strings
- Consider word boundaries
\bwhen matching whole words
Common Pattern Building Blocks
- Start of string:
^pattern - End of string: `pattern# Regular Expressions: Pattern Matching Guide
Regular expressions (regex) are powerful tools for finding patterns in strings. They’re incredibly useful for validation, ensuring data is in a particular format, and text processing. Compilers use regular expressions to validate program syntax, and web developers use them for everything from email validation to URL parsing.
What Are Regular Expressions?
At their core, regular expressions are a mini-language for describing text patterns. Instead of looking for exact matches, you describe the pattern you want to find, and the regex engine finds all strings that match that pattern.
Character Classes
Character classes let you match specific types of characters:
Predefined Character Classes
\d- Matches any digit (0-9)\D- Matches any non-digit\w- Matches any word character (letters, digits, underscore)\W- Matches any non-word character\s- Matches any whitespace (spaces, tabs, newlines)\S- Matches any non-whitespace
Custom Character Classes
[aeiou]- Matches any vowel[0-9]- Matches any digit (same as\d)[a-z]- Matches any lowercase letter[A-Z]- Matches any uppercase letter[0-35-9]- Matches digits 0-3 or 5-9 (excludes 4)
Negated Character Classes
[^4]- Matches any character except 4[^aeiou]- Matches any consonant[^0-9]- Matches any non-digit (same as\D)
Quantifiers
Quantifiers specify how many times a pattern should match:
*- Matches zero or more occurrences+- Matches one or more occurrences?- Matches zero or one occurrence (optional){n}- Matches exactly n occurrences{n,}- Matches at least n occurrences{n,m}- Matches between n and m occurrences (inclusive)
Quantifier Examples
A*- Matches “”, “A”, “AA”, “AAA”, etc.A+- Matches “A”, “AA”, “AAA”, etc. (but not empty string)A?- Matches “” or “A”A{3}- Matches exactly “AAA”A{2,4}- Matches “AA”, “AAA”, or “AAAA”
Special Characters
The Dot (.)
.- Matches any single character except newline.*- Matches any number of characters (except newlines).+- Matches one or more of any character
Anchors
^- Matches the beginning of a string$- Matches the end of a string^A- String must start with “A”Z$- String must end with “Z”^A.*Z$- String starts with “A” and ends with “Z”
Practical Examples
Email Validation (Basic)
\w+@\w+\.\w+
Matches: user@domain.com
Phone Number (US Format)
\d{3}-\d{3}-\d{4}
Matches: 555-123-4567
Postal Code (US ZIP)
\d{5}(-\d{4})?
Matches: 12345 or 12345-6789
Finding Words
\b\w+\b
Matches individual words (using word boundaries)
IP Address (Simple)
\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
Matches: 192.168.1.1
Hexadecimal Colors
#[0-9A-Fa-f]{6}
Matches: #FF5733, #a1b2c3
Date Format (MM/DD/YYYY)
\d{2}/\d{2}/\d{4}
Matches: 12/25/2007
Username Validation
^[a-zA-Z0-9_]{3,16}$
Matches usernames 3-16 characters, letters/numbers/underscore only
- Entire string: `^pattern# Regular Expressions: Pattern Matching Guide
Regular expressions (regex) are powerful tools for finding patterns in strings. They’re incredibly useful for validation, ensuring data is in a particular format, and text processing. Compilers use regular expressions to validate program syntax, and web developers use them for everything from email validation to URL parsing.
What Are Regular Expressions?
At their core, regular expressions are a mini-language for describing text patterns. Instead of looking for exact matches, you describe the pattern you want to find, and the regex engine finds all strings that match that pattern.
Character Classes
Character classes let you match specific types of characters:
Predefined Character Classes
\d- Matches any digit (0-9)\D- Matches any non-digit\w- Matches any word character (letters, digits, underscore)\W- Matches any non-word character\s- Matches any whitespace (spaces, tabs, newlines)\S- Matches any non-whitespace
Custom Character Classes
[aeiou]- Matches any vowel[0-9]- Matches any digit (same as\d)[a-z]- Matches any lowercase letter[A-Z]- Matches any uppercase letter[0-35-9]- Matches digits 0-3 or 5-9 (excludes 4)
Negated Character Classes
[^4]- Matches any character except 4[^aeiou]- Matches any consonant[^0-9]- Matches any non-digit (same as\D)
Quantifiers
Quantifiers specify how many times a pattern should match:
*- Matches zero or more occurrences+- Matches one or more occurrences?- Matches zero or one occurrence (optional){n}- Matches exactly n occurrences{n,}- Matches at least n occurrences{n,m}- Matches between n and m occurrences (inclusive)
Quantifier Examples
A*- Matches “”, “A”, “AA”, “AAA”, etc.A+- Matches “A”, “AA”, “AAA”, etc. (but not empty string)A?- Matches “” or “A”A{3}- Matches exactly “AAA”A{2,4}- Matches “AA”, “AAA”, or “AAAA”
Special Characters
The Dot (.)
.- Matches any single character except newline.*- Matches any number of characters (except newlines).+- Matches one or more of any character
Anchors
^- Matches the beginning of a string$- Matches the end of a string^A- String must start with “A”Z$- String must end with “Z”^A.*Z$- String starts with “A” and ends with “Z”
Practical Examples
Email Validation (Basic)
\w+@\w+\.\w+
Matches: user@domain.com
Phone Number (US Format)
\d{3}-\d{3}-\d{4}
Matches: 555-123-4567
Postal Code (US ZIP)
\d{5}(-\d{4})?
Matches: 12345 or 12345-6789
Finding Words
\b\w+\b
Matches individual words (using word boundaries)
IP Address (Simple)
\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
Matches: 192.168.1.1
Hexadecimal Colors
#[0-9A-Fa-f]{6}
Matches: #FF5733, #a1b2c3
Date Format (MM/DD/YYYY)
\d{2}/\d{2}/\d{4}
Matches: 12/25/2007
Username Validation
^[a-zA-Z0-9_]{3,16}$
Matches usernames 3-16 characters, letters/numbers/underscore only
- Optional group:
(pattern)? - Either/or:
(pattern1|pattern2)
Common Pitfalls
Greedy vs. Non-Greedy
.*is greedy (matches as much as possible).*?is non-greedy (matches as little as possible)
Escaping Special Characters
To match literal special characters, escape them:
\.- Matches literal period\*- Matches literal asterisk\?- Matches literal question mark
Case Sensitivity
Most regex engines are case-sensitive by default:
[a-z]- Only lowercase letters[A-Za-z]- Both upper and lowercase- Use case-insensitive flags when available
Security Considerations
ReDoS (Regular Expression Denial of Service)
Certain regex patterns can cause exponential backtracking, leading to performance issues or denial of service attacks.
Vulnerable Patterns
These patterns can be exploited with malicious input:
(a+)+
(a|a)*
(a|b)*a
^(a+)+$
How ReDoS Works
When given input like aaaaaaaaaaaaaaaaaaaaX, the regex engine tries many combinations before failing, consuming excessive CPU time.
Safe Alternatives
- Vulnerable:
(a+)+ -
Safe:
a+ - Vulnerable:
(a|b)*a - Safe:
[ab]*a
Input Validation Bypass
Regex for validation can sometimes be bypassed with unexpected input.
Common Bypass Techniques
- Newline injection: Many regex engines treat
^and `# Regular Expressions: Pattern Matching Guide
Regular expressions (regex) are powerful tools for finding patterns in strings. They’re incredibly useful for validation, ensuring data is in a particular format, and text processing. Compilers use regular expressions to validate program syntax, and web developers use them for everything from email validation to URL parsing.
What Are Regular Expressions?
At their core, regular expressions are a mini-language for describing text patterns. Instead of looking for exact matches, you describe the pattern you want to find, and the regex engine finds all strings that match that pattern.
Character Classes
Character classes let you match specific types of characters:
Predefined Character Classes
\d- Matches any digit (0-9)\D- Matches any non-digit\w- Matches any word character (letters, digits, underscore)\W- Matches any non-word character\s- Matches any whitespace (spaces, tabs, newlines)\S- Matches any non-whitespace
Custom Character Classes
[aeiou]- Matches any vowel[0-9]- Matches any digit (same as\d)[a-z]- Matches any lowercase letter[A-Z]- Matches any uppercase letter[0-35-9]- Matches digits 0-3 or 5-9 (excludes 4)
Negated Character Classes
[^4]- Matches any character except 4[^aeiou]- Matches any consonant[^0-9]- Matches any non-digit (same as\D)
Quantifiers
Quantifiers specify how many times a pattern should match:
*- Matches zero or more occurrences+- Matches one or more occurrences?- Matches zero or one occurrence (optional){n}- Matches exactly n occurrences{n,}- Matches at least n occurrences{n,m}- Matches between n and m occurrences (inclusive)
Quantifier Examples
A*- Matches “”, “A”, “AA”, “AAA”, etc.A+- Matches “A”, “AA”, “AAA”, etc. (but not empty string)A?- Matches “” or “A”A{3}- Matches exactly “AAA”A{2,4}- Matches “AA”, “AAA”, or “AAAA”
Special Characters
The Dot (.)
.- Matches any single character except newline.*- Matches any number of characters (except newlines).+- Matches one or more of any character
Anchors
^- Matches the beginning of a string$- Matches the end of a string^A- String must start with “A”Z$- String must end with “Z”^A.*Z$- String starts with “A” and ends with “Z”
Practical Examples
Email Validation (Basic)
\w+@\w+\.\w+
Matches: user@domain.com
Phone Number (US Format)
\d{3}-\d{3}-\d{4}
Matches: 555-123-4567
Postal Code (Canadian)
[A-Z]\d[A-Z] \d[A-Z]\d
Matches: T2P 1J9, V6B 2W9
Postal Code (Canadian, Flexible)
[A-Z]\d[A-Z]\s?\d[A-Z]\d
Matches: T2P1J9 or T2P 1J9
Finding Words
\b\w+\b
Matches individual words (using word boundaries)
IP Address (Simple)
\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
Matches: 192.168.1.1
Hexadecimal Colors
#[0-9A-Fa-f]{6}
Matches: #FF5733, #a1b2c3
Date Format (MM/DD/YYYY)
\d{2}/\d{2}/\d{4}
Matches: 12/25/2007
Username Validation
^[a-zA-Z0-9_]{3,16}$
Matches usernames 3-16 characters, letters/numbers/underscore only
Tips for Writing Better Regex
Start Simple
Begin with basic patterns and build complexity gradually:
\d(any digit)\d+(one or more digits)\d{3}(exactly three digits)\d{3}-\d{3}-\d{4}(phone number pattern)
Test Your Patterns
Always test regex patterns with various inputs, including:
- Valid examples that should match
- Invalid examples that shouldn’t match
- Edge cases (empty strings, very long strings)
Be Specific
- Use
\dinstead of[0-9]for digits - Use
^and `# Regular Expressions: Pattern Matching Guide
Regular expressions (regex) are powerful tools for finding patterns in strings. They’re incredibly useful for validation, ensuring data is in a particular format, and text processing. Compilers use regular expressions to validate program syntax, and web developers use them for everything from email validation to URL parsing.
What Are Regular Expressions?
At their core, regular expressions are a mini-language for describing text patterns. Instead of looking for exact matches, you describe the pattern you want to find, and the regex engine finds all strings that match that pattern.
Character Classes
Character classes let you match specific types of characters:
Predefined Character Classes
\d- Matches any digit (0-9)\D- Matches any non-digit\w- Matches any word character (letters, digits, underscore)\W- Matches any non-word character\s- Matches any whitespace (spaces, tabs, newlines)\S- Matches any non-whitespace
Custom Character Classes
[aeiou]- Matches any vowel[0-9]- Matches any digit (same as\d)[a-z]- Matches any lowercase letter[A-Z]- Matches any uppercase letter[0-35-9]- Matches digits 0-3 or 5-9 (excludes 4)
Negated Character Classes
[^4]- Matches any character except 4[^aeiou]- Matches any consonant[^0-9]- Matches any non-digit (same as\D)
Quantifiers
Quantifiers specify how many times a pattern should match:
*- Matches zero or more occurrences+- Matches one or more occurrences?- Matches zero or one occurrence (optional){n}- Matches exactly n occurrences{n,}- Matches at least n occurrences{n,m}- Matches between n and m occurrences (inclusive)
Quantifier Examples
A*- Matches “”, “A”, “AA”, “AAA”, etc.A+- Matches “A”, “AA”, “AAA”, etc. (but not empty string)A?- Matches “” or “A”A{3}- Matches exactly “AAA”A{2,4}- Matches “AA”, “AAA”, or “AAAA”
Special Characters
The Dot (.)
.- Matches any single character except newline.*- Matches any number of characters (except newlines).+- Matches one or more of any character
Anchors
^- Matches the beginning of a string$- Matches the end of a string^A- String must start with “A”Z$- String must end with “Z”^A.*Z$- String starts with “A” and ends with “Z”
Practical Examples
Email Validation (Basic)
\w+@\w+\.\w+
Matches: user@domain.com
Phone Number (US Format)
\d{3}-\d{3}-\d{4}
Matches: 555-123-4567
Postal Code (US ZIP)
\d{5}(-\d{4})?
Matches: 12345 or 12345-6789
Finding Words
\b\w+\b
Matches individual words (using word boundaries)
IP Address (Simple)
\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
Matches: 192.168.1.1
Hexadecimal Colors
#[0-9A-Fa-f]{6}
Matches: #FF5733, #a1b2c3
Date Format (MM/DD/YYYY)
\d{2}/\d{2}/\d{4}
Matches: 12/25/2007
Username Validation
^[a-zA-Z0-9_]{3,16}$
Matches usernames 3-16 characters, letters/numbers/underscore only
anchors to match entire strings
- Consider word boundaries
\bwhen matching whole words
Common Pattern Building Blocks
- Start of string:
^pattern - End of string: `pattern# Regular Expressions: Pattern Matching Guide
Regular expressions (regex) are powerful tools for finding patterns in strings. They’re incredibly useful for validation, ensuring data is in a particular format, and text processing. Compilers use regular expressions to validate program syntax, and web developers use them for everything from email validation to URL parsing.
What Are Regular Expressions?
At their core, regular expressions are a mini-language for describing text patterns. Instead of looking for exact matches, you describe the pattern you want to find, and the regex engine finds all strings that match that pattern.
Character Classes
Character classes let you match specific types of characters:
Predefined Character Classes
\d- Matches any digit (0-9)\D- Matches any non-digit\w- Matches any word character (letters, digits, underscore)\W- Matches any non-word character\s- Matches any whitespace (spaces, tabs, newlines)\S- Matches any non-whitespace
Custom Character Classes
[aeiou]- Matches any vowel[0-9]- Matches any digit (same as\d)[a-z]- Matches any lowercase letter[A-Z]- Matches any uppercase letter[0-35-9]- Matches digits 0-3 or 5-9 (excludes 4)
Negated Character Classes
[^4]- Matches any character except 4[^aeiou]- Matches any consonant[^0-9]- Matches any non-digit (same as\D)
Quantifiers
Quantifiers specify how many times a pattern should match:
*- Matches zero or more occurrences+- Matches one or more occurrences?- Matches zero or one occurrence (optional){n}- Matches exactly n occurrences{n,}- Matches at least n occurrences{n,m}- Matches between n and m occurrences (inclusive)
Quantifier Examples
A*- Matches “”, “A”, “AA”, “AAA”, etc.A+- Matches “A”, “AA”, “AAA”, etc. (but not empty string)A?- Matches “” or “A”A{3}- Matches exactly “AAA”A{2,4}- Matches “AA”, “AAA”, or “AAAA”
Special Characters
The Dot (.)
.- Matches any single character except newline.*- Matches any number of characters (except newlines).+- Matches one or more of any character
Anchors
^- Matches the beginning of a string$- Matches the end of a string^A- String must start with “A”Z$- String must end with “Z”^A.*Z$- String starts with “A” and ends with “Z”
Practical Examples
Email Validation (Basic)
\w+@\w+\.\w+
Matches: user@domain.com
Phone Number (US Format)
\d{3}-\d{3}-\d{4}
Matches: 555-123-4567
Postal Code (US ZIP)
\d{5}(-\d{4})?
Matches: 12345 or 12345-6789
Finding Words
\b\w+\b
Matches individual words (using word boundaries)
IP Address (Simple)
\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
Matches: 192.168.1.1
Hexadecimal Colors
#[0-9A-Fa-f]{6}
Matches: #FF5733, #a1b2c3
Date Format (MM/DD/YYYY)
\d{2}/\d{2}/\d{4}
Matches: 12/25/2007
Username Validation
^[a-zA-Z0-9_]{3,16}$
Matches usernames 3-16 characters, letters/numbers/underscore only
- Entire string: `^pattern# Regular Expressions: Pattern Matching Guide
Regular expressions (regex) are powerful tools for finding patterns in strings. They’re incredibly useful for validation, ensuring data is in a particular format, and text processing. Compilers use regular expressions to validate program syntax, and web developers use them for everything from email validation to URL parsing.
What Are Regular Expressions?
At their core, regular expressions are a mini-language for describing text patterns. Instead of looking for exact matches, you describe the pattern you want to find, and the regex engine finds all strings that match that pattern.
Character Classes
Character classes let you match specific types of characters:
Predefined Character Classes
\d- Matches any digit (0-9)\D- Matches any non-digit\w- Matches any word character (letters, digits, underscore)\W- Matches any non-word character\s- Matches any whitespace (spaces, tabs, newlines)\S- Matches any non-whitespace
Custom Character Classes
[aeiou]- Matches any vowel[0-9]- Matches any digit (same as\d)[a-z]- Matches any lowercase letter[A-Z]- Matches any uppercase letter[0-35-9]- Matches digits 0-3 or 5-9 (excludes 4)
Negated Character Classes
[^4]- Matches any character except 4[^aeiou]- Matches any consonant[^0-9]- Matches any non-digit (same as\D)
Quantifiers
Quantifiers specify how many times a pattern should match:
*- Matches zero or more occurrences+- Matches one or more occurrences?- Matches zero or one occurrence (optional){n}- Matches exactly n occurrences{n,}- Matches at least n occurrences{n,m}- Matches between n and m occurrences (inclusive)
Quantifier Examples
A*- Matches “”, “A”, “AA”, “AAA”, etc.A+- Matches “A”, “AA”, “AAA”, etc. (but not empty string)A?- Matches “” or “A”A{3}- Matches exactly “AAA”A{2,4}- Matches “AA”, “AAA”, or “AAAA”
Special Characters
The Dot (.)
.- Matches any single character except newline.*- Matches any number of characters (except newlines).+- Matches one or more of any character
Anchors
^- Matches the beginning of a string$- Matches the end of a string^A- String must start with “A”Z$- String must end with “Z”^A.*Z$- String starts with “A” and ends with “Z”
Practical Examples
Email Validation (Basic)
\w+@\w+\.\w+
Matches: user@domain.com
Phone Number (US Format)
\d{3}-\d{3}-\d{4}
Matches: 555-123-4567
Postal Code (US ZIP)
\d{5}(-\d{4})?
Matches: 12345 or 12345-6789
Finding Words
\b\w+\b
Matches individual words (using word boundaries)
IP Address (Simple)
\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
Matches: 192.168.1.1
Hexadecimal Colors
#[0-9A-Fa-f]{6}
Matches: #FF5733, #a1b2c3
Date Format (MM/DD/YYYY)
\d{2}/\d{2}/\d{4}
Matches: 12/25/2007
Username Validation
^[a-zA-Z0-9_]{3,16}$
Matches usernames 3-16 characters, letters/numbers/underscore only
- Optional group:
(pattern)? - Either/or:
(pattern1|pattern2)
Common Pitfalls
Greedy vs. Non-Greedy
.*is greedy (matches as much as possible).*?is non-greedy (matches as little as possible)
Escaping Special Characters
To match literal special characters, escape them:
\.- Matches literal period\*- Matches literal asterisk\?- Matches literal question mark
Case Sensitivity
Most regex engines are case-sensitive by default:
[a-z]- Only lowercase letters[A-Za-z]- Both upper and lowercase- Use case-insensitive flags when available
as line boundaries, not string boundaries
- Case sensitivity:
[a-z]doesn’t match uppercase letters - Unicode issues:
\wmight not handle international characters as expected
Safer Validation Practices
- Use
\Aand\zfor true string start/end (language dependent) - Consider case-insensitive matching when appropriate
- Test with various character encodings and special characters
- Validate both format AND content length limits
Best Practices for Security
- Avoid complex nested quantifiers
- Test with long, malformed input
- Set timeouts for regex operations
- Use specific character classes instead of broad ones
- Validate input length before applying regex
- Consider using dedicated parsers for complex formats
When NOT to Use Regex
Regular expressions aren’t always the best tool:
- Complex parsing (use proper parsers for HTML, XML, JSON)
- Simple string operations (use built-in string methods)
- Performance-critical code (regex can be slow on large inputs)
Remember: “Some people, when confronted with a problem, think ‘I know, I’ll use regular expressions.’ Now they have two problems.” Use regex when appropriate, but don’t force it where simpler solutions exist.
Resources for Learning and Testing
Online Regex Testing
Rubular (https://rubular.com/) is an excellent online regex tester that provides:
- Real-time pattern testing as you type
- Clear highlighting of matches in your test string
- Ruby-based regex engine (but patterns work across most languages)
- Instant feedback on pattern syntax errors
- Ability to save and share regex patterns
Using Rubular Effectively
- Start with simple test strings - Enter basic examples of what you want to match
- Build patterns incrementally - Add one piece at a time and watch the matches update
- Test edge cases - Add test strings that should NOT match to verify your pattern
- Use the quick reference - Rubular provides a handy cheat sheet on the right side
- Save useful patterns - Bookmark or save patterns you’ll use again
Other Testing Resources
- Online regex testers with different engines
- Language-specific regex documentation
- Practice with real-world examples
- Start with simple patterns and gradually increase complexity
Regular expressions are incredibly powerful once you understand the basics. Using tools like Rubular to practice with real examples makes learning much easier, and don’t be afraid to start simple and build up to more complex patterns.