Regular Expressions Tutorial: A Beginner-Friendly Guide to Regex

March 31, 2026 · 12 min read

Table of Contents

What Are Regular Expressions and Why Learn Them?
Basic Syntax and Fundamentals
Character Classes and Ranges
Quantifiers: Controlling Match Repetition
Groups and Capturing
Alternation and Choice Operators
Lookahead and Lookbehind Assertions
Common Regex Patterns Reference Table
Using Regex Across Different Languages
Performance Optimization Tips
Testing and Debugging Regular Expressions
Frequently Asked Questions

What Are Regular Expressions and Why Learn Them?

Regular expressions (commonly abbreviated as regex or regexp) are powerful pattern-matching tools that allow you to search, validate, extract, and manipulate text using specialized syntax. Think of them as a sophisticated search language that goes far beyond simple "find and replace" operations.

Imagine you need to extract all email addresses from a file containing thousands of lines of log data, or validate that a user's phone number follows the correct format. Using traditional string manipulation methods would result in verbose, hard-to-maintain code. Regular expressions can accomplish these tasks with a single, concise pattern.

At their core, regular expressions define search patterns using a combination of literal characters and special metacharacters. These patterns can match simple strings like "cat" or complex structures like email addresses, URLs, or credit card numbers.

Why Should You Learn Regular Expressions?

Text Processing Efficiency: Regex can quickly process large volumes of text data and perform complex search and replace operations that would take dozens of lines of conventional code
Data Validation: Validating user input (emails, phone numbers, password strength) is a common requirement in web development, and regex provides elegant solutions
Data Extraction: Extract structured information from unstructured text, such as pulling links from web pages or error messages from logs
Cross-Platform Universality: Nearly every programming language and text editor supports regular expressions with similar syntax
Productivity Boost: Mastering regex can dramatically reduce the time spent writing repetitive code and performing manual text operations
Code Refactoring: Quickly find and modify patterns across entire codebases during refactoring projects

Real-World Applications

Regular expressions are used extensively across software development and data processing:

Web Form Validation: Ensuring emails, phone numbers, postal codes, and other user inputs match expected formats
Log Analysis: Parsing server logs to extract error messages, IP addresses, timestamps, and other relevant data
Text Editor Operations: Advanced search and replace in IDEs like VS Code, Sublime Text, or Vim
Web Scraping: Extracting specific data patterns from HTML content when building web crawlers
Configuration File Parsing: Reading and validating configuration files with specific syntax requirements
Data Cleaning: Standardizing inconsistent data formats in datasets before analysis
Security: Detecting malicious patterns in user input to prevent injection attacks

Pro tip: While regex is powerful, it's not always the best tool for every job. For parsing complex structured data like HTML or JSON, use dedicated parsers instead. Regex works best for pattern matching in plain text.

Basic Syntax and Fundamentals

Regular expressions consist of two types of characters: literal characters (which match themselves) and metacharacters (which have special meanings). Let's start with the fundamentals.

Literal Characters

The simplest regex is just plain text. The pattern cat will match the exact string "cat" in your text.

Text: "The cat sat on the mat"
Regex: cat
Matches: "The cat sat on the mat"

Literal characters are case-sensitive by default, so cat won't match "Cat" or "CAT" unless you use a case-insensitive flag.

The Dot Metacharacter (.)

The dot . is a wildcard that matches any single character except newline characters.

Text: "cat", "cot", "cut", "c@t"
Regex: c.t
Matches: All four strings

To match a literal dot character, escape it with a backslash: \.

Text: "file.txt"
Regex: file\.txt
Matches: "file.txt" (not "fileAtxt")

Anchors: Matching Positions

Anchors don't match characters—they match positions in the text.

Caret (^) - Start of Line: The ^ anchor matches the beginning of a string or line.

Text: "cat\ndog\ncat"
Regex: ^cat
Matches: Only the first "cat"

Dollar Sign ($) - End of Line: The $ anchor matches the end of a string or line.

Text: "cat\ndog\ncat"
Regex: cat$
Matches: Only the last "cat"

Combining Anchors: Use both to match entire lines.

Regex: ^cat$
Matches: Only lines containing exactly "cat" with nothing before or after

Word Boundaries (\b)

The \b anchor matches word boundaries—positions between word and non-word characters.

Text: "cat category caterpillar"
Regex: \bcat\b
Matches: Only the standalone word "cat"

This is incredibly useful for finding whole words without matching partial words.

Escape Sequences

Special characters in regex need to be escaped with a backslash to match them literally:

Special Characters	Escaped Form
. * + ? ^ $ { } [ ] ( ) \| \	\. \* \+ \? \^ \$ \{ \} \[ \] \\| \\

Example matching a price:

Regex: \$\d+\.\d{2}
Matches: "$19.99", "$5.00"

Character Classes and Ranges

Character classes let you define a set of characters and match any one of them. They're enclosed in square brackets.

Basic Character Classes

Square brackets [] create a character set that matches any single character inside.

Text: "cat", "cot", "cut", "cit"
Regex: c[aou]t
Matches: "cat", "cot", "cut" (not "cit")

Character Ranges

Use hyphens to define ranges of characters:

[a-z] - Any lowercase letter
[A-Z] - Any uppercase letter
[0-9] - Any digit
[a-zA-Z] - Any letter (upper or lower)
[a-z0-9] - Any letter or digit

Text: "a1", "b2", "c3", "d4"
Regex: [a-c][1-3]
Matches: "a1", "b2", "c3" (not "d4")

Negated Character Classes

Use a caret ^ at the start of a character class to negate it—matching any character NOT in the set.

Regex: [^0-9]
Matches: Any character that is NOT a digit

Text: "abc123def"
Regex: [^a-z]+
Matches: "123" (the sequence of non-lowercase letters)

Predefined Character Classes

Regex provides shorthand for common character classes:

Shorthand	Equivalent	Description
`\d`	`[0-9]`	Any digit
`\D`	`[^0-9]`	Any non-digit
`\w`	`[a-zA-Z0-9_]`	Any word character
`\W`	`[^a-zA-Z0-9_]`	Any non-word character
`\s`	`[ \t\n\r\f\v]`	Any whitespace character
`\S`	`[^ \t\n\r\f\v]`	Any non-whitespace character

Example matching a simple phone number:

Regex: \d{3}-\d{3}-\d{4}
Matches: "555-123-4567"

Quick tip: Uppercase versions of shorthand classes are always the negation of their lowercase counterparts. \d matches digits, \D matches non-digits.

Quantifiers: Controlling Match Repetition

Quantifiers specify how many times a character or group should be matched. They're placed after the element you want to repeat.

Basic Quantifiers

* - Zero or more times
+ - One or more times
? - Zero or one time (makes something optional)
{n} - Exactly n times
{n,} - At least n times
{n,m} - Between n and m times

Examples of Quantifiers in Action

Asterisk (*) - Zero or More:

Regex: ca*t
Matches: "ct", "cat", "caat", "caaat"

Plus (+) - One or More:

Regex: ca+t
Matches: "cat", "caat", "caaat" (not "ct")

Question Mark (?) - Optional:

Regex: colou?r
Matches: "color" and "colour"

Exact Count {n}:

Regex: \d{3}
Matches: Exactly three digits like "123"

Range {n,m}:

Regex: \d{2,4}
Matches: 2 to 4 digits like "12", "123", or "1234"

Greedy vs. Lazy Quantifiers

By default, quantifiers are greedy—they match as much text as possible. Adding ? after a quantifier makes it lazy (matching as little as possible).

Text: "<div>content</div><div>more</div>"
Regex (greedy): <div>.*</div>
Matches: "<div>content</div><div>more</div>" (entire string)

Regex (lazy): <div>.*?</div>
Matches: "<div>content</div>" (first tag only)

Lazy quantifiers:

*? - Zero or more (lazy)
+? - One or more (lazy)
?? - Zero or one (lazy)
{n,m}? - Between n and m (lazy)

Pro tip: Greedy matching can cause performance issues with large texts. Use lazy quantifiers when you need to match the shortest possible string, especially when working with nested structures.

Practical Example: Matching HTML Tags

Regex: <([a-z]+)>.*?</\1>
Matches: Paired HTML tags like "<p>text</p>" or "<div>content</div>"

This pattern uses lazy matching to avoid capturing multiple tags at once, and backreferences (covered next) to ensure opening and closing tags match.

Groups and Capturing

Parentheses () create groups that serve multiple purposes: they group parts of a pattern together, capture matched text for later use, and enable backreferences.

Basic Grouping

Groups let you apply quantifiers to multiple characters:

Regex: (ha)+
Matches: "ha", "haha", "hahaha"

Without grouping, ha+ would match "ha", "haa", "haaa" (only the 'a' repeats).

Capturing Groups

Groups automatically capture the text they match, which you can reference later:

Text: "John Smith"
Regex: (\w+) (\w+)
Captures: Group 1 = "John", Group 2 = "Smith"

In most programming languages, you can access these captures:

// JavaScript example
const match = "John Smith".match(/(\w+) (\w+)/);
console.log(match[1]); // "John"
console.log(match[2]); // "Smith"

Backreferences

Backreferences let you match the same text that was captured by a group earlier in the pattern. Use \1, \2, etc.

Regex: (\w+) \1
Matches: Repeated words like "the the" or "is is"

Regex: <([a-z]+)>.*?</\1>
Matches: Matching HTML tags like "<div>...</div>"

Non-Capturing Groups

Sometimes you need grouping without capturing. Use (?:...) for non-capturing groups:

Regex: (?:https?|ftp)://\S+
Matches: URLs starting with http, https, or ftp
(The protocol isn't captured as a group)

Non-capturing groups improve performance when you don't need to reference the captured text.

Named Capturing Groups

Named groups make your regex more readable and maintainable:

Regex: (?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})
Matches: Dates like "2026-03-31"
Access: match.groups.year, match.groups.month, match.groups.day

Named groups are especially useful in complex patterns where numbered references become confusing.

Quick tip: Use named groups for complex patterns that you or others will need to maintain. The slight verbosity pays off in readability.

Alternation and Choice Operators

The pipe symbol | acts as an OR operator, allowing you to match one pattern or another.

Basic Alternation

Regex: cat|dog
Matches: Either "cat" or "dog"

Regex: gray|grey
Matches: Both American and British spellings

Alternation with Groups

Combine alternation with groups for more complex patterns:

Regex: (Mr|Ms|Mrs|Dr)\. \w+
Matches: "Mr. Smith", "Dr. Jones", "Ms. Williams"

Regex: \.(jpg|jpeg|png|gif)$
Matches: Image file extensions at the end of filenames

Order Matters

The regex engine tries alternatives from left to right and stops at the first match:

Text: "category"
Regex: cat|category
Matches: "cat" (stops after first match)

Better: category|cat
Matches: "category" (longer match first)

Always put longer or more specific alternatives first to avoid premature matching.

Practical Examples

Matching multiple date formats:

Regex: \d{4}-\d{2}-\d{2}|\d{2}/\d{2}/\d{4}
Matches: "2026-03-31" or "03/31/2026"

Matching different phone formats:

Regex: \d{3}-\d{3}-\d{4}|\(\d{3}\) \d{3}-\d{4}
Matches: "555-123-4567" or "(555) 123-4567"

Lookahead and Lookbehind Assertions

Lookaround assertions check if a pattern exists before or after the current position without including it in the match. They're zero-width assertions—they don't consume characters.

Positive Lookahead (?=...)

Matches a position where the pattern inside the lookahead follows:

Regex: \d+(?= dollars)
Text: "50 dollars and 30 euros"
Matches: "50" (only the number before "dollars")

The lookahead checks for " dollars" but doesn't include it in the match.

Negative Lookahead (?!...)

Matches a position where the pattern inside does NOT follow:

Regex: \d+(?! dollars)
Text: "50 dollars and 30 euros"
Matches: "30" (the number NOT followed by "dollars")

Positive Lookbehind (?<=...)

Matches a position where the pattern inside precedes:

Regex: (?<=\$)\d+
Text: "Price: $50 and €30"
Matches: "50" (only the number after "$")

Negative Lookbehind (?<!...)

Matches a position where the pattern inside does NOT precede:

Regex: (?<!\$)\d+
Text: "Price: $50 and €30"
Matches: "30" (the number NOT preceded by "$")

Practical Applications

Password Validation: Ensure a password contains at least one uppercase letter, one lowercase letter, and one digit:

Regex: ^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$
Validates: Passwords with all requirements met

Extract Domain from Email:

Regex: (?<=@)[a-zA-Z0-9.-]+
Text: "[email protected]"
Matches: "example.com"

Find Numbers Not in Parentheses:

Regex: (?<!\()\d+(?!\))
Text: "Call (555) 123-4567"
Matches: "123" and "4567" (not "555")

Pro tip: Lookaround assertions are powerful but can impact performance. Use them judiciously, especially in patterns that will process large amounts of text. Some regex engines have limited lookbehind support.

Common Regex Patterns Reference Table

Here's a comprehensive reference of frequently used regex patterns for common validation and extraction tasks. You can test these patterns using our Regex Tester Tool.

Pattern Type	Regex	Description
Email Address	`[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}`	Basic email validation
URL	`https?://[^\s]+`	Simple URL matching
Phone (US)	`$?\d{3}$?[-.\s]?\d{3}[-.\s]?\d{4}`	Various US phone formats
ZIP Code (US)	`\d{5}(-\d{4})?`	5-digit or ZIP+4
IP Address (IPv4)	`\b(?:\d{1,3}\.){3}\d{1,3}\b`	Basic IPv4 format
Date (YYYY-MM-DD)	`\d{4}-\d{2}-\d{2}`	ISO date format
Time (24-hour)	`[0-2]\d:[0-5]\d`	HH:MM format
Hex Color	`#[0-9A-Fa-f]{6}\b`	6-digit hex colors
Credit Card	`\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}`	16-digit card numbers
Username	`^[a-zA-Z0-9_]{3,16}$`	3-16 alphanumeric chars
Strong Password	`^(?=.[a-z])(?=.[A-Z])(?=.\d)(?=.[@$!%?&])[A-Za-z\d@$!%?&]{8,}$`	Min 8 chars with requirements
HTML Tag	`<([a-z]+)([^<]+)(?:>(.?)<\/\1>\|\s+\/>)`	Opening and closing tags

Advanced Pattern Examples

Use Case	Regex	Example Match
Extract hashtags	`#\w+`	#regex #tutorial
Extract mentions	`@\w+`	@username
File extension	`\.\w+$`	.txt, .pdf, .jpg
Remove extra spaces	`\s+`	Multiple spaces
Markdown links	`\[([^\]]+)\]$([^)]+)$`	[text](url)
CSS class names	`class="([^"]*)"`	class="container"
Extract numbers	`-?\d+\.?\d*`	123, -45.67
Trim whitespace	`^\s+\|\s+$`	Leading/trailing spaces

You can also use our String Formatter Tool to clean and format text data after extraction.

Using Regex Across Different Languages

While regex syntax is largely consistent across languages, implementation details vary. Here's how to use regex in popular programming languages.

JavaScript

// Literal notation
const regex = /\d{3}-\d{3}-\d{4}/;

// Constructor notation
const regex2 = new RegExp('\\d{3}-\\d{3}-\\d{4}');

// Test if pattern matches
regex.test('555-123-4567'); // true

// Extract matches
const match = '555-123-4567'.match(regex);

// Replace
const result = 'Call 555-123-4567'.replace(regex, 'XXX-XXX-XXXX');

// Global flag for multiple matches
const globalRegex = /\d+/g;
'a1b2c3'.match(globalRegex); // ['1', '2', '3']

Python

import re

# Compile pattern
pattern = re.compile(r'\d{3}-\d{3}-\d{4}')

# Test if pattern matches
pattern.search('555-123-4567')  # Match object or None

# Find all matches
re.findall(r'\d+', 'a1b2c3')  # ['1', '2', '3']

# Replace
re.sub(r'\d{3}-\d{3}-\d{4}', 'XXX-XXX-XXXX', 'Call 555-123-4567')

# Split by pattern
re.split(r'\s+', 'split  by   spaces')  # ['split