RubyLearning

Helping Ruby Programmers become Awesome!

Ruby Regex: The Complete Guide to Regular Expressions in Ruby

By RubyLearning

Ruby regular expressions (regex) give you a concise, flexible way to search, match, and manipulate text. Whether you are validating user input, parsing log files, or extracting data from strings, mastering ruby regex is one of the most practical skills you can develop. This guide covers everything from basic pattern matching to advanced techniques like lookahead assertions and named captures, with plenty of ruby regex examples you can use immediately.

Ruby treats regular expressions as first-class objects of the Regexp class. This means you can store them in variables, pass them to methods, and build them dynamically at runtime. Combined with Ruby's expressive string methods, regex becomes an extremely powerful tool in your toolkit.

Creating Regular Expressions in Ruby

There are three ways to create a regex in Ruby: literal notation with forward slashes, the Regexp.new constructor, and the %r{} syntax.


# Literal notation (most common)
pattern = /hello/

# Regexp.new constructor (useful for dynamic patterns)
pattern = Regexp.new("hello")

# %r{} syntax (handy when the pattern contains slashes)
pattern = %r{/usr/local/bin}

# All three are equivalent Regexp objects
/hello/.class   # => Regexp
      

Use Regexp.new when you need to build patterns from variables or user input. For static patterns, the literal /pattern/ syntax is preferred because it is compiled once at parse time.


# Building a pattern dynamically
search_term = "ruby"
pattern = Regexp.new(Regexp.escape(search_term), Regexp::IGNORECASE)
"Learn Ruby fast" =~ pattern  # => 6
      

Always use Regexp.escape when interpolating user input into a pattern. This prevents special regex characters in the input from being interpreted as regex operators.

Basic Matching: =~, match, and match?

Ruby provides several ways to test whether a string matches a pattern. Each returns something different, so choosing the right one matters for both correctness and performance.


# =~ operator: returns the index of the first match, or nil
"Hello, Ruby!" =~ /Ruby/   # => 7
"Hello, Ruby!" =~ /Python/ # => nil

# match method: returns a MatchData object, or nil
m = /(\d+)-(\d+)/.match("Order 42-100")
m[0]   # => "42-100" (full match)
m[1]   # => "42"     (first capture)
m[2]   # => "100"    (second capture)

# match? method (Ruby 2.4+): returns true/false, no MatchData allocation
/\d+/.match?("abc123")  # => true
/\d+/.match?("abcdef")  # => false
      

Performance tip: Use match? when you only need a boolean result. It is significantly faster than =~ or match because it does not allocate a MatchData object or set the global match variables ($~, $1, etc.).

Character Classes, Quantifiers, and Anchors

These are the fundamental building blocks of any regex pattern. Think of this section as your ruby regex cheat sheet for the essentials.

Character Classes


# Built-in character classes
/\d/   # Digit: [0-9]
/\D/   # Non-digit: [^0-9]
/\w/   # Word character: [a-zA-Z0-9_]
/\W/   # Non-word character: [^a-zA-Z0-9_]
/\s/   # Whitespace: [ \t\r\n\f\v]
/\S/   # Non-whitespace: [^ \t\r\n\f\v]
/\h/   # Hex digit: [0-9a-fA-F] (Ruby-specific)

# Custom character classes
/[aeiou]/      # Any vowel
/[^aeiou]/     # Any non-vowel
/[a-zA-Z]/     # Any letter
/[0-9a-fA-F]/  # Hex digit (explicit version)

# POSIX character classes (Ruby supports these)
/[[:alpha:]]/  # Alphabetic characters
/[[:digit:]]/  # Digits
/[[:space:]]/  # Whitespace
/[[:upper:]]/  # Uppercase letters
/[[:lower:]]/  # Lowercase letters
/[[:punct:]]/  # Punctuation
      

Quantifiers


# Greedy quantifiers (match as much as possible)
/a*/     # Zero or more 'a'
/a+/     # One or more 'a'
/a?/     # Zero or one 'a'
/a{3}/   # Exactly 3 'a'
/a{2,4}/ # Between 2 and 4 'a'
/a{2,}/  # 2 or more 'a'

# Lazy (non-greedy) quantifiers: add ? after the quantifier
/a*?/    # Zero or more 'a', as few as possible
/a+?/    # One or more 'a', as few as possible
/a{2,4}?/ # Between 2 and 4, as few as possible

# Possessive quantifiers (no backtracking): add + after the quantifier
/a*+/    # Zero or more 'a', no backtracking
/a++/    # One or more 'a', no backtracking
      

The difference between greedy and lazy quantifiers matters when extracting content between delimiters. For example, to match content inside HTML tags:


html = "<b>bold</b> and <i>italic</i>"

# Greedy: matches too much
html.match(/<.+>/)[0]   # => "<b>bold</b> and <i>italic</i>"

# Lazy: matches the first tag
html.match(/<.+?>/)[0]  # => "<b>"
      

Anchors


# Position anchors
/^hello/     # Start of line
/world$/     # End of line
/\Ahello/    # Start of string (ignores multiline)
/world\z/    # End of string (ignores multiline)
/world\Z/    # End of string, before optional trailing newline
/\bhello\b/  # Word boundary
/\Bhello\B/  # Non-word boundary

# Examples
"hello world" =~ /^hello/  # => 0
"hello world" =~ /world$/  # => 6
"say hello" =~ /\bhello\b/ # => 4
"othello" =~ /\bhello\b/   # => nil (not at word boundary)
      

Capture Groups and Named Captures

Capture groups let you extract specific parts of a match. Named captures make your code far more readable than numbered groups.


# Numbered capture groups
m = /(\d{4})-(\d{2})-(\d{2})/.match("2026-03-29")
m[1]  # => "2026" (year)
m[2]  # => "03"   (month)
m[3]  # => "29"   (day)

# Named capture groups
m = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/.match("2026-03-29")
m[:year]   # => "2026"
m[:month]  # => "03"
m[:day]    # => "29"

# Named captures with =~ automatically create local variables
/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/ =~ "2026-03-29"
puts year   # => "2026"
puts month  # => "03"
puts day    # => "29"

# Non-capturing groups (grouping without capturing)
/(?:https?|ftp):\/\/\S+/.match("Visit https://example.com")
      

Important: The automatic local variable assignment only works when the regex literal is on the left side of the =~ operator. If you reverse the order (string =~ /pattern/), the local variables are not created.

The MatchData Object

When a match succeeds, Ruby returns a MatchData object containing detailed information about the match. Understanding this object is key to working effectively with ruby regular expressions.


m = /(?<word>\w+)\s+(?<num>\d+)/.match("item 42 in stock")

m[0]           # => "item 42"       (full match)
m.string       # => "item 42 in stock" (original string)
m.regexp       # => /(?<word>\w+)\s+(?<num>\d+)/

m.pre_match    # => ""              (before the match)
m.post_match   # => " in stock"     (after the match)

m.begin(0)     # => 0               (start index of full match)
m.end(0)       # => 7               (end index of full match)

m.captures     # => ["item", "42"]  (all captures as array)
m.named_captures # => {"word"=>"item", "num"=>"42"} (hash)

m.names        # => ["word", "num"] (capture group names)
      

The named_captures method (returning a Hash) is particularly useful when you need to feed matched data into other methods or constructors.

String Methods That Use Regex

Ruby's String class provides several methods that accept regex patterns. These are among the most frequently used regex features in everyday Ruby code.

scan


# scan returns an array of all matches
"Call 555-1234 or 555-5678".scan(/\d{3}-\d{4}/)
# => ["555-1234", "555-5678"]

# With capture groups, scan returns an array of arrays
"John 30, Jane 25, Bob 40".scan(/(\w+)\s+(\d+)/)
# => [["John", "30"], ["Jane", "25"], ["Bob", "40"]]

# With a block
"abc123def456".scan(/\d+/) { |n| puts n }
# Output: 123, 456
      

gsub and sub


# sub replaces the first match, gsub replaces all matches
"hello world".sub(/\w+/, "hi")    # => "hi world"
"hello world".gsub(/\w+/, "hi")   # => "hi hi"

# Using back-references in the replacement string
"John Smith".gsub(/(\w+)\s(\w+)/, '\2, \1')  # => "Smith, John"

# Using a block for complex replacements
"prices: $10 and $20".gsub(/\$(\d+)/) do |match|
  "$#{$1.to_i * 2}"
end
# => "prices: $20 and $40"

# Using a hash for substitution
"cat and dog".gsub(/cat|dog/, "cat" => "feline", "dog" => "canine")
# => "feline and canine"
      

split


# Split on a regex pattern
"one,two,,three".split(/,/)     # => ["one", "two", "", "three"]
"one  two   three".split(/\s+/) # => ["one", "two", "three"]

# Split with a capture group keeps the delimiter
"one-two=three".split(/([-=])/) # => ["one", "-", "two", "=", "three"]
      

grep


# grep filters an array using a regex
words = ["apple", "banana", "apricot", "cherry", "avocado"]
words.grep(/^a/)   # => ["apple", "apricot", "avocado"]

# grep with a block transforms matched elements
words.grep(/^a/) { |w| w.upcase }
# => ["APPLE", "APRICOT", "AVOCADO"]

# grep_v returns non-matching elements (Ruby 2.3+)
words.grep_v(/^a/) # => ["banana", "cherry"]
      

Lookahead and Lookbehind Assertions

Lookahead and lookbehind are zero-width assertions. They check whether a pattern exists ahead of or behind the current position without consuming any characters. This is useful when you need to match something based on its context.


# Positive lookahead: (?=pattern) - what follows must match
"100px 200em 300px".scan(/\d+(?=px)/)     # => ["100", "300"]

# Negative lookahead: (?!pattern) - what follows must NOT match
"100px 200em 300px".scan(/\d+(?!px)\w+/)  # => ["200em"]

# Positive lookbehind: (?<=pattern) - what precedes must match
"$100 EUR200 $300".scan(/(?<=\$)\d+/)     # => ["100", "300"]

# Negative lookbehind: (?<!pattern) - what precedes must NOT match
"$100 EUR200 $300".scan(/(?<!\$)\d+/)     # => ["200"]
      

A practical example: extracting prices that are in dollars but not in euros:


prices = "Items cost $29.99, EUR15.00, and $49.99"
dollar_amounts = prices.scan(/(?<=\$)\d+\.\d{2}/)
# => ["29.99", "49.99"]
      

Multiline and Extended Mode (/m and /x Flags)

Ruby regex flags modify how the pattern is interpreted. The two most important are /m (multiline) and /x (extended).


# /m flag: makes . match newlines (called "multiline mode")
text = "line one\nline two\nline three"
text.match(/one.+three/)[0]    # => nil (. doesn't match \n by default)
text.match(/one.+three/m)[0]   # => "one\nline two\nline three"

# /i flag: case-insensitive matching
"Hello" =~ /hello/i  # => 0

# /x flag: extended mode, allows comments and whitespace
pattern = /
  \A                # Start of string
  (?<year>\d{4})    # Four-digit year
  -                  # Literal dash
  (?<month>\d{2})   # Two-digit month
  -                  # Literal dash
  (?<day>\d{2})     # Two-digit day
  \z                # End of string
/x

"2026-03-29".match(pattern)[:year]  # => "2026"

# Combine flags
/pattern/mix  # multiline + case-insensitive + extended
      

Note: Ruby's /m flag is equivalent to /s (single-line or DOTALL) in most other regex flavors. In Ruby, ^ and $ always match at line boundaries (no separate multiline flag is needed for that behavior). Use \A and \z when you specifically need start/end of the entire string.

Common Regex Patterns

Here is a ruby regex cheat sheet of patterns for everyday validation and extraction tasks. Each pattern includes notes on limitations and edge cases.


# Email (simplified, covers most common cases)
EMAIL = /\A[\w+\-.]+@[a-z\d\-]+(\.[a-z\d\-]+)*\.[a-z]+\z/i

"user@example.com".match?(EMAIL)   # => true
"bad@@email.com".match?(EMAIL)     # => false

# URL
URL = /\Ahttps?:\/\/[\S]+\z/

"https://example.com/path?q=1".match?(URL)  # => true

# Phone number (US format, flexible)
PHONE = /\A\+?1?[\s.-]?\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4}\z/

"555-123-4567".match?(PHONE)   # => true
"(555) 123-4567".match?(PHONE) # => true

# IPv4 address
IPV4 = /\A(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\z/

"192.168.1.1".match?(IPV4)   # => true
"999.999.999.999".match?(IPV4) # => false

# Date (YYYY-MM-DD)
DATE = /\A\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])\z/

"2026-03-29".match?(DATE) # => true
"2026-13-01".match?(DATE) # => false
      

Caveat: Regex alone cannot fully validate emails (the RFC spec is notoriously complex), dates (leap years, days per month), or URLs. For production validation, pair regex with dedicated libraries or built-in Ruby methods like URI.parse or Date.parse.

Practical Examples: Parsing Logs, Extracting Data, and Validation

Parsing Log Files

Regex is the go-to tool for extracting structured data from log files. Here is an example that parses a common Apache/Nginx log format. For production systems that need continuous log monitoring and alerting, tools like IntelDaily can complement your Ruby scripts by tracking patterns across logs and web mentions in real time.


LOG_PATTERN = /
  (?<ip>\S+)\s+          # Client IP
  \S+\s+                   # Ident (usually -)
  \S+\s+                   # Auth user (usually -)
  \[(?<time>[^\]]+)\]\s+  # Timestamp in brackets
  "(?<method>\w+)\s+       # HTTP method
   (?<path>\S+)\s+         # Request path
   (?<protocol>[^"]+)"\s+ # Protocol
  (?<status>\d{3})\s+     # Status code
  (?<bytes>\d+|-)\s*       # Response size
/x

log_line = '192.168.1.1 - - [29/Mar/2026:10:15:32 +0000] "GET /api/users HTTP/1.1" 200 1234'

if m = LOG_PATTERN.match(log_line)
  puts m[:ip]      # => "192.168.1.1"
  puts m[:method]  # => "GET"
  puts m[:path]    # => "/api/users"
  puts m[:status]  # => "200"
end
      

Extracting Data from Structured Text


# Extract all key-value pairs from a config-style string
config = "host=localhost port=5432 dbname=myapp user=admin"
pairs = config.scan(/(?<key>\w+)=(?<value>\S+)/)
# => [["host", "localhost"], ["port", "5432"],
#     ["dbname", "myapp"], ["user", "admin"]]

hash = pairs.to_h
# => {"host"=>"localhost", "port"=>"5432",
#     "dbname"=>"myapp", "user"=>"admin"}

# Extract version numbers from a changelog
changelog = "v2.3.1 - Bug fixes\nv2.4.0 - New features\nv3.0.0 - Major release"
versions = changelog.scan(/v(\d+\.\d+\.\d+)/).flatten
# => ["2.3.1", "2.4.0", "3.0.0"]
      

Input Validation


# Username: 3-20 alphanumeric characters, underscores, hyphens
def valid_username?(name)
  /\A[a-zA-Z0-9_-]{3,20}\z/.match?(name)
end

valid_username?("ruby_dev")    # => true
valid_username?("ab")          # => false (too short)
valid_username?("hello world") # => false (contains space)

# Strong password: at least 8 chars, one upper, one lower,
# one digit, one special character
def strong_password?(pw)
  return false if pw.length < 8
  /[A-Z]/.match?(pw) &&
    /[a-z]/.match?(pw) &&
    /\d/.match?(pw) &&
    /[^A-Za-z0-9]/.match?(pw)
end

strong_password?("Passw0rd!")  # => true
strong_password?("password")   # => false

# Hex color code
HEX_COLOR = /\A#(?:[0-9a-fA-F]{3}){1,2}\z/
"#fff".match?(HEX_COLOR)    # => true
"#1a2b3c".match?(HEX_COLOR) # => true
"#xyz".match?(HEX_COLOR)    # => false
      

Performance Tips: Avoiding Catastrophic Backtracking

A poorly written regex can cause catastrophic backtracking, where the engine takes exponential time to determine that a string does not match. This can freeze your application or open it to ReDoS (Regular Expression Denial of Service) attacks.


# BAD: nested quantifiers cause catastrophic backtracking
# This can hang on strings like "aaaaaaaaaaaaaaaaaaaaaaX"
bad_pattern = /^(a+)+$/

# GOOD: flatten the quantifiers
good_pattern = /^a+$/

# BAD: overlapping alternatives
bad_pattern = /^(.*,)*$/

# GOOD: be specific about what each part matches
good_pattern = /^([^,]*,)*[^,]*$/
      

Follow these rules to keep your regex fast:

  • Avoid nested quantifiers like (a+)+ or (a*)*. Flatten them to a+.
  • Be specific with character classes. Use [^,]+ instead of .+ when you know the delimiter.
  • Use possessive quantifiers (a++) or atomic groups ((?>a+)) when backtracking is not needed.
  • Use match? instead of =~ or match when you only need a boolean.
  • Anchor your patterns with \A and \z to prevent unnecessary scanning.
  • Set a timeout in Ruby 3.2+ using Regexp.timeout= to protect against runaway patterns.

# Ruby 3.2+: set a global timeout for all regex operations
Regexp.timeout = 1.0  # seconds

# Or set a timeout on a specific pattern
pattern = Regexp.new("(a+)+$", timeout: 0.5)

# Possessive quantifier: prevents backtracking
/\A\w++\z/.match?("hello")  # => true, and fast even on long strings

# Atomic group: same effect
/\A(?>\w+)\z/.match?("hello")  # => true
      

Quick Reference: Ruby Regex Cheat Sheet

Syntax Meaning
/pattern/Regex literal
=~Match operator (returns index or nil)
.match(str)Returns MatchData or nil
.match?(str)Returns true/false (fast, no allocations)
\d \w \sDigit, word char, whitespace
[abc] [^abc]Character class, negated class
* + ? {n,m}Quantifiers (greedy)
*? +? ??Lazy quantifiers
*+ ++ ?+Possessive quantifiers
^ $ \A \zAnchors (line/string start/end)
\bWord boundary
(pattern)Capture group
(?<name>pat)Named capture
(?:pattern)Non-capturing group
(?=pat) (?!pat)Positive/negative lookahead
(?<=pat) (?<!pat)Positive/negative lookbehind
/i /m /xCase-insensitive, dot-all, extended
(?>pattern)Atomic group (no backtracking)

Conclusion

Ruby's regex support is among the best of any programming language. With the Regexp class, powerful string methods like scan, gsub, and split, and features like named captures and lookahead assertions, you can handle virtually any text processing task. Start with simple patterns using match? for validation, graduate to named captures for data extraction, and always keep performance in mind by avoiding nested quantifiers and using possessive quantifiers where appropriate.

For further reading, consult the official Ruby Regexp documentation and experiment with patterns in irb or pry to build your fluency with ruby regular expressions.