mkaz.blog

Working with Python

Regular Expressions

Regular expressions in Python uses the standard module re

It is recommended to use raw strings for a regex pattern to avoid issues with escaping special characters. Recent versions of Python will issue a warning when not using a raw string for escaped characters.

Basic regular expression matching

Typically I want to use re.search instead of re.match, the difference is .search will scan the entire string to match the expression anywhere, while .match matches against the whole string at once.

import re
 
s = "There are 13 dogs outside."
m = re.match(r"(\d+)", s)
if m is None:
    print("No match")
 
m = re.search(r"(\d+)", s)
if m is not None:
    print(f"Match found: {m.group(1)} dogs")

Both .match and .search return a Match object or None if no match found. The Match object first value will be the entire portion of the string matched and then the next items in the group the list of matches parentheses.

import re
 
s = "There are 13 dogs outside."
m = re.match(r".*?(\d+)\s(\w+)", s)
if m is None:
    print("No match")
else:
    print(f"Matched String: {m.group(0)}")
    print(f"First Paren: {m.group(1)}")
    print(f"Second Paren: {m.group(2)}")
 
m = re.search("(\d+)\s(\w+)", s)
if m is not None:
    print(f"Matched String: {m.group(0)}")
    print(f"First Paren: {m.group(1)}")
    print(f"Second Paren: {m.group(2)}")

Regular Expression Substitution

Use the re.sub() function to replace text based on a regular expression.

import re
 
s = "There are 13 dogs by 4 windows."
s = re.sub(r"\d+", "2", s)
s
>>> "There are 2 dogs by 2 windows."