Home Python Regex
Post
Cancel

Python Regex

RegEx Module

Python has a built-in package called re, which can be used to work with Regular Expressions.

Import the re module:

1
import re

regex functions

The re module offers a set of functions that allows us to search a string for a match:

FunctionDescription
findallReturns a list containing all matches
splitReturns a list where the string has been split at each match
searchReturns a Match object if there is a match anywhere in the string
subReplaces one or many matches with a string

The findall() Function

The findall() function returns a list containing all matches.

Example

Print a list of all matches:

1
2
3
4
5
import re

txt = "We are the 'Vikings' from the north."
x = re.findall("the", txt)
print(x)

The search() Function

The search() function searches the string for a match, and returns a Match object if there is a match.

If there is more than one match, only the first occurrence of the match will be returned:

Example

Search for the position in the string:

1
2
3
4
5
import re

txt = "We are the 'Vikings' from the north."
x = re.search("\'V", txt)
print("The position of the search string is:", x.start())

The split() Function

The split() function returns a list where the string has been split at each match:

Example

Split at each white-space character:

1
2
3
4
5
import re

txt = "We are the 'Vikings' from the north."
x = re.split("\s", txt)
print(x)

The sub() Function

The sub() function replaces the matches with the text of your choice:

Example

Replace every white-space character with “_”:

1
2
3
4
5
import re

txt = "We are the 'Vikings' from the north."
x = re.sub("\s", "_", txt)
print(x)

Meta Characters

Metacharacters are characters with a special meaning:

CharacterDescription
[]A set of characters
\Signals a special sequence (can also be used to escape special characters)
.Any character (except newline character)
^Starts with
$Ends with
*Zero or more occurrences
+One or more occurrences
{}Exactly the specified number of occurrences
|Either or
()Capture and group

Special Sequences

A special sequence is a \ followed by one of the characters in the list below, and has a special meaning:

CharacterDescription
\AReturns a match if the specified characters are at the beginning of the string
\bReturns a match where the specified characters are at the beginning or at the end of a word
\BReturns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word
\dReturns a match where the string contains digits (numbers from 0-9)
\DReturns a match where the string DOES NOT contain digits
\sReturns a match where the string contains a white space character
\SReturns a match where the string DOES NOT contain a white space character
\wReturns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character)
\WReturns a match where the string DOES NOT contain any word characters
\ZReturns a match if the specified characters are at the end of the string

Sets

A set is a set of characters inside a pair of square brackets [] with a special meaning:

SetDescription 
[arn]Returns a match where one of the specified characters (a, r, or n) are present 
[a-n]Returns a match for any lower case character, alphabetically between a and n 
[^arn]Returns a match for any character EXCEPT a, r, and n 
[a-zA-Z]Returns a match for any character alphabetically between a and z, lower case OR upper case 
[+]In sets, +, *, .,, (), $,{} has no special meaning, so [+] means: return a match for any + character in the string
[0123]Returns a match where any of the specified digits (0, 1, 2, or 3) are present 
[0-9]Returns a match for any digit between 0 and 9 
[0-5][0-9]Returns a match for any two-digit numbers from 00 and 59 
This post is licensed under CC BY 4.0 by the author.