10+ basic examples to learn Python RegEx from scratch

In this Python Tutorial we will concentrate on Regex. A regular expression is a pattern you specify, using special characters to represent combinations of specified characters, digits, and words.

I will cover the basics of different regular expressions in the first part of this tutorial, if you are already familiar with the basics then you can directly jump to Python RegEx section in this tutorial

 

Introduction to regular expressions

A regular expression can be as simple as a series of characters that match a given word. For example, the following pattern matches the word “hat”; no surprise there.

hat

But what if you wanted to match a larger set of words? For example, let’s say you wanted to match the following combination of letters:

  • Match a “h” character.
  • Match any number of “a” characters, but at least one.
  • Match a “t” character.

Here’s the regular expression that implements these criteria:

ha+t

Here,

  • Literal characters, such as “h” and “t” in this example, must be matched exactly
  • The plus sign (+) is a special character.
  • It does not cause the regular-expression processor to look for a plus sign. Instead, it forms a subexpression, together with “a” that says, “Match one or more ‘a’ characters.

The pattern ha+t therefore matches any of the following:

hat
haat
haaat
haaaat
..

This was just an overview of regular expression, we will look into different meta characters which we can use with Python regex.

 

Different meta characters

These are tools for specifying either a specific character or one of a number of characters, such as “any digit” or “any alphanumeric character.” Each of these characters matches one character at a time.

CharacterNameDescription
.Dot (Period)Matches any one character except a newline.
If the DOTALL flag is enabled, it matches any character at all.
^CaretMatches the beginning of the string.
If the MULTILINE flag is enabled, it also matches beginning of lines (any character after a newline).
$DollarMatches the end of a string.
If the MULTILINE flag is enabled, it matches the end of a line (the last character before a newline or end of string).
[]Square bracketsA set of characters you wish to match.
\BackslashThis is used to escape various characters. One of the functions of escape sequences is to turn a special character back into a literal character.
expr*Wild character(Star)Modifies meaning of expression expr so that it matches zero or more occurrences rather than one. For example, a* matches “a”, “aa”, and “aaa”, as well as an empty string.
expr+PlusModifies meaning of expression expr so that it matches one or more occurrences rather than only one. For example, a+ matches “a”, “aa”, and “aaa”.
expr{n}Curly bracesModifies expression so that it matches exactly n occurrences of expr. For example, a{3} matches “aaa”;
expr{m, n}Matches a minimum of m occurrences of expr and a maximum of n. For example, x{2,4}y matches “xxy”, “xxxy”, and “xxxxy
expr{m,}Matches a minimum of m occurrences of expr with no upper limit to how many can be matched. For example, x{3,} finds a match if it can match the pattern “xxx” anywhere. But it will match more than three if it can. Therefore zx(3,)y matches “zxxxxxy”.
expr{,n}Matches a minimum of zero, and a maximum of n, instances of the expression expr. For example, ca{,2}t matches “ct”, “cat”, and “caat” but not “caaat”.
expr1 | expr2AlternationMatches a single occurrence of expr1, or a single occurrence of expr2, but not both. For example, a|b matches “a” or “b”. Note that the precedence of this operator is very low, so cat|dog matches “cat” or “dog”.
()ParenthesesThis is used to capture and group sub-patterns
expr?Question markModifies meaning of expression expr so that it matches zero or one occurrence of expr. For example, a? matches “a” or an empty string.

 

. - dot (period)

A dot . matches any one character except a newline character.

PatternWhat does pattern mean?StringMatch?Description
c..fc matches the character c literally (case sensitive)
. matches any character (except for line terminators)
. matches any character (except for line terminators)
f matches the character f literally (case sensitive)
cdefYESExactly two characters between c and f
c12fYESExactly two characters between c and f
abcdefghYESDoesn't matter what is the starting or ending character. There are exactly two characters between c and f
conef

NO

More than 2 characters between c and f
c1f

NO

Less than 2 characters between c and f
Cdef

NO

C is uppercase while pattern contains lowercase c

 

^ - caret

A caret sign ^ matches the beginning of the string.

PatternWhat does Pattern mean?StringMatch?Description
^a^ asserts position at start of a line
a matches the character a literally (case sensitive)
appleYESThe first character a of apple matches the pattern
abcdYESThe first character a of apple matches the pattern
Abcd

NO

The first character of Abcd is uppercase A while the pattern has lowercase a
Cat

NO

The first character of Car is C is not matching the pattern
^en^ asserts position at start of a line
en matches the characters en literally (case sensitive)
endYESThe first two character of end matches the pattern en
east

NO

The first two character of east is not matching our pattern en
bend

NO

The first two character of bend is not matching our pattern en
End

NO

The pattern expects the first two characters to be en in lowercase while the string has uppercase E

 

$ - dollar

The dollar symbol $ is used to check if a string ends with provided expression.

PatternWhat does Pattern mean?StringMatch?Description
e$e matches the character e literally (case sensitive)
$ asserts position at the end of a line
appleYESThe last character of apple matches the pattern
ABCDE

NO

The last character of ABCDE is in Uppercase while our pattern expects lowercase e in the end
eYESThe provided string contains single character e which can be considered as both first and last so it will be a match
ee$ee matches the characters ee literally (case sensitive)
$ asserts position at the end of a line
treeYESThe last two character of tree matches our pattern ee at the end of tree
eye

NO

The pattern expects ee in the end of the string eye, since we have single e hence the match fails

 

[] - square backets

You can add a set of characters inside square brackets which you wish to match.

PatternWhat does pattern mean?StringMatch?Description
BL460C_G[789]_DISKBL460C_G matches the characters BL460C_G literally (case sensitive)

Match a single character present in the list [789] (case sensitive)
BL460C_G9_DISKYES9 is a matching character in the list [789] while other character also match literally
BL460C_G8_DISKYES8 is a matching character in the list [789] while other character also match literally
BL460C_G5_DISK

NO

5 is not part of the list [789] even when rest of the characters match so this will not be matched
BL460C_G78_DISK

NO

78 is part of the list [789] but it can match only single character out of all the values in the list and here there are two characters so it is not a match.

 

\ - backslash

The backslash can be used to “escape” special characters, making them into literal characters. The backslash can also add special meaning to certain ordinary characters—for example, causing \d to mean “any digit” rather than a “d”. We will learn about these special sequences later in this tutorial

PatternWhat does this pattern mean?StringMatch?Description
p@$$w0rdp@ matches the characters p@ literally (case sensitive)
$ asserts position at the end of a line
$ asserts position at the end of a line
w0rd matches the characters w0rd literally (case sensitive)
p@$$w0rd

NO

In the pattern $ means end of line so it won't match pa$$w0rd from the string.
p@\$\$w0rdp@ matches the characters p@ literally (case sensitive)
\$ matches the character $ literally (case sensitive)
\$ matches the character $ literally (case sensitive)
w0rd matches the characters w0rd literally (case sensitive)
p@$$w0rdYESNow since we are using backslash as a escape sequence to now $ is considered as a string instead of meta character.

* - Wild character (Star)

The asterisk (*) modifies the meaning of the expression immediately preceding it, so the a, together with the *, matches zero or more “a” characters.

PatternWhat does Pattern mean?StringMatch?Description
ca*tc matches the character c literally (case sensitive)
a*matches the character a literally (case sensitive)
*Quantifier - Matches between zero and unlimited times, as many times as possible, giving back as needed
t matches the character t literally (case sensitive)
catYESa is followed by t where a is present in the string for zero or more times
ctYESa can be present zero or more times in our string so this is also a match
caatYESa can be present zero or more times and here a is present twice followed by t so this is a match
caaatsYESa can be present zero or more times and here 'a' is present thrice followed by t so this is a match
castle

NO

a is present zero or more times but it is not followed by t so not a match
cart

NO

a is present zero or more times but it is not followed by t so not a match

 

+ - plus

The plus (+) sign will match exactly one or more characters of the preceding expression.

PatternWhat does Pattern mean?StringMatch?Description
ca+tc matches the character c literally (case sensitive)
a+ matches the character a literally (case sensitive)
+ Quantifier - Matches between one and unlimited times, as many times as possible, giving back as needed
t matches the character t literally (case sensitive)
catYESa is followed by t where a is present in the string for one or more times
ct

NO

a must be present one or more times in our string so this is not a match
caatYESa can be present one or more times and here a is present twice followed by t so this is a match
caaatsYESa can be present one or more times and here a is present thrice followed by t so this is a match
castle

NO

a is present more or more times but it is not followed by t so not a match
cart

NO

a is present one or more times but it is not followed by t so not a match

 

? - question

This means that the preceding expression can be present zero or one times only. So this can be helpful when you feel a certain character in a string can be there or may be not.

PatternWhat does this pattern mean?StringMatch?Description
cas?tca matches the characters ca literally (case sensitive)
s? matches the character s literally (case sensitive)
? Quantifier - Matches between zero and one times
catYESs is present zero times so it is a match
castYESs is present one time so it is a match
casst

NO

s is present more than one time so it is not a match

 

{} - curly braces

You can use {n,m} curly braces to match exactly the specified number of occurrences in a string. Here n means the minimum number of occurrence while m represents maximum number of occurrence to match.

PatternWhat does pattern mean?StringMatch?Description
al{2}a matches the character a literally (case sensitive)
l{2} matches the character l literally (case sensitive)
{2} Quantifier — Matches exactly 2 times
callYESThe a character is followed by l two times as expected by the pattern
tale

NO

The a character is followed by l but l is present only single time while {2} expects l to be present at least 2 times
fallsYESThe a character is followed by l and l is present 2 times so it is a match
troll

NO

l is present two times as expected by {2} but a character is missing before l so it is not a match

 

| - alteration

The alteration operator matches a single occurrence of expr1, or a single occurrence of provided expression, but not both.

PatternWhat does pattern mean?StringMatch?Description
cat|dog1st Alternative cat
cat matches the characters cat literally (case sensitive)

2nd Alternative dog
dog matches the characters dog literally (case sensitive)
cattleYESThe pattern will match cat in the string cattle
boggy

NO

As there is no cat or dog in this string boggy, there is no match
doggyYESThe pattern will match dog in the string doggy
battle

NO

As there is no cat or dog in this string battle, there is no match

 

() - parenthesis (group)

Causes the regular-expression evaluator to look at all of expr as a single group. There are two major purposes for doing so. First, a quantifier applies to the expression immediately preceding it; but if that expression is a group, the entire group is referred to. For example, (ab)+ matches “ab”, “abab”, “ababab”, and so on.

PatternWhat does pattern meanStringMatch?Description
(ac)tac matches the characters ac literally (case sensitive)
t matches the character t literally (case sensitive)
factYESThe pattern ac is present followed by t hence this is a match
cat

NO

The pattern ac should be in the same order as it is a group item so this is not a match.
(a|c)ta matches the character a literally (case sensitive)
c matches the character c literally (case sensitive)
factYESAs we are using alteration with group, either of a or c followed by t should be present so it is a match
catYESHere also the pattern expects either a or c in the string so this is also a match

 

Special Sequences

These are different set of pre defined special sequences which can be used to capture different types of patterns in a string.

NOTE:

In all the examples "r" in the beginning is making sure that the string is being treated as a "raw string" so that Python itself does not translate any of the characters; it does not translate \n as a newline.
Special characterDescription
\AMatches beginning of a string
\bWord boundary which returns a match where the specified characters are at the beginning or at the end of a word. For example, r'at\b' matches 'cat' and 'at' but not 'cats'.
\BNonword boundary which returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word. For example, r'at\B' matches bats and atlanta but not cat
\dAny digit character. This includes the digit characters 0 through 9.
\DReturns a match where the string DOES NOT contain digits
\sAny whitespace character; may be blank space or any of the following: \t, \n, \r, \f, or \v
\SAny character that is not a white space, as defined just above.
\wMatches any alphanumeric character (letter or digit) or an underscore (_)
\WMatches any character that is not alphanumeric
\ZMatches the end of a string

 

Examples

Here I have consolidated all these special sequence and different examples to give you an overview on individual operator:

RegexPatternTest StringMatch?Explanation
\Ar'\Ais'is this good?YESSince is is at the starting so it is a match
I hope this is good

NO

Since is is not at the starting so no match
\br'\bPython'Python is easyYESPython is at the beginning of the string so it is a match
How easy is PythonYESPython is again at the beginning of the string so it is a Match. Word boundary is for a word and not sentence so doesn't matter is Python is not at the starting of sentence
Python2 is easyYESPython is again at the beginning of the string Python2 so it is a match. It doesn't matter if Python2 was at the beginning of sentence.
Download iPython

NO

Python is not at the beginning of the string iPython so no match
r'Python\b'Python is easyYESPython is itself a string so it is a match
How easy is PythonYESAgain Python itself is a word in the sentence, it's position in the sentence doesn't matter so this is a match
Python2 is easy

NO

Python is supposed to be at the end so this is not a match
Download iPythonYESThe iPython string ends with Python so this is a match
\Br'\BPython'Python is easy

NO

\B assert position where \b does not match
Python2 is easy

NO

Download iPythonYES
r'Python\B'Python is easy

NO

\B assert position where \b does not match
Python2 is easyYES
Download iPython

NO

\dr'\d'Passw 0rdYESThe string contains a numerical digit between 0-9
Password

NO

There are no numerical digit in the provided string
\Dr'\D'Passw0rdYES\D matches any character that's not a digit (equal to [^0-9])
12345

NO

\sr'\s'Hello WorldYES\s matches any whitespace character (equal to [\r\n\t\f\v ])
HelloWorld

NO

No whitespace character
\Sr'\S'Hello WorldYES\S is opposite to \s
   

NO

\wr'\w'Passw0rd_123YES\w matches any word character (equal to [a-zA-Z0-9_])
[{(<>)}]

NO

\Wr'\w'Passw0rd_123

NO

\W is opposite to \w
[{(<>)}]YES
\Zr'Python\Z'I like PythonYES\Z asserts position at the end of the string, or before the line terminator right at the end of the string (if any).
Python is easy

NO

 

Python regex

  • The re module supplies the attributes listed in the earlier section.
  • It also provides a function that corresponds to each method of a regular expression object (findall, match, search, split, sub, and subn) each with an additional first argument, a pattern string that the function implicitly compiles into a regular expression object.
  • It’s generally preferable to compile pattern strings into regular expression objects explicitly and call the regular expression object’s methods, but sometimes, for a one-off use of a regular expression pattern, calling functions of module re can be slightly handier.

 

Iterative searching with re.findall

One of the most common search tasks is to find all substrings matching a particular pattern. The syntax to use findall would be:

list = re.findall(pattern, target_string, flags=0)

Here in this syntax, pattern is a regular-expression string or precompiled object, target_string is the string to be searched, and flags is optional. The return value of re.findall is a list of strings, each string containing one of the substrings found. These are returned in the order found.

 

Example-1: Find all the digits in a string

In this example we will search for all digit in the provided string.

10+ basic examples to learn Python RegEx from scratch

Here we are using \d operator with re.findall to find all the digits and '\d+' means to match all digits present one or more times from the provided string. Output from this script:

~]# python3 regex-eg-1.py
['12', '123', '78', '456']

 

Example-2: Find words with 6 or more characters

In this example we will write a sample code to find all the words with 6 or more than 6 characters from the provided string using re.findall.

10+ basic examples to learn Python RegEx from scratch

Here we are using re.findall with \w to match alpha numeric character in combination with {6, } to list words with minimum 6 letters or more. Output from this script:

~]# python3 regex-eg-2.py
['testing45', 'test37', 'testing1456']

 

Example-3: Split all characters in the string

We have a string which contains mathematical operators but we want it to be recognized as strings and each character should be broken down into a list of strings.

10+ basic examples to learn Python RegEx from scratch

Here we are using re.findall to get a list of strings with individual characters from the provided text. Output from this script:

~]# python3 regex-eg-3.py
['12', '15', '+', '3', '100', '-', '*']

 

Example-4: Find all the vowels from the string

In this example we will identify all the vowels from the provided string:

10+ basic examples to learn Python RegEx from scratch

Output from this script:

~]# python3 regex-eg-4.py
['12', '15', '+', '3', '100', '-', '*']

 

Example-5: Find vowels case-insensitive

In the last example we listed the vowels from a string but that was case sensitive, if we had some text with UPPERCASE then they won't be matched. To perform case insensitive match we need to add an additional IGNORECASE flag using flags=re.I or flags=re.IGNORECASE

10+ basic examples to learn Python RegEx from scratch

Output from this script:

 ~]# python3 regex-eg-5.py
['i', 'i', 'o', 'e', 'A', 'E', 'e']

 

The re.split function

Another way to invoke regular expressions to help analyze text into tokens is to use the re.split function. The general syntax to use re.split would be:

list = re.split(pattern, string, maxsplit=0, flags=0)

In this syntax,

  • pattern is a regular-expression pattern supporting all the grammar shown until now; however, it doesn’t specify a pattern to find but to skip over. All the text in between is considered a token. So the pattern is really representative of token separators, and not the tokens themselves.
  • The string, as usual, is the target string to split into tokens.
  • The maxsplit argument specifies the maximum number of tokens to find. If this argument is set to 0, the default, then there is no maximum number.

 

Example-1: Split using whitespace

In this example we have a string where we will split the line using whitespace

10+ basic examples to learn Python RegEx from scratch

Output from this script:

~]# python3 regex-eg-1.py
['NAME="/dev/sda"', 'PARTLABEL=""', 'TYPE="disk"']

 

Example-2: Strip using whitespace from a file

In this example we will create a list of elements using whitespace as stripping pattern. We will take the output of who command into a file who.txt

~]# who > who.txt

Now using our python script we will strip each element into a list.

#!/usr/bin/env python3

import re

f = open('who.txt', 'r')
for eachline in f:
    print(re.split(r'\s\s+|\t', eachline))
f.close()

We have defined \s\s+ which means at least two whitespace characters with an alteration pipe and \t to match tab. Output from this script:

~]# python3 regex-eg-2.py
['root', 'pts/0', '2020-11-02 12:07 (10.0.2.2)\n']
['root', 'pts/1', '2020-11-02 19:14 (10.0.2.2)\n']

Now at the end of each line we are getting a newline character, to strip that we can use rstrip(\n) so the updated code would be:

10+ basic examples to learn Python RegEx from scratch

Output from this script:

~]# python3 regex-eg-3.py
['root', 'pts/0', '2020-11-02 12:07 (10.0.2.2)']
['root', 'pts/1', '2020-11-02 19:14 (10.0.2.2)']

 

Replace text using re.sub()

Another tool is the ability to replace text—that is, text substitution. We might want to replace all occurrences of a pattern with some other pattern. This almost always involves group tagging, described in the previous section.

The re.sub function performs text substitution.

re.sub(find_pattern, repl, target_str, count=0, flags=0)

In this syntax, find_pattern is the pattern to look for, repl is the regular-expression replacement string, and target_str is the string to be searched. The last two arguments are both optional.

The return value is the new string, which consists of the target string after the requested replacements have been made.

 

Example-1: Replace multiple spaces with single space

In this example I have a string with multiple whitespace characters where we will use re.sub() to replace multiple whitespace with single whitespace character.

10+ basic examples to learn Python RegEx from scratch

Output from this script:

~]# python3 regex-eg-1.py
abc def ghi ktm

 

Example-2: Replace duplicates

In this example I have a string with multiple duplicate words which I wish to replace with single occurrence of each duplicate word.

10+ basic examples to learn Python RegEx from scratch

Here the replacement string, contains only a reference to the first half of that pattern. This is a tagged string, so this directs the regular-expression evaluator to note that tagged string and use it as the replacement string.

r'\1'

Second, the repeated-word test on “This this” will fail unless the, flags argument is set to re.I (or re.IGNORECASE).

Output from this script:

 ~]# python3 regex-eg-2.py
This is a Python tutorial

 

Searching a string for patterns using re.search

In this section we will learn how to find the first substring that matches a pattern. The re.search function performs this task using following syntax:

match_obj = re.search(pattern, target_string, flags=0)

In this syntax, pattern is either a string containing a regular-expression pattern or a precompiled regular-expression object; target_string is the string to be searched. The flags argument is optional and has a default value of 0.

The function produces a match object if successful and None otherwise. This function is close to re.match in the way that it works, except it does not require the match to happen at the beginning of the string.

By default re.search will search into complete string and will print only the first matching pattern. Let us verify this concept, here I have a text which contains 'python' two times. So we will use re.search to find 'python' word in the sentence.

#!/usr/bin/env python3

import re

line = "This is python regex tutorial. We are using python3"
pat = r'\bpython'
print(re.search(pat, line))

Output from this script:

~]# python3 regex-eg-1.py
<_sre.SRE_Match object; span=(8, 14), match='python'>

As you see the re.search function has stopped searching after first match i.e. 'python' even when 'python3' was also a match for our pattern.

 

Match Object

If you observe the output from re.search, we get a bunch of information along with the matched object. To further optimize the output and get the desired information we can use match object group with re.search

#!/usr/bin/env python3

import re

line = "This is python regex tutorial. We are using python3"
pat = r'\bpython'
match_ob = re.search(pat, line)
print('matched from the pattern: ', match_ob.group())
print('starting index: ', match_ob.start())
print('ending index: ', match_ob.end()-1)
print('Length: ', match_ob.end() - match_ob.start())

Here I am printing different information based on the output from re.search using the index position. Output from this script:

 ~]# python3 regex-eg-2.py
matched from the pattern:  python
starting index:  8
ending index:  13
Length:  6

 

Example-1: Search for a pattern in a log file

Let us take a practical example where we will go through a log file and print all the lines with text CRITICAL

10+ basic examples to learn Python RegEx from scratch

Output from this script:

~]# python3 regex-eg-1.py
2019-11-13T13:03:03Z CRITICAL Error: Unable to find a match
2019-11-13T13:03:14Z CRITICAL Importing GPG key 0x8483C65D:
2019-11-13T13:11:06Z CRITICAL Error: No Matches found

 

Refining matches with re.match

The re.match function returns either a match object, if it succeeds, or the special object None, if it fails. The syntax to use re.match would be:

re.match(s,start=0,end=sys.maxint)

Returns an appropriate match object when a substring of s, starting at index start and not reaching as far as index end, matches r. Otherwise, match returns None.

Let us use our example from previous section where I have added an additional python if else block:

#!/usr/bin/env python3

import re

line = "This is python regex tutorial. We are using python3"
pat = r'\bpython'
match_ob = re.match(pat, line)
if match_ob:
    print(match_ob)
    print('matched from the pattern: ', match_ob.group())
    print('starting index: ', match_ob.start())
    print('ending index: ', match_ob.end()-1)
    print('Length: ', match_ob.end() - match_ob.start())
else:
    print('No match found')

Here now instead of re.search we will use re.match to find our pattern in the provided string. Output from this script:

~]# python3 regex-eg-1.py
No match found

Now even though we have python in our string, re.match returns "No match found", this is because re.match will only search at the first index position. So to get a match we will rephrase our text in the script:

#!/usr/bin/env python3

import re

line = "python regex tutorial. We are using python3"
pat = r'\bpython'
match_ob = re.match(pat, line)
if match_ob:
    print(match_ob)
    print('matched from the pattern: ', match_ob.group())
    print('starting index: ', match_ob.start())
    print('ending index: ', match_ob.end()-1)
    print('Length: ', match_ob.end() - match_ob.start())
else:
    print('No match found')

Output from this script:

 ~]# python3 regex-eg-1.py
<_sre.SRE_Match object; span=(0, 6), match='python'>
matched from the pattern:  python
starting index:  0
ending index:  5
Length:  6

So now re.match was able to match the pattern since the pattern was available at index position 0 so the basic difference between re.search and re.match is that re.match will search for the pattern at first index while re.search will search for the pattern in the entire string.

 

Example-1: Match for a telephone number

In this example we will collect telephone number from the user and using re.match we will confirm if the syntax of the input number is correct or incorrect. Normally in US, the telephone syntax is:

xxx-xxx-xxxx

Sample script:

10+ basic examples to learn Python RegEx from scratch

Here,

  • \d{3} matches a digit (equal to [0-9]) where {3} Quantifier — Matches exactly 3 times
  • Match a single character present in the list below [-] where - matches the character - literally (case sensitive)
  • \d{3} matches a digit (equal to [0-9]) where {3} Quantifier — Matches exactly 3 times
  • Match a single character present in the list below [-] where - matches the character - literally (case sensitive)
  • \d{3,4} matches a digit (equal to [0-9]) where {3,4} Quantifier — Matches between 3 and 4 times, as many times as possible, giving back as needed (greedy)

Output from this script for different inputs:

 ~]# python3 regex-eg-1.py
Match

 ~]# python3 regex-eg-1.py
Enter telephone number: 123-456-1111
Match

 ~]# python3 regex-eg-1.py
Enter telephone number: 1234-123-111
No Match

 

Using re.compile

If you’re going to use the same regular-expression pattern multiple times, it’s a good idea to compile that pattern into a regular-expression object and then use that object repeatedly. The regex package provides a method for this purpose called compile with the following syntax:

regex_object_name = re.compile(pattern)

You will understand better with this example. Here I have used some of the python regex function which we learned in this tutorial. Now if you see we had to use the same pattern multiple times for different regex search so to avoid this we can create a regex pattern object and then use this object to perform your search.

#!/usr/bin/env python3

import re

line = "This is python regex tutorial using python3"
pat = r'\bpython\d'
print('using re.search: ', re.search(pat, line))
print('using re.findall: ', re.findall(pat, line))
print('using re.match: ', re.match(pat, line))

pat_ob = re.compile(pat)
print('Pattern Object: ', pat_ob)
print('using re.compile with re.search: ', pat_ob.search(line))
print('using re.compile with re.findall: ', pat_ob.findall(line))
print('using re.compile with re.match: ', pat_ob.match(line))

As you can see the output from first section without re.compile and second section with re.compile has same output:

 ~]# python3 regex-eg-2.py
using re.search:  <_sre.SRE_Match object; span=(36, 43), match='python3'>
using re.findall:  ['python3']
using re.match:  None
Pattern Object:  re.compile('\\bpython\\d')
using re.compile with re.search:  <_sre.SRE_Match object; span=(36, 43), match='python3'>
using re.compile with re.findall:  ['python3']
using re.compile with re.match:  None

 

Conclusion

Python regex is a very vast topic but I have tried to cover the most areas which are used in most codes. re.search and re.match can be confusing but you have to remember that re.match will search only at the first index position while re.search will search for the pattern in entire string.

We mostly end up using re.compile, you could perform these tasks without precompiling a regular-expression object. However, compiling can save execution time if you’re going to use the same pattern more than once. Otherwise, Python may have to rebuild a state machine multiple times when it could have been built only once.

Leave a Comment

Please use shortcodes <pre class=comments>your code</pre> for syntax highlighting when adding code.