Java RegEx Explained [In-Depth Tutorial]


Bashir Alam

JAVA

Regular expressions are a powerful tool used to match and manipulate text. They are a sequence of characters that define a search pattern, which can be used to search, replace, and validate data. Regular expressions are supported by many programming languages, including Java. In this article, we will discuss how to use regular expressions in java using Java.

 

Introduction to Java RegEx (Regular Expressions)

Regular expressions are a way to search and manipulate text data. In Java, regular expressions are supported through the java.util.regex package. This package provides two main classes for working with regular expressions: Pattern and Matcher. A regular expression is a sequence of characters that define a search pattern. This pattern can be used to search for specific pieces of text within a larger body of text or to validate input data to ensure it matches a specific format.

For example, if you wanted to find all instances of the word "apple" in a piece of text, you could use a regular expression to search for the pattern "apple". Similarly, if you wanted to validate that a user's input for a phone number matches a specific format, you could use a regular expression to ensure that the input contains the correct number of digits and formatting characters. Java regular expressions use a syntax that is similar to other programming languages and tools but with some specific differences. Understanding the basics of regular expression syntax is essential to effectively using regular expressions in Java.

 

Java APIs for Regular Expressions

Java provides two main classes for working with regular expressions: Pattern and Matcher.

 

Pattern Class:

The Pattern class represents a compiled regular expression. It has the following methods:

  • compile(String regex): Compiles the specified regular expression into a Pattern object. Returns the Pattern object.
  • matcher(CharSequence input): Creates a Matcher object that matches the given input string against the pattern. Returns the Matcher object.

For example, to create a Pattern object that matches the word "apple", you would use the following code:

Pattern pattern = Pattern.compile("apple");

Once you have a Pattern object, you can use it to create a Matcher object, which can be used to search for the pattern within a piece of text.

 

Matcher Class:

The Matcher class is used to match a regular expression against a string. It has the following methods:

  • matches(): Returns true if the entire input sequence matches the pattern.
  • find(): Attempts to find the next subsequence of the input sequence that matches the pattern. Returns true if a match is found.
  • group(): Returns the matched subsequence.
  • start(): Returns the start index of the matched subsequence.
  • end(): Returns the end index of the matched subsequence.
  • replaceAll(String replacement): Replaces all occurrences of the pattern in the input sequence with the specified replacement string. Returns the modified input sequence.

In addition to these methods, the Matcher class also provides methods for specifying the region of the input sequence to search in, setting and querying the state of the match, and accessing the pattern used for the match.

For example, to create a Matcher object to search for the word "apple" within the string "I like to eat apples", you would use the following code:

Pattern pattern = Pattern.compile("apple");
Matcher matcher = pattern.matcher("I like to eat apples");

Once you have a Matcher object, you can use its various methods to search for and manipulate the text that matches the pattern.

Some of the most commonly used methods of the Matcher class include:

  • find(): Searches for the next occurrence of the pattern within the input string.
  • matches(): Tests whether the entire input string matches the pattern.
  • group(): Returns the portion of the input string that matches the pattern.

These are just a few examples of the methods available on the Matcher class. The Java documentation provides a full list of methods and their descriptions.

In the next section, we'll cover the basic syntax of regular expressions in Java, which is used to create patterns that can be matched by a Matcher object.

 

Basic Regular Expression Syntax in Java

Java regular expressions use a syntax that is similar to other programming languages and tools, but with some specific differences. The basic syntax for regular expressions in Java includes special characters, character classes, quantifiers, and grouping constructs.

 

Special Characters:

Special characters are used to define patterns that match specific characters or groups of characters within a larger piece of text. Some of the most commonly used special characters in Java regular expressions include:

  • "." : Matches any single character.
  • "^" : Matches the beginning of a line.
  • "$" : Matches the end of a line.
  • "|" : Matches either the pattern on the left or the pattern on the right.
  • "[]" : Matches any single character within the brackets.
  • "[^]" : Matches any single character not within the brackets.
  • "\s" : Matches any whitespace character.
  • "\S" : Matches any non-whitespace character.
  • "\d" : Matches any digit character.
  • "\D" : Matches any non-digit character.
  • "\w" : Matches any word character (letters, digits, or underscores).
  • "\W" : Matches any non-word character.

 

Character Classes:

Character classes are used to define patterns that match specific sets of characters. For example, the character class "[aeiou]" would match any vowel character.

 

Quantifiers:

Quantifiers are used to specify how many times a particular pattern should be matched. Some of the most commonly used quantifiers in Java regular expressions include:

  • "*" : Matches zero or more occurrences of the preceding pattern.
  • "+" : Matches one or more occurrences of the preceding pattern.
  • "?" : Matches zero or one occurrences of the preceding pattern.
  • "{n}" : Matches exactly n occurrences of the preceding pattern.
  • "{n,}" : Matches n or more occurrences of the preceding pattern.
  • "{n,m}" : Matches between n and m occurrences of the preceding pattern.

 

Grouping Constructs:

Grouping constructs are used to group patterns together so that they can be treated as a single unit. Grouping constructs can also be used to apply quantifiers to multiple patterns at once. Some of the most commonly used grouping constructs in Java regular expressions include:

  • "(pattern)" : Groups the specified pattern.
  • "(?idmsux-idmsux:pattern)" : Specifies options for the specified pattern.
  • "(?=pattern)" : Matches the pattern only if it is followed by the specified pattern.
  • "(?!pattern)" : Matches the pattern only if it is not followed by the specified pattern.

These are just a few examples of the basic syntax for regular expressions in Java. The Java documentation provides a full list of special characters, character classes, quantifiers, and grouping constructs, along with their descriptions.

 

Regular expression flags

Java regular expressions also support flags that can be used to modify the behavior of a regular expression. Here are some common flags:

  • CASE_INSENSITIVE: Enables case-insensitive matching.
  • DOTALL: Enables the "." character to match line terminators.
  • MULTILINE: Enables multiline mode, in which "^" and "$" match the beginning and end of lines, respectively.
  • UNICODE_CASE: Enables Unicode-aware case folding.

 

Some Practical Examples

Example-1: Matching a specific word

Suppose we want to find all occurrences of the word "apple" in a string. Here's how we can do that using regular expressions:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Example {
    public static void main(String[] args) {
        String text = "I have an apple and a banana.";
        Pattern pattern = Pattern.compile("apple");
        Matcher matcher = pattern.matcher(text);

        while (matcher.find()) {
            System.out.println("Match found at index " + matcher.start());
        }
    }
}

Output:

Match found at index 10

In this example, we first create a Pattern object using the Pattern.compile() method and pass the regular expression "apple" as an argument. Then, we create a Matcher object using the matcher() method and pass the text we want to search for matches in as an argument. We then use the find() method in a loop to find all occurrences of the word "apple" in the text. The start() method of the Matcher object returns the index where the match starts in the text.

 

Example-2: Matching a Pattern with Wildcards

Suppose we want to find all occurrences of a word that starts with the letter "a" and ends with the letter "e" in a string. Here's how we can do that using regular expressions:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Example {
    public static void main(String[] args) {
        String text = "I have an apple and an orange.";
        Pattern pattern = Pattern.compile("a\\w+e");
        Matcher matcher = pattern.matcher(text);

        while (matcher.find()) {
            System.out.println("Match found at index " + matcher.start());
        }
    }
}

Output:

Match found at index 3
Match found at index 10
Match found at index 25

In this example, we create a Pattern object using the Pattern.compile() method and pass the regular expression "a\\w+e" as an argument. The "\\w+" part of the regular expression matches any word character (i.e., any letter, digit, or underscore) one or more times. The "a" and "e" parts of the regular expression match the letters "a" and "e", respectively. We then create a Matcher object and use the find() method in a loop to find all matches of the pattern in the text.

 

Example-3: Extracting Groups from a Match

Suppose we have a string that contains a person's name in the format "last name, first name". We want to extract the last name and first name into separate variables. Here's how we can do that using regular expressions

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Example {
    public static void main(String[] args) {
        String text = "Smith, John";
        Pattern pattern = Pattern.compile("(\\w+), (\\w+)");
        Matcher matcher = pattern.matcher(text);

        if (matcher.find()) {
            String lastName = matcher.group(1);
            String firstName = matcher.group(2);
            System.out.println("Last Name: " + lastName);
            System.out.println("First Name: " + firstName);
        }
    }
}

Output:

Last Name: Smith
First Name: John

In this example, we create a Pattern object using the Pattern.compile() method and pass the regular expression "(\w+), (\w+)" as an argument. The regular expression contains two groups, separated by a comma and a space. The first group matches one or more word characters (i.e., letters, digits, or underscores) and represents the last name. The second group matches one or more word characters and represents

 

Example-4: Replacing with regular expressions

The replaceAll() method in Java allows you to replace parts of a string that match a regular expression with a specified replacement string.

For example, the following code replaces all occurrences of the word "dog" with "cat" in the string "I have a dog and my dog is cute":

String str = "I have a dog and my dog is cute";
str = str.replaceAll("dog", "cat");
System.out.println(str);

Output:

I have a cat and my cat is cute

 

Example-5: Validating email addresses

Validating email addresses is a common use case for regular expressions in Java. An email address consists of two parts: the local part (the part before the "@") and the domain part (the part after the "@"). Here is an example of how to use regular expressions to validate an email address in Java:

String email = "example@example.com";
boolean isValid = email.matches("[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}");
System.out.println(isValid);

In this example, we use the matches() method to check if the email address matches the regular expression. The regular expression consists of two parts separated by the "@" symbol.

The first part [a-zA-Z0-9._%+-]+ matches one or more characters that can be letters, digits, dots, underscores, percent signs, plus signs, or hyphens. This matches the local part of the email address.

The second part [a-zA-Z0-9.-]+\\.[a-zA-Z]{2,} matches one or more characters that can be letters, digits, dots, or hyphens, followed by a dot, and then two or more letters. This matches the domain part of the email address.

The backslash before the dot in the regular expression is used to escape the dot, as it has a special meaning in regular expressions.

If the email address matches the regular expression, the isValid variable will be set to true, and the code will print "true". If the email address does not match the regular expression, the isValid variable will be set to false, and the code will print "false".

 

Summary

In Java, regular expressions are a powerful tool for working with text data. Regular expressions are patterns used to match, search, and manipulate strings of text. Java provides a comprehensive set of APIs in the java.util.regex package for working with regular expressions.

The Pattern class represents a compiled regular expression, and the Matcher class is used to match the regular expression against a string. The Matcher class also provides methods for accessing the matched subsequence, replacing the matched subsequence with a replacement string, and setting and querying the state of the match.

Java regular expressions support a wide range of syntax elements, including character classes, quantifiers, anchors, and grouping constructs. Java regular expressions also support flags that can be used to modify the behavior of a regular expression.

Some common use cases for regular expressions in Java include validating email addresses, extracting phone numbers or other data from text, and searching for patterns in log files or other large text datasets.

When working with regular expressions in Java, it is important to follow best practices, such as using anchors to ensure that the pattern matches the entire input sequence, and avoiding excessive backtracking that can lead to poor performance.

 

Further Reading

Regular expressions in java

 

Views: 4

Bashir Alam

He is a Computer Science graduate from the University of Central Asia, currently employed as a full-time Machine Learning Engineer at uExel. His expertise lies in OCR, text extraction, data preprocessing, and predictive models. You can reach out to him on his Linkedin or check his projects on GitHub page.

Can't find what you're searching for? Let us assist you.

Enter your query below, and we'll provide instant results tailored to your needs.

If my articles on GoLinuxCloud has helped you, kindly consider buying me a coffee as a token of appreciation.

Buy GoLinuxCloud a Coffee

For any other feedbacks or questions you can send mail to admin@golinuxcloud.com

Thank You for your support!!

Leave a Comment

GoLinuxCloud Logo


We try to offer easy-to-follow guides and tips on various topics such as Linux, Cloud Computing, Programming Languages, Ethical Hacking and much more.

Programming Languages

JavaScript

Python

Golang

Node.js

Java

Laravel