Java Modules for Regular Expressions

Pattern / Matcher / MatchResult

Java offers modules for implementing regular expressions through the package java.util.regex This defines two major classes Pattern and Matcher; along with the interface MatchResult. These can be used in our code to implement Regular Expressions. We an have an Exception class - PatternSyntaxException - which is thrown if the syntax of the Regex pattern is not good. The typical way to proceed is as follows:
  1. Identify the regular expression for the requirement
  2. Compile the string holding this regular expression into a Pattern
  3. Get a Matcher for the Pattern and the target string and then Check if the target string 'matches' the pattern.

Pattern

The pattern class has the following important methods:
MethodDescription
static Pattern compile(String regex)This is the factory method for this class. It compiles the given regex and return the instance of Pattern.
Matcher matcher(CharSequence input)For the given Pattern and the input string, it creates a Matcher object that describes how Pattern matches with the given input.
static boolean matches(String regex, CharSequence input)It works as the combination of compile and matcher methods. It compiles the regular expression and matches the given input with the pattern.
String[] split(CharSequence input)splits the given input string around matches of given pattern.
String pattern()returns the regex pattern.

Matcher

It implements MatchResult interface.
MethodDescription
boolean matches()This tests for an exact match. Returns true if the input matches patterns end to end.
boolean find()This checks for a kind of "contains" match. Not just that, it moves the cursor to the match found. There could be multiple matches within the string. Each invocation of find() method will move the cursor to the next match.
boolean find(int start)Essentially the same as find(). But it starts the matching process after skipping the first few characters (as defined by start). Every call to find(4) will start afresh from the 4th character.
String group()returns the matched subsequence.
int start()This is meaningful only if called after a successfully find() or find(start). It returns the starting index of the matched subsequence.
int end()returns the ending index of the matched subsequence.
int groupCount()This returns the total number of the matched subsequence.

Example

Suppose we want to write code that identifies if the string starts with a number followed white space. Here, we start with developing a regular expression. For a string starting with a number followed by white space, the regular expression would be "\d+\s"
package com.solegaonkar.learnjava;

import java.util.regex.*;

public class RegexExample1{
 public static void main(String args[]){
  //1st way
  Pattern p = Pattern.compile("\\d+\\s+");//. represents single character
  Matcher m = p.matcher("123 234 ");
  while (m.find()) {
   System.out.printf("%s: %d: %d\n", m.group(), m.start(), m.end());
  }
  boolean b = m.matches();

  m = p.matcher("123 234 123 456 677 666");
  if (m.find(5)) {
    System.out.printf("%s: %d: %d\n", m.group(), m.start(), m.end());
  }
  if (m.find(4)) {
    System.out.printf("%s: %d: %d\n", m.group(), m.start(), m.end());
  }
  boolean b1 = m.matches();

  //2nd way
  boolean b2=Pattern.compile("\\d+\\s.*").matcher("12 ds").matches();

  //3rd way
  boolean b3 = Pattern.matches("\\d+\\s.*", "123 fds");

  System.out.println(b+" "+b2+" "+b3);
 }
}
123 : 0: 4
234 : 4: 8
34 : 5: 8
234 : 4: 8
false true true
Note that we use \\ instead of \ in the regex string. This is because \ is a special character in the string and we need to escape it with another \.