Class TokenizerMatch

java.lang.Object
eu.svjatoslav.commons.string.tokenizer.TokenizerMatch

public class TokenizerMatch extends Object
Represents a matched token from the tokenizer.

TokenizerMatch contains all information about a token that was extracted from the source string:

  • token - The actual text content of the token
  • terminator - The Terminator that identified this token
  • matcher - The regex Matcher used for matching

Key methods:

Example usage:


 TokenizerMatch match = tokenizer.getNextToken();

 System.out.println("Token: " + match.token);

 if (match.isGroup("number")) {
     int value = Integer.parseInt(match.token);
 }

 if (match.isGroup("string")) {
     String[] groups = match.getRegExpGroups();
     // groups[0] might be the string content without quotes
 }
 

For tokens that were accumulated text (not matched by a terminator), the terminator and matcher fields will be null.

See Also:
  • Field Details

    • token

      public final String token
      The text content of the matched token.

      This is the actual substring from the source that was identified as a token. For accumulated text (no terminator match), this contains all characters accumulated before a terminator was found.

    • terminator

      public final Terminator terminator
      The Terminator that identified this token.

      May be null if this token was accumulated text rather than matched by a terminator. When not null, you can check the terminator's group to categorize the token.

    • matcher

      public final Matcher matcher
      The regex Matcher used to identify this token.

      May be null if this token was accumulated text. When not null, you can use this to extract capture groups from the match.

  • Constructor Details

    • TokenizerMatch

      public TokenizerMatch(String token, Terminator terminator, Matcher matcher, Tokenizer tokenizer)
      Creates a new TokenizerMatch with all components.
      Parameters:
      token - the matched text. May be empty but should not be null.
      terminator - the Terminator that matched this token. May be null for accumulated text tokens.
      matcher - the regex Matcher used for matching. May be null for accumulated text tokens.
      tokenizer - the Tokenizer that produced this match.
  • Method Details

    • isGroup

      public boolean isGroup(String group)
      Checks if this token belongs to the specified group.

      This compares the group name of the terminator against the provided group name. Useful for categorizing tokens by type.

      Special cases:

      • If terminator is null, returns true only if group is also null
      • If terminator.group is null, returns true only if group is null

      Example:

      
       tokenizer.addTerminator(PRESERVE, "\\d+", "number");
       tokenizer.addTerminator(PRESERVE, "\\w+", "word");
      
       TokenizerMatch match = tokenizer.getNextToken();
       if (match.isGroup("number")) {
           // Token is a number
       } else if (match.isGroup("word")) {
           // Token is a word
       }
       
      Parameters:
      group - the group name to check against. May be null.
      Returns:
      true if this token belongs to the specified group, false otherwise.
    • getRegExpGroups

      public String[] getRegExpGroups()
      Extracts regex capture groups from this match.

      Returns the captured groups from the regex pattern that matched this token. Group 1 and onwards are returned (not the full match).

      Example:

      
       tokenizer.addTerminator(PRESERVE, "(\\d+):(\\d+)", "time");
       // Matches "12:30"
      
       TokenizerMatch match = tokenizer.getNextToken();
       String[] groups = match.getRegExpGroups();
       // groups[0] = "12" (hours)
       // groups[1] = "30" (minutes)
       
      Returns:
      an array of captured group strings. Empty array if matcher is null or no capture groups exist in the pattern.
    • toString

      public String toString()
      Returns a detailed string representation for debugging.

      Includes the token text, terminator details, and any regex groups.

      Overrides:
      toString in class Object
      Returns:
      a multi-line descriptive string.
    • getTokenizer

      public Tokenizer getTokenizer()
      Returns the tokenizer that produced this match.

      This allows continuing tokenization or accessing tokenizer state from within token handling code.

      Returns:
      the Tokenizer instance that created this match.