Class Terminator

java.lang.Object
eu.svjatoslav.commons.string.tokenizer.Terminator

public class Terminator extends Object
Defines a token boundary using a regular expression pattern.

A Terminator specifies how to identify and handle token boundaries in the tokenizer. Each terminator has:

  • A regex pattern to match at the current position
  • A termination strategy (PRESERVE or DROP)
  • An optional group name for categorizing matched tokens
  • An active flag that can disable the terminator temporarily

Termination strategies:

  • PRESERVE - The matched text is returned as a token. Use this for tokens you want to process (keywords, operators, etc.)
  • DROP - The matched text is discarded silently. Use this for separators you don't need (whitespace, comments, etc.)

Example:


 // Drop whitespace (don't return it as a token)
 Terminator whitespace = tokenizer.addTerminator(DROP, "\\s+");

 // Preserve quoted strings (return them as tokens)
 Terminator strings = tokenizer.addTerminator(PRESERVE, "\".*?\"", "string");

 // Preserve numbers with a group name
 Terminator numbers = tokenizer.addTerminator(PRESERVE, "\\d+", "number");

 // Temporarily disable a terminator
 whitespace.active = false;  // Now whitespace will be returned as tokens
 whitespace.active = true;   // Back to dropping whitespace
 

Patterns are anchored to match only at the current position (the "^" anchor is prepended automatically).

See Also:
  • Field Details

    • termination

      public final Terminator.TerminationStrategy termination
      The strategy for handling matched tokens.

      Determines whether matched text is returned as a token (PRESERVE) or silently discarded (DROP).

      See Also:
    • group

      public final String group
      An optional group name for categorizing this token type.

      When set, matched tokens can be identified by this group name using TokenizerMatch.isGroup(String).

      May be null for uncategorized tokens.

    • active

      public boolean active
      Flag indicating whether this terminator is active.

      When false, this terminator is skipped during matching. Set this to temporarily disable a terminator without removing it.

      Default value is true.

    • pattern

      public final Pattern pattern
      The compiled regex pattern used for matching.

      This is the regexp with "^" prepended to anchor it at the start. The pattern is compiled once for efficiency.

  • Constructor Details

    • Terminator

      public Terminator(Terminator.TerminationStrategy termination, String regexp, String group)
      Creates a new terminator with the specified configuration.

      The regex pattern is anchored with "^" to match only at the current position in the source string.

      Parameters:
      termination - how to handle matched tokens (PRESERVE or DROP).
      regexp - the regex pattern to match tokens. Must not be null.
      group - the group name for categorizing this token type. May be null.
  • Method Details

    • match

      public Matcher match(String source, int index)
      Creates a matcher for this terminator at the specified position.

      The matcher is configured to search from the given index to the end of the source string, ensuring matches only occur at that exact position (due to the "^" anchor in the pattern).

      Parameters:
      source - the source string to match against. Must not be null.
      index - the starting position for matching.
      Returns:
      a Matcher configured to search from index to the end.
    • toString

      public String toString()
      Returns a string representation of this terminator.

      Includes the regexp, termination strategy, group name, and active status.

      Overrides:
      toString in class Object
      Returns:
      a descriptive string for debugging.