2011-07-23

Tips on Regex []

Moved to the page http://3rdstage.wikia.com/wiki/On_Regex in my wiki as of 20th Jul. 2012.

Readings : References, Tutorials and Articles

Special Characters of ERE (Extended Regular Expression)

from Regular Expressions in Single UNIX Specification

An ERE special character has special properties in certain contexts. Outside those contexts, or when preceded by a backslash, such a character is an ERE that matches the special character itself. The extended regular expression special characters and the contexts in which they have their special meaning are:

. \ [ (
The period, left-bracket, backslash and left-parenthesis are special except when used in a bracket expression. Outside a bracket expression, a left-parenthesis immediately followed by a right-parenthesis produces undefined results.
)
The right-parenthesis is special when matched with a preceding left-parenthesis, both outside a bracket expression.
* + ? {

The asterisk, plus-sign, question-mark and left-brace are special except when used in a bracket expression (see RE Bracket Expression ). Any of the following uses produce undefined results:

  • if these characters appear first in an ERE, or immediately following a vertical-line, circumflex or left-parenthesis.
  • if a left-brace is not part of a valid interval expression.
|

The vertical-line is special except when used in a bracket expression. A vertical-line appearing first or last in an ERE, or immediately following a vertical-line or a left-parenthesis, or immediately preceding a right-parenthesis, produces undefined results.

^

The circumflex is special when used:

  • as an anchor
  • as the first character of a bracket expression
$

The dollar sign is special when used as an anchor.

Regex with Java

You can find the most proper information to use regex with Java in the API documentation of java.util.regex.Pattern class

Formal rules for bracket expression

Bracket expressions such as [0-9a-zA-Z], [^0-9a-zA-Z], or [0-9a-zA-Z.?*+-] are kind of different from normal expressions. One of the most important differences is metacharacters or special characters. Including that, more formal detailed description for bracket expression can be found in the following

Capturing, Grouping and Backreferences

NOT operator in Regex

Nested pairs search

Lookaround : lookahead and lookbehind

Greedy, Reluctant, or Possessive Quantifiers

0 comments:

Post a Comment