add PIRegularExpression

2025-08-11 14:23:29 +03:00
parent 91955d44fa
commit 654c0847b2
481 changed files with 434858 additions and 0 deletions
--- a/3rd/pcre2/doc/pcre2syntax.3
+++ b/3rd/pcre2/doc/pcre2syntax.3
@@ -0,0 +1,736 @@
+.TH PCRE2SYNTAX 3 "27 November 2024" "PCRE2 10.45"
+.SH NAME
+PCRE2 - Perl-compatible regular expressions (revised API)
+.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
+.rs
+.sp
+The full syntax and semantics of the regular expression patterns that are
+supported by PCRE2 are described in the
+.\" HREF
+\fBpcre2pattern\fP
+.\"
+documentation. This document contains a quick-reference summary of the pattern
+syntax followed by the syntax of replacement strings in substitution function.
+The full description of the latter is in the
+.\" HREF
+\fBpcre2api\fP
+.\"
+documentation.
+.
+.SH "QUOTING"
+.rs
+.sp
+  \ex         where x is non-alphanumeric is a literal x
+  \eQ...\eE    treat enclosed characters as literal
+.sp
+Note that white space inside \eQ...\eE is always treated as literal, even if
+PCRE2_EXTENDED is set, causing most other white space to be ignored. Note also
+that PCRE2's handling of \eQ...\eE has some differences from Perl's. See the
+.\" HREF
+\fBpcre2pattern\fP
+.\"
+documentation for details.
+.
+.
+.SH "BRACED ITEMS"
+.rs
+.sp
+With one exception, wherever brace characters { and } are required to enclose
+data for constructions such as \eg{2} or \ek{name}, space and/or horizontal tab
+characters that follow { or precede } are allowed and are ignored. In the case
+of quantifiers, they may also appear before or after the comma. The exception
+is \eu{...} which is not Perl-compatible and is recognized only when
+PCRE2_EXTRA_ALT_BSUX is set. This is an ECMAScript compatibility feature, and
+follows ECMAScript's behaviour.
+.
+.
+.SH "ESCAPED CHARACTERS"
+.rs
+.sp
+This table applies to ASCII and Unicode environments. An unrecognized escape
+sequence causes an error.
+.sp
+  \ea         alarm, that is, the BEL character (hex 07)
+  \ecx        "control-x", where x is a non-control ASCII character
+  \ee         escape (hex 1B)
+  \ef         form feed (hex 0C)
+  \en         newline (hex 0A)
+  \er         carriage return (hex 0D)
+  \et         tab (hex 09)
+  \e0dd       character with octal code 0dd
+  \eddd       character with octal code ddd, or backreference
+  \eo{ddd..}  character with octal code ddd..
+  \eN{U+hh..} character with Unicode code point hh.. (Unicode mode only)
+  \exhh       character with hex code hh
+  \ex{hh..}   character with hex code hh..
+.sp
+\eN{U+hh..} is synonymous with \ex{hh..} but is not supported in environments
+that use EBCDIC code (mainly IBM mainframes). Note that \eN not followed by an
+opening curly bracket has a different meaning (see below).
+.P
+If PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX is set ("ALT_BSUX mode"), the
+following are also recognized:
+.sp
+  \eU         the character "U"
+  \euhhhh     character with hex code hhhh
+  \eu{hh..}   character with hex code hh.. but only for EXTRA_ALT_BSUX
+.sp
+When \ex is not followed by {, one or two hexadecimal digits are read,
+but in ALT_BSUX mode \ex must be followed by two hexadecimal digits to be
+recognized as a hexadecimal escape; otherwise it matches a literal "x".
+Likewise, if \eu (in ALT_BSUX mode) is not followed by four hexadecimal digits
+or (in EXTRA_ALT_BSUX mode) a sequence of hex digits in curly brackets, it
+matches a literal "u".
+.P
+Note that \e0dd is always an octal code. The treatment of backslash followed by
+a non-zero digit is complicated; for details see the section
+.\" HTML <a href="pcre2pattern.html#digitsafterbackslash">
+.\" </a>
+"Non-printing characters"
+.\"
+in the
+.\" HREF
+\fBpcre2pattern\fP
+.\"
+documentation, where details of escape processing in EBCDIC environments are
+also given.
+.
+.
+.SH "CHARACTER TYPES"
+.rs
+.sp
+  .          any character except newline;
+               in dotall mode, any character whatsoever
+  \eC         one code unit, even in UTF mode (best avoided)
+  \ed         a decimal digit
+  \eD         a character that is not a decimal digit
+  \eh         a horizontal white space character
+  \eH         a character that is not a horizontal white space character
+  \eN         a character that is not a newline
+  \ep{\fIxx\fP}     a character with the \fIxx\fP property
+  \eP{\fIxx\fP}     a character without the \fIxx\fP property
+  \eR         a newline sequence
+  \es         a white space character
+  \eS         a character that is not a white space character
+  \ev         a vertical white space character
+  \eV         a character that is not a vertical white space character
+  \ew         a "word" character
+  \eW         a "non-word" character
+  \eX         a Unicode extended grapheme cluster
+.sp
+\eC is dangerous because it may leave the current matching point in the middle
+of a UTF-8 or UTF-16 character. The application can lock out the use of \eC by
+setting the PCRE2_NEVER_BACKSLASH_C option. It is also possible to build PCRE2
+with the use of \eC permanently disabled.
+.P
+By default, \ed, \es, and \ew match only ASCII characters, even in UTF-8 mode
+or in the 16-bit and 32-bit libraries. However, if locale-specific matching is
+happening, \es and \ew may also match characters with code points in the range
+128-255. If the PCRE2_UCP option is set, the behaviour of these escape
+sequences is changed to use Unicode properties and they match many more
+characters, but there are some option settings that can restrict individual
+sequences to matching only ASCII characters.
+.P
+Property descriptions in \ep and \eP are matched caselessly; hyphens,
+underscores, and ASCII white space characters are ignored, in accordance with
+Unicode's "loose matching" rules. For example, \ep{Bidi_Class=al} is the same
+as \ep{ bidi class = AL }.
+.
+.
+.SH "GENERAL CATEGORY PROPERTIES FOR \ep and \eP"
+.rs
+.sp
+  C          Other
+  Cc         Control
+  Cf         Format
+  Cn         Unassigned
+  Co         Private use
+  Cs         Surrogate
+.sp
+  L          Letter
+  Lc         Cased letter, the union of Ll, Lu, and Lt
+  L&         Synonym of Lc
+  Ll         Lower case letter
+  Lm         Modifier letter
+  Lo         Other letter
+  Lt         Title case letter
+  Lu         Upper case letter
+.sp
+  M          Mark
+  Mc         Spacing mark
+  Me         Enclosing mark
+  Mn         Non-spacing mark
+.sp
+  N          Number
+  Nd         Decimal number
+  Nl         Letter number
+  No         Other number
+.sp
+  P          Punctuation
+  Pc         Connector punctuation
+  Pd         Dash punctuation
+  Pe         Close punctuation
+  Pf         Final punctuation
+  Pi         Initial punctuation
+  Po         Other punctuation
+  Ps         Open punctuation
+.sp
+  S          Symbol
+  Sc         Currency symbol
+  Sk         Modifier symbol
+  Sm         Mathematical symbol
+  So         Other symbol
+.sp
+  Z          Separator
+  Zl         Line separator
+  Zp         Paragraph separator
+  Zs         Space separator
+.sp
+From release 10.45, when caseless matching is set, Ll, Lu, and Lt are all
+equivalent to Lc.
+.
+.
+.SH "PCRE2 SPECIAL CATEGORY PROPERTIES FOR \ep and \eP"
+.rs
+.sp
+  Xan        Alphanumeric: union of properties L and N
+  Xps        POSIX space: property Z or tab, NL, VT, FF, CR
+  Xsp        Perl space: property Z or tab, NL, VT, FF, CR
+  Xuc        Universally-named character: one that can be
+               represented by a Universal Character Name
+  Xwd        Perl word: property Xan or underscore
+.sp
+Perl and POSIX space are now the same. Perl added VT to its space character set
+at release 5.18.
+.
+.
+.SH "BINARY PROPERTIES FOR \ep AND \eP"
+.rs
+.sp
+Unicode defines a number of binary properties, that is, properties whose only
+values are true or false. You can obtain a list of those that are recognized by
+\ep and \eP, along with their abbreviations, by running this command:
+.sp
+  pcre2test -LP
+.
+.
+.
+.SH "SCRIPT MATCHING WITH \ep AND \eP"
+.rs
+.sp
+Many script names and their 4-letter abbreviations are recognized in
+\ep{sc:...} or \ep{scx:...} items, or on their own with \ep (and also \eP of
+course). You can obtain a list of these scripts by running this command:
+.sp
+  pcre2test -LS
+.
+.
+.
+.SH "THE BIDI_CLASS PROPERTY FOR \ep AND \eP"
+.rs
+.sp
+  \ep{Bidi_Class:<class>}   matches a character with the given class
+  \ep{BC:<class>}           matches a character with the given class
+.sp
+The recognized classes are:
+.sp
+  AL          Arabic letter
+  AN          Arabic number
+  B           paragraph separator
+  BN          boundary neutral
+  CS          common separator
+  EN          European number
+  ES          European separator
+  ET          European terminator
+  FSI         first strong isolate
+  L           left-to-right
+  LRE         left-to-right embedding
+  LRI         left-to-right isolate
+  LRO         left-to-right override
+  NSM         non-spacing mark
+  ON          other neutral
+  PDF         pop directional format
+  PDI         pop directional isolate
+  R           right-to-left
+  RLE         right-to-left embedding
+  RLI         right-to-left isolate
+  RLO         right-to-left override
+  S           segment separator
+  WS          white space
+.
+.
+.SH "CHARACTER CLASSES"
+.rs
+.sp
+  [...]       positive character class
+  [^...]      negative character class
+  [x-y]       range (can be used for hex characters)
+  [[:xxx:]]   positive POSIX named set
+  [[:^xxx:]]  negative POSIX named set
+.sp
+  alnum       alphanumeric
+  alpha       alphabetic
+  ascii       0-127
+  blank       space or tab
+  cntrl       control character
+  digit       decimal digit
+  graph       printing, excluding space
+  lower       lower case letter
+  print       printing, including space
+  punct       printing, excluding alphanumeric
+  space       white space
+  upper       upper case letter
+  word        same as \ew
+  xdigit      hexadecimal digit
+.sp
+In PCRE2, POSIX character set names recognize only ASCII characters by default,
+but some of them use Unicode properties if PCRE2_UCP is set. You can use
+\eQ...\eE inside a character class.
+.P
+When PCRE2_ALT_EXTENDED_CLASS is set, UTS#18 extended character classes may be
+used, allowing nested character classes, combined using set operators.
+.sp
+  [x&&[^y]]   UTS#18 extended character class
+.sp
+  x||y        set union (OR)
+  x&&y        set intersection (AND)
+  x--y        set difference (AND NOT)
+  x~~y        set symmetric difference (XOR)
+.sp
+.
+.
+.SH "PERL EXTENDED CHARACTER CLASSES"
+.rs
+.sp
+  (?[...])                Perl extended character class
+  (?[\ep{Thai} & \ep{Nd}])  operators; whitespace ignored
+  (?[(x - y) & z])        parentheses for grouping
+.sp
+  (?[ [^3] & \ep{Nd} ])    [...] is a nested ordinary class
+  (?[ [:alpha:] - [z] ])  POSIX set is allowed outside [...]
+  (?[ \ed - [3] ])         backslash-escaped set is allowed outside [...]
+  (?[ !\en & [:ascii:] ])  backslash-escaped character is allowed outside [...]
+                      all other characters or ranges must be enclosed in [...]
+.sp
+  x|y, x+y                set union (OR)
+  x&y                     set intersection (AND)
+  x-y                     set difference (AND NOT)
+  x^y                     set symmetric difference (XOR)
+  !x                      set complement (NOT)
+.sp
+Inside a Perl extended character class, [...] switches mode to be interpreted
+as an ordinary character class. Outside of a nested [...], the only items
+permitted are backslash-escapes, POSIX sets, operators, and parentheses. Inside
+a nested ordinary class, ^ has its usual meaning (inverts the class when used
+as the first character); outside of a nested class, ^ is the XOR operator.
+.
+.
+.SH "QUANTIFIERS"
+.rs
+.sp
+  ?           0 or 1, greedy
+  ?+          0 or 1, possessive
+  ??          0 or 1, lazy
+  *           0 or more, greedy
+  *+          0 or more, possessive
+  *?          0 or more, lazy
+  +           1 or more, greedy
+  ++          1 or more, possessive
+  +?          1 or more, lazy
+  {n}         exactly n
+  {n,m}       at least n, no more than m, greedy
+  {n,m}+      at least n, no more than m, possessive
+  {n,m}?      at least n, no more than m, lazy
+  {n,}        n or more, greedy
+  {n,}+       n or more, possessive
+  {n,}?       n or more, lazy
+  {,m}        zero up to m, greedy
+  {,m}+       zero up to m, possessive
+  {,m}?       zero up to m, lazy
+.
+.
+.SH "ANCHORS AND SIMPLE ASSERTIONS"
+.rs
+.sp
+  \eb          word boundary
+  \eB          not a word boundary
+  ^           start of subject
+                also after an internal newline in multiline mode
+                (after any newline if PCRE2_ALT_CIRCUMFLEX is set)
+  \eA          start of subject
+  $           end of subject
+                also before newline at end of subject
+                also before internal newline in multiline mode
+  \eZ          end of subject
+                also before newline at end of subject
+  \ez          end of subject
+  \eG          first matching position in subject
+.
+.
+.SH "REPORTED MATCH POINT SETTING"
+.rs
+.sp
+  \eK          set reported start of match
+.sp
+From release 10.38 \eK is not permitted by default in lookaround assertions,
+for compatibility with Perl. However, if the PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
+option is set, the previous behaviour is re-enabled. When this option is set,
+\eK is honoured in positive assertions, but ignored in negative ones.
+.
+.
+.SH "ALTERNATION"
+.rs
+.sp
+  expr|expr|expr...
+.
+.
+.SH "CAPTURING"
+.rs
+.sp
+  (...)           capture group
+  (?<name>...)    named capture group (Perl)
+  (?'name'...)    named capture group (Perl)
+  (?P<name>...)   named capture group (Python)
+  (?:...)         non-capture group
+  (?|...)         non-capture group; reset group numbers for
+                   capture groups in each alternative
+.sp
+In non-UTF modes, names may contain underscores and ASCII letters and digits;
+in UTF modes, any Unicode letters and Unicode decimal digits are permitted. In
+both cases, a name must not start with a digit.
+.
+.
+.SH "ATOMIC GROUPS"
+.rs
+.sp
+  (?>...)         atomic non-capture group
+  (*atomic:...)   atomic non-capture group
+.
+.
+.SH "COMMENT"
+.rs
+.sp
+  (?#....)        comment (not nestable)
+.
+.
+.SH "OPTION SETTING"
+.rs
+Changes of these options within a group are automatically cancelled at the end
+of the group.
+.sp
+  (?a)            all ASCII options
+  (?aD)           restrict \ed to ASCII in UCP mode
+  (?aS)           restrict \es to ASCII in UCP mode
+  (?aW)           restrict \ew to ASCII in UCP mode
+  (?aP)           restrict all POSIX classes to ASCII in UCP mode
+  (?aT)           restrict POSIX digit classes to ASCII in UCP mode
+  (?i)            caseless
+  (?J)            allow duplicate named groups
+  (?m)            multiline
+  (?n)            no auto capture
+  (?r)            restrict caseless to either ASCII or non-ASCII
+  (?s)            single line (dotall)
+  (?U)            default ungreedy (lazy)
+  (?x)            ignore white space except in classes or \eQ...\eE
+  (?xx)           as (?x) but also ignore space and tab in classes
+  (?-...)         unset the given option(s)
+  (?^)            unset imnrsx options
+.sp
+(?aP) implies (?aT) as well, though this has no additional effect. However, it
+means that (?-aP) also implies (?-aT) and disables all ASCII restrictions for
+POSIX classes.
+.P
+Unsetting x or xx unsets both. Several options may be set at once, and a
+mixture of setting and unsetting such as (?i-x) is allowed, but there may be
+only one hyphen. Setting (but no unsetting) is allowed after (?^ for example
+(?^in). An option setting may appear at the start of a non-capture group, for
+example (?i:...).
+.P
+The following are recognized only at the very start of a pattern or after one
+of the newline or \eR sequences or options with similar syntax. More than one
+of them may appear. For the first three, d is a decimal number.
+.sp
+  (*LIMIT_DEPTH=d)     set the backtracking limit to d
+  (*LIMIT_HEAP=d)      set the heap size limit to d * 1024 bytes
+  (*LIMIT_MATCH=d)     set the match limit to d
+  (*CASELESS_RESTRICT) set PCRE2_EXTRA_CASELESS_RESTRICT when matching
+  (*NOTEMPTY)          set PCRE2_NOTEMPTY when matching
+  (*NOTEMPTY_ATSTART)  set PCRE2_NOTEMPTY_ATSTART when matching
+  (*NO_AUTO_POSSESS)   no auto-possessification (PCRE2_NO_AUTO_POSSESS)
+  (*NO_DOTSTAR_ANCHOR) no .* anchoring (PCRE2_NO_DOTSTAR_ANCHOR)
+  (*NO_JIT)            disable JIT optimization
+  (*NO_START_OPT)      no start-match optimization (PCRE2_NO_START_OPTIMIZE)
+  (*TURKISH_CASING)    set PCRE2_EXTRA_TURKISH_CASING when matching
+  (*UTF)               set appropriate UTF mode for the library in use
+  (*UCP)               set PCRE2_UCP (use Unicode properties for \ed etc)
+.sp
+Note that LIMIT_DEPTH, LIMIT_HEAP, and LIMIT_MATCH can only reduce the value of
+the limits set by the caller of \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP,
+not increase them. LIMIT_RECURSION is an obsolete synonym for LIMIT_DEPTH. The
+application can lock out the use of (*UTF) and (*UCP) by setting the
+PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options, respectively, at compile time.
+.
+.
+.SH "NEWLINE CONVENTION"
+.rs
+.sp
+These are recognized only at the very start of the pattern or after option
+settings with a similar syntax.
+.sp
+  (*CR)           carriage return only
+  (*LF)           linefeed only
+  (*CRLF)         carriage return followed by linefeed
+  (*ANYCRLF)      all three of the above
+  (*ANY)          any Unicode newline sequence
+  (*NUL)          the NUL character (binary zero)
+.
+.
+.SH "WHAT \eR MATCHES"
+.rs
+.sp
+These are recognized only at the very start of the pattern or after option
+setting with a similar syntax.
+.sp
+  (*BSR_ANYCRLF)  CR, LF, or CRLF
+  (*BSR_UNICODE)  any Unicode newline sequence
+.
+.
+.SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS"
+.rs
+.sp
+  (?=...)                     )
+  (*pla:...)                  ) positive lookahead
+  (*positive_lookahead:...)   )
+.sp
+  (?!...)                     )
+  (*nla:...)                  ) negative lookahead
+  (*negative_lookahead:...)   )
+.sp
+  (?<=...)                    )
+  (*plb:...)                  ) positive lookbehind
+  (*positive_lookbehind:...)  )
+.sp
+  (?<!...)                    )
+  (*nlb:...)                  ) negative lookbehind
+  (*negative_lookbehind:...)  )
+.sp
+Each top-level branch of a lookbehind must have a limit for the number of
+characters it matches. If any branch can match a variable number of characters,
+the maximum for each branch is limited to a value set by the caller of
+\fBpcre2_compile()\fP or defaulted. The default is set when PCRE2 is built
+(ultimate default 255). If every branch matches a fixed number of characters,
+the limit for each branch is 65535 characters.
+.
+.
+.SH "NON-ATOMIC LOOKAROUND ASSERTIONS"
+.rs
+.sp
+These assertions are specific to PCRE2 and are not Perl-compatible.
+.sp
+  (?*...)                                )
+  (*napla:...)                           ) synonyms
+  (*non_atomic_positive_lookahead:...)   )
+.sp
+  (?<*...)                               )
+  (*naplb:...)                           ) synonyms
+  (*non_atomic_positive_lookbehind:...)  )
+.
+.
+.SH "SUBSTRING SCAN ASSERTION"
+.rs
+This feature is not Perl-compatible.
+.sp
+  (*scan_substring:(grouplist)...)  scan captured substring
+  (*scs:(grouplist)...)             scan captured substring
+.sp
+The comma-separated list may identify groups in any of the following ways:
+.sp
+  n       absolute reference
+  +n      relative reference
+  -n      relative reference
+  <name>  name
+  'name'  name
+.sp
+.
+.
+.SH "SCRIPT RUNS"
+.rs
+.sp
+  (*script_run:...)           ) script run, can be backtracked into
+  (*sr:...)                   )
+.sp
+  (*atomic_script_run:...)    ) atomic script run
+  (*asr:...)                  )
+.
+.
+.SH "BACKREFERENCES"
+.rs
+.sp
+  \en              reference by number (can be ambiguous)
+  \egn             reference by number
+  \eg{n}           reference by number
+  \eg+n            relative reference by number (PCRE2 extension)
+  \eg-n            relative reference by number
+  \eg{+n}          relative reference by number (PCRE2 extension)
+  \eg{-n}          relative reference by number
+  \ek<name>        reference by name (Perl)
+  \ek'name'        reference by name (Perl)
+  \eg{name}        reference by name (Perl)
+  \ek{name}        reference by name (.NET)
+  (?P=name)       reference by name (Python)
+.
+.
+.SH "SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)"
+.rs
+.sp
+  (?R)            recurse whole pattern
+  (?n)            call subroutine by absolute number
+  (?+n)           call subroutine by relative number
+  (?-n)           call subroutine by relative number
+  (?&name)        call subroutine by name (Perl)
+  (?P>name)       call subroutine by name (Python)
+  \eg<name>        call subroutine by name (Oniguruma)
+  \eg'name'        call subroutine by name (Oniguruma)
+  \eg<n>           call subroutine by absolute number (Oniguruma)
+  \eg'n'           call subroutine by absolute number (Oniguruma)
+  \eg<+n>          call subroutine by relative number (PCRE2 extension)
+  \eg'+n'          call subroutine by relative number (PCRE2 extension)
+  \eg<-n>          call subroutine by relative number (PCRE2 extension)
+  \eg'-n'          call subroutine by relative number (PCRE2 extension)
+.
+.
+.SH "CONDITIONAL PATTERNS"
+.rs
+.sp
+  (?(condition)yes-pattern)
+  (?(condition)yes-pattern|no-pattern)
+.sp
+  (?(n)               absolute reference condition
+  (?(+n)              relative reference condition (PCRE2 extension)
+  (?(-n)              relative reference condition (PCRE2 extension)
+  (?(<name>)          named reference condition (Perl)
+  (?('name')          named reference condition (Perl)
+  (?(name)            named reference condition (PCRE2, deprecated)
+  (?(R)               overall recursion condition
+  (?(Rn)              specific numbered group recursion condition
+  (?(R&name)          specific named group recursion condition
+  (?(DEFINE)          define groups for reference
+  (?(VERSION[>]=n.m)  test PCRE2 version
+  (?(assert)          assertion condition
+.sp
+Note the ambiguity of (?(R) and (?(Rn) which might be named reference
+conditions or recursion tests. Such a condition is interpreted as a reference
+condition if the relevant named group exists.
+.
+.
+.SH "BACKTRACKING CONTROL"
+.rs
+.sp
+All backtracking control verbs may be in the form (*VERB:NAME). For (*MARK) the
+name is mandatory, for the others it is optional. (*SKIP) changes its behaviour
+if :NAME is present. The others just set a name for passing back to the caller,
+but this is not a name that (*SKIP) can see. The following act immediately they
+are reached:
+.sp
+  (*ACCEPT)       force successful match
+  (*FAIL)         force backtrack; synonym (*F)
+  (*MARK:NAME)    set name to be passed back; synonym (*:NAME)
+.sp
+The following act only when a subsequent match failure causes a backtrack to
+reach them. They all force a match failure, but they differ in what happens
+afterwards. Those that advance the start-of-match point do so only if the
+pattern is not anchored.
+.sp
+  (*COMMIT)       overall failure, no advance of starting point
+  (*PRUNE)        advance to next starting character
+  (*SKIP)         advance to current matching position
+  (*SKIP:NAME)    advance to position corresponding to an earlier
+                  (*MARK:NAME); if not found, the (*SKIP) is ignored
+  (*THEN)         local failure, backtrack to next alternation
+.sp
+The effect of one of these verbs in a group called as a subroutine is confined
+to the subroutine call.
+.
+.
+.SH "CALLOUTS"
+.rs
+.sp
+  (?C)            callout (assumed number 0)
+  (?Cn)           callout with numerical data n
+  (?C"text")      callout with string data
+.sp
+The allowed string delimiters are ` ' " ^ % # $ (which are the same for the
+start and the end), and the starting delimiter { matched with the ending
+delimiter }. To encode the ending delimiter within the string, double it.
+.
+.
+.SH "REPLACEMENT STRINGS"
+.rs
+.sp
+If the PCRE2_SUBSTITUTE_LITERAL option is set, a replacement string for
+\fBpcre2_substitute()\fP is not interpreted. Otherwise, by default, the only
+special character is the dollar character in one of the following forms:
+.sp
+  $$                  insert a dollar character
+  $n or ${n}          insert the contents of group \fIn\fP
+  $<name>             insert the contents of named group
+  $0 or $&            insert the entire matched substring
+  $`                  insert the substring that precedes the match
+  $'                  insert the substring that follows the match
+  $_                  insert the entire input string
+  $*MARK or ${*MARK}  insert a control verb name
+.sp
+For ${n}, n can be a name or a number. If PCRE2_SUBSTITUTE_EXTENDED is set,
+there is additional interpretation:
+.P
+1. Backslash is an escape character, and the forms described in "ESCAPED
+CHARACTERS" above are recognized. Also:
+.sp
+  \eQ...\eE   can be used to suppress interpretation
+  \el        force the next character to lower case
+  \eu        force the next character to upper case
+  \eL        force subsequent characters to lower case
+  \eU        force subsequent characters to upper case
+  \eu\eL      force next character to upper case, then all lower
+  \el\eU      force next character to lower case, then all upper
+  \eE        end \eL or \eU case forcing
+  \eb        backspace character (note: as in character class in pattern)
+  \ev        vertical tab character (note: not the same as in a pattern)
+.sp
+2. The Python form \eg<n>, where the angle brackets are part of the syntax and
+\fIn\fP is either a group name or a number, is recognized as an alternative way
+of inserting the contents of a group, for example \eg<3>.
+.P
+3. Capture substitution supports the following additional forms:
+.sp
+  ${n:-string}             default for unset group
+  ${n:+string1:string2}    values for set/unset group
+.sp
+The substitution strings themselves are expanded. Backslash can be used to
+escape colons and closing curly brackets.
+.
+.
+.SH "SEE ALSO"
+.rs
+.sp
+\fBpcre2pattern\fP(3), \fBpcre2api\fP(3), \fBpcre2callout\fP(3),
+\fBpcre2matching\fP(3), \fBpcre2\fP(3).
+.
+.
+.SH AUTHOR
+.rs
+.sp
+.nf
+Philip Hazel
+Retired from University Computing Service
+Cambridge, England.
+.fi
+.
+.
+.SH REVISION
+.rs
+.sp
+.nf
+Last updated: 27 November 2024
+Copyright (c) 1997-2024 University of Cambridge.
+.fi