Name : perl-Unicode-Regex-Set
| |
Version : 0.04
| Vendor : obs://build_opensuse_org/devel:languages:perl
|
Release : 7.65
| Date : 2024-08-05 17:21:52
|
Group : Development/Libraries/Perl
| Source RPM : perl-Unicode-Regex-Set-0.04-7.65.src.rpm
|
Size : 0.01 MB
| |
Packager : (none)
| |
Summary : Subtraction and Intersection of Character Sets
|
Description :
Perl 5.8.0 misses subtraction and intersection of characters, which is described in Unicode Regular Expressions (UTS #18). This module provides a mimic syntax of character classes including subtraction and intersection, taking advantage of look-ahead assertions.
The syntax provided by this module is considerably incompatible with the standard Perl\'s regex syntax.
Any whitespace character (that matches \'/\\s/\') is allowed between any tokens. Square brackets (\'\'[\'\' and \'\']\'\') are used for grouping. A literal whitespace and square brackets must be backslashed (escaped with a backslash, \'\'\\\'\'). You cannot put literal \'\']\'\' at the start of a group.
A POSIX-style character class like \'[:alpha:]\' is allowed since its \'\'[\'\' is not a literal.
SEPARATORS (\'\'&\'\' for intersection, \'\'|\'\' for union, and \'\'-\'\' for subtraction) should be enclosed with one or more whitespaces. E.g. \'[A&Z]\' is a list of \'\'A\'\', \'\'&\'\', \'\'Z\'\'. \'[A-Z]\' is a character range from \'\'A\'\' to \'\'Z\'\'. \'[A-Z - Z]\' is a set by removal of \'[Z]\' from \'[A-Z]\'.
Union operator \'\'|\'\' may be omitted. E.g. \'[A-Z | a-z]\' is equivalent to \'[A-Z a-z]\', and also to \'[A-Za-z]\'.
Intersection operator \'\'&\'\' has high precedence, so \'[\\p{A} \\p{B} & \\p{C} \\p{D}]\' is equivalent to \'[\\p{A} | [\\p{B} & \\p{C}] | \\p{D}]\'.
Subtraction operator \'\'-\'\' has low precedence, so \'[\\p{A} \\p{B} - \\p{C} \\p{D}]\' is equivalent to \'[[\\p{A} | \\p{B}] - [\\p{C} | \\p{D}] ]\'.
\'[\\p{A} - \\p{B} - \\p{C}]\' is a set by removal of \'\\p{B}\' and \'\\p{C}\' from \'\\p{A}\'. It is equivalent to \'[\\p{A} - [\\p{B} \\p{C}]]\' and \'[\\p{A} - \\p{B} \\p{C}]\'.
Negation. when \'\'^\'\' just after a group-opening \'\'[\'\', i.e. when they are combined as \'\'[^\'\', all the tokens following are negated. E.g. \'[^A-Z a-z]\' matches anything but neither \'[A-Z]\' nor \'[a-z]\'. More clearly you can say this with grouping as \'[^ [A-Z a-z]]\'.
If \'\'^\'\' that is not next to \'\'[\'\' is prefixed to a sequence of literal characters, character ranges, and/or metacharacters, such a \'\'^\'\' only negates that sequence; e.g. \'[A-Z ^\\p{Latin}]\' matches \'A-Z\' or a non-Latin character. But \'[A-Z [^\\p{Latin}]]\' (or \'[A-Z \\P{Latin}]\', for this is a simple case) is recommended for clarity.
If you want to remove anything other than \'PERL\' from \'[A-Z]\', use \'[A-Z & PERL]\' as well as \'[A-Z - [^PERL]]\'. Similarly, if you want to intersect \'[A-Z]\' and a thing not \'JUNK\', use \'[A-Z - JUNK]\' as well as \'[A-Z & [^JUNK]]\'.
For further examples, please see tests.
|
RPM found in directory: /packages/linux-pbone/ftp5.gwdg.de/pub/opensuse/repositories/devel:/languages:/perl:/CPAN-U/openSUSE_Tumbleweed/noarch |