Name : perl-Text-Scraper
| |
Version : 0.02
| Vendor : obs://build_opensuse_org/devel:languages:perl
|
Release : lp155.6.1
| Date : 2023-07-20 18:34:15
|
Group : Development/Libraries/Perl
| Source RPM : perl-Text-Scraper-0.02-lp155.6.1.src.rpm
|
Size : 0.03 MB
| |
Packager : https://www_suse_com/
| |
Summary : Structured data from (un)structured text
|
Description :
Template Tags are classed as _Leaves_ or _Branches_. Like XML, Branches must have an associated closing tag, Leaves must not. By default, Leaf nodes return SCALARs and Branch nodes return ARRAYs of HASHes - each array element mapping to a matched sub-sequence. Blessing or filtering this data is left as an exercise for subclasses.
The default syntax is based on the XML preprocessor syntax:
< ?tmpl TYPE NAME [ATTRIBUTES] ?>
and for Branches:
< ?tmpl TYPE NAME [ATTRIBUTES] ?> ... < ?tmpl end NAME ?>
By default, Tags _must_ be named and any closing tag _must_ include the name of the opening tag it is closing. Attributes have the same syntax as XML attributes - but (similar to Perl regular expressions) can use any non-bracket punctuation character as quotation delimiters:
< ?tmpl var foo bar=\"baz\" blah=/But dont \"quote\" me on that!/ ?>
The only attribute acted on by the default tag classes is \'regex\' - used to refine how the Tag is translated into a regular-expression capture group:
< ?tmpl var naiveEmailAddress regex=\"([\\w\\d\\.]+\\AATT[\\w\\d\\.]+)\" ?>
This can be used to further filter the parsed data - similar to using grep:
< ?tmpl var onlyFoocomEmailAddresses regex=\"([\\w\\d\\.]+AATT(?:foo\\.com))\" ?>
Each tag should create _only one_ capture group - but it is fine to make the outer group non-capturing:
< ?tmpl var dateJustMonth regex=\"(?:\\d+ (\\S+) \\d+)\" ?>
_The above would capture only the month field in dates formated as_ \'02 July 1979\'.
|
RPM found in directory: /packages/linux-pbone/ftp5.gwdg.de/pub/opensuse/repositories/devel:/languages:/perl:/CPAN-T/15.5/noarch |