Name : perl-Regexp-Grammars
| |
Version : 1.58.0
| Vendor : obs://build_opensuse_org/devel:languages:perl
|
Release : 1.1
| Date : 2022-09-16 05:08:00
|
Group : Unspecified
| Source RPM : perl-Regexp-Grammars-1.58.0-1.1.src.rpm
|
Size : 0.29 MB
| |
Packager : (none)
| |
Summary : Add grammatical parsing features to Perl 5.10 regexes
|
Description :
This module adds a small number of new regex constructs that can be used within Perl 5.10 patterns to implement complete recursive-descent parsing.
Perl 5.10 already supports recursive-descent _matching_, via the new \'(?< name>...)\' and \'(?&name)\' constructs. For example, here is a simple matcher for a subset of the LaTeX markup language:
$matcher = qr{ (?&File)
(?(DEFINE) (?< File> (?&Element)* )
(?< Element> \\s* (?&Command) | \\s* (?&Literal) )
(?< Command> \\\\ \\s* (?&Literal) \\s* (?&Options)? \\s* (?&Args)? )
(?< Options> \\[ \\s* (?:(?&Option) (?:\\s*,\\s* (?&Option) )*)? \\s* \\])
(?< Args> \\{ \\s* (?&Element)* \\s* \\} )
(?< Option> \\s* [^][\\$&%#_{}~^\\s,]+ )
(?< Literal> \\s* [^][\\$&%#_{}~^\\s]+ ) ) }xms
This technique makes it possible to use regexes to recognize complex, hierarchical--and even recursive--textual structures. The problem is that Perl 5.10 doesn\'t provide any support for extracting that hierarchical data into nested data structures. In other words, using Perl 5.10 you can _match_ complex data, but not _parse_ it into an internally useful form.
An additional problem when using Perl 5.10 regexes to match complex data formats is that you have to make sure you remember to insert whitespace-matching constructs (such as \'\\s*\') at every possible position where the data might contain ignorable whitespace. This reduces the readability of such patterns, and increases the chance of errors (typically caused by overlooking a location where whitespace might appear).
The Regexp::Grammars module solves both those problems.
If you import the module into a particular lexical scope, it preprocesses any regex in that scope, so as to implement a number of extensions to the standard Perl 5.10 regex syntax. These extensions simplify the task of defining and calling subrules within a grammar, and allow those subrule calls to capture and retain the components of they match in a proper hierarchical manner.
For example, the above LaTeX matcher could be converted to a full LaTeX parser (and considerably tidied up at the same time), like so:
use Regexp::Grammars; $parser = qr{ < File>
< rule: File> < [Element]>*
< rule: Element> < Command> | < Literal>
< rule: Command> \\\\ < Literal> < Options>? < Args>?
< rule: Options> \\[ < [Option]>+ % (,) \\]
< rule: Args> \\{ < [Element]>* \\}
< rule: Option> [^][\\$&%#_{}~^\\s,]+
< rule: Literal> [^][\\$&%#_{}~^\\s]+ }xms
Note that there is no need to explicitly place \'\\s*\' subpatterns throughout the rules; that is taken care of automatically.
If the Regexp::Grammars version of this regex were successfully matched against some appropriate LaTeX document, each rule would call the subrules specified within it, and then return a hash containing whatever result each of those subrules returned, with each result indexed by the subrule\'s name.
That is, if the rule named \'Command\' were invoked, it would first try to match a backslash, then it would call the three subrules \'< Literal>\', \'< Options>\', and \'< Args>\' (in that sequence). If they all matched successfully, the \'Command\' rule would then return a hash with three keys: \'\'Literal\'\', \'\'Options\'\', and \'\'Args\'\'. The value for each of those hash entries would be whatever result-hash the subrules themselves had returned when matched.
In this way, each level of the hierarchical regex can generate hashes recording everything its own subrules matched, so when the entire pattern matches, it produces a tree of nested hashes that represent the structured data the pattern matched.
For example, if the previous regex grammar were matched against a string containing:
\\documentclass[a4paper,11pt]{article} \\author{D. Conway}
it would automatically extract a data structure equivalent to the following (but with several extra \"empty\" keys, which are described in Subrule results):
{ \'file\' => { \'element\' => [ { \'command\' => { \'literal\' => \'documentclass\', \'options\' => { \'option\' => [ \'a4paper\', \'11pt\' ], }, \'args\' => { \'element\' => [ \'article\' ], } } }, { \'command\' => { \'literal\' => \'author\', \'args\' => { \'element\' => [ { \'literal\' => \'D.\', }, { \'literal\' => \'Conway\', } ] } } } ] } }
The data structure that Regexp::Grammars produces from a regex match is available to the surrounding program in the magic variable \'%/\'.
Regexp::Grammars provides many features that simplify the extraction of hierarchical data via a regex match, and also some features that can simplify the processing of that data once it has been extracted. The following sections explain each of those features, and some of the parsing techniques they support.
|
RPM found in directory: /packages/linux-pbone/ftp5.gwdg.de/pub/opensuse/repositories/devel:/languages:/perl/openSUSE_Tumbleweed/noarch |