SEARCH
NEW RPMS
DIRECTORIES
ABOUT
FAQ
VARIOUS
BLOG

 
 

perl-Regexp-Ignore rpm build for : OpenSuSE. For other distributions click perl-Regexp-Ignore.

Name : perl-Regexp-Ignore
Version : 0.03 Vendor : obs://build_opensuse_org/devel:languages:perl
Release : lp154.6.1 Date : 2023-01-27 17:19:55
Group : Development/Libraries/Perl Source RPM : perl-Regexp-Ignore-0.03-lp154.6.1.src.rpm
Size : 0.06 MB
Packager : https://www_suse_com/
Summary : Let us ignore unwanted parts, while parsing text.
Description :
Markup languages, like HTML, are difficult to parse. The reason is that you
can have a line like:

< font size=+1>H< /font>ello < font size=+1>W< /font>orld

How can we find the string \"Hello World\", in the above line, and replace it
by \"Hello Universe\" (which is a lot deeper)? Or how can we run a speller on
the text and replace the mistakes with suggestions for the correct
spelling?

This module come to help you doing exactly that.

Actually the module let you first split the text to the parts you are
interested in and the unwanted parts. For example, all the HTML tags can be
taken as unwanted parts.

Then it let you parse the part you are interested in (while totally
ignoring the unwanted parts).

In the end it let you merge back the unwanted parts with the possibly
changed parts you were interested in.

There is just one catch. It uses the assumption that when you replace the
above \"Hello World\" to \"Hello Universe\", all the unwanted parts between the
start of the match to the end of the match, will be pushed after the text
that will replace the match. This is not really understood right? Look at
the example:

The text:

< font size=+1>H< /font>ello < font size=+1>W< /font>orld

will be first split and we will get the \"cleaned\" text:

Hello World

Then we can parse it using something like:

s/Hello World/Hello Universe/;

This will give us the changed \"cleaned\" text:

Hello Universe

When we will merge with the unwanted parts we will get

< font size=+1>Hello Universe< /font>< font size=+1>< /font>

So, the unwanted parts in the match were pushed after the replacer.

Why this assumption?

Because. Actually, I could not find any better assumption. I can not guess
what will be the unwanted parts in a match and the replacer of the match
might be longer or shorter then the match itself. So, in fact, we have
three reasonable possibilities: 1. Push the unwanted parts before the
replacer. 2. Push the unwanted parts after the replacer. 3. Spread the
unwanted parts in the replacer in the same proportions that they are spread
in the match.

So I chose the second option. It is very similar to the first, and by far a
lot simpler (to implement and to use) then the third.

As you see in the example above, usually it should not break the markup
language. It might, though, give some surprises - in the example above,
\"Hello Universe\" is all marked to be with bigger fonts.

All in all, I believe that it provides big help when parsing formatted
texts.

So now, that we know what the module can give us, let\'s check how we use
the module.

The class Regexp::Ignore is an abstract class: there is a method,
*get_tokens*, in the class that is not implemented. So the user of this
class must inherit it and implement the *get_tokens* method. The
*get_tokens* method actually splits the text into tokens and mark them
\"wanted\" or \"unwanted\".

Don\'t panic - it might sound very difficult, but it is not. Moreover, the
module comes with some classes that already inherit from Regexp::Ignore,
and you can use them. For more details about implementing the *get_tokens*
method and an implementation example, see below.

After we have the inherited class that implements the *get_tokens* method,
and we call *split* to split the text, we can go on with our parsing like
the SYNOPSIS above. We can use the method *s* which is parallel to the perl
s// operator, and if we need more complex text manipulation, we can replace
text directly using the b< replace> method.

When we finish to change the text, we can call the *merge* method that will
build the resulted text from the changed \"cleaned\" text and the unwanted
parts.

RPM found in directory: /packages/linux-pbone/ftp5.gwdg.de/pub/opensuse/repositories/devel:/languages:/perl:/CPAN-R/15.4/noarch

Content of RPM  Provides Requires

Download
ftp.icm.edu.pl  perl-Regexp-Ignore-0.03-lp154.6.1.noarch.rpm
     

Provides :
perl(Regexp::Ignore)
perl(Regexp::IgnoreHTML)
perl(Regexp::IgnoreTextCharacteristicsHTML)
perl-Regexp-Ignore

Requires :
perl(:MODULE_COMPAT_5.26.1)
rpmlib(CompressedFileNames) <= 3.0.4-1
rpmlib(FileDigests) <= 4.6.0-1
rpmlib(PayloadFilesHavePrefix) <= 4.0-1
rpmlib(PayloadIsXz) <= 5.2-1


Content of RPM :
/usr/lib/perl5/vendor_perl/5.26.1/Regexp
/usr/lib/perl5/vendor_perl/5.26.1/Regexp/Ignore.pm
/usr/lib/perl5/vendor_perl/5.26.1/Regexp/IgnoreHTML.pm
/usr/lib/perl5/vendor_perl/5.26.1/Regexp/IgnoreTextCharacteristicsHTML.pm
/usr/lib/perl5/vendor_perl/5.26.1/x86_64-linux-thread-multi
/usr/share/doc/packages/perl-Regexp-Ignore
/usr/share/doc/packages/perl-Regexp-Ignore/Changes
/usr/share/doc/packages/perl-Regexp-Ignore/README
/usr/share/doc/packages/perl-Regexp-Ignore/examples
/usr/share/doc/packages/perl-Regexp-Ignore/examples/speller.pl
/usr/share/man/man3/Regexp::Ignore.3pm.gz
/usr/share/man/man3/Regexp::IgnoreHTML.3pm.gz
/usr/share/man/man3/Regexp::IgnoreTextCharacteristicsHTML.3pm.gz

 
ICM