SEARCH
NEW RPMS
DIRECTORIES
ABOUT
FAQ
VARIOUS
BLOG

 
 

perl-Text-Ngram rpm build for : OpenSuSE. For other distributions click perl-Text-Ngram.

Name : perl-Text-Ngram
Version : 0.13 Vendor : obs://build_opensuse_org/devel:languages:perl
Release : 4.6 Date : 2014-07-17 03:23:03
Group : Development/Libraries/Perl Source RPM : perl-Text-Ngram-0.13-4.6.src.rpm
Size : 0.03 MB
Packager : (none)
Summary : Ngram analysis of text
Description :
n-Gram analysis is a field in textual analysis which uses sliding window
character sequences in order to aid topic analysis, language determination
and so on. The n-gram spectrum of a document can be used to compare and
filter documents in multiple languages, prepare word prediction networks,
and perform spelling correction.

The neat thing about n-grams, though, is that they\'re really easy to
determine. For n=3, for instance, we compute the n-gram counts like so:

the cat sat on the mat
--- $counts{\"the\"}++;
--- $counts{\"he \"}++;
--- $counts{\"e c\"}++;
...

This module provides an efficient XS-based implementation of n-gram
spectrum analysis.

There are two functions which can be imported:

ngram_counts
This first function returns a hash reference with the n-gram histogram
of the text for the given window size. The default window size is 5.

$href = ngram_counts(\\%config, $text, $window_size);

The only necessary parameter is $text.

The possible value for \\%config are:

flankbreaks
If set to 1 (default), breaks are flanked by spaces; if set to 0,
they\'re not. Breaks are punctuation and other non-alfabetic
characters, which, unless you use \'punctuation => 0\' in your
configuration, do not make it into the returned hash.

Here\'s an example, supposing you\'re using the default value for
punctuation (1):

my $text = \"Hello, world\";
my $hash = ngram_counts($text, 5);

That produces the following ngrams:

{
\'Hello\' => 1,
\'ello \' => 1,
\' worl\' => 1,
\'world\' => 1,
}

On the other hand, this:

my $text = \"Hello, world\";
my $hash = ngram_counts({flankbreaks => 0}, $text, 5);

Produces the following ngrams:

{
\'Hello\' => 1,
\' worl\' => 1,
\'world\' => 1,
}

lowercase
If set to 0, casing is preserved. If set to 1, all letters are
lowercased before counting ngrams. Default is 1.


$href_p = ngram_counts( {lowercase => 0}, $text, 4 );

punctuation
If set to 0 (default), punctuation is removed before calculating
the ngrams. Set to 1 to preserve it.


$href_p = ngram_counts( {punctuation => 1}, $text, 2 );

spaces
If set to 0 (default is 1), no ngrams contaning spaces will be
returned.


$href = ngram_counts( {spaces => 0}, $text, 3);

If you\'re going to request both types of ngrams, than the best way
to avoid calculating the same thing twice is probably this:

$href_with_spaces = ngram_counts($text[, $window]);
$href_no_spaces = $href_with_spaces;
for (keys %$href_no_spaces) { delete $href->{$_} if / / }

add_to_counts
This incrementally adds to the supplied hash; if \'$window\' is zero or
undefined, then the window size is computed from the hash keys.

add_to_counts($more_text, $window, $href)

RPM found in directory: /packages/linux-pbone/ftp5.gwdg.de/pub/opensuse/repositories/devel:/languages:/perl/SLE_11_SP3/i586

Content of RPM  Changelog  Provides Requires

Hmm ... It's impossible ;-) This RPM doesn't exist on any FTP server

Provides :
Ngram.so
perl(Text::Ngram)
perl-Text-Ngram

Requires :
rpmlib(CompressedFileNames) <= 3.0.4-1
perl = 5.10.0
rpmlib(VersionedDependencies) <= 3.0.3-1
rpmlib(PayloadFilesHavePrefix) <= 4.0-1
rpmlib(PayloadIsLzma) <= 4.4.6-1
libc.so.6
libc.so.6(GLIBC_2.1.3)


Content of RPM :
/usr/lib/perl5/vendor_perl/5.10.0/i586-linux-thread-multi
/usr/lib/perl5/vendor_perl/5.10.0/i586-linux-thread-multi/Text
/usr/lib/perl5/vendor_perl/5.10.0/i586-linux-thread-multi/Text/Ngram.pm
/usr/lib/perl5/vendor_perl/5.10.0/i586-linux-thread-multi/auto/Text
/usr/lib/perl5/vendor_perl/5.10.0/i586-linux-thread-multi/auto/Text/Ngram
/usr/lib/perl5/vendor_perl/5.10.0/i586-linux-thread-multi/auto/Text/Ngram/Ngram.bs
/usr/lib/perl5/vendor_perl/5.10.0/i586-linux-thread-multi/auto/Text/Ngram/Ngram.so
/usr/share/doc/packages/perl-Text-Ngram
/usr/share/doc/packages/perl-Text-Ngram/CREDITS
/usr/share/doc/packages/perl-Text-Ngram/Changes
/usr/share/doc/packages/perl-Text-Ngram/README
/usr/share/man/man3/Text::Ngram.3pm.gz

 
ICM