SEARCH
NEW RPMS
DIRECTORIES
ABOUT
FAQ
VARIOUS
BLOG

 
 

perl-String-Trigram rpm build for : openSUSE Tumbleweed. For other distributions click perl-String-Trigram.

Name : perl-String-Trigram
Version : 0.12 Vendor : obs://build_opensuse_org/devel:languages:perl
Release : 1.71 Date : 2024-08-05 18:24:46
Group : Development/Libraries/Perl Source RPM : perl-String-Trigram-0.12-1.71.src.rpm
Size : 0.04 MB
Packager : (none)
Summary : Find similar strings by trigram (or 1, 2, 4, etc.-gram) method
Description :
This module computes the similarity of two strings based on the trigram
method. This consists of splitting some string into triples of characters
and comparing those to the trigrams of some other string. For example the
string kangaroo has the trigrams \'{kan ang nga gar aro roo}\'. A wrongly
typed kanagaroo has the trigrams \'{kan ana nag aga gar aro roo}\'. To
compute the similarity we divide the number of matching trigrams (tokens
not types) by the number of all trigrams (types not tokens). For our
example this means dividing 4 / 9 resulting in 0.44.

To balance the disadvantage of the outer characters (every one of which
occurs in only one trigram - while the second and the penultimate occur in
two and the rest of the characters in three trigrams each) somewhat we pad
the string with blanks on either side resulting in two more trigrams \'\'
ka\'\' and \'\'ro \'\', when using a padding of one blank. Thus we arrive at 6
matching trigrams and 11 trigrams all in all, resulting in a similarity
value of 0.55.

When using the trigram method there is one thing that might appear as a
problem: Two short strings with one (or two or three ...) different
trigrams tend to produce a lower similarity then two long ones. To
counteract this effect, you can set the module\'s \'warp\' property. If you
set it to something greater than 1.0 (try something between 1.0 and 3.0,
flying at warp 9 won\'t get you anywhere here), this will lift the
similarity of short strings more and the similarity of long strings less,
resulting in the \'%%\' curve in the (schematical) diagram below.

1.0
simi- | % * %
larity | % * # #
value | % * #
| * #
| % * #
| * #
| *
| % * #
|
| *
| % #
| *
| *** no warp (i.e. warp == 1.0)
| %# %% warp > 1
| * ### warp < 1
|________________________________________________________
0.0
length of string

Dependency of similarity value on length of string and warp

Don\'t hesitate to use this feature, it sometimes really helps generating
useful results where you otherwise wouldn\'t have got any.

Please be aware of that a \'warp\' less than 1.0 will result in an inverse
effect pulling down the similarity of short strings a lot and the
similarity of long ones less, resulting in the \'###\' curve. I have no idea
what this can be good for, but it\'s just a side effect of the method. How
is all this done? Take a look at the code.

Splitting strings into trigrams is a time consuming affair and if you want
to compare a set of n strings to another set of m strings and you do it on
a member to member base you will have to do n * m splittings. To avoid
this, this module takes a set of strings as the base of comparison and
generates an index of every trigram occuring in any of the members of the
set (including the information, how often the trigram occurs in a given
member string). Then you can feed it the members of the other set one by
one. This results in an amount of n + m splitting plus the overhead from
generating the index. This way we save a lot of time at the expense of
memory, so - if you operate on a great amount of strings - this might turn
out to be somewhat of a problem. But there you are. There\'s no such thing
as a free lunch.

Anyway - the module is optimized for comparisons of sets of string which
results in single comparisons being slower than it might be. So, if you use
the \'compare()\' function which compares single strings in a functional
interface, to be able to use the full functionality of the module and not
to get into the need to program same things twice, internally a
String::Trigram object is instantiated and an index of the trigrams of one
of the strings is generated. In practice however this shouldn\'t be a big
disadvantage since a single comparison or just a few won\'t need too much
(absolute) time.

RPM found in directory: /packages/linux-pbone/ftp5.gwdg.de/pub/opensuse/repositories/devel:/languages:/perl:/CPAN-S/openSUSE_Tumbleweed/noarch

Content of RPM  Provides Requires

Download
ftp.icm.edu.pl  perl-String-Trigram-0.12-1.71.noarch.rpm
     

Provides :
perl(String::Trigram)
perl-String-Trigram

Requires :
perl(:MODULE_COMPAT_5.40.0)
rpmlib(CompressedFileNames) <= 3.0.4-1
rpmlib(FileDigests) <= 4.6.0-1
rpmlib(PayloadFilesHavePrefix) <= 4.0-1
rpmlib(PayloadIsZstd) <= 5.4.18-1


Content of RPM :
/usr/lib/perl5/vendor_perl/5.40.0/String
/usr/lib/perl5/vendor_perl/5.40.0/String/Trigram.pm
/usr/share/doc/packages/perl-String-Trigram
/usr/share/doc/packages/perl-String-Trigram/Changes
/usr/share/doc/packages/perl-String-Trigram/README
/usr/share/man/man3/String::Trigram.3pm.gz

 
ICM