Name : libtextcat
| |
Version : 2.2
| Vendor : openSUSE Build Service
|
Release : 4.1
| Date : 2007-10-27 22:23:01
|
Group : Development/Languages/C and C++
| Source RPM : libtextcat-2.2-4.1.src.rpm
|
Size : 0.67 MB
| |
Packager : (none)
| |
Summary : Library for text classification
|
Description :
Libtextcat is a library with functions that implement the classification technique described in Cavnar & Trenkle, \"N-Gram-Based Text Categorization\" [1]. It was primarily developed for language guessing, a task on which it is known to perform with near-perfect accuracy.
The central idea of the Cavnar & Trenkle technique is to calculate a \"fingerprint\" of a document with an unknown category, and compare this with the fingerprints of a number of documents of which the categories are known. The categories of the closest matches are output as the classification. A fingerprint is a list of the most frequent n-grams occurring in a document, ordered by frequency. Fingerprints are compared with a simple out-of-place metric. See the article for more details.
Considerable effort went into making this implementation fast and efficient. The language guesser processes over 100 documents/second on a simple PC, which makes it practical for many uses. It was developed for use in our webcrawler and search engine software, in which it it handles millions of documents a day.
Authors: -------- Frank Scheelen
|
RPM found in directory: /packages/linux-pbone/ftp5.gwdg.de/pub/opensuse/repositories/server:/search/SLE_10/x86_64 |
Hmm ... It's impossible ;-) This RPM doesn't exist on any FTP server
Provides :
libtextcat.so.0()(64bit)
libtextcat
Requires :