Name : tokenizer
| |
Version : 5.3.5
| Vendor : MandrakeSoft
|
Release : 1mdk
| Date : 2004-08-05 12:04:10
|
Group : Sciences/Computer science
| Source RPM : tokenizer-5.3.5-1mdk.src.rpm
|
Size : 0.07 MB
| |
Packager : Guillaume Rousse < guillomovitch_mandrake_org>
| |
Summary : Text segmenter
|
Description :
Tokenizer allows to segment a text in tokens, then in word-forms. The tokens match regular expressions, and the word-forms match lexical entries compiled with lexed. A word-form is a concatenation of tokens for a compound name. Ambiguity between simple and coumpound words is represented through a direct acyclic graph (DAG).
|
RPM found in directory: /vol/rzm6/linux-mandriva/official/10.1/i586/media/contrib |