Name : tokenizer
| |
Version : 5.4.1
| Vendor : MandrakeSoft
|
Release : 1mdk
| Date : 2004-11-25 15:44:23
|
Group : Sciences/Computer science
| Source RPM : tokenizer-5.4.1-1mdk.src.rpm
|
Size : 0.07 MB
| |
Packager : Guillaume Rousse < guillomovitch_mandrake_org>
| |
Summary : Text segmenter
|
Description :
Tokenizer allows to segment a text in tokens, then in word-forms. The tokens match regular expressions, and the word-forms match lexical entries compiled with lexed. A word-form is a concatenation of tokens for a compound name. Ambiguity between simple and coumpound words is represented through a direct acyclic graph (DAG).
|
RPM found in directory: /vol/rzm6/linux-mandriva/official/10.2/i586/media/contrib |