Name : R-tokenizers
| |
Version : 0.3.0
| Vendor : obs://build_opensuse_org/devel:languages:R
|
Release : 3.5
| Date : 2024-08-28 23:53:45
|
Group : Development/Libraries/Other
| Source RPM : R-tokenizers-0.3.0-3.5.src.rpm
|
Size : 0.75 MB
| |
Packager : (none)
| |
Summary : Fast, Consistent Tokenization of Natural Language Text
|
Description :
Convert natural language text into tokens. Includes tokenizers for shingled n-grams, skip n-grams, words, word stems, sentences, paragraphs, characters, shingled characters, lines, Penn Treebank, regular expressions, as well as functions for counting characters, words, and sentences, and a function for splitting longer texts into separate documents, each with the same number of words. The tokenizers have a consistent interface, and the package is built on the \'stringi\' and \'Rcpp\' packages for fast yet correct tokenization in \'UTF-8\'.
|
RPM found in directory: /packages/linux-pbone/ftp5.gwdg.de/pub/opensuse/repositories/devel:/languages:/R:/autoCRAN/openSUSE_Tumbleweed/x86_64 |