Name : R-tokenizers
| |
Version : 0.3.0
| Vendor : obs://build_opensuse_org/devel:languages:R
|
Release : lp155.3.2
| Date : 2024-07-18 15:00:00
|
Group : Development/Libraries/Other
| Source RPM : R-tokenizers-0.3.0-lp155.3.2.src.rpm
|
Size : 0.77 MB
| |
Packager : https://www_suse_com/
| |
Summary : Fast, Consistent Tokenization of Natural Language Text
|
Description :
Convert natural language text into tokens. Includes tokenizers for shingled n-grams, skip n-grams, words, word stems, sentences, paragraphs, characters, shingled characters, lines, Penn Treebank, regular expressions, as well as functions for counting characters, words, and sentences, and a function for splitting longer texts into separate documents, each with the same number of words. The tokenizers have a consistent interface, and the package is built on the \'stringi\' and \'Rcpp\' packages for fast yet correct tokenization in \'UTF-8\'.
|
RPM found in directory: /packages/linux-pbone/ftp5.gwdg.de/pub/opensuse/repositories/devel:/languages:/R:/autoCRAN/15.5/x86_64 |