Name : snowball
| |
Version : 2.2.0
| Vendor : Fedora Project
|
Release : 10.fc40
| Date : 2024-02-12 18:56:35
|
Group : Unspecified
| Source RPM : snowball-2.2.0-10.fc40.src.rpm
|
Size : 0.28 MB
| |
Packager : Fedora Project
| |
Summary : Snowball compiler and stemming algorithms
|
Description :
Snowball is a small string processing language for creating stemming algorithms for use in Information Retrieval, plus a collection of stemming algorithms implemented using it.
Snowball was originally designed and built by Martin Porter. Martin retired from development in 2014 and Snowball is now maintained as a community project. Martin originally chose the name Snowball as a tribute to SNOBOL, the excellent string handling language from the 1960s. It now also serves as a metaphor for how the project grows by gathering contributions over time.
The Snowball compiler translates a Snowball program into source code in another language - currently Ada, ISO C, C#, Go, Java, Javascript, Object Pascal, Python and Rust are supported.
What is Stemming?
Stemming maps different forms of the same word to a common \"stem\" - for example, the English stemmer maps connection, connections, connective, connected, and connecting to connect. So a search for connected would also find documents which only have the other forms.
This stem form is often a word itself, but this is not always the case as this is not a requirement for text search systems, which are the intended field of use. We also aim to conflate words with the same meaning, rather than all words with a common linguistic root (so awe and awful don\'t have the same stem), and over-stemming is more problematic than under-stemming so we tend not to stem in cases that are hard to resolve. If you want to always reduce words to a root form and/or get a root form which is itself a word then Snowball\'s stemming algorithms likely aren\'t the right answer.
|
RPM found in directory: /vol/rzm3/linux-fedora-buffet/linux/development/40/Everything/x86_64/os/Packages/s |