Name : perl-SWISH-Filter
| |
Version : 0.191
| Vendor : obs://build_opensuse_org/devel:languages:perl
|
Release : 1.71
| Date : 2024-08-05 19:47:59
|
Group : Development/Libraries/Perl
| Source RPM : perl-SWISH-Filter-0.191-1.71.src.rpm
|
Size : 0.12 MB
| |
Packager : (none)
| |
Summary : filter documents for indexing with Swish-e
|
Description :
SWISH::Filter provides a unified way to convert documents into a type that Swish-e can index. Individual filters are installed as separate subclasses (modules). For example, there might be a filter that converts from PDF format to HTML format.
SWISH::Filter is a framework that relies on other packages to do the heavy lifting of converting non-text documents to text. *Additional helper programs or Perl modules may need to be installed to use SWISH::Filter to filter documents.* For example, to filter PDF documents you must install the \'Xpdf\' package.
The filters are automatically loaded when \'SWISH::Filters->new()\' is called. Filters define a type and priority that determines the processing order of the filter. Filters are processed in this sort order until a filter accepts the document for filtering. The filter uses the document\'s content type to determine if the filter should handle the current document. The content-type is determined by the files suffix if not supplied by the calling program.
The individual filters are not designed to be used as separate modules. All access to the filters is through this SWISH::Filter module.
Normally, once a document is filtered processing stops. Filters can filter the document and then set a flag saying that filtering should continue (for example a filter that uncompresses a MS Word document before passing on to the filter that converts from MS Word to text). All this should be transparent to the end user. So, filters can be pipe-lined.
The idea of SWISH::Filter is that new filters can be created, and then downloaded and installed to provide new filtering capabilities. For example, if you needed to index MS Excel documents you might be able to download a filter from the Swish-e site and magically next time you run indexing MS Excel docs would be indexed.
The SWISH::Filter setup can be used with -S prog or -S http. It works best with the -S prog method because the filter modules only need to be loaded and compiled one time. The -S prog program _spider.pl_ will automatically use SWISH::Filter when spidering with default settings (using \"default\" as the first parameter to spider.pl).
The -S http indexing method uses a Perl helper script called _swishspider_. _swishspider_ has been updated to work with SWISH::Filter, but (unlike spider.pl) does not contain a \"use lib\" line to point to the location of SWISH::Filter. This means that by default _swishspider_ will *not* use SWISH::Filter for filtering. The reason for this is because _swishspider_ runs for every URL fetched, and loading the Filters for each document can be slow. The recommended way of spidering is using -S prog with spider.pl, but if -S http is desired the way to enable SWISH::Filter is to set PERL5LIB before running swish so that _swishspider_ will be able to locate the SWISH::Filter module. Here\'s one way to set the PERL5LIB with the bash shell:
$ export PERL5LIB=`swish-filter-test -path`
|
RPM found in directory: /packages/linux-pbone/ftp5.gwdg.de/pub/opensuse/repositories/devel:/languages:/perl:/CPAN-S/openSUSE_Tumbleweed/noarch |