Changelog for
mnogosearch-pgsql-3.4.1-7.1.i586.rpm :
Sun Jan 24 13:00:00 2016 kieltuxAATTgmail.com
- update to version 3.4.1
* Indexing changes
- Multi-threaded indexing is now possible, which significantly
reduces the time needed to create search index (indexer --index) on
a multi-CPU machine. indexer --index now honors the -N option to
specify the number of threads to start.
- A new command IndexerThreads was added to specify the default
number of threads started for indexing.
- A new command IndexCacheSize was added to specify amount or RAM
used for search index cache when running indexer --index.
* Crawler changes
- A new command IPRequestPerMinLimit was added for more polite
crawling.
- A new command FollowLinks was added to fine-tune which kind of
links should be followed.
- The CollectLinks command was enhanced to fine tune what kind of
links should be stored in the database.
- A new command AjaxLinks was added, to crawl AJAX-run Web sites.
- The DNSCacheTimeOut command was added.
- The Proxy command was changed to understand the full URL
notation including the authorization part (previously it required
the host:port format). The ProxyAuthBasic command was removed. Also,
- Proxy can now accept a list of proxy addresses, which makes indexer
randomly choose one of the addresses to download every document.
- A special purpose section ResponseTime was added, to store
document download time in the database.
- Robots exclusion protocol improvements were made. indexer now
understands the
* and $ patters in robots.txt, and respects the
X-Robots-Tags HTTP header.
- The Robots command now understands more possible values
(robotstxt, xrobotstag, meta and rel) to fine tune which robot
directives should be respected. Previously only yes (respect all
robot directives) and no (ignore all robot directives) values were
understood.
- When crawling with multiple threads (indexer -Nxxx) on Windows,
threads do not lock each other any more when resolving host names.
* Search result presentation changes
- Search templates now use a C/C++-style language instead of the
old language with cumbersome tag based operators.
- Extended word statistics variables are now available in the
search template.
- New document properties UniqueWordHitVector and SectionHitVector
are now available at search time, presending the information about
word hit distribution inside the entire document and its individual
sections.
- When searching in the m=any (find any of the words) mode,
search.cgi now displays the query words that are missing in the
document.
- The default template was rewritten to use CSS styles instead of
the inline HTML formatting. This makes it easier to customize the
search template according to the user\'s site design. Various other
minor template improvements were made.
- The quality for the document body snippet was improved for HTML
documents.
- Fixed that OS environment variables were not available in search
templates on Windows.
* Cached copies
- Cached copies are now stored in a separate table cachedcopy
without wrapping to base64.
- A new command CachedCopyEncoding was added to control cached
copy compression.
- The table bdicti was removed. In earlier version it stored
pre-parsed documents and was used at indexing time. Now indexing is
performed directly from the cached document copies. This change
reduces disk size used by the mnoGoSearch database.
- indexer --index experienced a significant performance
degradation when the table bdicti size grew bigger than the OS disk
cache size. The new format fixes this problem. Indexing performance
now grows linearly with the document collection size.
- The third parameter in the command Section (responsible for the
section length) is now optional. The default indexer.conf file now
does not specify the length values for the sections, so indexer does
not store any information into the table urlinfo by default. At
search time, the variables containing various document parts (the
body snippet, the title, content type, etc) can now be created from
the document cached copies stored in the table cachedcopy. The table
urlinfo is now empty in the default configuration, but it still can
be used to store user defined sections, or the standard sections
when needed.
- Changes in the document sections configuration in indexer.conf
do not require full re-crawling any longer. The changes immediately
take effect after the next indexer --index run.
* Links
- The link information storage format was changed for better
performance and for easier indexing of the link text, e.g.:
link text Also, the new link format makes mnoGoSearch convenient for SEO
purposes and for other kind of site analysis. See the new structure
for the table links for details.
- A new section with the name ilinktext is now understood, meaning
the text from the incoming links of the referencing documents,
between the
and tags.
- It\'s now possible to limit search to the incoming link text only
by specifying the wf=00010000 parameter to search.cgi (assuming the
default ID of the section ilinktext).
- The structure of the table links for MySQL and PostgreSQL now
uses partitioning for better indexing performance.
- A new table redirect was added to store simple redirect links,
e.g. the URL from the Location header of a 301 Moved Permanently
HTTP response. Redirects are stored separately from the hypertext
links for better performance of the popularity calculation.
* Sites
- The code performing grouping results by sites was rewritten.
When crawling, indexer does not populate the table server with
unique site names any more. This improves crawling performance. The
column url.site_id was removed.
* Popularity changes
- Popularity is now automatically calculated when running indexer
- -index in the default configuration.
- A new command UsePopularity was added to change the default
behavior.
- A new command line option indexer --rewritepop is now understood
to calculate popularity without recreating the entire search index.
- A new command PopularityFactor was added to change how a
document\'s popularity affects its search result score.
- The -R command line parameter to indexer (which forced indexer
to calculate popularity after crawling) is no longer supported.
- The PopRankSkipSameSite, PopRankFeedBack, PopRankUseTracking,
PopRankUseShowCnt, PopRankShowCntRatio, PopRankShowCntWeight were
removed.
* Miscelaneous changes
- New commands Phrase2CountFactor and Phrase3CountFactor were
added.
- A few performance improvements in built-in parsers, character
set and Unicode routines, memory management routines were made.
- Fixed that the implementation of utf-8 did not detect malformed
byte sequences in some cases, which led to database errors (e.g.
invalid byte sequence for encoding \"UTF8\" in case of PostgreSQL).
- indexer running in the SQL interpreter mode (e.g. --create,
- -drop, --sqlmon) now does not recognize an empty comment
immediately followed by a semicolon (e.g. /
*
*/;) as the current SQL
query end. This change allows to use complex stored procedures in
the structure. See the definition of the stored procedure
links_insert_trigger_func implementing link partitioning in
create/pgsql/create.txt as an example.
- Some columns in the SQL structure were renamed to avoid using
SQL reserved words (qinfo.value to qinfo.sval, qtrack.found to
qtrack.nfound).
- The column intag in the table dict was renamed to coord. The
column intag in the tables bdict and dict00..dictFF was renamed to
coords.
- CMake experimental support was added. Currently CMake is used to
build mnoGoSearch on Windows only. On UNIX-alike platforms it\'s
still recommended to use the configure script generated by the GNU
autotools.
- mnoGoSearch is now compiled using the Filesystem Hierarchy
Standard (FHS) layout by default, which is slightly different from
the traditional mnoGoSearch layout. Use ./configure
- -disable-fhs-layout to compile with the traditional mnoGoSearch
layout.
- The code was modified in a more modular way, a big number of
enums were introduced instead of non-typed integer constants,
database and thread handlers were added, some other code quality and
extendability improvements were made, which makes it easier to
maintain the code, as well as to add a plugin infrastructure in the
future. The API was slightly changed (e.g. the structure of the
UDM_RESULT data type).
- XML documents that start with
or
are now automatically considered to be
sitemap protocol files. The built-in XML parser collects links from
such files, but does not put their words into the search index.
- The built-in HTML parser now understands
as a synonym to
and
* Functionality removed
- The old file based search result cache and the Cache command
were removed. Use the new search result cache (introduced in 3.3.8)
instead.
- Support for categories was removed. The table categories was
removed. The column server.category was removed. Use user defined
limits instead.
- Database structure files for Access, mSQL, Solid, SapDB were
removed.
- The Crosswords command and the crossdict table were removed.
- The ResultContentType command was removed. Now search.htm
explicitly output the desired content type.
Tue Dec 10 13:00:00 2013 munix9AATTgooglemail.com
- update to version 3.3.15
* The default search template improvements were made. Query words are
now highlighted differently when displaying a list of found documents
(using bold font) and when displaying a cached copy of a document
(using yellow background).
* A section about installation of the mnoGoSearch PHP module was added
into the docbook manual.
* The EREGCUT template operator was added, to remove sub-strings matching
to a regular expression pattern from a string.
* Bug#4820 \"mirror files exceed platform limit for file name length\"
was fixed.
* A few potential vulnerabilities found by the Veracode static analyzer
were fixed (Bug#4826).
* A few warnings reported by the clang compiler were fixed.
* Fixed that the words having non-ASCII letters were not highlighted when
displaying cached copy in cases when the document character set differs
from LocalCharset (a bug since 3.3.13).
* Fixed that the Microsoft SQL Server driver always used quotes in a USE
\"dbname\"; query when connecting to the server, assuming that QUOTED_IDENTIFIERS
is set to ON, which is not necessarily always the case (a bug since 3.3.12).
Now quotes are used only for database names starting with a digit.
* Fixed that popularity rank calculation did not work with Microsoft
SQL Server.
* Fixed a bug in the Microsoft SQL Server driver which reported one extra
byte (for the trailing 0x00) when fetching character data from the server.
This bug made indexer and search.cgi behave unexpectedly in rare cases.
* Bug#4825 \"Redirect: Bad URL: redirected locations not indexed\" was fixed.
* Fixed that make bin-dist did not work in some cases.
Sat Jul 13 14:00:00 2013 munix9AATTgooglemail.com
- spec file changes to consider the version of the distribution,
fixed suse-alternative-link-missing warnings
Mon Apr 8 14:00:00 2013 munix9AATTgooglemail.com
- update to version 3.3.14
* DOCX and RTF built-in parsers were added.
* It\'s now possible to use the $(ConfDir), $(ShareDir), $(VarDir), $(TmpDir)
template variables in search.htm, e.g.:
Include $(ConfDir)/common.inc
DBAddr sqlite3:///$(VarDir)/mnogosearch.sqlite3/
Previously these variables were understood only in indexer.conf.
* A minor fix in installation layout was made: the --docdir parameter to
configure is now respected, and the HTML documentation is now installed
to PREFIX/share/doc/mnogosearch/ by default. Previously --docdir was
ignored, and the documentation was installed to PREFIX/doc/.
* Files to build rpm and deb binary packages were added.
* A few minor problems discovered by the code static analysis tools were
fixed.
* Unassigned euc-jp characters were converted to U+0000 instead of the
question mark when converting to other character sets.
* Context snippets did not work well if the CachedCopy section name was
written the in lower case in indexer.conf.
* Fixed that static linking against MySQL-5.5 client library failed
because of the missing -ldl linker flag.
* Fixed a crash in search.cgi when compiled in extra debug mode with
- -enable-trace on a 64-bit machine.
* Fixed that indexer failed with the error \"Integer does not fit into column\"
on 64-bit machines when running with the OpenLink Virtuoso backend.
Tue Mar 5 13:00:00 2013 munix9AATTgooglemail.com
- reconfiguration of spec file to allow selective selection of multiple
database backends (idea taken from Sisyphus repository by the ALT
Linux Team)
- activated tests for \"mysql\", \"postgresql\" and \"sqlite3\" database backends
Fri Mar 1 13:00:00 2013 munix9AATTgooglemail.com
- update to version 3.3.13
* Bug#4818 \"Arbitrary Files Reading in mnoGoSearch\" was fixed. This is a
security bug. All users of the earlier 3.3.x releases are highly advised
to upgrade.
* Bug#4819 \"Variables Overwriting in mnoGoSearch\" was fixed. search.cgi
was vulnerable to Cross-Site Scripting in cases when values of some empty
pre-defined internal variables were replaced in the HTTP query string
(e.g. search.cgi?q=test&stored=%3Cscript%3E). Now all variables coming
from the query string are automatically HTML-escaped in the $(var)
template format.
* The meaning of $&(var) has not changed, it still applies HTML escaping to
all variables, both coming from the query string and those generated
internally.
* Support for \"Content-Type: message/rfc822\" was added (
*.eml and
*.mht
files), including multi-part messages and messages with attachments,
with Content-Transfer-Encoding of types 7bit, 8bit, base64 and quoted-
printable. When processing attachments, indexer can use external parsers.
For example, if indexer is configured to use catdoc for the documents
of the type application/msword, then indexer also executes catdoc for
the attachments of this type.
* Bug#4803 \"buffer overflow detected with search.cgi\" was fixed.
* Fixed that search.cgi did not use HTML entities (<, > and &)
to escape special characters when displaying cached copy for a document
of type text/plain.
* Fixed that the \"dm\" search parameter did not work in some cases.
* Fixed that the \"su\" search parameter (user defined order) was not taken
into account by search query cache, thus wrong cache hits were returned
in some cases.
* Bugs in synonym processing were fixed: a word form generated from synonyms
could be searched twice in the database; bad memory access when using
\"ComplexSynonyms yes\".
* A memory leak bug was fixed in the code producing word forms from an Ispell
dictionary.
* Improved compatibility with the latest versions of PostgreSQL. Escaping of
the SQL character literals for PostgreSQL >= 90000 was changed from the
C-alike stype (using backslash) to the standard SQL style.
* Data type for the column \"url.url\" for Firebird was changed from varchar(127)
to varchar(247), which is the longest indexable varchar in a Firebird database
with page_size=1024 (bug#2125).
* Fixed that mnoGoSearch did not work with Mecab (Japanese segmenter)
dictionaries encoded in utf-8 encoded.
* Fixed that indexer silently ignored the -u values (URL limits) longer than
64 characters (bug#4800, bug#4689).
* Fixed a bug in the code handling excerpts. It did not work well in cases
when context following a highlighted word have no space characters and
ExcerptPadding ends in the middle of the next highlighted word.
The entire excerpt was erroneously highlighted in such cases.
* Fixed that the PHP extension module did not compile with PHP-5.4 (bug#4808).
* Fixed a crash in indexer happened on processing a message/http response
in combination with an external parser returning an empty response.
* Bug#4806 \"Command Proxy without argument\" was fixed.
* Bug#4359 \"buffer overflow when doing using -Ewordstat option\" was fixed.
* A few compilation warnings on 64bit platforms were fixed.
* Bug#4722 \"Error messages display directly on web page\" was fixed.
* Bug#4718 \"sqlite3 driver: (1) cannot start a transaction within a transaction\"
was fixed.
* Fixed that in case of LiveUpdates=yes search.cgi erroneously printed the
error \"word index not found\" when the query produced no results.
* A few dead links in the Section called External parsers for the most common
file types in Chapter 5 were fixed.
* autoconf warnings were fixed. MySQL client library detection was improved
for OS X Lion.
Fri May 25 14:00:00 2012 munix9AATTgooglemail.com
- added patch for a buffer overflow in sql.c
http://www.mnogosearch.org/bugs/bugs.php?id=19133
Fri Dec 16 13:00:00 2011 munix9AATTgooglemail.com
- update to version 3.3.12
* An SQL injection that happened because of weak control of valid characters
in host names in hypertext links was fixed. The injection was possible with
the databases supporting multiple statements in a single SQL query: with MySQL
(when ClientMultiStatement=yes option is enabled in DBAddr) as well as with
PostgreSQL.
* A new search query syntax for range search was added.
* A new search.htm command UseRangeOperators was added to activate range search
operators (which are disabled by default).
* A new option decimal was added to the Section command. Words of the sections
marked with this option are treated as decimal numbers. This, for example,
allows numeric range search for the given section.
* --help command line indexer option was added as a synonym for -h.
* A description how to use pdftohtml converter was added into indexer.conf-dist
and into the manual.
* Fixed that indexer allowed malformed URLs containing non-ASCII characters in
host names, which led to SQL errors on attempt to insert a malformed URL into
the database, for example: PQexec: ERROR: invalid byte sequence for encoding
\"UTF8\": 0xbf.
* A bug in udm-config was fixed. Due to this bug, linking of the mnoGoSearch PHP
module failed with the error cannot find -lmnogosearch.
* Fixed that the Firebird (Interbase) API did not work with SQL_LONG data type
correctly on x86_64 platforms.
* UDM_MAXTIMESTRSIZE constant was changed from 35 to 64, as strftime() can return
a result longer than 35 characters on some operating systems (e.g. AIX).
Too small constant value led to a wrong or a zero value in the $(Last-Modified)
template variable on AIX.
* Fixed that Microsoft SQL Server did not work with database names consisting
only of digit characters.
* Compilation problems when building using --without-pthreads where fixed.
* A compilation problem happened on AIX5/AIX6 because of wrong thread compiler
flags was fixed.
* A compilation problem with Sybase client library on 64-bit Linux platforms
was fixed.
* \"Bug#4704 Indexing various binaries as XML\" was fixed.
* \"Bug#8299 Wrong score when UserScore gets 0 and UserScoreFactor is set\" was
fixed.
Fri Dec 2 13:00:00 2011 munix9AATTgooglemail.com
- added libtool to BuildRequires, needed for openSUSE_Factory repo
Wed Sep 7 14:00:00 2011 munix9AATTgooglemail.com
- update to version 3.3.11
* Bug#4346 \"QCache does not differentiate on Sections\" was fixed. ${sl.
*} was
added into default QueryCacheID format.
* The search template section is now executed earlier, so tmplt,
Locale and StdoutBufferSize can be set dynamically.
* Bug#4256 \"Install include headers fails due to duplicated entry of udm_http.h\"
was fixed.
* GNU-style long options are now understood. For example, indexer --rewritelimits
is a synonym for the old command indexer -Erewritelimits. The new long options
are intended to replace the old -Exxx options. See indexer -? for the list of
the new options.
* A few performance improvements in handling information in the server and srvinfo
tables were made.
* SQL scripts to create tables for MySQL now have the ENGINE=MyISAM option, to
address the default storage engine change in MySQL version 5.5.
* Fixed a Valgrind warning when a template variable didn\'t end with right
parenthesis properly, e.g. $(name.
* Non-standard RSS tags inside the - tag can now be parsed when defined
using a Section command. Cluster XML search results can also transfer the
non-default user section values to the front-end point.
* Fixed that the default PHP frontend (php/index.php) did not highlight non-Latin
searched words when displaying cached copies.
* The --enable-fhs-layout option to configure is now available, to build and
install with layout which suites File Hierarchy Layout standard better.
When --enable-fhs-layout, mnoGoSearch installs.
* Correct path to MySQL client library is now detected by configure on 64-bit
Linux platforms.
* Oracle\'s 11g client include and library layout is now detected by configure
on 64-bit Linux platforms.
* An error message is now displayed when indexer can not find a create or drop
SQL script. Earlier indexer exited silently.
* Fixed that a few files from the msearch-test directory were not included into
the distribution, so make check did not work outside mnoGoSearch CVS tree.