|
|
|
|
Changelog for lucene-sandbox-8.11.2-117.188.noarch.rpm :
* Wed Feb 21 2024 gus.kenionAATTsuse.com- Use %patch -P N instead of deprecated %patchN. * Tue Sep 19 2023 fstrbaAATTsuse.com- Upgrade to version 8.11.2 * API Changes + LUCENE-9265: SimpleFSDirectory is deprecated in favor of NIOFSDirectory. + LUCENE-9304: Removed ability to set DocumentsWriterPerThreadPool on IndexWriterConfig. The DocumentsWriterPerThreadPool is a packaged protected final class which made it impossible to customize. + LUCENE-9339: MergeScheduler#merge doesn\'t accept a parameter if a new merge was found anymore. + LUCENE-9330: SortFields are now responsible for writing themselves into index headers if they are used as index sorts. + LUCENE-9340: Deprecate SimpleBindings#add(SortField). + LUCENE-9345: MergeScheduler is now decoupled from IndexWriter. Instead it accepts a MergeSource interface that offers the basic methods to acquire pending merges, run the merge and do accounting around it. + LUCENE-9349: QueryVisitor.consumeTermsMatching() now takes a Supplier to enable queries that build large automata to provide them lazily. TermsInSetQuery switches to using this method to report matching terms. + LUCENE-9366: DocValues.emptySortedNumeric() no longer takes a maxDoc parameter + LUCENE-7822: CodecUtil#checkFooter(IndexInput, Throwable) now throws a CorruptIndexException if checksums mismatch or if checksums can\'t be verified. + LUCENE-7020: TieredMergePolicy#setMaxMergeAtOnceExplicit is deprecated and the number of segments that get merged via explicit merges is unlimited by default. + LUCENE-9437: Lucene\'s facet module\'s DocValuesOrdinalsReader.decode method is now public, making it easier for applications to decode facet ordinals into their corresponding labels + LUCENE-9449: Field comparators for numeric fields and _doc were moved to their own package. TopFieldCollector sets TotalHits.relation to GREATER_THAN_OR_EQUAL_TO, as soon as the requested total hits threshold is reached, even though in some cases no skipping optimization is applied and all hits are collected. + LUCENE-9515: IndexingChain now accepts individual primitives rather than a DocumentsWriterPerThread instance in order to create a new DocConsumer. + LUCENE-9680: Removed deprecation warning from IndexWriter#getFieldNames(). + LUCENE-9902: Change the getValue method from IntTaxonomyFacets to be protected instead of private. Users can now access the count of an ordinal directly without constructing an extra FacetLabel. Also use variable length arguments for the getOrdinal call in TaxonomyReader. + LUCENE-9962: DrillSideways allows sub-classes to provide \"drill down\" FacetsCollectors. They may provide a null collector if they choose to bypass \"drill down\" facet collection. + LUCENE-10027: Add a new Directory reader open API from indexCommit and a custom comparator for sorting leaf readers + LUCENE-10036: Replaced the ScoreCachingWrappingScorer ctor with a static factory method that ensures unnecessary wrapping doesn\'t occur. * New Features + LUCENE-7889: Grouping by range based on values from DoubleValuesSource and LongValuesSource + LUCENE-8962: Add IndexWriter merge-on-commit feature to selectively merge small segments on commit, subject to a configurable timeout, to improve search performance by reducing the number of small segments for searching + LUCENE-8962: Add IndexWriter merge-on-refresh feature to selectively merge small segments on getReader, subject to a configurable timeout, to improve search performance by reducing the number of small segments for searching. + LUCENE-9378: Doc values now allow configuring how to trade compression for retrieval speed. + LUCENE-9385: Add FacetsConfig option to control which drill-down terms are indexed for a FacetLabel + LUCENE-9386: RegExpQuery added case insensitive matching option. + LUCENE-9413: Add CJKWidthCharFilter and its factory + LUCENE-9444: Add utility class to retrieve facet labels from the taxonomy index for a facet field so such fields do not also have to be redundantly stored + LUCENE-9484: Allow sorting an index after it was created. With SortingCodecReader, existing unsorted segments can be wrapped and merged into a fresh index using IndexWriter#addIndices API. + LUCENE-9507: Custom order for leaves in IndexReader and IndexWriter + LUCENE-9537: Added smoothingScore method and default implementation to Scorable abstract class. The smoothing score allows scorers to calculate a score for a document where the search term or subquery is not present. The smoothing score acts like an idf so that documents that do not have terms or subqueries that are more frequent in the index are not penalized as much as documents that do not have less frequent terms or subqueries and prevents scores which are the product or terms or subqueries from going to zero. Added the implementation of the Indri AND and the IndriDirichletSimilarity from the academic Indri search engine: http://www.lemurproject.org/indri.php. + LUCENE-9552: New LatLonPoint query that accepts an array of LatLonGeometries. + LUCENE-9553: New XYPoint query that accepts an array of XYGeometries. + LUCENE-9572: TypeAsSynonymFilter has been enhanced support ignoring some types, and to allow the generated synonyms to copy some or all flags from the original token + LUCENE-9574 A token filter to drop tokens that match all specified flags. + LUCENE-9575: PatternTypingFilter has been added to allow setting a type attribute on tokens based on a configured set of regular expressions + LUCENE-9594: FeatureField supports newLinearQuery that for scoring uses raw indexed values of features without any transformation. + LUCENE-9641: LatLonPoint query support for spatial relationships. + LUCENE-9694: New tool for creating a deterministic index to enable benchmarking changes on a consistent multi-segment index even when they require re-indexing. + LUCENE-9950: New facet counting implementation for general string doc value fields (SortedSetDocValues / SortedDocValues) not created through FacetsConfig + LUCENE-10035: The SimpleText codec now writes skip lists. + LUCENE-10083: Analyzer and stemmer for Telugu language * Improvements + LUCENE-9276: Use same code-path for updateDocuments and updateDocument in IndexWriter and DocumentsWriter. + LUCENE-9279: Update dictionary version for Ukrainian analyzer to 4.9.1 + LUCENE-8050: PerFieldDocValuesFormat should not get the DocValuesFormat on a field that has no doc values. + LUCENE-9304: Removed ThreadState abstraction from DocumentsWriter which allows pooling of DWPT directly and improves the approachability of the IndexWriter code. + LUCENE-9324: Add an ID to SegmentCommitInfo in order to compare commits for equality and make snapshots incremental on generational files. + LUCENE-9342: TotalHits\' relation will be EQUAL_TO when the number of hits is lower than TopDocsColector\'s numHits + LUCENE-9353: Metadata of the terms dictionary moved to its own file, with the \'.tmd\' extension. This allows checksums of metadata to be verified when opening indices and helps save seeks when opening an index. + LUCENE-9359: SegmentInfos#readCommit now always returns a CorruptIndexException if the content of the file is invalid. + LUCENE-9393: Make FunctionScoreQuery use ScoreMode.COMPLETE for creating the inner query weight when ScoreMode.TOP_DOCS is requested. + LUCENE-9392: Make FacetsConfig.DELIM_CHAR publicly accessible + LUCENE-9397: UniformSplit supports encodable fields metadata. + LUCENE-9396: Improved truncation detection for points. + LUCENE-9402: Let MultiCollector handle minCompetitiveScore + LUCENE-8574: Add a new ExpressionValueSource which will enforce only one value per name per hit in dependencies, ExpressionFunctionValues will no longer recompute already computed values + LUCENE-9416: Fix CheckIndex to print an invalid non-zero norm as unsigned long when detecting corruption. + LUCENE-9440: FieldInfo#checkConsistency called twice from Lucene50(60)FieldInfosFormat#read; Removed the (redundant?) assert and do these checks for real. + LUCENE-9446: In BooleanQuery rewrite, always remove MatchAllDocsQuery filter clauses when possible. + LUCENE-9501: Improve coverage for Asserting * test classes: make sure to handle singleton doc values, and sometimes exercise Weight#scorer instead of Weight#bulkScorer for top-level queries. + LUCENE-9511: Include StoredFieldsWriter in DWPT accounting to ensure that it\'s heap consumption is taken into account when IndexWriter stalls or should flush DWPTs. + LUCENE-9514: Include TermVectorsWriter in DWPT accounting to ensure that it\'s heap consumption is taken into account when IndexWriter stalls or should flush DWPTs. + LUCENE-9523: In query shapes over shape fields, skip points while traversing the BKD tree when the relationship with the document is already known. + LUCENE-9539: Use more compact datastructures to represent sorted doc-values in memory when sorting a segment before flush and in SortingCodecReader. + LUCENE-9458: WordDelimiterGraphFilter should order tokens at the same position by endOffset to emit longer tokens first. The same graph is produced. + LUCENE-5309: Optimize facet counting for single-valued SSDV / StringValueFacetCounts. + LUCENE-9023: GlobalOrdinalsWithScore should not compute occurrences when the provided min is 1. + LUCENE-9177: ICUNormalizer2CharFilter no longer requires normalization-inert characters as boundaries for incremental processing, vastly improving worst-case performance. + LUCENE-9455: ExitableTermsEnum should sample timeout and interruption check before calling next(). + LUCENE-9662: Make CheckIndex concurrent by parallelizing index check across segments. + LUCENE-9663: Add compression to terms dict from SortedSet/Sorted DocValues. + LUCENE-9675: Binary doc values fields now expose their configured compression mode in the attributes of the field info. + LUCENE-9725: BM25FQuery was extended to handle similarities beyond BM25Similarity. It was renamed to CombinedFieldQuery to reflect its more general scope. + LUCENE-9877: Reduce index size by increasing allowable exceptions in PForUtil from 3 to 7. + LUCENE-9687: Hunspell support improvements: add API for spell-checking and suggestions, support compound words, fix various behavior differences between Java and C++ implementations, improve performance + LUCENE-9917: The BEST_SPEED compression mode now trades more compression ratio in exchange of faster reads. + LUCENE-9935: Enable bulk merge for stored fields with index sort. + LUCENE-9944: Allow DrillSideways users to provide their own CollectorManager without also requiring them to provide an ExecutorService. + LUCENE-9945: Extend DrillSideways to support exposing FacetCollectors directly. + LUCENE-9946: Support for multi-value fields in LongRangeFacetCounts and DoubleRangeFacetCounts. + LUCENE-9965: Added QueryProfilerIndexSearcher and ProfilerCollector to support debugging query execution strategy and timing. + LUCENE-9981: Operations.getCommonSuffix/Prefix(Automaton) is now much more efficient, from a worst case exponential down to quadratic cost in the number of states + transitions in the Automaton. These methods no longer use the costly determinize method, removing the risk of TooComplexToDeterminizeException + LUCENE-9981: Operations.determinize now throws TooComplexToDeterminizeException based on too much \"effort\" spent determinizing rather than a precise state count on the resulting returned automaton, to better handle adversarial cases like det(rev(regexp(\"(. *a){2000}\"))) that spend lots of effort but result in smallish eventual returned automata. + LUCENE-9983: Stop sorting determinize powersets unnecessarily. + LUCENE-10030: Lazily evaluate score in DrillSidewaysScorer.doQueryFirstScoring + LUCENE-10043: Decrease default for LRUQueryCache\'s skipCacheFactor to 10. This prevents caching a query clause when it is much more expensive than running the top-level query. + LUCENE-10103: Make QueryCache respect Accountable queries * Optimizations + LUCENE-9254: UniformSplit keeps FST off-heap. + LUCENE-8103: DoubleValuesSource and QueryValueSource now use a TwoPhaseIterator if one is provided by the Query. + LUCENE-9287: UsageTrackingQueryCachingPolicy no longer caches DocValuesFieldExistsQuery. + LUCENE-9286: FST.Arc.BitTable reads directly FST bytes. Arc is lightweight again and FSTEnum traversal faster. + LUCENE-7788: fail precommit on unparameterised log messages and examine for wasted work/objects + LUCENE-9273: Speed up geometry queries by specialising Component2D spatial operations. Instead of using a generic relate method for all relations, we use specialize methods for each one. In addition, the type of triangle is computed at deserialization time, therefore we can be more selective when decoding points of a triangle. + LUCENE-9087: Build always trees with full leaves and lower the default value for maxPointsPerLeafNode to 512. + LUCENE-9148: Points now write their index in a separate file. + LUCENE-9280: Add an ability for field comparators to skip non-competitive documents. Creating a TopFieldCollector with totalHitsThreshold less than Integer.MAX_VALUE instructs Lucene to skip non-competitive documents whenever possible. For numeric sort fields the skipping functionality works when the same field is indexed both with doc values and points. To indicate that the same data is stored in these points and doc values SortField#setCanUsePoints method should be used. + LUCENE-9395: ConstantValuesSource now shares a single DoubleValues instance across all segments + LUCENE-9447, LUCENE-9486: Stored fields now get higer compression ratios on highly compressible data. + LUCENE-9373: FunctionMatchQuery now accepts a \"matchCost\" optimization hint. + LUCENE-9510: Indexing with an index sort is now faster by not compressing temporary representations of the data. + LUCENE-9449: Enhance DocComparator to provide an iterator over competitive documents when searching with \"after\". This iterator can quickly position on the desired \"after\" document skipping all documents and segments before \"after\". + LUCENE-9021: QueryParser: re-use the LookaheadSuccess exception. + LUCENE-9346: WANDScorer now supports queries that have a \'minimumNumberShouldMatch\' configured. + LUCENE-9536: Reduced memory usage for OrdinalMap when a segment has all values. + LUCENE-9636: Faster decoding of postings for some numbers of bits per value. + LUCENE-9673: Substantially improve RAM efficiency of how MemoryIndex stores postings in memory, and reduced a bit of RAM overhead in IndexWriter\'s internal postings book-keeping + LUCENE-9827: Speed up merging of stored fields and term vectors for smaller segments. + LUCENE-9932: Performance improvement for BKD index building + LUCENE-9996: Improved memory efficiency of IndexWriter\'s RAM buffer, in particular in the case of many fields and many indexing threads. + LUCENE-10014: Lucene90DocValuesFormat was using too many bits per value when compressing via gcd, unnecessarily wasting index storage. + LUCENE-10022: Rewrite empty DisjunctionMaxQuery to MatchNoDocsQuery. + LUCENE-10031: Slightly faster segment merging for sorted indices. + LUCENE-10196: Improve IntroSorter with 3-ways partitioning + LUCENE-10481: FacetsCollector will not request scores if it does not use them * Bug Fixes + LUCENE-9300: Fix corruption of the new gen field infos when doc values updates are applied on a segment created externally and added to the index with IndexWriter#addIndexes(Directory). + LUCENE-9350: Partial reversion of LUCENE-9068; holding levenshtein automata on FuzzyQuery can end up blowing up query caches which use query objects as cache keys, so building the automata is now delayed to search time again. + LUCENE-9259: Fix wrong NGramFilterFactory argument name for preserveOriginal option + LUCENE-8849: DocValuesRewriteMethod.visit wasn\'t visiting its embedded query + LUCENE-9258: DocTermsIndexDocValues assumed it was operating on a SortedDocValues (single valued) field when it could be multi-valued used with a SortedSetSelector + LUCENE-9164: Ensure IW processes all internal events before it closes itself on a rollback. + LUCENE-8908: Return default value from objectVal when doc doesn\'t match the query in QueryValueSource + LUCENE-9133: Fix for potential NPE in TermFilteredPresearcher for empty fields + LUCENE-9309: Wait for #addIndexes merges when aborting merges. + LUCENE-9337: Ensure CMS updates it\'s thread accounting datastructures consistently. CMS today releases it\'s lock after finishing a merge before it re-acquires it to update the thread accounting datastructures. This causes threading issues where concurrently finishing threads fail to pick up pending merges causing potential thread starvation on forceMerge calls + LUCENE-9314: Single-document monitor runs were using the less efficient MultiDocumentBatch implementation. + LUCENE-9362: Fix equality check in ExpressionValueSource#rewrite. This fixes rewriting of inner value sources. + LUCENE-9405: IndexWriter incorrectly calls closeMergeReaders twice when the merged segment is 100% deleted. + LUCENE-9400: Tessellator might build illegal polygons when several holes share the shame vertex. + LUCENE-9417: Tessellator might build illegal polygons when several holes share are connected to the same vertex. + LUCENE-9418: Fix ordered intervals over interleaved terms + LUCENE-9443: The UnifiedHighlighter was closing the underlying reader when there were multiple term-vector fields. This was a regression in 8.6.0. + LUCENE-9478: Prevent DWPTDeleteQueue from referencing itself and leaking memory. The queue passed an implicit this reference to the next queue instance on flush which leaked about 500byte of memory on each full flush, commit or getReader call. + LUCENE-9427: Fix a regression where the unified highlighter didn\'t produce highlights on fuzzy queries that correspond to exact matches. + LUCENE-9467: Fix NRTCachingDirectory to use Directory#fileLength to check if a file already exists instead of opening an IndexInput on the file which might throw a AccessDeniedException in some Directory implementations. + LUCENE-9501: Fix a bug in IndexSortSortedNumericDocValuesRangeQuery where it could violate the DocIdSetIterator contract. + LUCENE-9401: Include field in ComplexPhraseQuery\'s toString() + LUCENE-9578: Fix TermRangeQuery when there is no upper bound and the lower bound is the empty string excluded. This would previously match no strings at all while it should match all non-empty strings. + LUCENE-9524: Fix NPE in SpanWeight#explain when no scoring is required and SpanWeight has null Similarity.SimScorer. + LUCENE-9508: DocumentsWriter was only stalling threads for 1 second allowing documents to be indexed even the DocumentsWriter wasn\'t able to keep up flushing. Unless IW can\'t make progress due to an ill behaving DWPT this issue was barely noticeable. + LUCENE-9581: Japanese tokenizer should discard the compound token instead of disabling the decomposition of long tokens when discardCompoundToken is activated. + LUCENE-9595: Make Component2D#withinPoint implementations consistent with ShapeQuery logic. + LUCENE-9606: Wrap boolean queries generated by shape fields with a Constant score query. + LUCENE-9617: Fix per-field memory leak in IndexWriter.deleteAll(). Reset next available internal field number to 0 on FieldInfos.clear(), to avoid wasting FieldInfo references. + LUCENE-9635: BM25FQuery - Mask encoded norm long value in array lookup. + LUCENE-9642: When encoding triangles in ShapeField, make sure generated triangles are CCW by rotating triangle points before checking triangle orientation. + LUCENE-9661: Fix deadlock in TermsEnum.EMPTY that occurs when trying to initialize TermsEnum and BaseTermsEnum at the same time + LUCENE-9744: NPE on a degenerate query in MinimumShouldMatchIntervalsSource $MinimumMatchesIterator.getSubMatches(). + LUCENE-9762: DoubleValuesSource.fromQuery (also used by FunctionScoreQuery.boostByQuery) could throw an exception when the query implements TwoPhaseIterator and when the score is requested repeatedly. + LUCENE-9791: BytesRefHash.equals/find is now thread safe, fixing a Luwak/Monitor bug causing registered queries to sometimes fail to match. + LUCENE-9870: Fix Circle2D intersectsLine t-value (distance) range clamp + LUCENE-9887: Fixed parameter use in RadixSelector. + LUCENE-9953: LongValueFacetCounts should count each document at most once when determining the total count for a dimension. Prior to this fix, multi-value docs could contribute a > 1 count to the dimension count. + LUCENE-9958: Fixed performance regression for boolean queries that configure a minimum number of matching clauses. + LUCENE-9963: FlattenGraphFilter is now more robust when handling incoming holes in the input token graph + LUCENE-9964: Duplicate long values in a document field should only be counted once when using SortedNumericDocValuesFields + LUCENE-9967: Do not throw NullPointerException while trying to handle another exception in ReplicaNode.start + LUCENE-9988: Fix DrillSideways correctness bug introduced in LUCENE-9944 + LUCENE-9991: Fix edge case failure in TestStringValueFacetCounts + LUCENE-9999: CombinedFieldQuery can fail with an exception when document is missing some fields. + LUCENE-10008: Respect ignoreCase in CommonGramsFilterFactory + LUCENE-10020: DocComparator should not skip docs with the same docID on multiple sorts with search after + LUCENE-10026: Fix CombinedFieldQuery equals and hashCode, which ensures query rewrites don\'t drop CombinedFieldQuery clauses. + LUCENE-10039: Correct CombinedFieldQuery scoring when there is a single field. + LUCENE-10046: Counting bug fixed in StringValueFacetCounts. + LUCENE-10060: Ensure DrillSidewaysQuery instances never get cached. + LUCENE-10070 Skip deleted docs when accumulating facet counts for all docs + LUCENE-10081: KoreanTokenizer should check the max backtrace gap on whitespaces. + LUCENE-10106: Sort optimization can wrongly skip the first document of each segment + LUCENE-10110: MultiCollector now handles single leaf collector that wants to skip low-scoring hits but the combined score mode doesn\'t allow it + LUCENE-10111: Missing calculating the bytes used of DocsWithFieldSet in NormValuesWriter + LUCENE-10116: Missing calculating the bytes used of DocsWithFieldSet and currentValues in SortedSetDocValuesWriter + LUCENE-10119: Sort optimization with search_after can wrongly skip documents whose values are equal to the last value of the previous page + LUCENE-10126: Sort optimization with a chunked bulk scorer can wrongly skip documents + LUCENE-10134: ConcurrentSortedSetDocValuesFacetCounts shouldn\'t share liveDocs Bits across threads + LUCENE-10154: NumericLeafComparator to define getPointValues + LUCENE-10208: Ensure that the minimum competitive score does not decrease in concurrent search + LUCENE-10477: Highlighter: WeightedSpanTermExtractor.extractWeightedSpanTerms to Query#rewrite multiple times if necessary + LUCENE-10564: Make sure SparseFixedBitSet#or updates ramBytesUsed * Documentation + LUCENE-9424: Add a performance warning to AttributeSource.captureState javadocs * Changes in runtime behaviour + LUCENE-9539: SortingCodecReader now doesn\'t cache doc values fields anymore. Previously, SortingCodecReader used to cache all doc values fields after they were loaded into memory. This reader should only be used to sort segments after the fact using IndexWriter#addIndices. * Other + LUCENE-9257: Always keep FST off-heap. FSTLoadMode, Reader attributes and openedFromWriter removed. + LUCENE-9272: Checksums of the terms index are now verified when LeafReader#checkIntegrity is called rather than when opening the index. + LUCENE-9270: Update Javadoc about normalizeEntry in the Kuromoji DictionaryBuilder. + LUCENE-9275: Make TestLatLonMultiPolygonShapeQueries more resilient for CONTAINS queries. + LUCENE-9244: Adjust TestLucene60PointsFormat#testEstimatePointCount2Dims so it does not fail when a point is shared by multiple leaves. + LUCENE-9271: ByteBufferIndexInput was refactored to work on top of the ByteBuffer API. + LUCENE-9191: Make LineFileDocs\'s random seeking more efficient, making tests using LineFileDocs faster + LUCENE-9338: Refactors SimpleBindings to improve type safety and cycle detection + LUCENE-9358: Change the way the multi-dimensional BKD tree builder generates the intermediate tree representation to be equal to the one dimensional case to avoid unnecessary tree and leaves rotation. + LUCENE-9288: poll_mirrors.py release script can handle HTTPS mirrors. + LUCENE-9232: Fix or suppress 13 resource leak precommit warnings in lucene/replicator + LUCENE-9398: Always keep BKD index off-heap. BKD reader does not implement Accountable any more. + LUCENE-9292: Refactor BKD point configuration into its own class. + LUCENE-9470: Make TestXYMultiPolygonShapeQueries more resilient for CONTAINS queries. + LUCENE-9512: Move LockFactory stress test to be a unit/integration test. + LUCENE-9637: Removes some unused code and replaces the Point implementation on ShapeField/ShapeQuery random tests. + LUCENE-9836: Removed the pure Maven build. It is no longer possible to build artifacts using Maven (this feature was no longer working correctly). Due to migration to Gradle for Lucene/Solr 9.0, the maintenance of the Maven build was no longer reasonable. POM files are generated for deployment to Maven Central only. Please use \"ant generate-maven-artifacts\" to produce and deploy artifacts to any repository. + LUCENE-9836: Migrate Maven tasks to use \"maven-resolver-ant-tasks\" instead of the no longer maintained \"maven-ant-tasks\". + LUCENE-9985: Upgrade jetty to 9.4.41 + LUCENE-9976: Fix WANDScorer assertion error. + LUCENE-10098: Add docs/links to GermanAnalyzer describing how to decompound nouns + SOLR-14995: Update Jetty to 9.4.34 * Build + Upgrade forbiddenapis to version 3.0.1. + LUCENE-9376: Fix or suppress 20 resource leak precommit warnings in lucene/search + LUCENE-9380: Fix auxiliary class warnings in Lucene + LUCENE-9389: Enhance gradle logging calls validation: eliminate getMessage() + Upgrade forbiddenapis to version 3.1. + LUCENE-10104, SOLR-15631: Upgrade forbiddenapis to version 3.2- Removed patch: * lucene-java8compat.patch + not needed in this version, since the compatibility is handled by --release option for javac versions that support it- Added patch: * lucene-timestamps.patch + use SOURCE_DATE_EPOCH for timestamps and for pseudo-random seeds + improves reproducibility of builds using lucene for indexing- Modified patches: * lucene-missing-dependencies.patch * lucene-nodoclint.patch * lucene-osgi-manifests.patch + rediff to changed context * Mon Aug 21 2023 fstrbaAATTsuse.com- Avoid xerces-j2 on classpath * fixes build after apache-ivy upgrade to 2.5.2 * Mon Jul 24 2023 fstrbaAATTsuse.com- Do not depend on jtidy, since it is not used during build * Sun Mar 20 2022 fstrbaAATTsuse.com- Added patch: * lucene-nodoclint.patch + Do not abort compilation on html5 errors with javadoc 17 * Thu Apr 09 2020 fstrbaAATTsuse.com- Upgrade to version 8.5.0 * API Changes: + LUCENE-9093: Change in behavior of the UnifiedHighlighter\'s LengthGoalBreakIterator that will yield Passages sized a little different due to the fact that the sizing pivot is now the center of the first match and not its left edge. + LUCENE-9116: PostingsWriterBase and PostingsReaderBase no longer support setting a field\'s metadata via a \'long[]\'. + LUCENE-9116: The FSTOrd postings format has been removed. + LUCENE-8369: Remove obsolete spatial module. + LUCENE-8621: Refactor LatLonShape, XYShape, and all query and utility classes to core. + LUCENE-9218: XY geometries API works in float space. + LUCENE-9212: Intervals.multiterm() takes CompiledAutomaton rather than plain Automaton + LUCENE-9150: Restore support for dynamic PlanetModel in spatial3d. + LUCENE-9171: QueryBuilder.newTermQuery() and .newSynonymQuery() now take boost parameters. + LUCENE-9029: Deprecate SloppyMath toRadians/toDegrees in favor of Java Math. + LUCENE-8620: Add CONTAINS support for LatLonShape and XYShape. + LUCENE-9050: MultiTermIntervalsSource.visit() was not calling back to its visitor. + LUCENE-8909: IndexWriter#getFieldNames() method is used to get fields present in index. After LUCENE-8316, this method is no longer required. Hence, deprecate IndexWriter#getFieldNames() method. + LUCENE-8755: SpatialPrefixTreeFactory now consumes the \"version\" parsed with Lucene\'s Version class. The quad and packed quad prefix trees are sensitive to this. It\'s recommended to pass the version like you should do likewise for analysis components for tokenized text, or else changes to the encoding in future versions may be incompatible with older indexes. + LUCENE-8956: QueryRescorer now only sorts the first topN hits instead of all initial hits. + LUCENE-8921: IndexSearcher.termStatistics() no longer takes a TermStates; it takes the docFreq and totalTermFreq. And don\'t call if docFreq <= 0. The previous implementation survives as deprecated and final. It\'s removed in 9.0. + LUCENE-8990: PointValues#estimateDocCount(visitor) estimates the number of documents that would be matched by the given IntersectVisitor. THe method is used to compute the cost() of ScorerSuppliers instead of PointValues#estimatePointCount(visitor). + LUCENE-8865: IndexSearcher now uses Executor instead of ExecutorService. This change is fully backwards compatible since ExecutorService directly implements Executor. + LUCENE-8856: Intervals queries have moved from the sandbox to the queries module. + LUCENE-8893: Intervals.wildcard() and Intervals.prefix() methods now take BytesRef rather than String. + LUCENE-3041: A query introspection API has been added. Queries should implement a visit() method, taking a QueryVisitor, and either pass the visitor down to any child queries, or call a visitX() or consumeX() method on it. All locations in the code that called Weight.extractTerms() have been changed to use this API, and the extractTerms() method has been deprecated. + LUCENE-8735: Directory.getPendingDeletions is now abstract to ensure subclasses override it. FilterDirectory now delegates the call, ensuring correct default behaviour for subclasses. + LUCENE-8662: TermsEnum.seekExact(BytesRef) to abstract and delegate seekExact(BytesRef) in FilterLeafReader.FilterTermsEnum. + LUCENE-8469: Deprecated StringHelper.compare has been removed. + LUCENE-8039: Introduce a \"delta distance\" method set to GeoDistance. This allows distance calculations, especially for paths, to take into account an \"excursion\" to include the specified point. + LUCENE-8007: Index statistics Terms.getSumDocFreq(), Terms.getDocCount() are now required to be stored by codecs. Additionally, TermsEnum.totalTermFreq() and Terms.getSumTotalTermFreq() are now required: if frequencies are not stored they are equal to TermsEnum.docFreq() and Terms.getSumDocFreq(), respectively, because all freq() values equal 1. + LUCENE-8038: Deprecated PayloadScoreQuery constructors have been removed + LUCENE-8014: Similarity.computeSlopFactor() and Similarity.computePayloadFactor() have been removed + LUCENE-7996: Queries are now required to produce positive scores. + LUCENE-8099: CustomScoreQuery, BoostedQuery and BoostingQuery have been removed + LUCENE-8012: Explanation now takes Number rather than float + LUCENE-8116: SimScorer now only takes a frequency and a norm as per-document scoring factors. + LUCENE-8113: TermContext has been renamed to TermStates, and can now be constructed lazily if term statistics are not required + LUCENE-8242: Deprecated method IndexSearcher#createNormalizedWeight() has been removed + LUCENE-8267: Memory codecs removed from the codebase (MemoryPostings, MemoryDocValues). + LUCENE-8144: Moved QueryCachingPolicy.ALWAYS_CACHE to the test framework. + LUCENE-8356: StandardFilter and StandardFilterFactory have been removed + LUCENE-8373: StandardAnalyzer.ENGLISH_STOP_WORD_SET has been removed + LUCENE-8388: Unused PostingsEnum#attributes() method has been removed + LUCENE-8405: TopDocs.maxScore is removed. IndexSearcher and TopFieldCollector no longer have an option to compute the maximum score when sorting by field. + LUCENE-8411: TopFieldCollector no longer takes a fillFields option, it now always fills fields. + LUCENE-8412: TopFieldCollector no longer takes a trackDocScores option. Scores need to be set on top hits via TopFieldCollector#populateScores instead. + LUCENE-6228: A new Scorable abstract class has been added, containing only those methods from Scorer that should be called from Collectors. LeafCollector.setScorer() now takes a Scorable rather than a Scorer. + LUCENE-8475: Deprecated constants have been removed from RamUsageEstimator. + LUCENE-8483: Scorers may no longer take null as a Weight + LUCENE-8352: TokenStreamComponents is now final, and can take a Consumer in its constructor + LUCENE-8498: LowerCaseTokenizer has been removed, and CharTokenizer no longer takes a normalizer function. + LUCENE-7875: Moved MultiFields static methods out of the class. getLiveDocs is now in MultiBits which is now public. getMergedFieldInfos and getIndexedFields are now in FieldInfos. getTerms is now in MultiTerms. getTermPositionsEnum and getTermDocsEnum were collapsed and renamed to just getTermPostingsEnum and moved to MultiTerms. + LUCENE-8513: MultiFields.getFields is now removed. Please avoid this class, and Fields in general, when possible. + LUCENE-8497: MultiTermAwareComponent has been removed, and in its place TokenFilterFactory and CharFilterFactory now expose type-safe normalize() methods. This decouples normalization from tokenization entirely. + LUCENE-8597: IntervalIterator now exposes a gaps() method that reports the number of gaps between its component sub-intervals. This can be used in a new filter available via Intervals.maxgaps(). + LUCENE-8609: Remove IndexWriter#numDocs() and IndexWriter#maxDoc() in favor of IndexWriter#getDocStats(). * Changes in Runtime Behavior + LUCENE-8671: Load FST off-heap also for ID-like fields if reader is not opened from an IndexWriter. + LUCENE-8730: WordDelimiterGraphFilter always emits its original token first. This brings its behaviour into line with the deprecated WordDelimiterFilter, so that the only difference in output between the two is in the position length attribute. + LUCENE-7386: Disjunctions nested in disjunctions are now flattened. This might trigger changes in the produced scores due to changes to the order in which scores of sub clauses are summed up. + LUCENE-8756: MoreLikeThisQuery now respects custom term frequencies (TermFrequencyAttribute) at search time + LUCENE-8333: Switch MoreLikeThis.setMaxDocFreqPct to use maxDoc instead of numDocs. + LUCENE-7837: Indices that were created before the previous major version will now fail to open even if they have been merged with the previous major version. + LUCENE-8020: Similarities are no longer passed terms that don\'t exist by queries such as SpanOrQuery, so scoring formulas no longer require divide-by-zero hacks. IndexSearcher.termStatistics/collectionStatistics return null instead of returning bogus values for a non-existent term or field. + LUCENE-7996: FunctionQuery and FunctionScoreQuery now return a score of 0 when the function produces a negative value. + LUCENE-8116: Similarities now score fields that omit norms as if the norm was 1. This might change score values on fields that omit norms. + LUCENE-8134: Index options are no longer automatically downgraded. + LUCENE-8031: Length normalization correctly reflects omission of term frequencies. + LUCENE-7444: StandardAnalyzer no longer defaults to removing English stopwords + LUCENE-8060: IndexSearcher\'s search and searchAfter methods now only compute total hit counts accurately up to 1,000 in order to enable top-hits optimizations such as block-max WAND (LUCENE-8135). + LUCENE-8505: IndexWriter#addIndices will now fail if the target index is sorted but the candidate is not. + LUCENE-8535: Highlighter and FVH doesn\'t support ToParent and ToChildBlockJoinQuery out of the box anymore. In order to highlight on Block-Join Queries a custom WeightedSpanTermExtractor / FieldQuery should be used. + LUCENE-8563: BM25 scores don\'t include the (k1+1) factor in their numerator anymore. This doesn\'t affect ordering as this is a constant factor which is the same for every document. + LUCENE-8509: WordDelimiterGraphFilter will no longer set the offsets of internal tokens by default, preventing a number of bugs when the filter is chained with tokenfilters that change the length of their tokens + LUCENE-8633: IntervalQuery scores do not use term weighting any more, the score is instead calculated as a function of the sloppy frequency of the matching intervals. + LUCENE-8635: FSTs can now remain off-heap, accessed via IndexInput, and the default codec\'s term dictionary (BlockTreeTermsReader) will now leave the FST for the terms index off-heap for non-primary-key fields using MMapDirectory, reducing heap usage for such fields. * New Features: + LUCENE-8903: Add LatLonShape and XYShape point query. + LUCENE-8707: Add LatLonShape and XYShape distance query. + LUCENE-9238: New XYPointField field and Queries for indexing, searching and sorting cartesian points. + LUCENE-8936: Add SpanishMinimalStemFilter + LUCENE-8764 LUCENE-8945: Add \"export all terms and doc freqs\" feature to Luke with delimiters. + LUCENE-8747: Composite Matches from multiple subqueries now allow access to their submatches, and a new NamedMatches API allows marking of subqueries and a simple way to find which subqueries have matched on a given document + LUCENE-8769: Introduce Range Query For Multiple Connected Ranges + LUCENE-8960: Introduce LatLonDocValuesPointInPolygonQuery for LatLonDocValuesField + LUCENE-8753: New UniformSplitPostingsFormat (name \"UniformSplit\") primarily benefiting in simplicity and extensibility. New STUniformSplitPostingsFormat (name \"SharedTermsUniformSplit\") that shares a single internal term dictionary across fields. + LUCENE-8632: New XYShape Field and Queries for indexing and searching general cartesian geometries. + LUCENE-8891: Snowball stemmer/analyzer for the Estonian language. + LUCENE-8815: Provide a DoubleValues implementation for retrieving the value of features without requiring a separate numeric field. Note that as feature values are stored with only 8 bits of mantissa the values returned may have a delta from the original values indexed. + LUCENE-8803: Provide a FeatureSortfield to allow sorting search hits by descending value of a feature. This is exposed via the factory method FeatureField#newFeatureSort. + LUCENE-8784: The KoreanTokenizer now preserves punctuations if discardPunctuation is set to false (defaults to true). + LUCENE-8812: Add new KoreanNumberFilter that can change Hangul character to number and process decimal point. It is similar to the JapaneseNumberFilter. + LUCENE-8362: Add doc-value support to range fields. + LUCENE-8766: Add monitor subproject (previously Luwak monitoring library). This allows a stream of documents to be matched against a set of registered queries in an efficien manner, for use as a monitoring or classification tool. + LUCENE-7714: Add a numeric range query in sandbox that takes advantage of index sorting. + LUCENE-8859: The completion suggester\'s postings format now have an option to load its internal FST off-heap. + LUCENE-2562: The well-known graphical user interface for inspecting Lucene indexes \"Luke\" was added as a Lucene module. It can be started from the binary distribution by calling the shell scripts in the module folder or from the source checkout by using \'ant -f lucene/luke/build.xml run\'. Luke provides a Swing-based user interface and can be used to open Lucene or Solr (or Elasticsearch) indexes, inspect documents, check index commits and segments, or test (custom) analyzers. It also has maintenance functions to check index structures and force merge indexes for archival. + LUCENE-8340: LongPoint#newDistanceFeatureQuery may be used to boost scores based on how close a value of a long field is from a configurable origin. This is typically useful to boost by recency. + LUCENE-8482: LatLonPoint#newDistanceFeatureQuery may be used to boost scores based on the haversine distance of a LatLonPoint field to a provided point. This is typically useful to boost by distance. + LUCENE-8216: Added a new BM25FQuery in sandbox to blend statistics across several fields using the BM25F formula. + LUCENE-8564: GraphTokenFilter is an abstract class useful for token filters that need to read-ahead in the token stream and take into account graph structures. This also changes FixedShingleFilter to extend GraphTokenFilter + LUCENE-8612: Intervals.extend() treats an interval as if it covered a wider span than it actually does, allowing users to force minimum gaps between intervals in a phrase. + LUCENE-8629: New interval functions: Intervals.before(), Intervals.after(), Intervals.within() and Intervals.overlapping(). + LUCENE-8622: Adds a minimum-should-match interval function that produces intervals spanning a subset of a set of sources. + LUCENE-8645: Intervals.fixField() allows you to report intervals from one field as if they came from another. + LUCENE-8646: New interval functions: Intervals.prefix() and Intervals.wildcard() + LUCENE-8655: Add a getter in FunctionScoreQuery class in order to access to the underlying DoubleValuesSource. + LUCENE-8697: GraphTokenStreamFiniteStrings correctly handles side paths containing gaps + LUCENE-8702: Simplify intervals returned from vararg Intervals factory methods * Improvements: + LUCENE-9149: Increase data dimension limit in BKD. + LUCENE-9102: Add maxQueryLength option to DirectSpellchecker. + LUCENE-9091: UnifiedHighlighter HTML escaping should only escape essentials + LUCENE-9105: UniformSplit postings format detects corrupted index and better handles IO exceptions. + LUCENE-9106: UniformSplit postings format allows extension of block/line serializers. + LUCENE-9093: UnifiedHighlighter\'s LengthGoalBreakIterator has a new fragmentAlignment option to better center the first match in the passage. Also the sizing point now pivots at the center of the first match term and not its left edge. This yields Passages that won\'t be identical to the previous behavior. + LUCENE-9153: Allow WhitespaceAnalyzer to set a maxTokenLength other than the default of 255 + LUCENE-9152: Improve line intersections with polygons when they are touching from the outside. + LUCENE-9123: Add new JapaneseTokenizer constructors with discardCompoundToken option that controls whether the tokenizer emits original (compound) tokens when the mode is not NORMAL. + UCENE-9253: KoreanTokenizer now supports custom dictionaries(system, unknown). + LUCENE-9171: QueryBuilder can now use BoostAttributes on input token streams to selectively boost particular terms or synonyms in parsed queries. + LUCENE-9002: Skip costly caching clause in LRUQueryCache if it makes the query many times slower. + LUCENE-9006: WordDelimiterGraphFilter\'s catenateAll token is now ordered before any token parts, like WDF did. + LUCENE-9028: introducing Intervals.multiterm() + LUCENE-9018: ConcatenateGraphFilter now has a configurable separator. + LUCENE-9036: ExitableDirectoryReader may interupt scaning over DocValues + LUCENE-9062: QueryVisitor now has a consumeTermsMatching() method, allowing queries that match a class of terms to pass a ByteRunAutomaton matching those that class back to the visitor. + LUCENE-9073: IntervalQuery to respond field on toString() and explain() + LUCENE-8874: Show SPI names instead of class names in Luke Analysis tab. + LUCENE-8894: Add APIs to find SPI names for Tokenizer/CharFilter/TokenFilter factory classes. + LUCENE-8914: move the logic for discarding inner modes in FloatPointNearestNeighbor to the IntersectVisitor so we take advantage of the change introduced in LUCENE-7862. + LUCENE-8955: move the logic for discarding inner modes in LatLonPoint NearestNeighbor to the IntersectVisitor so we take advantage of the change introduced in LUCENE-7862. + LUCENE-8918: PhraseQuery throws exceptions at construction time if it is passed null arguments. + LUCENE-8916: GraphTokenStreamFiniteStrings preserves all Token attributes through its finite strings TokenStreams + LUCENE-8933: Check kuromoji user dictionary beforehand to avoid unexpected runtime exceptions. (Tomoko Uchida + LUCENE-8906: Expose Lucene50PostingsFormat.IntBlockTermState as public so that other postings formats can re-use it. + LUCENE-8942: Remove redundant parameters and improve visibility strictness in LRUQueryCache + SOLR-13663: Introduce into XML Query Parser + LUCENE-8952: Use a sort key instead of true distance in NearestNeighbor + LUCENE-8620: Tessellator labels the edges of the generated triangles whether they belong to the original polygon. This information is added to the triangle encoding. + LUCENE-8964: Fix geojson shape parsing on string arrays in properties + LUCENE-8976: Use exact distance between point and bounding rectangle in FloatPointNearestNeighbor. + LUCENE-8966: The Korean analyzer now splits tokens on boundaries between digits and alphabetic characters. + LUCENE-8984: MoreLikeThis MLT is biased for uncommon fields + LUCENE-7840: Non-scoring BooleanQuery now removes SHOULD clauses before building the scorer supplier as opposed to eliminating them during scoring construction. + LUCENE-8770: BlockMaxConjunctionScorer now leverages two-phase iterators in order to avoid executing the second phase when scorers don\'t intersect. + LUCENE-8781: FST lookup performance has been improved in many cases by encoding Arcs using full-sized arrays with gaps. The new encoding is enabled for postings in the default codec and for suggesters. + LUCENE-8818: Fix smokeTestRelease.py encoding bug + LUCENE-8845: Allow Intervals.prefix() and Intervals.wildcard() to specify their maximum allowed expansions + LUCENE-8875: Introduce a Collector optimized for use cases when large number of hits are requested + LUCENE-8848 LUCENE-7757 LUCENE-8492: The UnifiedHighlighter now detects that parts of the query are not understood by it, and thus it should not make optimizations that result in no highlights or slow highlighting. This generally works best for WEIGHT_MATCHES mode. Consequently queries produced by ComplexPhraseQueryParser and the surround QueryParser will now highlight correctly. + LUCENE-8793: Luke enhanced UI for CustomAnalyzer: show detailed analysis steps. + LUCENE-8855: Add Accountable to some Query implementations + LUCENE-8673: Use radix partitioning when merging dimensional points instead of sorting all dimensions before hand. + LUCENE-8687: Optimise radix partitioning for points on heap. + LUCENE-8699: Change HeapPointWriter to use a single byte array instead to a list of byte arrays. In addition a new interface PointValue is added to abstract out the different formats between offline and on-heap writers. + LUCENE-8703: Build point writers in the BKD tree only when they are needed. + LUCENE-8652: SynonymQuery can now deboost the document frequency of each term when blending the score of the synonym. + LUCENE-8631: The Korean\'s user dictionary now picks the longest-matching word and discards the other matches. + LUCENE-8732: ConstantScoreQuery can now early terminate the query if the minimum score is greater than the constant score and total hits are not requested. + LUCENE-8750: Implements setMissingValue() on sort fields produced from DoubleValuesSource and LongValuesSource + LUCENE-8701: ToParentBlockJoinQuery now creates a child scorer that disallows skipping over non-competitive documents if the score of a parent depends on the score of multiple children (avg, max, min). Additionally the score mode \'none\' that assigns a constant score to each parent can early terminate top scores\'s collection. + LUCENE-8751: Weight#matches now use the ScorerSupplier to build scorers with a lead cost of 1 (single document). + LUCENE-8752: Japanese new era name \'令和\' (Reiwa) is added to the dictionary used in JapaneseTokenizer so that the analyzer handles the era name correctly. Reiwa is set to replace the Heisei Era on May 1, 2019. + LUCENE-8671: Introduced reader attributes allows a per IndexReader configuration of codec internals. This enables a per reader configuration if FSTs are on- or off-heap on a per field basis + LUCENE-8787: spatial-extras DateRangePrefixTree used to only parse ISO-8601 timestamps with 0 or 3 digits of milliseconds precision but now parses other lengths (although > 3 not used). + LUCENE-7997: Add BaseSimilarityTestCase to sanity check similarities. SimilarityBase switches to 64-bit doubles internally to help avoid common numeric issues. Add missing range checks for similarity parameters. Improve BM25 and ClassicSimilarity\'s explanations. + LUCENE-8011: Improved similarity explanations. + LUCENE-4198: Codecs now have the ability to index score impacts. + LUCENE-8135: Boolean queries now implement the block-max WAND algorithm in order to speed up selection of top scored documents. + LUCENE-8279: CheckIndex now cross-checks terms with norms. + LUCENE-8660: TopDocsCollectors now return an accurate count (instead of a lower bound) if the total hit count is equal to the provided threshold. * Optimizations + LUCENE-9211: Add compression for Binary doc value fields. + LUCENE-4702: Better compression of terms dictionaries. + LUCENE-9228: Sort dvUpdates in the term order before applying if they all update a single field to the same value. This optimization can reduce the flush time by around 20% for the docValues update user cases. + LUCENE-9245: Reduce AutomatonTermsEnum memory usage. + LUCENE-9237: Faster UniformSplit intersect TermsEnum. + LUCENE-9068: FuzzyQuery builds its Automaton up-front + LUCENE-9113: Faster merging of SORTED/SORTED_SET doc values. + LUCENE-9125: Optimize Automaton.step() with binary search and introduce Automaton.next(). + LUCENE-9147: The index of stored fields and term vectors in now off-heap. + LUCENE-8928: When building a kd-tree for dimensions n > 2, compute exact bounds for an inner node every N splits to improve the quality of the tree. N is defined by SPLITS_BEFORE_EXACT_BOUNDS which is set to 4. + BaseDirectoryReader no longer sums up the \'LeafReader#numDocs\' of its leaves eagerly. This especially helps when creating views of readers that hide documents, since computing the number of live documents is an expensive operation. + LUCENE-8992: TopFieldCollector and TopScoreDocCollector can now share minimum scores across leaves concurrently. + LUCENE-8932: BKDReader\'s index is now stored off-heap when the IndexInput is an instance of ByteBufferIndexInput. + LUCENE-9024: IntroSelector now falls back to the median of medians algorithm instead of sorting when the maximum recursion level is exceeded, providing better worst-case runtime. + LUCENE-8920: The denser arcs of FST now index labels with a bitset in order to provide near constant time access. + LUCENE-9027: Use SIMD instructions to decode postings. + LUCENE-9049: Remove FST cached root arcs now redundant with labels indexed by bitset. This frees some on-heap FST space. + LUCENE-9045: Do not use TreeMap/TreeSet in BlockTree and PerFieldPostingsFormat. + LUCENE-8922: DisjunctionMaxQuery more efficiently leverages impacts to skip non-competitive hits. + LUCENE-8935: BooleanQuery with no scoring clause can now early terminate the query when the total hits is not requested. + LUCENE-8941: Matches on wildcard queries will defer building their full disjunction until a MatchesIterator is pulled + LUCENE-8755: spatial-extras quad and packed quad prefix trees now index points faster. + LUCENE-8860: add additional leaf node level optimizations in LatLonShapeBoundingBoxQuery. + LUCENE-8968: Improve performance of WITHIN and DISJOINT queries for Shape queries by doing just one pass whenever possible. + LUCENE-8939: Introduce shared count based early termination across multiple slices + LUCENE-8980: Blocktree\'s seekExact now short-circuits false if the term isn\'t in the min-max range of the segment. Large perf gain for ID/time like data when populated sequentially. + LUCENE-8796: Use exponential search instead of binary search in IntArrayDocIdSet#advance method + LUCENE-8865: Use incoming thread for execution if IndexSearcher has an executor. Now caller threads execute at least one search on an index even if there is an executor provided to minimize thread context switching. + LUCENE-8868: New storing strategy for BKD tree leaves with low cardinality. It stores the distinct values once with the cardinality value reducing the storage cost. + LUCENE-8885: Optimise BKD reader by exploiting cardinality information stored on leaves. + LUCENE-8896: Override default implementation of IntersectVisitor#visit(DocIDSetBuilder, byte[]) for several queries. + LUCENE-8901: Load frequencies lazily only when needed in BlockDocsEnum and BlockImpactsEverythingEnum + LUCENE-8888: Optimize distribution of points with data dimensions in BKD tree leaves. + LUCENE-8311: Phrase queries now leverage impacts. + LUCENE-8040: Optimize IndexSearcher.collectionStatistics, avoiding MultiFields/MultiTerms + LUCENE-4100: Disjunctions now support faster collection of top hits when the total hit count is not required. + LUCENE-7993: Phrase queries are now faster if total hit counts are not required. + LUCENE-8109: Boolean queries propagate information about the minimum competitive score in order to make collection faster if there are disjunctions or phrase queries as sub queries, which know how to leverage this information to run faster. + LUCENE-8439: Disjunction max queries can skip blocks to select the top documents if the total hit count is not required. + LUCENE-8204: Boolean queries with a mix of required and optional clauses are now faster if the total hit count is not required. + LUCENE-8448: Boolean queries now propagates the mininum score to their sub-scorers. + LUCENE-8511: MultiFields.getIndexedFields is now optimized; does not call getMergedFieldInfos + LUCENE-8507: TopFieldCollector can now update the minimum competitive score if the primary sort is by relevancy and the total hit count is not required. + LUCENE-8464: ConstantScoreScorer now implements setMinCompetitveScore in order to early terminate the iterator if the minimum score is greater than the constant score. + LUCENE-8607: MatchAllDocsQuery can shortcut when total hit count is not required + LUCENE-8585: Index-time jump-tables for DocValues, for O(1) advance when retrieving doc values. * Bug Fixes + LUCENE-9084: Fix potential deadlock due to circular synchronization in AnalyzingInfixSuggester + LUCENE-9115: NRTCachingDirectory no longer caches files of unknown size. + LUCENE-9144: Fix error message on OneDimensionBKDWriter when too many points are added to the writer. + LUCENE-9135: Make UniformSplit FieldMetadata counters long. + LUCENE-9200: Fix TieredMergePolicy to use double (not float) math to make its merging decisions, fixing a corner-case bug uncovered by fun randomized tests + LUCENE-9099: Unordered and Ordered interval queries now correctly handle repeated subterms - ordered intervals could supply an \'extra\' minimized interval, resulting in odd matches when combined with eg CONTAINS queries; and unordered intervals would match duplicate subterms on the same position, so an query for UNORDERED(foo, foo) would match a document containing \'foo\' only once. + LUCENE-9250: Add support for Circle2d#intersectsLine around the dateline. + LUCENE-9243: Add fudge factor when creating a bounding box of a XYCircle. + LUCENE-9239: Circle2D#WithinTriangle detects properly if a triangle is Within distance. + LUCENE-9251: Fix bug in the polygon tessellator where edges with different value on #isEdgeFromPolygon were bot filtered out properly. + LUCENE-9263: Fix wrong transformation of distance in meters to radians in Geo3DPoint. + LUCENE-9001: Fix race condition in SetOnce. + LUCENE-9030: Fix WordnetSynonymParser behaviour so it behaves similar to SolrSynonymParser. + LUCENE-9054: Fix reproduceJenkinsFailures.py to not overwrite junit XML files when retrying + LUCENE-9031: UnsupportedOperationException on MatchesIterator.getQuery() + LUCENE-8996: maxScore was sometimes missing from distributed grouped responses. + LUCENE-9055: Fix the detection of lines crossing triangles through edge points. + LUCENE-9103: Disjunctions can miss some hits in some rare conditions. + LUCENE-8755: spatial-extras quad and packed quad prefix trees could throw a NullPointerException for certain cell edge coordinates + LUCENE-9005: BooleanQuery.visit() would pull subVisitors from its parent visitor, rather than from a visitor for its own specific query. This could cause problems when BQ was nested under another BQ. Instead, we now pull a MUST subvisitor, pass it to any MUST subclauses, and then pull SHOULD, MUST_NOT and FILTER visitors from it rather than from the parent. + LUCENE-8831: Fixed LatLonShapeBoundingBoxQuery .hashCode methods. + LUCENE-8775: Improve tessellator to handle better cases where a hole share a vertex with the polygon. + LUCENE-8785: Ensure new threadstates are locked before retrieving the number of active threadstates. This causes assertion errors and potentially broken field attributes in the IndexWriter when IndexWriter#deleteAll is called while actively indexing. + LUCENE-8804: Forbid calls to putAttribute on frozen FieldType instances. + LUCENE-8828: Removes the buggy \'disallow overlaps\' boolean from Intervals.unordered(), and replaces it with a new Intervals.unorderedNoOverlaps() method + LUCENE-8843: Don\'t ignore exceptions that are thrown when trying to open a file in IOUtils#fsync. + LUCENE-8835: FileSwitchDirectory now respects the file extension when listing directory contents to ensure we don\'t expose pending deletes if both directory point to the same underlying filesystem directory. + LUCENE-8853: FileSwitchDirectory now applies best effort to place tmp files in the same directory as the target files. + LUCENE-8892: Add missing closing parentheses in MultiBoolFunction\'s description() + LUCENE-8736: LatLonShapePolygonQuery returns incorrect WITHIN results with shared boundaries. Point in Polygon now correctly includes boundary points. Box and Polygon relations with triangles have also been improved to correctly include boundary points. + LUCENE-8712: Polygon2D does not detect crossings through segment edges. + LUCENE-8720: NameIntCacheLRU (in the facets module) had an int overflow bug that disabled cleaning of the cache + LUCENE-8726: ValueSource.asDoubleValuesSource() could leak a reference to IndexSearcher + LUCENE-8719: FixedShingleFilter can miss shingles at the end of a token stream if there are multiple paths with different lengths. + LUCENE-8688: TieredMergePolicy#findForcedMerges now tries to create the cheapest merges that allow the index to go down to \'maxSegmentCount\' segments or less. + LUCENE-8477: Interval disjunctions could miss valid hits if some of the clauses of the disjunction are minimized away. We now rewrite intervals if a source contains a disjunction and the internal gaps matter for matching. This behaviour can be disabled if users are more interested in speed rather than accuracy of matching. + LUCENE-8741: ValueSource.fromDoubleValuesSource() was casting to Scorer instead of Scorable, leading to ClassCastExceptions + LUCENE-8754: Fix ConcurrentModificationException in SegmentInfo if attributes are accessed in MergePolicy while the merge is running + LUCENE-8765: Fixed validation of the number of added points in KD trees. * Other + LUCENE-9109: Backport some changes from master (except StackWalker) to improve TestSecurityManager + LUCENE-9110: Backport refactored stack analysis in tests to use generalized LuceneTestCase methods + LUCENE-9141: Simplify LatLonShapeXQuery API by adding a new abstract class called LatLonGeometry. Queries are executed with input objects that extend such interface. + LUCENE-9194: Simplify XYShapeXQuery API by adding a new abstract class called XYGeometry. Queries are executed with input objects that extend such interface. + LUCENE-9096: Simplification of CompressingTermVectorsWriter#flushOffsets. + LUCENE-9225: Rectangle extends LatLonGeometry so it can be used in a geometry collection. + LUCENE-8979: Code Cleanup: Use entryset for map iteration wherever possible. - Part 2 + LUCENE-8746: Refactor EdgeTree - Introduce a Component tree that represents the tree of components (e.g polygons). Edge tree is now just a tree of edges. + LUCENE-8994: Code Cleanup - Pass values to list constructor instead of empty constructor followed by addAll(). + LUCENE-9046: Fix wrong example in Javadoc of TermInSetQuery + LUCENE-8983: Add sandbox PhraseWildcardQuery to control multi-terms expansions in a phrase. + LUCENE-9067: Polygon2D#contains() is now thread safe. + LUCENE-8778 LUCENE-8911 LUCENE-8957: Define analyzer SPI names as static final fields and document the names in Javadocs. + LUCENE-8758: QuadPrefixTree: removed levelS and levelN fields which weren\'t used. + LUCENE-8975: Code Cleanup: Use entryset for map iteration wherever possible. + LUCENE-8993, LUCENE-8807: Changed all repository and download references in build files to HTTPS. + LUCENE-8998: Fix OverviewImplTest.testIsOptimized reproducible failure. + LUCENE-8999: LuceneTestCase.expectThrows now propogates assert/assumption failures up to the test w/o wrapping in a new assertion failure unless the caller has explicitly expected them + LUCENE-8062: GlobalOrdinalsWithScoreQuery is no longer eligible for query caching. + LUCENE-8847: Code Cleanup: Remove StringBuilder.append with concatenated strings. + LUCENE-8861: Script to find open Github PRs that needs attention + LUCENE-8852: ReleaseWizard tool for release managers + LUCENE-8838: Remove support for Steiner points on Tessellator. + LUCENE-8879: Improve BKDRadixSelector tests. + LUCENE-8886: Fix TestMutablePointsReaderUtils tests. + LUCENE-8680: Refactor EdgeTree#relateTriangle method. + LUCENE-8685: Refactor LatLonShape tests. + LUCENE-8713: Add Line2D tests. + LUCENE-8729: Workaround: Disable accessibility doclints (Java 13+), so compilation with recent JDK succeeds. + LUCENE-8725: Make TermsQuery.SeekingTermSetTermsEnum a top level class and public * Build + Upgrade forbiddenapis to version 2.7; upgrade Groovy to 2.4.17. + LUCENE-9041: Upgrade ecj to 3.19.0 to fix sporadic precommit javadoc issues * Test Framework + LUCENE-8825: CheckHits now display the shard index in case of mismatch between top hits.- Modified patches: * 0001-Disable-ivy-settings.patch * 0002-Dependency-generation.patch * lucene-java8compat.patch * lucene-osgi-manifests.patch + rediff to changed context- Added patch: * lucene-missing-dependencies.patch + patch out dependencies that are not needed for modules that we distribute + patch out dependencies on jars that we don\'t build + add target for the new monitor jars * Mon Mar 23 2020 fstrbaAATTsuse.com- Modified patch: * lucene-osgi-manifests.patch + add the OSGi manifest to queryparser module too * Fri Oct 11 2019 fstrbaAATTsuse.com- Modified patch: * lucene-osgi-manifests.patch + add the OSGi manifests also to modules that are currently not built due to missing dependencies * Tue Oct 01 2019 fstrbaAATTsuse.com- Remove a bogus log4j build dependency * Thu Sep 26 2019 fstrbaAATTsuse.com- Fix property Provides and Obsoletes in order to make upgrade smooth- Added patch: * lucene-osgi-manifests.patch + Patch the build to produce OSGi manifests needed by eclipse- Install the artifacts to \"lucene\" subdirectory and create compatibility symlinks- Install lucene-misc as archful artifact, since it contains JNI code * Thu Sep 26 2019 fstrbaAATTsuse.com- Upgrade to version 7.1.0- Added patches: * 0001-Disable-ivy-settings.patch * 0002-Dependency-generation.patch + Sync with Fedora\'s 7.1.0 * lucene-java8compat.patch + Avoid using java9+ only functions * Mon Jun 24 2019 fstrbaAATTsuse.com- Remove the parent references from the pom files, since we are not building lucene using maven.- Overhaul the packaging to distribute the artifacts and the corresponding metadata and pom files in the same package- Specify runtime dependencies of the different packages- Remove version information from the artifact names * Mon Jun 24 2019 idonmezAATTsuse.com- Remove the JPP prefix from pom filenames * Tue Feb 12 2019 fstrbaAATTsuse.com- Remove dependency on jline, because nothing in the build uses it * Sat Dec 22 2018 fstrbaAATTsuse.com- Require the different apache-commons- * packages instead of jakarta-commons- * * Thu Nov 01 2018 fstrbaAATTsuse.com- Do not require asm to build. Nothing depends on it * Fri Sep 29 2017 fstrbaAATTsuse.com- Minimum supported java is 1.8 * Mon Jul 10 2017 jengelhAATTinai.de- Remove unused \"%package javadoc\" declaration block.- Trim filler words from descriptions. Say a thing about features. * Thu Jun 29 2017 badshah400AATTgmail.com- Update to version 6.6.0: + See https://lucene.apache.org/core/6_6_0/changes/Changes.html for a full list of changes.- Drop patches that are no longer applicable or needed: + lucene-no-classpath-in-manifest.patch + lucene-no-get.patch + lucene-2.3.0-db-javadoc.patch- Add BuildRequires: antlr-java, apache-commons-codec, apache-ivy, asm, fdupes, git- Replace SOURCE0 by full source URL.- Update to changed list of non-core modules: + Update source URL\'s for corresponding pom files. + Update %%install section to reflect changed list + Each module corresponds to a subpackage, named according to its jar file (except lucene which corresponds to the main jar file lucene-core-%{version}.jar).- Adapt file list to changes. * Fri May 19 2017 dziolkowskiAATTsuse.com- New build dependency: javapackages-local * Wed Mar 18 2015 tchvatalAATTsuse.com- Fix build with new javapackages-tools * Fri Jun 27 2014 tchvatalAATTsuse.com- Remove java-javdoc to build on sle11 again as the javadoc is also pulled in regardless. * Tue Sep 10 2013 mvyskocilAATTsuse.com- use add_maven_depmap from javapackages-tools * Mon Sep 09 2013 tchvatalAATTsuse.com- Move from jpackage-utils to javapackage-tools * Tue Jun 26 2012 mvyskocilAATTsuse.cz- build require java-javadoc >= 1.6.0 * Thu Dec 10 2009 mvyskocilAATTsuse.cz- refreshed patches * lucene-2.3.0-db-javadoc.patch * lucene-no-get.patch * Tue Sep 29 2009 mvyskocilAATTsuse.cz- fixed requires * Tue May 26 2009 mvyskocilAATTsuse.cz- fixed bnc#507014: removed all jars from source tarball * Tue May 12 2009 mvyskocilAATTsuse.cz- Initial SUSE packaging of lucene 2.4.1 (from jpp 5.0)
|
|
|