RPM Search

Changelog for apache-parquet-devel-16.0.0-1.1.x86_64.rpm :

* Sun Apr 21 2024 Ben Greiner - Update to 16.0.0 [#]# Bug Fixes
* [C++][ORC] Catch all ORC exceptions to avoid crash (#40697)
* [C++][S3] Handle conventional content-type for directories (#40147)
* [C++] Strengthen handling of duplicate slashes in S3, GCS (#40371)
* [C++] Avoid hash_mean overflow (#39349)
* [C++] Fix spelling (array) (#38963)
* [C++][Parquet] Fix crash in Modular Encryption (#39623)
* [C++][Dataset] Fix failures in dataset-scanner-benchmark (#39794)
* [C++][Device] Fix Importing nested and string types for DeviceArray (#39770)
* [C++] Use correct (non-CPU) address of buffer in ExportDeviceArray (#39783)
* [C++] Improve error message for \"chunker out of sync\" condition (#39892)
* [C++] Use make -j1 to install bundled bzip2 (#39956)
* [C++] DatasetWriter avoid creating zero-sized batch when max_rows_per_file enabled (#39995)
* [C++][CI] Disable debug memory pool for ASAN and Valgrind (#39975)
* [C++][Gandiva] Make Gandiva\'s default cache size to be 5000 for object code cache (#40041)
* [C++][FS][Azure] Fix CreateDir and DeleteDir trailing slash issues on hierarchical namespace accounts (#40054)
* [C++][FS][Azure] Validate containers in AzureFileSystem::Impl::MovePaths() (#40086)
* [C++] Decimal types with different precisions and scales bind failed in resolve type when call arithmetic function (#40223)
* [C++][Docs] Correct the console emitter link (#40146)
* [C++][Python] Fix test_gdb failures on 32-bit (#40293)
* [Python][C++] Fix large file handling on 32-bit Python build (#40176)
* [C++] Support glog 0.7 build (#40230)
* [C++] Fix cast function bind failed after add an alias name through AddAlias (#40200)
* [C++] TakeCC: Concatenate only once and delegate to TakeAA instead of TakeCA (#40206)
* [C++] Fix an abort on asof_join_benchmark run for lost an arg (#40234)
* [C++] Fix an simple buffer-overflow case in decimal_benchmark (#40277)
* [C++] Reduce S3Client initialization time (#40299)
* [C++] Fix a wrong total_bytes to generate StringType\'s test data in vector_hash_benchmark (#40307)
* [C++][Gandiva] Add support for compute module\'s decimal promotion rules (#40434)
* [C++][Parquet] Add missing config.h include in key_management_test.cc (#40330)
* [C++][CMake] Add missing glog::glog dependency to arrow_util (#40332)
* [C++][Gandiva] Add missing OpenSSL dependency to encrypt_utils_test.cc (#40338)
* [C++] Remove const qualifier from Buffer::mutable_span_as (#40367)
* [C++] Avoid simplifying expressions which call impure functions (#40396)
* [C++] Expose protobuf dependency if opentelemetry or ORC are enabled (#40399)
* [C++][FlightRPC] Add missing expiration_time arguments (#40425)
* [C++] Move key_hash/key_map/light_array related files to internal for prevent using by users (#40484)
* [C++] Add missing Threads::Threads dependency to arrow_static (#40433)
* [C++] Fix static build on Windows (#40446)
* [C++] Ensure using bundled FlatBuffers (#40519)
* [C++][CI] Fix TSAN and ASAN/UBSAN crashes (#40559)
* [C++] Repair FileSystem merge error (#40564)
* [C++] Fix 3.12 Python support (#40322)
* [C++] Move mold linker flags to variables (#40603)
* [C++] Enlarge dest buffer according to dest offset for CopyBitmap benchmark (#40769)
* [C++][Gandiva] \'ilike\' function does not work (#40728)
* [C++] Fix protobuf package name setting for builds with substrait (#40753)
* [C++][ORC] Fix std::filesystem related link error with ORC 2.0.0 or later (#41023)
* [C++] Fix TSAN link error for module library (#40864)
* [C++][FS][Azure] Don\'t run TestGetFileInfoGenerator() with Valgrind (#41163)
* [C++] Fix null count check in BooleanArray.true_count() (#41070)
* [C++] IO: fixing compiling in gcc 7.5.0 (#41025)
* [C++][Parquet] Bugfixes and more tests in boolean arrow decoding (#41037)
* [C++] formatting.h: Make sure space is allocated for the \'Z\' when formatting timestamps (#41045)
* [C++] Ignore ARROW_USE_MOLD/ARROW_USE_LLD with clang < 12 (#41062)
* [C++] Fix: left anti join filter empty rows. (#41122)
* [CI][C++] Don\'t use CMake 3.29.1 with vcpkg (#41151)
* [CI][C++] Use newer LLVM on Ubuntu 24.04 (#41150)
* [CI][R][C++] test-r-linux-valgrind has started failing
* [C++][Python] Sporadic asof_join failures in PyArrow
* [C++] Fix Valgrind error in string-to-float16 conversion (#41155)
* [C++] Stop defining ARROW_TEST_MEMCHECK in config.h.cmake (#41177)
* [C++] Fix mistake in integration test. Explicitly cast std::string to avoid compiler interpreting char
* -> bool (#41202) [#]# New Features and Improvements
* [C++] Filesystem implementation for Azure Blob Storage
* [C++] Implement cast to/from halffloat (#40067)
* [C++] Add residual filter support to swiss join (#39487)
* [C++] Add support for building with Emscripten (#37821)
* [C++][Python] Add missing methods to RecordBatch (#39506)
* [C++][Java][Flight RPC] Add Session management messages (#34817)
* [C++] build filesystems as separate modules (#39067)
* [C++][Parquet] Rewrite BYTE_STREAM_SPLIT SSE optimizations using xsimd (#40335)
* [C++] Add support for service-specific endpoint for S3 using AWS_ENDPOINT_URL_S3 (#39160)
* [C++][FS][Azure] Implement DeleteFile() (#39840)
* [C++] Implement Azure FileSystem Move() via Azure DataLake Storage Gen 2 API (#39904)
* [C++] Add ImportChunkedArray and ExportChunkedArray to/from ArrowArrayStream (#39455)
* [CI][C++][Go] Don\'t run jobs that use a self-hosted GitHub Actions Runner on fork (#39903)
* [C++][FS][Azure] Use the generic filesystem tests (#40567)
* [C++][Compute] Add binary_slice kernel for fixed size binary (#39245)
* [C++] Avoid creating memory manager instance for every buffer view/copy (#39271)
* [C++][Parquet] Minor: Style enhancement for parquet::FileMetaData (#39337)
* [C++] IO: Reuse same buffer in CompressedInputStream (#39807)
* [C++] Use more permissable return code for rename (#39481)
* [C++][Parquet] Use std::count in ColumnReader ReadLevels (#39397)
* [C++] Support cast kernel from large string, (large) binary to dictionary (#40017)
* [C++] Pass -jN to make in external projects (#39550)
* [C++][Parquet] Add integration test for BYTE_STREAM_SPLIT (#39570)
* [C++] Ensure top-level benchmarks present informative metrics (#40091)
* [C++] Ensure CSV and JSON benchmarks present a bytes/s or items/s metric (#39764)
* [C++] Ensure dataset benchmarks present a bytes/s or items/s metric (#39766)
* [C++][Gandiva] Ensure Gandiva benchmarks present a bytes/s or items/s metric (#40435)
* [C++][Parquet] Benchmark levels decoding (#39705)
* [C++][FS][Azure] Remove StatusFromErrorResponse as it\'s not necessary (#39719)
* [C++][Parquet] Make BYTE_STREAM_SPLIT routines type-agnostic (#39748)
* [C++][Device] Generic CopyBatchTo/CopyArrayTo memory types (#39772)
* [C++] Document and micro-optimize ChunkResolver::Resolve() (#39817)
* [C++] Allow building cpp/src/arrow/
*
*/
*.cc without waiting bundled libraries (#39824)
* [C++][Parquet] Parquet binary length overflow exception should contain the length of binary (#39844)
* [C++][Parquet] Minor: avoid creating a new Reader object in Decoder::SetData (#39847)
* [C++] Thirdparty: Bump google benchmark to 1.8.3 (#39878)
* [C++] DataType::ToString support optionally show metadata (#39888)
* [C++][Gandiva] Accept LLVM 18 (#39934)
* [C++] Use Requires instead of Libs for system RE2 in arrow.pc (#39932)
* [C++] Small CSV reader refactoring (#39963)
* [C++][Parquet] Expand BYTE_STREAM_SPLIT to support FIXED_LEN_BYTE_ARRAY, INT32 and INT64 (#40094)
* [C++][FS][Azure] Add support for reading user defined metadata (#40671)
* [C++][FS][Azure] Add AzureFileSystem support to FileSystemFromUri() (#40325)
* [C++][FS][Azure] Make attempted reads and writes against directories fail fast (#40119)
* [C++][Python] Basic conversion of RecordBatch to Arrow Tensor (#40064)
* [C++][Python] Basic conversion of RecordBatch to Arrow Tensor - add support for different data types (#40359)
* [C++][Python] Basic conversion of RecordBatch to Arrow Tensor - add option to cast NULL to NaN (#40803)
* [C++][FS][Azure] Implement DeleteFile() for flat-namespace storage accounts (#40075)
* [CI][C++] Add a job on ARM64 macOS (#40456)
* [C++][Parquet] Remove AVX512 variants of BYTE_STREAM_SPLIT encoding (#40127)
* [C++][Parquet][Tools] Print FIXED_LEN_BYTE_ARRAY length (#40132)
* [C++] Make S3 narrative test more flexible (#40144)
* [C++] Remove redundant invocation of BatchesFromTable (#40173)
* [C++][CMake] Use \"RapidJSON\" CMake target for RapidJSON (#40210)
* [C++][CMake] Use arrow/util/config.h.cmake instead of add_definitions() (#40222)
* [C++] Fix: improve the backpressure handling in the dataset writer (#40722)
* [C++][CMake] Improve description why we need to initialize AWS C++ SDK in arrow-s3fs-test (#40229)
* [C++] Add support for system glog 0.7 (#40275)
* [C++] Specialize ResolvedChunk::Value on value-specific types instead of entire class (#40281)
* [C++][Docs] Add documentation of array factories (#40373)
* [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed (#40329)
* [FlightRPC][C++][Java][Go] Add URI scheme to reuse connection (#40084)
* [C++] Add benchmark for ToTensor conversions (#40358)
* [C++] Define ARROW_FORCE_INLINE for non-MSVC builds (#40372)
* [C++] Add support for mold (#40397)
* [C++] Add support for LLD (#40927)
* [C++] Produce better error message when Move is attempted on flat-namespace accounts (#40406)
* [C++][ORC] Upgrade ORC to 2.0.0 (#40508)
* [CI][C++] Don\'t install FlatBuffers (#40541)
* [C++] Ensure pkg-config flags include -ldl for static builds (#40578)
* [Dev][C++][Python][R] Use pre-commit for clang-format (#40587)
* [C++] Rename Function::is_impure() to is_pure() (#40608)
* [C++] Add missing util/config.h in arrow/io/compressed_test.cc (#40625)
* [Python][C++] Support conversion of pyarrow.RunEndEncodedArray to numpy/pandas (#40661)
* [C++] Expand Substrait type support (#40696)
* [C++] Create registry for Devices to map DeviceType to MemoryManager in C Device Data import (#40699)
* [C++][Parquet] Minor enhancement code of encryption (#40732)
* [C++][Parquet] Simplify PageWriter and ColumnWriter creation (#40768)
* [C++] Re-order loads and stores in MemoryPoolStats update (#40647)
* [C++] Revert changes from PR #40857 (#40980)
* [C++] Correctly report asimd/neon in GetRuntimeInfo (#40857)
* [C++] Thirdparty: bump zstd to 1.5.6 (#40837)
* [Docs][C++][Python] Add initial documentation for RecordBatch::Tensor conversion (#40842)
* [C++][Python] Basic conversion of RecordBatch to Arrow Tensor - add support for row-major (#40867)
* [C++][Parquet] Encoding: Optimize DecodeArrow/Decode(bitmap) for PlainBooleanDecoder (#40876)
* [C++] Suppress shorten-64-to-32 warnings in CUDA/Skyhook codes (#40883)
* [C++] Fix unused function build error (#40984)
* [C++][Parquet] RleBooleanDecoder supports DecodeArrow with nulls (#40995)
* [C++][FS][Azure] Adjust DeleteDir/DeleteDirContents/GetFileInfoSelector behaviors against Azure for generic filesystem tests (#41068)
* [C++][Parquet] Avoid allocating buffer object in RecordReader\'s SkipRecords (#39818)- Drop apache-arrow-pr40230-glog-0.7.patch- Drop apache-arrow-pr40275-glog-0.7-2.patch- Belated inclusion of submission without changelog by Shani Hadiyanto )
* disable static devel packages by default: The CMake targets require them for all builds, if not disabled
* Add subpackages for Apache Arrow Flight and Flight SQL
* Sat Mar 23 2024 Ben Greiner - Update to 15.0.2 [#]# Bug Fixes
* [C++][Acero] Increase size of Acero TempStack (#40007)
* [C++][Dataset] Add missing Protobuf static link dependency (#40015)
* [C++] Possible data race when reading metadata of a parquet file (#40111)
* [C++] Make span SFINAE standards-conforming to enable compilation with nvcc (#40253)
* Wed Feb 28 2024 Ben Greiner - Reenable logging
* Add apache-arrow-pr40230-glog-0.7.patch
* Add apache-arrow-pr40275-glog-0.7-2.patch
* now requires glog devel files to be present for apache-arrow-devel; ArrowConfig.cmake fails otherwise
* gh#apache/arrow#40181
* gh#apache/arrow#40230
* gh#apache/arrow#40275
* Fri Feb 23 2024 Ben Greiner - Update to 15.0.1 [#]# Bug Fixes
* [C++] \"iso_calendar\" kernel returns incorrect results for array length > 32 (#39360)
* [C++] Explicit error in ExecBatchBuilder when appending var length data exceeds offset limit (int32 max) (#39383)
* [C++][Parquet] Pass memory pool to decoders (#39526)
* [C++][Parquet] Validate page sizes before truncating to int32 (#39528)
* [C++] Fix tail-word access cross buffer boundary in `CompareBinaryColumnToRow` (#39606)
* [C++] Fix the issue of ExecBatchBuilder when appending consecutive tail rows with the same id may exceed buffer boundary (for fixed size types) (#39585)
* [Release] Update platform tags for macOS wheels to macosx_10_15 (#39657)
* [C++][FlightRPC] Fix nullptr dereference in PollInfo (#39711)
* [C++] Fix tail-byte access cross buffer boundary in key hash avx2 (#39800)
* [C++][Acero] Fix AsOfJoin with differently ordered schemas than the output (#39804)
* [C++] Expression ExecuteScalarExpression execute empty args function with a wrong result (#39908)
* [C++] Strip extension metadata when importing a registered extension (#39866)
* [C#] Restore support for .NET 4.6.2 (#40008)
* [C++] Fix out-of-line data size calculation in BinaryViewBuilder::AppendArraySlice (#39994)
* [C++][CI][Parquet] Fixing parquet column_writer_test building (#40175) [#]# New Features and Improvements
* [C++] PollFlightInfo does not follow rule of 5
* [C++] Fix filter and take kernel for month_day_nano intervals (#39795)
* [C++] Thirdparty: Bump zlib to 1.3.1 (#39877)
* [C++] Add missing \"#include \" (#40010)- Release 15.0.0 [#]# Bug Fixes
* [C++] Bring back case_when tests for union types (#39308)
* [C++] Fix the issue of ExecBatchBuilder when appending consecutive tail rows with the same id may exceed buffer boundary (#39234)
* [C++][Python] Add a no-op kernel for dictionary_encode(dictionary) (#38349)
* [C++] Use the latest tagged version of flatbuffers (#38192)
* [C++] Don\'t use MSVC_VERSION to determin - fms-compatibility-version (#36595)
* [C++] Optimize hash kernels for Dictionary ChunkedArrays (#38394)
* [C++][Gandiva] Avoid registering exported functions multiple times in gandiva (#37752)
* [C++][Acero] Fix race condition caused by straggling input in the as-of-join node (#37839)
* [C++][Parquet] add more closed file checks for ParquetFileWriter (#38390)
* [C++][FlightRPC] Add missing app_metadata arguments (#38231)
* [C++][Parquet] Fix Valgrind memory leak in arrow-dataset-file-parquet-encryption-test (#38306)
* [C++][Parquet] Don\'t initialize OpenSSL explicitly with OpenSSL 1.1 (#38379)
* [C++] Re-generate flatbuffers C++ for Skyhook (#38405)
* [C++] Avoid passing null pointer to LZ4 frame decompressor (#39125)
* [C++] Add missing explicit size_t cast for i386 (#38557)
* [C++] Fix: add TestingEqualOptions for gtest functions. (#38642)
* [C++][Gandiva] Use arrow io util to replace std::filesystem::path in gandiva (#38698)
* [C++] Protect against PREALLOCATE preprocessor defined on macOS (#38760)
* [C++] Check variadic buffer counts in bounds (#38740)
* [C++][FS][Azure] Do nothing for CreateDir(\"/container\", true) (#38783)
* Fix TestArrowReaderAdHoc.ReadFloat16Files to use new uncompressed files (#38825)
* [C++] S3FileSystem export s3 sdk config \"use_virtual_addressing\" to arrow::fs::S3Options (#38858)
* [C++][Gandiva] Fix Gandiva to_date function\'s validation for supress errors parameter (#38987)
* [C++][Parquet] Fix spelling (#38959)
* [C++] Fix spelling (acero) (#38961)
* [C++] Fix spelling (compute) (#38965)
* [C++] Fix spelling (util) (#38967)
* [C++] Fix spelling (dataset) (#38969)
* [C++] Fix spelling (filesystem) (#38972)
* [C++] Fix spelling (#38978)
* [C++] Fix spelling (#38980)
* [C++][Acero] union node output batches should be unordered (#39046)
* [C++][CI] Fix Valgrind failures (#39127)
* [C++] Remove needless system Protobuf dependency with - DARROW_HDFS=ON (#39137)
* [C++][Compute] Fix negative duration division (#39158)
* [C++] Add missing data copy in StreamDecoder::Consume(data) (#39164)
* [C++] Remove compiler warnings with -Wconversion - Wno-sign-conversion in public headers (#39186)
* [C++][Benchmarking] Remove hardcoded min times (#39307)
* [C++] Don\'t use \"if constexpr\" in lambda (#39334)
* [C++] Disable -Werror=attributes for Azure SDK\'s identity.hpp (#39448)
* [C++] Fix compile warning (#39389)
* [CI][JS] Force node 20 on JS build on arm64 to fix build issues (#39499)
* [C++] Disable parallelism for jemalloc external project (#39522)
* [C++][Parquet] Fix crash in test_parquet_dataset_lazy_filtering (#39632)
* [C++] Disable parallelism for all `make`-based externalProjects when CMake >= 3.28 is used [#]# New Features and Improvements
* [C++][JSON] Change the max rows to Unlimited(int_32) (#38582)
* [C++][Python] Add \"Z\" to the end of timestamp print string when tz defined (#39272)
* [C++][Python] DLPack implementation for Arrow Arrays (producer) (#38472)
* [C++] Diffing of Run-End Encoded arrays (#35003)
* [C++][Python][R] Allow users to adjust S3 log level by environment variable (#38267)
* [C++][Format] Implementation of the LIST_VIEW and LARGE_LIST_VIEW array formats (#35345)
* [C++] Use Cast() instead of CastTo() for Scalar in test (#39044)
* [C++][Python][Parquet] Implement Float16 logical type (#36073)
* [C++] Add Utf8View and BinaryView to the c ABI (#38443)
* [C++][Parquet] Add api to get RecordReader from RowGroupReader (#37003)
* [C++] Expose a span converter for Buffer and ArraySpan (#38027)
* [C++] Add A Dictionary Compaction Function For DictionaryArray (#37418)
* [C++] Add arrow::ipc::StreamDecoder::Reset() (#37970)
* [C++] Implement file reads for Azure filesystem (#38269)
* [C++][Integration] Add C++ Utf8View implementation (#37792)
* [C++][Gandiva] Add external function registry support (#38116)
* [C++][Gandiva] Migrate LLVM JIT engine from MCJIT to ORC v2/LLJIT (#39098)
* [C++] Feature: support concatenate recordbatches. (#37896)
* [C++] Add support for specifying custom Array opening and closing delimiters to arrow::PrettyPrintDelimiters (#38187)
* [R] Allow code() to return package name prefix. (#38144)
* [C++][Benchmark] Add non-stream Codec Compression/Decompression (#38067)
* [C++][Parquet] Change DictEncoder dtor checking to warning log (#38118)
* [C++][Parquet] Support reading parquet files with multiple gzip members (#38272)
* [C++][Parquet] check the decompressed page size same as size in page header (#38327)
* [C++][Azure] Use properties for input stream metadata (#38524)
* [C++][FS][Azure] Implement file writes (#38780)
* [C++] Implement GetFileInfo for a single file in Azure filesystem (#38505)
* [C++][CMake] Use transitive dependency for system GoogleTest (#38340)
* [C++][Parquet] Use new encrypted files for page index encryption test (#38347)
* Add validation logic for offsets and values to arrow.array.ListArray.fromArrays (#38531)
* [C++][Acero] Create a sorted merge node (#38380)
* [C++][Benchmark] Adding benchmark for LZ4/Snappy Compression (#38453)
* [C++] Support LogicalNullCount for DictionaryArray (#38681)
* [C++][Parquet] Faster scalar BYTE_STREAM_SPLIT (#38529)
* [C++][Gandiva] Support registering external C functions (#38632)
* [C++] Implement GetFileInfo(selector) for Azure filesystem (#39009)
* [C++][FS][Azure] Implement CreateDir() (#38708)
* [C++][FS][Azure] Implement DeleteDir() (#38793)
* [C++][FS][Azure] Implement DeleteDirContents() (#38888)
* [C++] : Implement AzureFileSystem::DeleteRootDirContents (#39151)
* [C++][FS][Azure] Implement CopyFile() (#39058)
* [C++][Go][Parquet] Add tests for reading Float16 files in parquet-testing (#38753)
* [C++][FS][Azure] Rename AzurePath to AzureLocation (#38773)
* [C++] Implement directory semantics even when the storage account doesn\'t support HNS (#39361)
* [C++][Parquet] Update parquet.thrift to sync with 2.10.0 (#38815)
* [C++] Replace \"#ifdef ARROW_WITH_GZIP\" in dataset test to ARROW_WITH_ZLIB (#38853)
* [C++][Parquet] Using length to optimize bloom filter read (#38863)
* [C++][Parquet] Minor: making parquet TypedComparator operation as const method (#38875)
* [C++] DatasetWriter release rows_in_flight_throttle when allocate writing failed (#38885)
* [C++][Parquet] Move EstimatedBufferedValueBytes from TypedColumnWriter to ColumnWriter (#39055)
* [C++] Stop installing internal bpacking_simd
* headers (#38908)
* [C++][Gandiva] Refactor function holder to return arrow Result (#38873)
* [C++] Use Cast() instead of CastTo() for Dictionary Scalar in test (#39362)
* [C++] Use Cast() instead of CastTo() for Timestamp Scalar in test (#39060)
* [C++] Use Cast() instead of CastTo() for List Scalar in test (#39353)
* [C++][Parquet] Support row group filtering for nested paths for struct fields (#39065)
* [C++] Refactor the Azure FS tests and filesystem class instantiation (#39207)
* [C++][Parquet] Optimize FLBA record reader (#39124)
* Create module info compiler plugin (#39135)
* [C++] : Try to make Buffer::device_type_ non-optional (#39150)
* [C++][Parquet] Remove deprecated AppendRowGroup(int64_t num_rows) (#39209)
* [C++][Parquet] Avoid WriteRecordBatch from produce zero-sized RowGroup (#39211)
* [C++] Support binary to fixed_size_binary cast (#39236)
* [C++][Azure][FS] Add default credential auth configuration (#39263)
* [C++] Don\'t install bundled Azure SDK for C++ with CMake 3.28+ (#39269)
* [C++][FS] : Remove the AzureBackend enum and add more flexible connection options (#39293)
* [C++][FS] : Inform caller of container not-existing when checking for HNS support (#39298)
* [C++][FS][Azure] Add workload identity auth configuration (#39319)
* [C++][FS][Azure] Add managed identity auth configuration (#39321)
* [C++] Forward arguments to ExceptionToStatus all the way to Status::FromArgs (#39323)
* [C++] Flaky DatasetWriterTestFixture.MaxRowsOneWriteBackpresure test (#39379)
* [C++] Add ForceCachedHierarchicalNamespaceSupport to help with testing (#39340)
* [C++][FS][Azure] Add client secret auth configuration (#39346)
* [C++] Reduce function.h includes (#39312)
* [C++] Use Cast() instead of CastTo() for Parquet (#39364)
* [C++][Parquet] Vectorize decode plain on FLBA (#39414)
* [C++][Parquet] Style: Using arrow::Buffer data_as api rather than reinterpret_cast (#39420)
* [C++][ORC] Upgrade ORC to 1.9.2 (#39431)
* [C++] Use default Azure credentials implicitly and support anonymous credentials explicitly (#39450)
* [C++][Parquet] Allow reading dictionary without reading data via ByteArrayDictionaryRecordReader (#39153)- Disable logging until compatibility with glog is restored gh#apache/arrow#40181
* Mon Jan 15 2024 Ben Greiner - Update to 14.0.2 [#]# New Features and Improvements
* GH-38449 - [Release][Go][macOS] Use local test data if possible (#38450)
* GH-38591 - [Parquet][C++] Remove redundant open calls in ParquetFileFormat::GetReaderAsync (#38621) [#]# Bug Fixes
* GH-38345 - [Release] Use local test data for verification if possible (#38362)
* GH-38438 - [C++] Dataset: Trying to fix the async bug in Parquet dataset (#38466)
* GH-38577 - Reading parquet file behavior change from 13.0.0 to 14.0.0
* GH-38618 - [C++] S3FileSystem: fix regression in deleting explicitly created sub-directories (#38845)
* GH-38861 - [C++] Add missing “-framework Security” to Libs.private in arrow.pc (#38869)
* GH-39072 - [Release][CI] Python3.11-devel is required for the verification job on AlmaLinux 8 (#39073)
* GH-39074 - [Release][Packaging] Use UTF-8 explicitly for KEYS (#39082)
* Thu Jan 11 2024 pgajdosAATTsuse.com- disable some tests for s390x [bsc#1218592]
* Mon Nov 13 2023 Ondřej Súkup - update 14.0.1
* GH-38431 - [Python][CI] Update fs.type_name checks for s3fs tests
* GH-38607 - [Python] Disable PyExtensionType autoload- update to 14.0.1
* very long list of changes can be found here: https://arrow.apache.org/release/14.0.0.html
* Fri Aug 25 2023 Ben Greiner - Update to 13.0.0 [#]# Acero
* Handling of unaligned buffers is input nodes can be configured programmatically or by setting the environment variable ACERO_ALIGNMENT_HANDLING. The default behavior is to warn when an unaligned buffer is detected GH-35498. [#]# Compute
* Several new functions have been added: - aggregate functions “first”, “last”, “first_last” GH-34911; - vector functions “cumulative_prod”, “cumulative_min”, “cumulative_max” GH-32190; - vector function “pairwise_diff” GH-35786.
* Sorting now works on dictionary arrays, with a much better performance than the naive approach of sorting the decoded dictionary GH-29887. Sorting also works on struct arrays, and nested sort keys are supported using FieldRed GH-33206.
* The check_overflow option has been removed from CumulativeSumOptions as it was redundant with the availability of two different functions: “cumulative_sum” and “cumulative_sum_checked” GH-35789.
* Run-end encoded filters are efficiently supported GH-35749.
* Duration types are supported with the “is_in” and “index_in” functions GH-36047. They can be multiplied with all integer types GH-36128.
* “is_in” and “index_in” now cast their inputs more flexibly: they first attempt to cast the value set to the input type, then in the other direction if the former fails GH-36203.
* Multiple bugs have been fixed in “utf8_slice_codeunits” when the stop option is omitted GH-36311. [#]# Dataset
* A custom schema can now be passed when writing a dataset GH-35730. The custom schema can alter nullability or metadata information, but is not allowed to change the datatypes written. [#]# Filesystems
* The S3 filesystem now writes files in equal-sized chunks, for compatibility with Cloudflare’s “R2” Storage GH-34363.
* A long-standing issue where S3 support could crash at shutdown because of resources still being alive after S3 finalization has been fixed GH-36346. Now, attempts to use S3 resources (such as making filesystem calls) after S3 finalization should result in a clean error.
* The GCS filesystem accepts a new option to set the project id GH-36227. [#]# IPC
* Nullability and metadata information for sub-fields of map types is now preserved when deserializing Arrow IPC GH-35297. [#]# Orc
* The Orc adapter now maps Arrow field metadata to Orc type attributes when writing, and vice-versa when reading GH-35304. [#]# Parquet
* It is now possible to write additional metadata while a ParquetFileWriter is open GH-34888.
* Writing a page index can be enabled selectively per-column GH-34949. In addition, page header statistics are not written anymore if the page index is enabled for the given column GH-34375, as the information would be redundant and less efficiently accessed.
* Parquet writer properties allow specifying the sorting columns GH-35331. The user is responsible for ensuring that the data written to the file actually complies with the given sorting.
* CRC computation has been implemented for v2 data pages GH-35171. It was already implemented for v1 data pages.
* Writing compliant nested types is now enabled by default GH-29781. This should not have any negative implication.
* Attempting to load a subset of an Arrow extension type is now forbidden GH-20385. Previously, if an extension type’s storage is nested (for example a “Point” extension type backed by a struct), it was possible to load selectively some of the columns of the storage type. [#]# Substrait
* Support for various functions has been added: “stddev”, “variance”, “first”, “last” (GH-35247, GH-35506).
* Deserializing sorts is now supported GH-32763. However, some features, such as clustered sort direction or custom sort functions, are not implemented. [#]# Miscellaneous
* FieldRef sports additional methods to get a flattened version of nested fields GH-14946. Compared to their non-flattened counterparts, the methods GetFlattened, GetAllFlattened, GetOneFlattened and GetOneOrNoneFlattened combine a child’s null bitmap with its ancestors’ null bitmaps such as to compute the field’s overall logical validity bitmap.
* In other words, given the struct array [null, {\'x\': null}, {\'x\': 5}], FieldRef(\"x\")::Get might return [0, null, 5] while FieldRef(\"y\")::GetFlattened will always return [null, null, 5].
* Scalar::hash() has been fixed for sliced nested arrays GH-35360.
* A new floating-point to decimal conversion algorithm exhibits much better precision GH-35576.
* It is now possible to cast between scalars of different list-like types GH-36309.
* Mon Jun 12 2023 Ben Greiner - Update to 12.0.1
* [GH-35423] - [C++][Parquet] Parquet PageReader Force decompression buffer resize smaller (#35428)
* [GH-35498] - [C++] Relax EnsureAlignment check in Acero from requiring 64-byte aligned buffers to requiring value-aligned buffers (#35565)
* [GH-35519] - [C++][Parquet] Fixing exception handling in parquet FileSerializer (#35520)
* [GH-35538] - [C++] Remove unnecessary status.h include from protobuf (#35673)
* [GH-35730] - [C++] Add the ability to specify custom schema on a dataset write (#35860)
* [GH-35850] - [C++] Don\'t disable optimization with RelWithDebInfo (#35856)- Drop cflags.patch -- fixed upstream
* Thu May 18 2023 Ben Greiner - Update to 12.0.0
* Run-End Encoded Arrays have been implemented and are accessible (GH-32104)
* The FixedShapeTensor Logical value type has been implemented using ExtensionType (GH-15483, GH-34796) [#]# Compute
* New kernel to convert timestamp with timezone to wall time (GH-33143)
* Cast kernels are now built into libarrow by default (GH-34388) [#]# Acero
* Acero has been moved out of libarrow into it’s own shared library, allowing for smaller builds of the core libarrow (GH-15280)
* Exec nodes now can have a concept of “ordering” and will reject non-sensible plans (GH-34136)
* New exec nodes: “pivot_longer” (GH-34266), “order_by” (GH-34248) and “fetch” (GH-34059)
* Breaking Change: Reorder output fields of “group_by” node so that keys/segment keys come before aggregates (GH-33616) [#]# Substrait
* Add support for the round function GH-33588
* Add support for the cast expression element GH-31910
* Added API reference documentation GH-34011
* Added an extension relation to support segmented aggregation GH-34626
* The output of the aggregate relation now conforms to the spec GH-34786 [#]# Parquet
* Added support for DeltaLengthByteArray encoding to the Parquet writer (GH-33024)
* NaNs are correctly handled now for Parquet predicate push-downs (GH-18481)
* Added support for reading Parquet page indexes (GH-33596) and writing page indexes (GH-34053)
* Parquet writer can write columns in parallel now (GH-33655)
* Fixed incorrect number of rows in Parquet V2 page headers (GH-34086)
* Fixed incorrect Parquet page null_count when stats are disabled (GH-34326)
* Added support for reading BloomFilters to the Parquet Reader (GH-34665)
* Parquet File-writer can now add additional key-value metadata after it has been opened (GH-34888)
* Breaking Change: The default row group size for the Arrow writer changed from 64Mi rows to 1Mi rows. GH-34280 [#]# ORC
* Added support for the union type in ORC writer (GH-34262)
* Fixed ORC CHAR type mapping with Arrow (GH-34823)
* Fixed timestamp type mapping between ORC and arrow (GH-34590) [#]# Datasets
* Added support for reading JSON datasets (GH-33209)
* Dataset writer now supports specifying a function callback to construct the file name in addition to the existing file name template (GH-34565) [#]# Filesystems
* GcsFileSystem::OpenInputFile avoids unnecessary downloads (GH-34051) [#]# Other changes
* Convenience Append(std::optional...) methods have been added to array builders ([GH-14863](https://github.com/apache/arrow/issues/14863))
* A deprecated OpenTelemetry header was removed from the Flight library (GH-34417)
* Fixed crash in “take” kernels on ExtensionArrays with an underlying dictionary type (GH-34619)
* Fixed bug where the C-Data bridge did not preserve nullability of map values on import (GH-34983)
* Added support for EqualOptions to RecordBatch::Equals (GH-34968)
* zstd dependency upgraded to v1.5.5 (GH-34899)
* Improved handling of “logical” nulls such as with union and RunEndEncoded arrays (GH-34361)
* Fixed incorrect handling of uncompressed body buffers in IPC reader, added IpcWriteOptions::min_space_savings for optional compression optimizations (GH-15102)
* Mon Apr 03 2023 Andreas Schwab - cflags.patch: fix option order to compile with optimisation- Adjust constraints
* Wed Mar 29 2023 Ben Greiner - Remove gflags-static. It was only needed due to a packaging error with gflags which is about to be fixed in Tumbleweed- Disable build of the jemalloc memory pool backend
* It requires every consuming application to LD_PRELOAD libjemalloc.so.2, even when it is not set as the default memory pool, due to static TLS block allocation errors
* Usage of the bundled jemalloc as a workaround is not desired (gh#apache/arrow#13739)
* jemalloc does not seem to have a clear advantage over the system glibc allocator: https://ursalabs.org/blog/2021-r-benchmarks-part-1
* This overrides the default behavior documented in https://arrow.apache.org/docs/cpp/memory.html#default-memory-pool
* Sun Mar 12 2023 Ben Greiner - Update to v11.0.0
* ARROW-4709 - [C++] Optimize for ordered JSON fields (#14100)
* ARROW-11776 - [C++][Java] Support parquet write from ArrowReader to file (#14151)
* ARROW-13938 - [C++] Date and datetime types should autocast from strings
* ARROW-14161 - [C++][Docs] Improve Parquet C++ docs (#14018)
* ARROW-14999 - [C++] Optional field name equality checks for map and list type (#14847)
* ARROW-15538 - [C++] Expanding coverage of math functions from Substrait to Acero (#14434)
* ARROW-15592 - [C++] Add support for custom output field names in a substrait::PlanRel (#14292)
* ARROW-15732 - [C++] Do not use any CPU threads in execution plan when use_threads is false (#15104)
* ARROW-16782 - [Format] Add REE definitions to FlatBuffers (#14176)
* ARROW-17144 - [C++][Gandiva] Add sqrt function (#13656)
* ARROW-17301 - [C++] Implement compute function \"binary_slice\" (#14550)
* ARROW-17509 - [C++] Simplify async scheduler by removing the need to call End (#14524)
* ARROW-17520 - [C++] Implement SubStrait SetRel (UnionAll) (#14186)
* ARROW-17610 - [C++] Support additional source types in SourceNode (#14207)
* ARROW-17613 - [C++] Add function execution API for a preconfigured kernel (#14043)
* ARROW-17640 - [C++] Add File Handling Test cases for GlobFile handling in Substrait Read (#14132)
* ARROW-17798 - [C++][Parquet] Add DELTA_BINARY_PACKED encoder to Parquet writer (#14191)
* ARROW-17825 - [C++] Allow the possibility to write several tables in ORCFileWriter (#14219)
* ARROW-17836 - [C++] Allow specifying alignment of buffers (#14225)
* ARROW-17837 - [C++][Acero] Create ExecPlan-owned QueryContext that will store a plan\'s shared data structures (#14227)
* ARROW-17859 - [C++] Use self-pipe in signal-receiving StopSource (#14250)
* ARROW-17867 - [C++][FlightRPC] Expose bulk parameter binding in Flight SQL (#14266)
* ARROW-17932 - [C++] Implement streaming RecordBatchReader for JSON (#14355)
* ARROW-17960 - [C++][Python] Implement list_slice kernel (#14395)
* ARROW-17966 - [C++] Adjust to new format for Substrait optional arguments (#14415)
* ARROW-17975 - [C++] Create at-fork facility (#14594)
* ARROW-17980 - [C++] As-of-Join Substrait extension (#14485)
* ARROW-17989 - [C++][Python] Enable struct_field kernel to accept string field names (#14495)
* ARROW-18008 - [Python][C++] Add use_threads to run_substrait_query
* ARROW-18051 - [C++] Enable tests skipped by ARROW-16392 (#14425)
* ARROW-18095 - [CI][C++][MinGW] All tests exited with 0xc0000139
* ARROW-18113 - [C++] Add RandomAccessFile::ReadManyAsync (#14723)
* ARROW-18135 - [C++] Avoid warnings that ExecBatch::length may be uninitialized (#14480)
* ARROW-18144 - [C++] Improve JSONTypeError error message in testing (#14486)
* ARROW-18184 - [C++] Improve JSON parser benchmarks (#14552)
* ARROW-18206 - [C++][CI] Add a nightly build for C++20 compilation (#14571)
* ARROW-18235 - [C++][Gandiva] Fix the like function implementation for escape chars (#14579)
* ARROW-18249 - [C++] Update vcpkg port to arrow 10.0.0
* ARROW-18253 - [C++][Parquet] Add additional bounds safety checks (#14592)
* ARROW-18259 - [C++][CMake] Add support for system Thrift CMake package (#14597)
* ARROW-18280 - [C++][Python] Support slicing to end in list_slice kernel (#14749)
* ARROW-18282 - [C++][Python] Support step >= 1 in list_slice kernel (#14696)
* ARROW-18287 - [C++][CMake] Add support for Brotli/utf8proc provided by vcpkg (#14609)
* ARROW-18342 - [C++] AsofJoinNode support for Boolean data field (#14658)
* ARROW-18350 - [C++] Use std::to_chars instead of std::to_string (#14666)
* ARROW-18367 - [C++] Enable the creation of named table relations (#14681)
* ARROW-18373 - Fix component drop-down, add license text (#14688)
* ARROW-18377 - MIGRATION: Automate component labels from issue form content (#15245)
* ARROW-18395 - [C++] Move select-k implementation into separate module
* ARROW-18402 - [C++] Expose DeclarationInfo (#14765)
* ARROW-18406 - [C++] Can\'t build Arrow with Substrait on Ubuntu 20.04 (#14735)
* ARROW-18409 - [GLib][Plasma] Suppress deprecated warning in building plasma-glib (#14739)
* ARROW-18413 - [C++][Parquet] Expose page index info from ColumnChunkMetaData (#14742)
* ARROW-18419 - [C++] Update vendored fast_float (#14817)
* ARROW-18420 - [C++][Parquet] Introduce ColumnIndex & OffsetIndex (#14803)
* ARROW-18421 - [C++][ORC] Add accessor for stripe information in reader (#14806)
* ARROW-18427 - [C++] Support negative tolerance in AsofJoinNode (#14934)
* ARROW-18435 - [C++][Java] Update ORC to 1.8.1 (#14942)
* GH-14869 - [C++] Add Cflags.private defining _STATIC to .pc.in. (#14900)
* GH-14920 - [C++][CMake] Add missing -latomic to Arrow CMake package (#15251)
* GH-14937 - [C++] Add rank kernel benchmarks (#14938)
* GH-14951 - [C++][Parquet] Add benchmarks for DELTA_BINARY_PACKED encoding (#15140)
* GH-15072 - [C++] Move the round functionality into a separate module (#15073)
* GH-15074 - [Parquet][C++] change 16-bit page_ordinal to 32-bit (#15182)
* GH-15096 - [C++] Substrait ProjectRel Emit Optimization (#15097)
* GH-15100 - [C++][Parquet] Add benchmark for reading strings from Parquet (#15101)
* GH-15151 - [C++] Adding RecordBatchReaderSource to solve an issue in R API (#15183)
* GH-15185 - [C++][Parquet] Improve documentation for Parquet Reader column_indices (#15184)
* GH-15199 - [C++][Substrait] Allow AGGREGATION_INVOCATION_UNSPECIFIED as valid invocation (#15198)
* GH-15200 - [C++] Created benchmarks for round kernels. (#15201)
* GH-15216 - [C++][Parquet] Parquet writer accepts RecordBatch (#15240)
* GH-15226 - [C++] Add DurationType to hash kernels (#33685)
* GH-15237 - [C++] Add ::arrow::Unreachable() using std::string_view (#15238)
* GH-15239 - [C++][Parquet] Parquet writer writes decimal as int32/64 (#15244)
* GH-15290 - [C++][Compute] Optimize IfElse kernel AAS/ASA case when the scalar is null (#15291)
* GH-33607 - [C++] Support optional additional arguments for inline visit functions (#33608)
* GH-33657 - [C++] arrow-dataset.pc doesn\'t depend on parquet.pc without ARROW_PARQUET=ON (#33665)
* PARQUET-2179 - [C++][Parquet] Add a test for skipping repeated fields (#14366)
* PARQUET-2188 - [parquet-cpp] Add SkipRecords API to RecordReader (#14142)
* PARQUET-2204 - [parquet-cpp] TypedColumnReaderImpl::Skip should reuse scratch space (#14509)
* PARQUET-2206 - [parquet-cpp] Microbenchmark for ColumnReader ReadBatch and Skip (#14523)
* PARQUET-2209 - [parquet-cpp] Optimize skip for the case that number of values to skip equals page size (#14545)
* PARQUET-2210 - [C++][Parquet] Skip pages based on header metadata using a callback (#14603)
* PARQUET-2211 - [C++] Print ColumnMetaData.encoding_stats field (#14556)- Remove unused python3-arrow package declaration
* Add options as recommended for python support- Provide test data for unittests- Don\'t use system jemalloc but bundle it in order to avoid static TLS errors in consuming packages like python-pyarrow
* gh#apache/arrow#13739
* Sun Aug 28 2022 Stefan Brüns - Revert ccache change, using ccache in a pristine buildroot just slows down OBS builds (use --ccache for local builds).- Remove unused gflags-static-devel dependency.
* Mon Aug 22 2022 John Vandenberg - Speed up builds with ccache
* Sat Aug 06 2022 Stefan Brüns - Update to v9.0.0 No (current) changelog provided- Spec file cleanup:
* Remove lots of duplicate, unused, or wrong build dependencies
* Do not package outdated Readmes and Changelogs- Enable tests, disable ones requiring external test data