|
|
|
|
Changelog for openucx-tools-1.15.0-2.2.x86_64.rpm :
* Mon Feb 26 2024 Dominique Leuenberger - Use %patch -P N instead of deprecated %patchN. * Mon Oct 02 2023 Nicolas Morey - Update to 1.15.0 - UCP - Added 2-stage pipeline protocol in the new protocol infrastructure - Added reset and abort functionality of rendezvous protocols in the new infrastructure - Added zero-copy rendezvous data send protocol in the new infrastructure - Added support for user memory handle in the new protocol infrastructure - Added option to force ODP registration for certain memory types - Enabled lock free memory region deregistration - Updated allow/deny transport list feature to control auxiliary transport selection - Multiple performance improvements of the new protocol infrastructure - Multiple improvements in error and debug messages - Fixed assertion when sending from non-contiguous GPU buffer to managed buffer - Fixed the race condition on endpoint configurations - Fixed endpoint reconfiguration issues due to asymmetrical selection - Fixed endpoint reconfiguration error due to wrong locality detection - Fixed crash during connection manager cleanup - Fixed rkey index calculation for rendezvous protocol - Fixed rcache dump function - Removed logging from rkey unpack in release mode - Fixed dobule free of rkey in rendezvous protocol - Fixed rendezvous pipeline protocol error flow - Fixed error handling in rendezvous get zcopy protocol - Replay pending requests of wireup EP CM during connection establishment to prevent potential ordering issues and wrong configuration - Pass user-provided memory type to the function that checks whether the buffer can be sent inline or not - Avoid memory registration during UCP context initialization - Fixed CPU/device atomics selection in the new protocol infrastructure - Multiple fixes in the new protocol infrastructure information output - UCT - Split UCT_MD_MKEY_PACK_FLAG_INVALIDATE into two flags for RMA and AMO - Added put_zcopy and get_zcopy scheme support for self transport - Added base implementation of is_reachable_v2 API using intra/inter flag - Introduced MD capability for non-blocking registration memory types - Added check for dmabuf kernel support in ROCm memory domain - Fixed exported memh packing - Fixed an error in checking return status of multi-threaded memory registration function - RDMA CORE (IB, ROCE, etc.) - Added implementation of is_reachable_v2 routine to IB interface - Added option to control CQE zipping per CQ RX/TX direction - Added option to specify how DCI selects port under RoCE LAG - Added hw_dcs to the list of policies to select DCI by an endpoint - Removed implicit on-demand paging - Added option to set RoCE lag dct port for response under queue affinity mode - Improved IB memlock limit logging - Fixed dma-buf based memory region registration - Fixed memory handle data corruption when PCIe relaxed ordering is enabled - Fixed performance degradation when indirect atomic key is not supported by the hardware - Fixed remote access error to strict-order keys because of wrong offset - Added check for UAR support to memory domain opening - Fixed updating port counters for devx qp - Fixed ibv_create_cq error message on node without Infiniband - Fixed performance degradation due to using 2 paths on NDR400 by default - Removed unnecessary async lock which otherwise would block UD progress - UCS - Added ucs_string_buffer_rbrk() to split token - Fixed lane selection and added bandwidth estimation for Sapphire Rapids family - Fixed displaying wrong environment variable suggestions - Fixed VFS warning output - Fixed SEGV in ucs_debug_backtrace_next(), upon previous SEGV handling, due to ENOMEM situation - Fixed memory corruption when using UCX_MPOOL_FIFO=y - UCM - Fixed conditional jump patching - Fixed mremap() override - Tests - Added a rocm docker container for testing - Added option to send client_id in iodemo test - Added support for multiple connections to the same server in iodemo test - Added synchronization before exit to hello world examples - Fixed wrong usage of ep_close in examples - Tools - Added user-side memcpy option for AM benchmarks in ucx_perftest - Added wireshark LUA dissectors for some UCX protocols - Fixed memory access flags in perftest - Removed support for librte from perf - Fixed worker flush deadlock when using multiple workers in ucx_perftest - Build - Added support for binutils 2.40 - Added versioned dependency to switch between packages with the same names - Added a separate xpmem deb subpackage - Added aarch64 support to the binary distribution pipeline - Removed dependency on libnuma - Documentation - Updated ucp_worker_release_address description- Refresh openucx-s390x-support.patch against latest souces * Tue Jul 25 2023 Nicolas Morey - Update to v1.14.1 - Fixed ROCm to prevent the locking of host pinned memory - Added CUDA 12 based UCX builds to the release flow - Increased the maximal number of endpoint configurations - Fixed filter for a slow-lanes in selection logic - Fixed TCP transport bandwidth calculation - Fixed device detection for ROCM - Fixed compatibility with CUDA 12 - Fixed rendezvous threshold for multi-path configurations - Fixed error message in case of static link - Fixed BlueField-3 detection - Multiple fixes for Azure CI pipeline * Mon Mar 20 2023 Nicolas Morey - Update to v1.14.0 - UCP - Added API for querying transport and device names on endpoint - Added API for querying datatype object - Added API for exporting and importing memory keys (no implementation yet) - Added support for non-persistent active message header - Added infrastructure to print protocols v2 performance - Multiple performance improvements for protocols v2 - Added support for non-contiguous datatypes for rendezvous protocols v2 - Added support for reset and abort request in protocols v2 - Added support for user memory handles in RMA API - Added multi-rail support for RMA API in protocols v2 - Added support for up to 16 different lanes per endpoint - Added support for dmabuf memory registration in protocols v2 - Added strong fence mode for ucp_worker_fence() API - UCT - Added new uct_md_mem_attach() API to support exported memory handles - Added remote completion mode for endpoint flush (via new flag) - Added support for dmabuf registration - Added new uct_ep_connect_to_ep_v2() API - Added new uct_mem_reg_v2() API - Added new uct_md_query_v2() API - Added support for IPv6 loopback address in TCP transport - RDMA CORE (IB, ROCE, etc.) - Added ECE (enhanced connection establishment) support for RC and DC transports - Added support for hardware DCS in DC transport - Added UD interface and endpoint resource information to VFS - Added CQ creation via DEVX API - Removed support for accelerated IB transports over legacy experimental verbs - UCS - Added support for auto-correction of user environment variables - UCM - Implemented CUDA bistro hooks for aarch64 (to enable memory cache on this platform) - Added support for CUDA virtual/stream-ordered memory with cudaMallocAsync - Documentation - Added FAQ for using pkg-config tool to build applications with UCX - Tools - Added runtime library version to the \'ucx_info -v\' output - Added support for memory types in ucx_info - Many bugfixes. See NEWS.- Drop patch merged upstream: - UCS-DEBUG-replace-PTR-with-void.patch - gcc13-fix.patch- Refresh openucx-s390x-support.patch * Mon Mar 06 2023 Martin Liška - Add upstream gcc13-fix.patch fix. * Mon Jan 16 2023 Andreas Schwab - openucx-s390x-support.patch: fix use of clz builtin for 64-bit value * Tue Oct 04 2022 Nicolas Morey-Chaisemartin - Update openucx-s390x-support.patch to add missing ucs_ffs32 on s390x- Drop baselibs.conf as openucx only works on 64b systems * Tue Sep 27 2022 Nicolas Morey-Chaisemartin - Update to v1.13.1 (jsc#PED-912) - Core - Added new objects to VFS: local and remote address of endpoint, statistics of ucp_ep_create success/failure, failed/destroyed endpoints - Added support for UCX static libraries - Added profiling for rkey management routines - PCIe relaxed order enabled by default for AMD CPUs - Fixed not deallocating memory from ucp_mem_unmap if no rcache - Fixed versioning infrastructure - Multiple code improvements: refactoring, debug prints and assertions, etc. - Multiple improvements in build, test and docs infrastructure - Added new objects to VFS (md, component, log_level, etc.) - Added configuration variable to specify which loadable modules are allowed - Added build-time configuration to disable sigaction overriding - UCP - Added API to pass pre-registered memory handle to UCP operations - Added implementation of AM rendezvous protocol - Added 2-stage pipeline rendezvous protocol for GPU - Added support for fragment mem_type for v1 pipeline proto, disabled by default - Added active message support for proto v2 - Added UCP memory registration cache - Improved adaptive progress - deactivate iface when all p2p lanes are destroyed - Added support for user memh in proto_v1 - Added support for selecting local address when creating a client endpoint - Added option to limit GPUDirectRDMA size in rendezvous protocol, UCX_RNDV_MEMTYPE_DIRECT_SIZE - Deprecated UCX_SOCKADDR_AUX_TLS configuration parameter - Resolving remote EP ID when creating local EP disabled by default - Added client_id to ucp_worker_create() and ucp_conn_request_query() APIs - Added ucp_worker_address_query() API - Updated ucp_ep_query() API for getting local and remote addresses - Added address versioning to correctly preserve wire compatibility starting from version 1.11.0 - Added new client/server connection establishment packet header format - Enabled rendezvous and tag sync protocols when error handling is enabled on the endpoint - Added iov zcopy support to RMA operations - Reduced memory usage of unexpected messages by fitting receive buffer size to packet size - Added support for modifying UCT and UCS configs by ucp_config_modify() API - Optimized unpacked rkeys memory consumption - Added request flag to influence latency vs. bandwidth protocol - Reduced memory management overhead with new protocols - Improved performance calculations for new protocols - Added AMO support with GPU memory target using new protocols - Added put_zcopy, get_zcopy and pipeline based rendezvous in new protocols - Added support for user-defined alignment in Active Messages - Added support for offload tag sync in new protocols - Updated ucp_atomic_post() to use NBX flow - UCT - Introduced API uct_md_mkey_pack_v2 - Introduced UCT iface features API - Introduced max_inflight_eps parameter in perf_attr API - Introduced UCT_SEND_FLAG_PEER_CHECK flag that forces checking connectivity to a peer - Introduced UCX_RCACHE_PURGE_ON_FORK to enable/disable cleaning regions when application is forking - Disabled PEER_FAILURE capability for XPMEM - Added API - uct_iface_is_reachable_v2() - Added IPv6 address support in TCP - Added latency estimation to uct_iface_estimate_perf() - Adjusted knem and cma overhead cost - Increased built-in TCP keep-alive interval to 2 seconds - RDMA CORE (IB, ROCE, etc.) - Introduced NDR autorecognition - Introduced CQE zipping support - Set the default MAX_RD_ATOMIC to maximum value supported by the hardware - Disabled mlx5 ifaces on verbs MD - Added detection of IB NDR devices - Added check for CQ overrun in assert mode - Added bitmap usage for releasing detached DCIs - Added configuration for requests ack frequency with DevX - Added remote QP info to tx error CQE traces - ROCM - Increased maximum number of HSA agents - UCS - Added topo module infrastructure - Added memtrack and rcache information to VFS - Added API for a per-process aggregate-sum statistics report - Added memory pool set data structure - Added new ptr_array API for bulk allocation - Added ucs_string_buffer_append_flags() for string buffer - Added ucs_ffs32() - Added ucs_vsnprintf_safe() which always adds \'\\0\' - Added thread-safe put to ptr_map - Improved accuracy of the topology distance estimation - Added prints of leaked callbacks from the callback queue - Removed a diagnostic message when fuse thread is stopped - Added configurable limit for the memory consumed by rcache - Added configuration for VFS(FUSE) thread affinity - Added memory limit support to memtrack - Packaging - Added cmake config files for better integration with external cmake based projects - Tools - Added loop-back transport support in ucx_perftest - Split ucx_perftest into separate modules - Added process placement option for ucx_info - Extended parameters correctness check in ucx_perftest- Backported UCS-DEBUG-replace-PTR-with-void.patch from upstream to fix compilation * Thu Jan 13 2022 Nicolas Morey-Chaisemartin - Fix UCM bistro support on non s390x archs- Add ucm-fix-UCX_MEM_MALLOC_RELOC.patch to disable malloc relocations by default (bsc#1194369) * Thu Sep 23 2021 Nicolas Morey-Chaisemartin - Update to v1.11.1 (jsc#SLE-19260) * Wed Feb 24 2021 Nicolas Morey-Chaisemartin - Update openucx-s390x-support.patch to fix mmap syscall on s390x (bsc#1182691) - Core: - Added support for UCX monitoring using virtual file system (VFS)/FUSE - Added support for applications with static CUDA runtime linking - Added support for a configuration file - Updated clang format configuration - UCP - Added rendezvous API for active messages - Added user-defined name to context, worker, and endpoint objects - Added flag to silence request leak check - Added API for endpoint performance evaluation - Added API - ucp_request_query - Added API - ucp_lib_query - Added bandwidth optimizations for new protocols multi-lane - Added support for multi-rail over lanes with BW ratio >= 1/4 - Added support for tracking outstanding requests and aborting those in case of connection failure - Refactored keep-alive protocol - Added device id to wireup protocol - Added support up to 128 transport layer resources in UCP context - Added support CUDA memory allocations with ucp_mem_map - Increased UCP_WORKER_MAX_EP_CONFIG to 64 - Adjusted memory type zcopy threshold when UCX_ZCOPY_THRESH set - Refactored wireup protocols, rendezvous, get, zcopy protocols - Added put zcopy multi-rail - Improved logging for new protocols - Added system topology information - Added new protocols for eager offload protocols - UCT - Extended connection establishment API - Added active message AM alignment in iface params - Added active message short IOV API. - Added support for interface query by operation and memory type - Added API to get allocation base address and length - Added md_dereg_v2 API - UCS - Added log filter by source file name. - Added checking for last element in fraglist queue - Added a method to get IP address from sockaddr. - Added memory usage limits to registration cache - RDMA CORE (IB, ROCE, etc.) - Added report of QP info in case of completion with error - Refactored of FC send operations - Added support for DevX unique QPN allocation - Optimized endpoint lookup for DCI - Added support for RDMA sub-function (SF) - Added support for DCI via DEVX - Added DCI pool per LAG port - Added support for RoCE IP reachability check using a subnet mask - Added active message short IOV for UD/DC/RC mlx, UD/RC verbs - Added endpoint keep alive check for UD - Suppressed warning if device can\'t be opened - Added support for multiple flush cancel without completion - Added ignore for devices with invalid GID - Added support for SRQ linked list reordering - Added flush by flow control on old devices - Added support for configurable rdma_resolve_addr/route timeout - Shared memory - Added active message short IOV support for posix, sysv, and self transports - TCP - Added support for peer failure in case of CONNECT_TO_EP - Added support for active message short IOV - See NEWS for a complete changelog and bug fixes- Refresh openucx-s390x-support against latest sources
|
|
|