|
|
|
|
Changelog for libopenblas_serial-devel-0.3.5-64.2.x86_64.rpm :
* Mon Jan 07 2019 Ismail Dönmez - Update to versiom 0.3.5 common: * Loop unrolling in TRMV has been enabled again. * A domain error in the thread workload distribution for SYRK has been fixed. * gmake builds will now automatically add -fPIC to the build options if the platform requires it. * A pthreads key leakage (and associate crash on dlclose) in the USE_TLS codepath was fixed. * Building of the utest cases on systems that do not provide an implementation of complex.h was fixed. x86_64: * The SkylakeX code was changed to compile on OSX. * Unwanted application of the -march=skylake-avx512 option to the common code parts of a DYNAMIC_ARCH build was fixed. * Improved performance of SGEMM for small workloads on Skylake X. * Performance of SGEMM and DGEMM was improved on Haswell. armv8: * A configuration error that broke the CNRM2 kernel was corrected. * Compilation of the GEMM kernels with CMAKE was fixed. * DYNAMIC_ARCH builds are now available with CMAKE as well. * Using CMAKE for cross-compilation to the new cpu TARGETs introduced in 0.3.4 now works. power: * A problem in cpu autodetection for AIX has been corrected. * Fri Dec 07 2018 Ismail Dönmez - Update to version 0.3.4 common: * The new, experimental thread-local memory allocation had inadvertently been left enabled for gmake builds in 0.3.3 despite the announcement. It is now disabled by default, and single-threaded builds will keep using the old allocator even if the USE_TLS option is turned on. * OpenBLAS will now provide enough buffer space for at least 50 threads by default. * The output of openblas_get_config() now contains the version number. * A serious thread safety bug in GEMV operation with small M and large N size has been fixed. * The code will now automatically call blas_thread_init after a fork if needed before handling a call to openblas_set_num_threads * Accesses to parallelized level3 functions from multiple callers are now serialized to avoid thread races (unless using OpenMP). * This should provide better performance than the known-threadsafe (but non-default) USE_SIMPLE_THREADED_LEVEL3 option. * When building LAPACK with gfortran, -frecursive is now (again) enabled by default to ensure correct behaviour. * The OpenBLAS version cblas.h now supports both CBLAS_ORDER and CBLAS_LAYOUT as the name of the matrix row/column order option. * Externally set LDFLAGS are now passed through to the final compile/link * steps to facilitate setting platform-specific linker flags. * A potential race condition during the build of LAPACK (that would usually manifest itself as a failure to build TESTING/MATGEN) has been fixed. * xHEMV has been changed to stay single-threaded for small input sizes where the overhead of multithreading exceeds any possible gains * CSWAP and ZSWAP have been limited to a single thread except on ARMV8 or ThunderX hardware with sizable input. * Linker flags for the PGI compiler have been updated * Behaviour of AXPY with zero increments is now handled in the C interface, correcting the result on at least Intel Atom. * The result matrix from calling SGELSS with an all-zero input matrix is now zeroed completely. x86_64: * Autodetection of AMD Ryzen2 has been fixed (again). * CMAKE builds now support labeling of an INTERFACE64=1 build of the library with the _64 suffix. * AVX512 version of DGEMM has been added and the AVX512 SGEMM kernel has been sped up by rewriting with C intrinsics * Fixed compilation on RHEL5/CENTOS5 (issue with typename __WAIT_STATUS) armv8: * DYNAMic_ARCH support is now available for 64bit ARM * cross-compiling for ARMV8 under iOS now works. * cpu-specific code has been rearranged to make better use of both hardware commonalities and model-specific compiler optimizations. * XGENE1 has been removed as a TARGET, superseded by the improved generic ARMV8 support. armv7: * Older assembly mnemonics have been converted to UAL form to allow building with clang 7.0 * Tue Oct 09 2018 Dmitry Roshchin - Update to version 0.3.3 common: * thread memory allocation has been switched back to the method used before version 0.3.1 due to unexpected problems caused by the new code under some circumstances. * LAPACK PR272 has been integrated, which fixes spurious errors in DSYEVR and related functions caused by missing conversion from ILAENV to ILAENV_2STAGE in several _2stage routines. x86_64 * added AVX512 implementations of SDOT, DDOT, SAXPY, DAXPY, DSCAL, DGEMVN and DSYMVL * added a workaround for a cygwin issue that prevented compilation of AVX512 code * Fri Aug 17 2018 idonmezAATTsuse.com- Update to version 0.3.2 common: * Fixes for regressions caused by the rewrite of the thread initialization code in 0.3.1 x86_64: * Added autodetection of AMD Ryzen 2 * Fixed build with older versions of MSVC power: * Fixed cpu autodetection for the BSDs mips64: * Fixed utest errors in AXPY, DSDOT, ROT and SWAP- Version 0.3.1 common: * Rewritten thread initialization code with significantly reduced overhead * Added CBLAS interfaces to the IxAMIN BLAS extension functions * Fixed the lapack-test target * CMAKE builds now create an OpenBLASConfig.cmake file * ZAXPY now uses a single thread for small input sizes * The LAPACK code was updated from Reference-LAPACK/lapack#253 power: * Corrected CROT and ZROT behaviour with zero INC_X armv7: * Corrected xDOT behaviour with zero INC_X or INC_Y x86_64: * Retired some older targets of DYNAMIC_ARCH builds to a new option DYNAMIC_OLDER, this affects PENRYN,DUNNINGTON, OPTERON,OPTERON_SSE3,BOBCAT,ATOM and NANO (which will still be supported via the slower PRESCOTT kernels when this option is not set) * Added an option DYNAMIC_LIST that (used in conjunction with DYNAMIC_ARCH) allows to specify the list of x86_64 targets to include. Any target not on the list will be supported by the Sandybridge or Nehalem kernels if available, or by Prescott. * Improved SWITCH_RATIO on Haswell for increased GEMM throughput * Added initial support for Intel Skylake X, including an AVX512 SGEMM kernel * Added autodetection of Intel Cannon Lake series as Skylake X * Added a default L2 cache size for hypervisors that return zero here (Chromebook) * Fixed a name clash with recent Windows10 headers that broke the build with (at least) recent mingw from MSYS2 * Fixed a link error in mixed clang/gfortran builds with OpenMP * Updated the OSX deployment target to 10.8 * Switched on parallel make for builds on MS Windows by default x86: * Fixed SSWAP and DSWAP behaviour with zero INC_X and INC_Y- Version 0.3.0 common: * Fixed some more thread race and locking bugs * Added preliminary support for calling an OpenMP build of the library from multiple threads * Removed performance impact of thread locks added in 0.2.20 on OpenMP code * General code cleanup * Optimized DSDOT implementation * Improved thread distribution for GEMM * Corrected IMATCOPY/OMATCOPY implementation * Fixed out-of-bounds accesses in the multithreaded xBMV/xPMV and SYMV implementations * Cmake build improvements * pkgconfig file now contains build options * openblas_get_config() now reports USE_OPENMP and NUM_THREADS settings used for the build * Corrections and improvements for systems with more than 64 cpus * LAPACK code updated to 3.8.0 including later fixes * Added ReLAPACK, a recursive implementation of several LAPACK functions * Rewrote ROTMG to handle cases that the netlib code failed to address * Disabled (broken) multithreading code for xTRMV * corrected prototypes of complex CBLAS functions to make our cblas.h match the generally accepted standard * Shared memory access failures on startup are now handled more gracefully * Restored utests from earlier releases (and made them pass on all affected systems) sparc: * several fixes for cpu autodetection arm: * Added support for CortexA53 and A72 * Added autodetection for ThunderX2T99 * Made most optimized kernels the default for generic ARMv8 targets x86_64: * Parallelized DDOT kernel for Haswell * Changed alignment directives in assembly kernels to boost performance on OSX * Fixed register handling in the GEMV microkernels (bug exposed by gcc7) * Added support for building on OpenBSD and Dragonfly * Updated compiler options to work with Intel release 2018 * Support fully optimized build with clang/flang on Microsoft Windows * Fixed building on AIX ibm z: * added optimized BLAS 1/2 functions mips: * Fixed cpu autodetection helper code * Added mips32 1004K cpu (Mediatek MT7621 and similar SoC) * Added mips64 I6500 cpu- Remove c_xerbla_no-void-return.patch: fixed upstream. * Tue Jan 30 2018 roAATTsuse.de- add openblas-s390.patch to build on s390 (bsc#1079513). * Fri Jan 05 2018 eichAATTsuse.com- Switch from gcc6 to gcc7 as additional compiler flavor for HPC on SLES.- Fix library package requires - use HPC macro (boo#1074890).- Fix unexpanded rpm macro in environment module file for HPC (boo#1074897). * Mon Nov 27 2017 normandAATTlinux.vnet.ibm.com- Add -mvsx option for ppc64 archi (not required for ppc64le) to avoid ./kernel/power/sasum_microk_power8.c:41:3: error: \'__vector\' undeclared (first use in this function); ... * Tue Oct 17 2017 eichAATTsuse.com- Add magic to limit the number of flavors built in the OBS to non-HPC ones. * Thu Oct 12 2017 eichAATTsuse.com- Generate baselib.conf dynamically and only for the non-HPC builds: this avoids issues with the source validator. * Fri Sep 08 2017 eichAATTsuse.com- Convert openblas to multibuild.- Add HPC build using environment modules. (FATE#321708).- fix-arm64-cpuid-return.patch Fix CPUID detection on ARM (From OHPC). * Wed Aug 09 2017 dmitry_rAATTopensuse.org- Remove migration %post scripts for old library names * Sat Jul 29 2017 badshah400AATTgmail.com- Update to version 0.2.20: * common: - Improved CMake support - Fixed several thread race and locking bugs - Fixed default LAPACK optimization level - Updated LAPACK to 3.7.0 - Added ReLAPACK (https://github.com/HPAC/ReLAPACK), make BUILD_RELAPACK=1 * POWER: - Optimizations for Power9 - Fixed several Power8 assembly bugs * ARM: - New optimized Vulcan and ThunderX2T99 targets - Support for ARMV7 SOFT_FP ABI (make ARM_SOFTFP_ABI=1) - Detect all cpu cores including offline ones - Fix compilation with CLANG - Support building a shared library for Android * MIPS: - Fixed several threading issues - Fix compilation with CLANG * x86_64: - Detect Intel Bay Trail and Apollo Lake - Detect Intel Sky Lake and Kaby Lake - Detect Intel Knights Landing - Detect AMD A8, A10, A12 and Ryzen - Support 64bit builds with Visual Studio - Fix building with Intel and PGI compilers - Fix building with MINGW and TDM-GCC - Fix cmake builds for Haswell and related cpus - Fix building for Sandybridge with CLANG 3.9 - Add support for the FLANG compiler * IBM Z: - New target z13 with BLAS3 optimizations- Drop 0001-Fix-power8-asm.patch; fixed upstream.- Minor rebase of c_xerbla_no-void-return.patch and openblas-noexecstack.patch for updated version.- Remove installed pkgconfig file as it is not adapted to the library names we use. * Thu May 18 2017 meissnerAATTsuse.com- 0001-Fix-power8-asm.patch: fixed power8 assembly (bsc#1039397) * Wed Sep 07 2016 idonmezAATTsuse.com- Update to version 0.2.19 POWER: * Optimize BLAS on Power8 * Fixed Julia+OpenBLAS bugs on Power8 MIPS: * Optimize BLAS on MIPS P5600 and I6400 ARM: * Improved on ARM Cortex-A57 * Wed Apr 13 2016 dmitry_rAATTopensuse.org- Update to version 0.2.18 ARM: * Provide DGEMM 8x4 kernel for Cortex-A57 POWER: * Optimize S and C BLAS3 on Power8 * Optimize BLAS2/1 on Power8 * Mon Mar 21 2016 dmitry_rAATTopensuse.org- Update to version 0.2.17 * Enable BUILD_LAPACK_DEPRECATED=1 by default. * Wed Mar 16 2016 idonmezAATTsuse.com- Update to version 0.2.16 * Upgrade LAPACK to 3.6.0 version. * Disable multi-threading for small size swap and ger. * Improve small zger, zgemv, ztrmv using stack alloction. * Let openblas_get_num_threads return the number of active threads. * Fix LAPACK Dormbr, Dormlq bug. * Avoid potential getenv segfault. * Import LAPACK svn bugfix #142-#147,#150-#155 x86/x86_64: * Optimize trsm kernels for AMD Bulldozer, Piledriver, Steamroller. * Detect Intel Avoton. * Detect AMD Trinity, Richland, E2-3200. * Optimize c/zgemv for AMD Bulldozer, Piledriver, Steamroller * Fix bug with scipy linalg test. ARM: * Support and optimize Cortex-A57 AArch64. * Update ARMV6 kernels. * Improve DGEMM for ARM Cortex-A57. POWER: * Fix detection of POWER architecture. * Optimize D and Z BLAS3 functions for Power8.- Remove openblas-libs.patch, not needed. * Tue Oct 27 2015 dmitry_rAATTopensuse.org- Update to version 0.2.15 * Enable MAX_STACK_ALLOC flags by default. * Improve ger and gemv for small matrices. * Improve gemv parallel with small m and large n case. * Improve ?imatcopy when lda==ldb * Add vecLib benchmarks * Fix LAPACK lantr for row major matrices * Fix LAPACKE lansy * Import bug fixes for LAPACKE s/dormlq, c/zunmlq * Raise the signal when pthread_create fails * Drop obsolete openblas-arm64-build.patch x86/x86-64: * Support pure C generic kernels for x86/x86-64. * Support Intel Boardwell and Skylake by Haswell kernels. * Support AMD Excavator by Steamroller kernels. * Optimize s/d/c/zdot for Intel SandyBridge and Haswell. * Optimize s/d/c/zdot for AMD Piledriver and Steamroller. * Optimize s/d/c/zapxy for Intel SandyBridge and Haswell. * Optimize s/d/c/zapxy for AMD Piledriver and Steamroller. * Optimize d/c/zscal for Intel Haswell, dscal for Intel SandyBridge. * Optimize d/c/zscal for AMD Bulldozer, Piledriver and Steamroller. * Optimize s/dger for Intel SandyBridge. * Optimize s/dsymv for Intel SandyBridge. * Optimize ssymv for Intel Haswell. * Optimize dgemv for Intel Nehalem and Haswell. * Optimize dtrmm for Intel Haswell. ARM: * Support Android NDK armeabi-v7a-hard ABI (-mfloat-abi=hard) * Fix lock, rpcc bugs POWER: * Support ppc64le platform (ELF ABI v2) * Support POWER7/8 by POWER6 kernels. * Wed Jul 29 2015 dmitry_rAATTopensuse.org- Change library name suffix * drop openblas-soname.patch- Add RPM %post script for manual BLAS/LAPACK update-alternatives configuration update- Use update-alternatives mechanism for OpenBLAS variants (serial, openmp, pthreads). pthreads variant is default for x86 and x86_64, OpenMP for other architectures.- Fix build on ARM64 * openblas-arm64-build.patch- Add update-alternatives mechanism for CBLAS- Provide cmake module- Delete info about host cpu from openblas_config.h for dynamic arch- Add update-alternatives to \'preup\' and \'post\' requires list for libraries- Add README.SUSE * Wed Mar 25 2015 dmitry_rAATTopensuse.org- Update to version 0.2.14 * Improve ger and gemv for small matrices by stack allocation. e.g. make -DMAX_STACK_ALLOC=2048 * Introduce openblas_get_num_threads and openblas_get_num_procs. * Add ATLAS-style ?geadd function. * Fix c/zsyr bug with negative incx. * Fix race condition during shutdown causing a crash in gotoblas_set_affinity(). x86/x86-64: * Support AMD Streamroller. ARM: * Add Cortex-A9 and Cortex-A15 targets. * Wed Dec 03 2014 dmitry_rAATTopensuse.org- Update to version 0.2.13 * Add SYMBOLPREFIX and SYMBOLSUFFIX makefile options for adding a prefix or suffix to all exported symbol names in the shared library. * Remove openblas-0.1.0-soname.patch * Add openblas-soname.patch * Rebase openblas-noexecstack.patch x86/x86-64: * Add generic kernel files for x86-64. make TARGET=GENERIC * Fix a bug of sgemm kernel on Intel Sandy Bridge. * Fix c_check bug on some amd64 systems. ARM: * Support APM\'s X-Gene 1 AArch64 processors. * Optimize trmm and sgemm. * Fri Oct 17 2014 dmitry_rAATTopensuse.org- Update to version 0.2.12 * Added CBLAS interface for ?omatcopy and ?imatcopy. * Enable ?gemm3m functions. * Added benchmark for ?gemm3m. * Optimized multithreading lower limits. * Disabled SYMM3M and HEMM3M functions because of segment violations. x86/x86-64: * Improved axpy and symv performance on AMD Bulldozer. * Improved gemv performance on modern Intel and AMD CPUs.
|
|
|