|
|
|
|
Changelog for python312-dask-complete-2024.8.2-169.3.noarch.rpm :
* Sun Sep 08 2024 Dirk Müller - update to 2024.8.2: * Avoid capturing code of xdist AATTfjetter * Reduce memory footprint of culling P2P rechunking * Add tests for choosing default rechunking method * Increase visibility of GPU CI updates AATTcharlesbluca * Bump test\\_pause\\_while\\_idle timeout AATTfjetter * Concatenate small input chunks before P2P rechunking * Remove dump cluster from gen\\_cluster AATTfjetter * Bump `numpy>=1.24` and `pyarrow>=14.0.1` minimum versions * Fix PipInstall plugin on Worker AATThendrikmakait * Remove more Python 3.10 compatibility code AATTjrbourbeau * Use task-based rechunking to prechunk along partial boundaries AATThendrikmakait * Ensure client\\_desires\\_keys does not corrupt Scheduler state AATTfjetter * Bump minimum ``cloudpickle`` to 3 AATTjrbourbeau * Thu Aug 29 2024 Ben Greiner - Update to 2024.8.1 * Improve output chunksizes for reshaping Dask Arrays * Improve scheduling efficiency for Xarray Rechunk-GroupBy-Reduce patterns * Drop support for Python 3.9- Release 2025.8.0 * Improve efficiency and performance of slicing with positional indexers * Improve scheduling efficiency for Xarray GroupBy-Reduce patterns- Release 2025.7.1 * More resilient distributed lock- Release 2025.7.0 * Drop support for pandas 1.x * Publish-subscribe APIs deprecated- Overhaul multibuild setup: Prepare for python313 * Thu Jul 11 2024 Ben Greiner - Do not pin to numpy < 2. Dask supports it, downstream packages must take care of their own. * Mon Jul 08 2024 Steve Kowalik - Update to 2024.6.2: * profile._f_lineno: handle next_line being None in Python 3.13 * Cache global query-planning config * Python 3.13 fixes * Fix test_map_freq_to_period_start for pandas=3 * Tokenizing memmap arrays will now avoid materializing the array into memory. * Fix test_dt_accessor with query planning disabled * Remove deprecated dask.compatibility module * Ensure compatibility for xarray.NamedArray * Avoid rounding error in test_prometheus_collect_count_total_by_cost_multipliers * Log key collision count in update_graph log event * Rename safe to expected in Scheduler.remove_worker * Eagerly update aggregate statistics for TaskPrefix instead of calculating them on-demand * Improve graph submission time for P2P rechunking by avoiding unpack recursion into indices * Add safe keyword to remove-worker event * Improved errors and reduced logging for P2P RPC calls * Adjust P2P tests for dask-expr * Iterate over copy of Server.digests_total_since_heartbeat to avoid RuntimeError * Add Prometheus gauge for task groups * Fix too strict assertion in shuffle code for pandas subclasses * Reduce noise from erring tasks that are not supposed to be running * Fri Apr 26 2024 Ben Greiner - Update to 2024.4.2 * Trivial Merge Implementation * Auto-partitioning in read_parquet- Release 2024.4.1 * Fix an error when importing dask.dataframe with Python 3.11.9.- Release 2024.4.0 * Query planning fixes * GPU metric dashboard fixes- Release 2024.3.1 * Demote an exception to a warning if dask-expr is not installed when upgrading.- Release 2024.3.0 * Query planning * Sunset of Pandas 1.X support * Tue Mar 05 2024 Ben Greiner - Update to 2024.2.1 * Allow silencing dask.DataFrame deprecation warning * More robust distributed scheduler for rare key collisions * More robust adaptive scaling on large clusters- The test subpackage now directly depends on pandas-test which does not use pytest-asyncio anymore * Wed Feb 14 2024 Ben Greiner - Update to 2024.2.0 * Deprecate Dask DataFrame implementation * Improved tokenization * https://docs.dask.org/en/stable/changelog.html#v2024-2-0- Really drop python39 from testing instead of testing it with every other test flavor * Tue Feb 06 2024 Dirk Müller - drop python39 from testing * Sun Feb 04 2024 Ben Greiner - Add python312 test flavor * Tue Jan 30 2024 Dirk Müller - update to 2024.1.1: * This release contains compatibility updates for the latest pandas and scipy releases. See :pr:`10834`, :pr:`10849`, :pr:`10845`, and :pr-distributed:`8474` from `crusaderky`_ for details. * Sat Jan 20 2024 Dirk Müller - update to 2024.1.0: * Released on January 12, 2024 * P2P rechunking now utilizes the relationships between input and output chunks. For situations that do not require all-to- all data transfer, this may significantly reduce the runtime and memory/disk footprint. It also enables task culling. * The fastparquet Parquet engine has been deprecated. Users should migrate to the pyarrow engine by installing PyArrow and removing engine=\"fastparquet\" in read_parquet or to_parquet calls. * This release improves serialization robustness for arbitrary data. Previously there were some cases where serialization could fail for non-msgpack serializable data. In those cases we now fallback to using pickle. * Deprecate shuffle keyword in favour of shuffle_method for DataFrame methods (:pr:`10738`) `Hendrik Makait`_ * Deprecate automatic argument inference in repartition * Deprecate compute parameter in set_index * Deprecate inplace in eval * Deprecate Series.view * Deprecate npartitions=\"auto\" for set_index & sort_values * Mon Dec 18 2023 Dirk Müller - update to 2023.12.1: * Dask DataFrames are now much more performant by using a logical query planner. * ``read_parquet`` will now infer the Arrow types ``pa.date32()``, ``pa.date64()`` and ``pa.decimal()`` as a ``ArrowDtype`` in pandas. These dtypes are backed by the original Arrow array, and thus avoid the conversion to NumPy object. * This release contains several updates that fix a possible deadlock introduced in 2023.9.2 and improve the robustness of P2P-based merging when the cluster is dynamically scaling up. * The ``distributed.scheduler.pickle`` configuration option is no longer supported. As of the 2023.4.0 release, ``pickle`` is used to transmit task graphs, so can no longer be disabled. We now raise an informative error when ``distributed.scheduler.pickle`` is set to ``False``. * Update DataFrame page * Add changelog entry for ``dask-expr`` switch * [Dask.order] Remove non-runnable leaf nodes from ordering * Update installation docs * Fix software environment link in docs * Avoid converting non-strings to arrow strings for read_parquet * Dask.order rewrite using a critical path approach * Avoid substituting keys that occur multiple times * Add missing image to docs * Update landing page * Make meta check simpler in dispatch * Pin PR Labeler * Reorganize docs index a bit * Avoid ``RecursionError`` when failing to pickle key in ``SpillBuffer`` and using ``tblib=3`` * Allow tasks to override ``is_rootish`` heuristic * Remove GPU executor (:pr-distributed:`8399`) `Hendrik Makait`_ * Do not rely on logging for subprocess cluster * Update gpuCI ``RAPIDS_VER`` to ``24.02`` * Ensure output chunks in P2P rechunking are distributed homogeneously (:pr-distributed:`8207`) `Florian Jetter`_ * Trivial: fix typo (:pr-distributed:`8395`) * Sat Dec 02 2023 Dirk Müller - update to 2023.12.0: * Bokeh 3.3.0 compatibility * Add ``network`` marker to ``test_pyarrow_filesystem_option_real_data`` * Bump GPU CI to CUDA 11.8 (:pr:`10656`) * Tokenize ``pandas`` offsets deterministically * Add tokenize ``pd.NA`` functionality * Update gpuCI ``RAPIDS_VER`` to ``24.02`` (:pr:`10636`) * Fix precision handling in ``array.linalg.norm`` (:pr:`10556`) `joanrue`_ * Add ``axis`` argument to ``DataFrame.clip`` and ``Series.clip`` (:pr:`10616`) `Richard (Rick) Zamora`_ * Update changelog entry for in-memory rechunking (:pr:`10630`) `Florian Jetter`_ * Fix flaky ``test_resources_reset_after_cancelled_task`` * Bump GPU CI to CUDA 11.8 * Bump ``conda-incubator/setup-miniconda`` * Add debug logs to P2P scheduler plugin * ``O(1)`` access for ``/info/task/`` endpoint * Remove stringification from shuffle annotations * Don\'t cast ``int`` metrics to ``float`` * Drop asyncio TCP backend * Add offload support to ``context_meter.add_callback`` * Test that ``sync()`` propagates contextvars * Fix ``test_statistical_profiling_cycle`` * Replace ``Client.register_plugin`` s ``idempotent`` argument with ``.idempotent`` attribute on plugins * Fix test report generation * Install ``pyarrow-hotfix`` on ``mindeps-pandas`` CI * Reduce memory usage of scheduler process - optimize ``scheduler.py::TaskState`` class * Update cuDF test with explicit ``dtype=object`` * Fix ``Cluster`` / ``SpecCluster`` calls to async close methods * Thu Nov 16 2023 Ondřej Súkup - Update to 2023.11.0 * Zero-copy P2P Array Rechunking * Deprecating PyArrow <14.0.1 * Improved PyArrow filesystem for Parquet * Improve Type Reconciliation in P2P Shuffling * official support for Python 3.12 * Reduced memory pressure for multi array reductions * improved P2P shuffling robustness * Reduced scheduler CPU load for large graphs * Sun Sep 10 2023 Ben Greiner - Update to 2023.9.1 [#]# Enhancements * Stricter data type for dask keys (GH#10485) crusaderky * Special handling for None in DASK_ environment variables (GH#10487) crusaderky [#]# Bug Fixes- Release 2023.9.0 [#]# Bug Fixes * Remove support for np.int64 in keys (GH#10483) crusaderky * Fix _partitions dtype in meta for shuffling (GH#10462) Hendrik Makait * Don’t use exception hooks to shorten tracebacks (GH#10456) crusaderky- Release 2023.8.1 [#]# Enhancements * Adding support for cgroup v2 to cpu_count (GH#10419) Johan Olsson * Support multi-column groupby with sort=True and split_out>1 (GH#10425) Richard (Rick) Zamora * Add DataFrame.enforce_runtime_divisions method (GH#10404) Richard (Rick) Zamora * Enable file mode=\"x\" with a single_file=True for Dask DataFrame to_csv (GH#10443) Genevieve Buckley [#]# Bug Fixes * Fix ValueError when running to_csv in append mode with single_file as True (GH#10441)- Release 2023.8.0 [#]# Enhancements * Fix for make_timeseries performance regression (GH#10428) Irina Truong- Release 2023.7.1 * This release updates Dask DataFrame to automatically convert text data using object data types to string[pyarrow] if pandas>=2 and pyarrow>=12 are installed. This should result in significantly reduced memory consumption and increased computation performance in many workflows that deal with text data. You can disable this change by setting the dataframe.convert-string configuration value to False with dask.config.set({\"dataframe.convert-string\": False}) [#]# Enhancements * Convert to pyarrow strings if proper dependencies are installed (GH#10400) James Bourbeau * Avoid repartition before shuffle for p2p (GH#10421) Patrick Hoefler * API to generate random Dask DataFrames (GH#10392) Irina Truong * Speed up dask.bag.Bag.random_sample (GH#10356) crusaderky * Raise helpful ValueError for invalid time units (GH#10408) Nat Tabris * Make repartition a no-op when divisions match (divisions provided as a list) (GH#10395) Nicolas Grandemange [#]# Bug Fixes * Use dataframe.convert-string in read_parquet token (GH#10411) James Bourbeau * Category dtype is lost when concatenating MultiIndex (GH#10407) Irina Truong * Fix FutureWarning: The provided callable... (GH#10405) Irina Truong * Enable non-categorical hive-partition columns in read_parquet (GH#10353) Richard (Rick) Zamora * concat ignoring DataFrame withouth columns (GH#10359) Patrick Hoefler- Release 2023.7.0 [#]# Enhancements * Catch exceptions when attempting to load CLI entry points (GH#10380) Jacob Tomlinson [#]# Bug Fixes * Fix typo in _clean_ipython_traceback (GH#10385) Alexander Clausen * Ensure that df is immutable after from_pandas (GH#10383) Patrick Hoefler * Warn consistently for inplace in Series.rename (GH#10313) Patrick Hoefler- Release 2023.6.1 [#]# Enhancements * Remove no longer supported clip_lower and clip_upper (GH#10371) Patrick Hoefler * Support DataFrame.set_index(..., sort=False) (GH#10342) Miles * Cleanup remote tracebacks (GH#10354) Irina Truong * Add dispatching mechanisms for pyarrow.Table conversion (GH#10312) Richard (Rick) Zamora * Choose P2P even if fusion is enabled (GH#10344) Hendrik Makait * Validate that rechunking is possible earlier in graph generation (GH#10336) Hendrik Makait [#]# Bug Fixes * Fix issue with header passed to read_csv (GH#10355) GALI PREM SAGAR * Respect dropna and observed in GroupBy.var and GroupBy.std (GH#10350) Patrick Hoefler * Fix H5FD_lock error when writing to hdf with distributed client (GH#10309) Irina Truong * Fix for total_mem_usage of bag.map() (GH#10341) Irina Truong [#]# Deprecations * Deprecate DataFrame.fillna/Series.fillna with method (GH#10349) Irina Truong * Deprecate DataFrame.first and Series.first (GH#10352) Irina Truong- Release 2023.6.0 [#]# Enhancements * Add missing not in predicate support to read_parquet (GH#10320) Richard (Rick) Zamora [#]# Bug Fixes * Fix for incorrect value_counts (GH#10323) Irina Truong * Update empty describe top and freq values (GH#10319) James Bourbeau * Sat Jun 10 2023 ecsos - Add %{?sle15_python_module_pythons} * Mon Jun 05 2023 Steve Kowalik - Tighten bokeh requirement to match distributed. * Fri May 26 2023 Ben Greiner - Update to 2023.5.1 * This release drops support for Python 3.8. As of this release Dask supports Python 3.9, 3.10, and 3.11. [#]# Enhancements * Drop Python 3.8 support (GH#10295) Thomas Grainger * Change Dask Bag partitioning scheme to improve cluster saturation (GH#10294) Jacob Tomlinson * Generalize dd.to_datetime for GPU-backed collections, introduce get_meta_library utility (GH#9881) Charles Blackmon-Luca * Add na_action to DataFrame.map (GH#10305) Patrick Hoefler * Raise TypeError in DataFrame.nsmallest and DataFrame.nlargest when columns is not given (GH#10301) Patrick Hoefler * Improve sizeof for pd.MultiIndex (GH#10230) Patrick Hoefler * Support duplicated columns in a bunch of DataFrame methods (GH#10261) Patrick Hoefler * Add numeric_only support to DataFrame.idxmin and DataFrame.idxmax (GH#10253) Patrick Hoefler * Implement numeric_only support for DataFrame.quantile (GH#10259) Patrick Hoefler * Add support for numeric_only=False in DataFrame.std (GH#10251) Patrick Hoefler * Implement numeric_only=False for GroupBy.cumprod and GroupBy.cumsum (GH#10262) Patrick Hoefler * Implement numeric_only for skew and kurtosis (GH#10258) Patrick Hoefler * mask and where should accept a callable (GH#10289) Irina Truong * Fix conversion from Categorical to pa.dictionary in read_parquet (GH#10285) Patrick Hoefler [#]# Bug Fixes * Spurious config on nested annotations (GH#10318) crusaderky * Fix rechunking behavior for dimensions with known and unknown chunk sizes (GH#10157) Hendrik Makait * Enable drop to support mismatched partitions (GH#10300) James Bourbeau * Fix divisions construction for to_timestamp (GH#10304) Patrick Hoefler * pandas ExtensionDtype raising in Series reduction operations (GH#10149) Patrick Hoefler * Fix regression in da.random interface (GH#10247) Eray Aslan * da.coarsen doesn’t trim an empty chunk in meta (GH#10281) Irina Truong * Fix dtype inference for engine=\"pyarrow\" in read_csv (GH#10280) Patrick Hoefler- Release 2023.5.0 [#]# Enhancements * Implement numeric_only=False for GroupBy.corr and GroupBy.cov (GH#10264) Patrick Hoefler * Add support for numeric_only=False in DataFrame.var (GH#10250) Patrick Hoefler * Add numeric_only support to DataFrame.mode (GH#10257) Patrick Hoefler * Add DataFrame.map to dask.DataFrame API (GH#10246) Patrick Hoefler * Adjust for DataFrame.applymap deprecation and all NA concat behaviour change (GH#10245) Patrick Hoefler * Enable numeric_only=False for DataFrame.count (GH#10234) Patrick Hoefler * Disallow array input in mask/where (GH#10163) Irina Truong * Support numeric_only=True in GroupBy.corr and GroupBy.cov (GH#10227) Patrick Hoefler * Add numeric_only support to GroupBy.median (GH#10236) Patrick Hoefler * Support mimesis=9 in dask.datasets (GH#10241) James Bourbeau * Add numeric_only support to min, max and prod (GH#10219) Patrick Hoefler * Add numeric_only=True support for GroupBy.cumsum and GroupBy.cumprod (GH#10224) Patrick Hoefler * Add helper to unpack numeric_only keyword (GH#10228) Patrick Hoefler [#]# Bug Fixes * Fix clone + from_array failure (GH#10211) crusaderky * Fix dataframe reductions for ea dtypes (GH#10150) Patrick Hoefler * Avoid scalar conversion deprecation warning in numpy=1.25 (GH#10248) James Bourbeau * Make sure transform output has the same index as input (GH#10184) Irina Truong * Fix corr and cov on a single-row partition (GH#9756) Irina Truong * Fix test_groupby_numeric_only_supported and test_groupby_aggregate_categorical_observed upstream errors (GH#10243) Irina Truong- Release 2023.4.1 [#]# Enhancements * Implement numeric_only support for DataFrame.sum (GH#10194) Patrick Hoefler * Add support for numeric_only=True in GroupBy operations (GH#10222) Patrick Hoefler * Avoid deep copy in DataFrame.__setitem__ for pandas 1.4 and up (GH#10221) Patrick Hoefler * Avoid calling Series.apply with _meta_nonempty (GH#10212) Patrick Hoefler * Unpin sqlalchemy and fix compatibility issues (GH#10140) Patrick Hoefler [#]# Bug Fixes * Partially revert default client discovery (GH#10225) Florian Jetter * Support arrow dtypes in Index meta creation (GH#10170) Patrick Hoefler * Repartitioning raises with extension dtype when truncating floats (GH#10169) Patrick Hoefler * Adjust empty Index from fastparquet to object dtype (GH#10179) Patrick Hoefler- Release 2023.4.0 [#]# Enhancements * Override old default values in update_defaults (GH#10159) Gabe Joseph * Add a CLI command to list and get a value from dask config (GH#9936) Irina Truong * Handle string-based engine argument to read_json (GH#9947) Richard (Rick) Zamora * Avoid deprecated GroupBy.dtypes (GH#10111) Irina Truong [#]# Bug Fixes * Revert grouper-related changes (GH#10182) Irina Truong * GroupBy.cov raising for non-numeric grouping column (GH#10171) Patrick Hoefler * Updates for Index supporting numpy numeric dtypes (GH#10154) Irina Truong * Preserve dtype for partitioning columns when read with pyarrow (GH#10115) Patrick Hoefler * Fix annotations for to_hdf (GH#10123) Hendrik Makait * Handle None column name when checking if columns are all numeric (GH#10128) Lawrence Mitchell * Fix valid_divisions when passed a tuple (GH#10126) Brian Phillips * Maintain annotations in DataFrame.categorize (GH#10120) Hendrik Makait * Fix handling of missing min/max parquet statistics during filtering (GH#10042) Richard (Rick) Zamora [#]# Deprecations * Deprecate use_nullable_dtypes= and add dtype_backend= (GH#10076) Irina Truong * Deprecate convert_dtype in Series.apply (GH#10133) Irina Truong- Drop dask-pr10042-parquetstats.patch * Tue Apr 04 2023 Ben Greiner - Drop python38 test flavor * Thu Mar 30 2023 Ben Greiner - Enable pyarrow in the [complete] extra * Mon Mar 27 2023 Ben Greiner - Update to 2023.3.2 [#]# Enhancements * Deprecate observed=False for groupby with categoricals (GH#10095) Irina Truong * Deprecate axis= for some groupby operations (GH#10094) James Bourbeau * The axis keyword in DataFrame.rolling/Series.rolling is deprecated (GH#10110) Irina Truong * DataFrame._data deprecation in pandas (GH#10081) Irina Truong * Use importlib_metadata backport to avoid CLI UserWarning (GH#10070) Thomas Grainger * Port option parsing logic from dask.dataframe.read_parquet to to_parquet (GH#9981) Anton Loukianov [#]# Bug Fixes * Avoid using dd.shuffle in groupby-apply (GH#10043) Richard (Rick) Zamora * Enable null hive partitions with pyarrow parquet engine (GH#10007) Richard (Rick) Zamora * Support unknown shapes in *_like functions (GH#10064) Doug Davis [#]# Maintenance * Restore Entrypoints compatibility (GH#10113) Jacob Tomlinson * Allow pyarrow build to continue on failures (GH#10097) James Bourbeau * Fix test_set_index_on_empty with pyarrow strings active (GH#10054) Irina Truong * Temporarily skip pyarrow_compat tests with pandas 2.0 (GH#10063) James Bourbeau * Sun Mar 26 2023 Ben Greiner - Add dask-pr10042-parquetstats.patch gh#dask/dask#10042- Enable python311 build: numba is not a strict requirement * Sat Mar 11 2023 Ben Greiner - Update to v2023.3.1 [#]# Enhancements * Support pyarrow strings in MultiIndex (GH#10040) Irina Truong * Improved support for pyarrow strings (GH#10000) Irina Truong * Fix flaky RuntimeWarning during array reductions (GH#10030) James Bourbeau * Extend complete extras (GH#10023) James Bourbeau * Raise an error with dataframe.convert_string=True and pandas<2.0 (GH#10033) Irina Truong * Rename shuffle/rechunk config option/kwarg to method (GH#10013) James Bourbeau * Add initial support for converting pandas extension dtypes to arrays (GH#10018) James Bourbeau * Remove randomgen support (GH#9987) Eray Aslan [#]# Bug Fixes * Skip rechunk when rechunking to the same chunks with unknown sizes (GH#10027) Hendrik Makait * Custom utility to convert parquet filters to pyarrow expression (GH#9885) Richard (Rick) Zamora * Consider numpy scalars and 0d arrays as scalars when padding (GH#9653) Justus Magin * Fix parquet overwrite behavior after an adaptive read_parquet operation (GH#10002) Richard (Rick) Zamora [#]# Maintenance * Remove stale hive-partitioning code from pyarrow parquet engine (GH#10039) Richard (Rick) Zamora * Increase minimum supported pyarrow to 7.0 (GH#10024) James Bourbeau * Revert “Prepare drop packunpack (GH#9994) (GH#10037) Florian Jetter * Have codecov wait for more builds before reporting (GH#10031) James Bourbeau * Prepare drop packunpack (GH#9994) Florian Jetter * Add CI job with pyarrow strings turned on (GH#10017) James Bourbeau * Fix test_groupby_dropna_with_agg for pandas 2.0 (GH#10001) Irina Truong * Fix test_pickle_roundtrip for pandas 2.0 (GH#10011) James Bourbeau * Wed Mar 08 2023 Benjamin Greiner - Update dependencies- Skip one more test failing because of missing pyarrow * Wed Mar 08 2023 Dirk Müller - update to 2023.3.0: * Bag must not pick p2p as shuffle default (:pr:`10005`) * Minor follow-up to P2P by default (:pr:`10008`) `James Bourbeau`_ * Add minimum version to optional ``jinja2`` dependency (:pr:`9999`) `Charles Blackmon-Luca`_ * Enable P2P shuffling by default * P2P rechunking * Efficient `dataframe.convert_string` support for `read_parquet` * Allow p2p shuffle kwarg for DataFrame merges * Change ``split_row_groups`` default to \"infer\" * Add option for converting string data to use ``pyarrow`` strings * Add support for multi-column ``sort_values`` * ``Generator`` based random-number generation in``dask.array`` * Support ``numeric_only`` for simple groupby aggregations for ``pandas`` 2.0 compatibility * Fix profilers plot not being aligned to context manager enter time * Relax dask.dataframe assert_eq type checks * Restore ``describe`` compatibility for ``pandas`` 2.0 * Improving deploying Dask docs * More docs for ``DataFrame.partitions`` * Update docs with more information on default Delayed scheduler * Deployment Considerations documentation * Temporarily rerun flaky tests * Update parsing of FULL_RAPIDS_VER/FULL_UCX_PY_VER * Increase minimum supported versions to ``pandas=1.3`` and ``numpy=1.21`` * Fix ``std`` to work with ``numeric_only`` for ``pandas`` 2.0 * Temporarily ``xfail`` ``test_roundtrip_partitioned_pyarrow_dataset`` (:pr:`9977`) * Fix copy on write failure in `test_idxmaxmin` (:pr:`9944`) * Bump ``pre-commit`` versions (:pr:`9955`) `crusaderky`_ * Fix ``test_groupby_unaligned_index`` for ``pandas`` 2.0 * Un-``xfail`` ``test_set_index_overlap_2`` for ``pandas`` 2.0 * Fix ``test_merge_by_index_patterns`` for ``pandas`` 2.0 * Bump jacobtomlinson/gha-find-replace from 2 to 3 (:pr:`9953`) * Fix ``test_rolling_agg_aggregate`` for ``pandas`` 2.0 compatibility * Bump ``black`` to ``23.1.0`` * Run GPU tests on python 3.8 & 3.10 (:pr:`9940`) * Fix ``test_to_timestamp`` for ``pandas`` 2.0 (:pr:`9932`) * Fix an error with ``groupby`` ``value_counts`` for ``pandas`` 2.0 compatibility * Config converter: replace all dashes with underscores * Sun Feb 26 2023 Ben Greiner - Prepare test multiflavors for python311, but skip python311 * Numba is not ready for python 3.11 yet gh#numba/numba#8304 * Fri Feb 17 2023 Ben Greiner - Update to 2023.2.0 [#]# Enhancements * Update numeric_only default in quantile for pandas 2.0 (GH#9854) Irina Truong * Make repartition a no-op when divisions match (GH#9924) James Bourbeau * Update datetime_is_numeric behavior in describe for pandas 2.0 (GH#9868) Irina Truong * Update value_counts to return correct name in pandas 2.0 (GH#9919) Irina Truong * Support new axis=None behavior in pandas 2.0 for certain reductions (GH#9867) James Bourbeau * Filter out all-nan RuntimeWarning at the chunk level for nanmin and nanmax (GH#9916) Julia Signell * Fix numeric meta_nonempty index creation for pandas 2.0 (GH#9908) James Bourbeau * Fix DataFrame.info() tests for pandas 2.0 (GH#9909) James Bourbeau [#]# Bug Fixes * Fix GroupBy.value_counts handling for multiple groupby columns (GH#9905) Charles Blackmon-Luca * Sun Feb 05 2023 Ben Greiner - Update to 2023.1.1 [#]# Enhancements * Add to_backend method to Array and _Frame (GH#9758) Richard (Rick) Zamora * Small fix for timestamp index divisions in pandas 2.0 (GH#9872) Irina Truong * Add numeric_only to DataFrame.cov and DataFrame.corr (GH#9787) James Bourbeau * Fixes related to group_keys default change in pandas 2.0 (GH#9855) Irina Truong * infer_datetime_format compatibility for pandas 2.0 (GH#9783) James Bourbeau [#]# Bug Fixes * Fix serialization bug in BroadcastJoinLayer (GH#9871) Richard (Rick) Zamora * Satisfy broadcast argument in DataFrame.merge (GH#9852) Richard (Rick) Zamora * Fix pyarrow parquet columns statistics computation (GH#9772) aywandji [#]# Documentation * Fix “duplicate explicit target name” docs warning (GH#9863) Chiara Marmo * Fix code formatting issue in “Defining a new collection backend” docs (GH#9864) Chiara Marmo * Update dashboard documentation for memory plot (GH#9768) Jayesh Manani * Add docs section about no-worker tasks (GH#9839) Florian Jetter [#]# Maintenance * Additional updates for detecting a distributed scheduler (GH#9890) James Bourbeau * Update gpuCI RAPIDS_VER to 23.04 (GH#9876) * Reverse precedence between collection and distributed default (GH#9869) Florian Jetter * Update xarray-contrib/issue-from-pytest-log to version 1.2.6 (GH#9865) James Bourbeau * Dont require dask config shuffle default (GH#9826) Florian Jetter * Un-xfail datetime64 Parquet roundtripping tests for new fastparquet (GH#9811) James Bourbeau * Add option to manually run upstream CI build (GH#9853) James Bourbeau * Use custom timeout in CI builds (GH#9844) James Bourbeau * Remove kwargs from make_blockwise_graph (GH#9838) Florian Jetter * Ignore warnings on persist call in test_setitem_extended_API_2d_mask (GH#9843) Charles Blackmon-Luca * Fix running S3 tests locally (GH#9833) James Bourbeau- Release 2023.1.0 [#]# Enhancements * Use distributed default clients even if no config is set (GH#9808) Florian Jetter * Implement ma.where and ma.nonzero (GH#9760) Erik Holmgren * Update zarr store creation functions (GH#9790) Ryan Abernathey * iteritems compatibility for pandas 2.0 (GH#9785) James Bourbeau * Accurate sizeof for pandas string[python] dtype (GH#9781) crusaderky * Deflate sizeof() of duplicate references to pandas object types (GH#9776) crusaderky * GroupBy.__getitem__ compatibility for pandas 2.0 (GH#9779) James Bourbeau * append compatibility for pandas 2.0 (GH#9750) James Bourbeau * get_dummies compatibility for pandas 2.0 (GH#9752) James Bourbeau * is_monotonic compatibility for pandas 2.0 (GH#9751) James Bourbeau * numpy=1.24 compatability (GH#9777) James Bourbeau [#]# Documentation * Remove duplicated encoding kwarg in docstring for to_json (GH#9796) Sultan Orazbayev * Mention SubprocessCluster in LocalCluster documentation (GH#9784) Hendrik Makait * Move Prometheus docs to dask/distributed (GH#9761) crusaderky [#]# Maintenance * Temporarily ignore RuntimeWarning in test_setitem_extended_API_2d_mask (GH#9828) James Bourbeau * Fix flaky test_threaded.py::test_interrupt (GH#9827) Hendrik Makait * Update xarray-contrib/issue-from-pytest-log in upstream report (GH#9822) James Bourbeau * pip install dask on gpuCI builds (GH#9816) Charles Blackmon-Luca * Bump actions/checkout from 3.2.0 to 3.3.0 (GH#9815) * Resolve sqlalchemy import failures in mindeps testing (GH#9809) Charles Blackmon-Luca * Ignore sqlalchemy.exc.RemovedIn20Warning (GH#9801) Thomas Grainger * xfail datetime64 Parquet roundtripping tests for pandas 2.0 (GH#9786) James Bourbeau * Remove sqlachemy 1.3 compatibility (GH#9695) McToel * Reduce size of expected DoK sparse matrix (GH#9775) Elliott Sales de Andrade * Remove executable flag from dask/dataframe/io/orc/utils.py (GH#9774) Elliott Sales de Andrade- Drop dask-pr9777-np1.24.patch * Mon Jan 02 2023 Ben Greiner - Update to 2022.12.1 [#]# Enhancements * Support dtype_backend=\"pandas|pyarrow\" configuration (GH#9719) James Bourbeau * Support cupy.ndarray to cudf.DataFrame dispatching in dask.dataframe (GH#9579) Richard (Rick) Zamora * Make filesystem-backend configurable in read_parquet (GH#9699) Richard (Rick) Zamora * Serialize all pyarrow extension arrays efficiently (GH#9740) James Bourbeau [#]# Bug Fixes * Fix bug when repartitioning with tz-aware datetime index (GH#9741) James Bourbeau * Partial functions in aggs may have arguments (GH#9724) Irina Truong * Add support for simple operation with pyarrow-backed extension dtypes (GH#9717) James Bourbeau * Rename columns correctly in case of SeriesGroupby (GH#9716) Lawrence Mitchell [#]# Maintenance * Add zarr to Python 3.11 CI environment (GH#9771) James Bourbeau * Add support for Python 3.11 (GH#9708) Thomas Grainger * Bump actions/checkout from 3.1.0 to 3.2.0 (GH#9753) * Avoid np.bool8 deprecation warning (GH#9737) James Bourbeau * Make sure dev packages aren’t overwritten in upstream CI build (GH#9731) James Bourbeau * Avoid adding data.h5 and mydask.html files during tests (GH#9726) Thomas Grainger- Release 2022.12.0 [#]# Enhancements * Remove statistics-based set_index logic from read_parquet (GH#9661) Richard (Rick) Zamora * Add support for use_nullable_dtypes to dd.read_parquet (GH#9617) Ian Rose * Fix map_overlap in order to accept pandas arguments (GH#9571) Fabien Aulaire * Fix pandas 1.5+ FutureWarning in .str.split(..., expand=True) (GH#9704) Jacob Hayes * Enable column projection for groupby slicing (GH#9667) Richard (Rick) Zamora * Support duplicate column cum-functions (GH#9685) Ben * Improve error message for failed backend dispatch call (GH#9677) Richard (Rick) Zamora [#]# Bug Fixes * Revise meta creation in arrow parquet engine (GH#9672) Richard (Rick) Zamora * Fix da.fft.fft for array-like inputs (GH#9688) James Bourbeau * Fix groupby -aggregation when grouping on an index by name (GH#9646) Richard (Rick) Zamora [#]# Maintenance * Avoid PytestReturnNotNoneWarning in test_inheriting_class (GH#9707) Thomas Grainger * Fix flaky test_dataframe_aggregations_multilevel (GH#9701) Richard (Rick) Zamora * Bump mypy version (GH#9697) crusaderky * Disable dashboard in test_map_partitions_df_input (GH#9687) James Bourbeau * Use latest xarray-contrib/issue-from-pytest-log in upstream build (GH#9682) James Bourbeau * xfail ttest_1samp for upstream scipy (GH#9670) James Bourbeau * Update gpuCI RAPIDS_VER to 23.02 (GH#9678)- Add dask-pr9777-np1.24.patch gh#dask/dask#9777- Move to PEP517 build * Mon Nov 21 2022 Ben Greiner - Go back to bokeh 2.4 * gh#dask/dask#9659 * we provide a legacy bokeh2 instead * Sun Nov 20 2022 Ben Greiner - Update to version 2022.11.1 [#]# Enhancements * Restrict bokeh=3 support (GH#9673) Gabe Joseph (ignored in rpm, fixed by bokek 3.0.2, see gh#dask/dask#9659) * Updates for fastparquet evolution (GH#9650) Martin Durant [#]# Maintenance * Revert importlib.metadata workaround (GH#9658) James Bourbeau- Release 2022.11.0 [#]# Enhancements * Generalize from_dict implementation to allow usage from other backends (GH#9628) GALI PREM SAGAR [#]# Bug Fixes * Avoid pandas constructors in dask.dataframe.core (GH#9570) Richard (Rick) Zamora * Fix sort_values with Timestamp data (GH#9642) James Bourbeau * Generalize array checking and remove pd.Index call in _get_partitions (GH#9634) Benjamin Zaitlen * Fix read_csv behavior for header=0 and names (GH#9614) Richard (Rick) Zamora [#]# Maintenance * Allow bokeh=3 (GH#9659) James Bourbeau * Add pre-commit to catch breakpoint() (GH#9638) James Bourbeau * Bump xarray-contrib/issue-from-pytest-log from 1.1 to 1.2 (GH#9635) * Remove blosc references (GH#9625) Naty Clementi * Harden test_repartition_npartitions (GH#9585) Richard (Rick) Zamora- Release 2022.10.2 * This was a hotfix and has no changes in this repository. The necessary fix was in dask/distributed, but we decided to bump this version number for consistency.- Release 2022.10.1 [#]# Enhancements * Enable named aggregation syntax (GH#9563) ChrisJar * Add extension dtype support to set_index (GH#9566) James Bourbeau * Redesigning the array HTML repr for clarity (GH#9519) Shingo OKAWA [#]# Bug Fixes * Fix merge with emtpy left DataFrame (GH#9578) Ian Rose [#]# Maintenance * Require Click 7.0+ in Dask (GH#9595) John A Kirkham * Temporarily restrict bokeh<3 (GH#9607) James Bourbeau * Resolve importlib-related failures in upstream CI (GH#9604) Charles Blackmon-Luca * Remove setuptools host dep, add CLI entrypoint (GH#9600) Charles Blackmon-Luca * More Backend dispatch class type annotations (GH#9573) Ian Rose- Create a -test subpackage in order to avoid rpmlint errors- Drop extra conftest: included in sdist. * Fri Oct 21 2022 Ben Greiner - Update to version 2022.10.0 * Backend library dispatching for IO in Dask-Array and Dask-DataFrame (GH#9475) Richard (Rick) Zamora * Add new CLI that is extensible (GH#9283) Doug Davis * Groupby median (GH#9516) Ian Rose * Fix array copy not being a no-op (GH#9555) David Hoese * Add support for string timedelta in map_overlap (GH#9559) Nicolas Grandemange * Shuffle-based groupby for single functions (GH#9504) Ian Rose * Make datetime.datetime tokenize idempotantly (GH#9532) Martin Durant * Support tokenizing datetime.time (GH#9528) Tim Paine * Avoid race condition in lazy dispatch registration (GH#9545) James Bourbeau * Do not allow setitem to np.nan for int dtype (GH#9531) Doug Davis * Stable demo column projection (GH#9538) Ian Rose * Ensure pickle-able binops in delayed (GH#9540) Ian Rose * Fix project CSV columns when selecting (GH#9534) Martin Durant * Update Parquet best practice (GH#9537) Matthew Rocklin- move -all metapackage to -complete, mirroring upstream\'s [complete] extra. * Fri Sep 30 2022 Arun Persaud - update to version 2022.9.2: * Enhancements + Remove factorization logic from array auto chunking (:pr:`9507`) `James Bourbeau`_ * Documentation + Add docs on running Dask in a standalone Python script (:pr:`9513`) `James Bourbeau`_ + Clarify custom-graph multiprocessing example (:pr:`9511`) `nouman`_ * Maintenance + Groupby sort upstream compatibility (:pr:`9486`) `Ian Rose`_ * Fri Sep 16 2022 Arun Persaud - update to version 2022.9.1: * New Features + Add \"DataFrame\" and \"Series\" \"median\" methods (:pr:`9483`) `James Bourbeau`_ * Enhancements + Shuffle \"groupby\" default (:pr:`9453`) `Ian Rose`_ + Filter by list (:pr:`9419`) `Greg Hayes`_ + Added \"distributed.utils.key_split\" functionality to \"dask.utils.key_split\" (:pr:`9464`) `Luke Conibear`_ * Bug Fixes + Fix overlap so that \"set_index\" doesn\'t drop rows (:pr:`9423`) `Julia Signell`_ + Fix assigning pandas \"Series\" to column when \"ddf.columns.min()\" raises (:pr:`9485`) `Erik Welch`_ + Fix metadata comparison \"stack_partitions\" (:pr:`9481`) `James Bourbeau`_ + Provide default for \"split_out\" (:pr:`9493`) `Lawrence Mitchell`_ * Deprecations + Allow \"split_out\" to be \"None\", which then defaults to \"1\" in \"groupby().aggregate()\" (:pr:`9491`) `Ian Rose`_ * Documentation + Fixing \"enforce_metadata\" documentation, not checking for dtypes (:pr:`9474`) `Nicolas Grandemange`_ + Fix \"it\'s\" --> \"its\" typo (:pr:`9484`) `Nat Tabris`_ * Maintenance + Workaround for parquet writing failure using some datetime series but not others (:pr:`9500`) `Ian Rose`_ + Filter out \"numeric_only\" warnings from \"pandas\" (:pr:`9496`) `James Bourbeau`_ + Avoid \"set_index(..., inplace=True)\" where not necessary (:pr:`9472`) `James Bourbeau`_ + Avoid passing groupby key list of length one (:pr:`9495`) `James Bourbeau`_ + Update \"test_groupby_dropna_cudf\" based on \"cudf\" support for \"group_keys\" (:pr:`9482`) `James Bourbeau`_ + Remove \"dd.from_bcolz\" (:pr:`9479`) `James Bourbeau`_ + Added \"flake8-bugbear\" to \"pre-commit\" hooks (:pr:`9457`) `Luke Conibear`_ + Bind loop variables in function definitions (\"B023\") (:pr:`9461`) `Luke Conibear`_ + Added assert for comparisons (\"B015\") (:pr:`9459`) `Luke Conibear`_ + Set top-level default shell in CI workflows (:pr:`9469`) `James Bourbeau`_ + Removed unused loop control variables (\"B007\") (:pr:`9458`) `Luke Conibear`_ + Replaced \"getattr\" calls for constant attributes (\"B009\") (:pr:`9460`) `Luke Conibear`_ + Pin \"libprotobuf\" to allow nightly \"pyarrow\" in the upstream CI build (:pr:`9465`) `Joris Van den Bossche`_ + Replaced mutable data structures for default arguments (\"B006\") (:pr:`9462`) `Luke Conibear`_ + Changed \"flake8\" mirror and updated version (:pr:`9456`) `Luke Conibear`_ * Sat Sep 10 2022 Arun Persaud - update to version 2022.9.0: * Enhancements + Enable automatic column projection for \"groupby\" aggregations (:pr:`9442`) `Richard (Rick) Zamora`_ + Accept superclasses in NEP-13/17 dispatching (:pr:`6710`) `Gabe Joseph`_ * Bug Fixes + Rename \"by\" columns internally for cumulative operations on the same \"by\" columns (:pr:`9430`) `Pavithra Eswaramoorthy`_ + Fix \"get_group\" with categoricals (:pr:`9436`) `Pavithra Eswaramoorthy`_ + Fix caching-related \"MaterializedLayer.cull\" performance regression (:pr:`9413`) `Richard (Rick) Zamora`_ * Documentation + Add maintainer documentation page (:pr:`9309`) `James Bourbeau`_ * Maintenance + Revert skipped fastparquet test (:pr:`9439`) `Pavithra Eswaramoorthy`_ + \"tmpfile\" does not end files with period on empty extension (:pr:`9429`) `Hendrik Makait`_ + Skip failing fastparquet test with latest release (:pr:`9432`) `James Bourbeau`_ * Thu Sep 01 2022 Steve Kowalik - Update to 2022.8.1: * Implement ma. *_like functions (:pr:`9378`) `Ruth Comer`_ * Fuse compatible annotations (:pr:`9402`) `Ian Rose`_ * Shuffle-based groupby aggregation for high-cardinality groups (:pr:`9302`) `Richard (Rick) Zamora`_ * Unpack namedtuple (:pr:`9361`) `Hendrik Makait`_ * Fix SeriesGroupBy cumulative functions with axis=1 (:pr:`9377`) `Pavithra Eswaramoorthy`_ * Sparse array reductions (:pr:`9342`) `Ian Rose`_ * Fix make_meta while using categorical column with index (:pr:`9348`) `Pavithra Eswaramoorthy`_ * Don\'t allow incompatible keywords in DataFrame.dropna (:pr:`9366`) `Naty Clementi`_ * Make set_index handle entirely empty dataframes (:pr:`8896`) `Julia Signell`_ * Improve dataclass handling in unpack_collections (:pr:`9345`) `Hendrik Makait`_ * Fix bag sampling when there are some smaller partitions (:pr:`9349`) `Ian Rose`_ * Add support for empty partitions to da.min/da.max functions (:pr:`9268`) `geraninam`_ * Use entry_points utility in sizeof (:pr:`9390`) `James Bourbeau`_ * Add entry_points compatibility utility (:pr:`9388`) `Jacob Tomlinson`_ * Upload environment file artifact for each CI build (:pr:`9372`) `James Bourbeau`_ * Remove werkzeug pin in CI (:pr:`9371`) `James Bourbeau`_ * Fix type annotations for dd.from_pandas and dd.from_delayed (:pr:`9362`) `Jordan Yap`_ * Ensure make_meta doesn\'t hold ref to data (:pr:`9354`) `Jim Crist-Harif`_ * Revise divisions logic in from_pandas (:pr:`9221`) `Richard (Rick) Zamora`_ * Warn if user sets index with existing index (:pr:`9341`) `Julia Signell`_ * Add keepdims keyword for da.average (:pr:`9332`) `Ruth Comer`_ * Change repr methods to avoid Layer materialization (:pr:`9289`) `Richard (Rick) Zamora`_ * Make sure order kwarg will not crash the astype method (:pr:`9317`) `Genevieve Buckley`_ * Fix bug for cumsum on cupy chunked dask arrays (:pr:`9320`) `Genevieve Buckley`_ * Match input and output structure in _sample_reduce (:pr:`9272`) `Pavithra Eswaramoorthy`_ * Include meta in array serialization (:pr:`9240`) `Frédéric BRIOL`_ * Fix Index.memory_usage (:pr:`9290`) `James Bourbeau`_ * Fix division calculation in dask.dataframe.io.from_dask_array (:pr:`9282`) `Jordan Yap`_ * Switch js-yaml for yaml.js in config converter (:pr:`9306`) `Jacob Tomlinson`_ * Update da.linalg.solve for SciPy 1.9.0 compatibility (:pr:`9350`) `Pavithra Eswaramoorthy`_ * Update test_getitem_avoids_large_chunks_missing (:pr:`9347`) `Pavithra Eswaramoorthy`_ * Import loop_in_thread fixture in tests (:pr:`9337`) `James Bourbeau`_ * Temporarily xfail test_solve_sym_pos (:pr:`9336`) `Pavithra Eswaramoorthy`_ * Update gpuCI RAPIDS_VER to 22.10 (:pr:`9314`) * Return Dask array if all axes are squeezed (:pr:`9250`) `Pavithra Eswaramoorthy`_ * Make cycle reported by toposort shorter (:pr:`9068`) `Erik Welch`_ * Unknown chunk slicing - raise informative error (:pr:`9285`) `Naty Clementi`_ * Fix bug in HighLevelGraph.cull (:pr:`9267`) `Richard (Rick) Zamora`_ * Sort categories (:pr:`9264`) `Pavithra Eswaramoorthy`_ * Use max (instead of sum) for calculating warnsize (:pr:`9235`) `Pavithra Eswaramoorthy`_ * Fix bug when filtering on partitioned column with pyarrow (:pr:`9252`) `Richard (Rick) Zamora`_ * Add type annotations to dd.from_pandas and dd.from_delayed (:pr:`9237`) `Michael Milton`_ * Update test_plot_multiple for upcoming bokeh release (:pr:`9261`) `James Bourbeau`_ * Add typing to common array properties (:pr:`9255`) `Illviljan`_ * Mon Jul 11 2022 Arun Persaud - update to version 2022.7.0: * Enhancements + Support \"pathlib.PurePath\" in \"normalize_token\" (:pr:`9229`) `Angus Hollands`_ + Add \"AttributeNotImplementedError\" for properties so IPython glob search works (:pr:`9231`) `Erik Welch`_ + \"map_overlap\": multiple dataframe handling (:pr:`9145`) `Fabien Aulaire`_ + Read entrypoints in \"dask.sizeof\" (:pr:`7688`) `Angus Hollands`_ * Bug Fixes + Fix \"TypeError: \'Serialize\' object is not subscriptable\" when writing parquet dataset with \"Client(processes=False)\" (:pr:`9015`) `Lucas Miguel Ponce`_ + Correct dtypes when \"concat\" with an empty dataframe (:pr:`9193`) `Pavithra Eswaramoorthy`_ * Documentation + Highlight note about persist (:pr:`9234`) `Pavithra Eswaramoorthy`_ + Update release-procedure to include more detail and helpful commands (:pr:`9215`) `Julia Signell`_ + Better SEO for Futures and Dask vs. Spark pages (:pr:`9217`) `Sarah Charlotte Johnson`_ * Maintenance + Use \"math.prod\" instead of \"np.prod\" on lists, tuples, and iters (:pr:`9232`) `crusaderky`_ + Only import IPython if type checking (:pr:`9230`) `Florian Jetter`_ + Tougher mypy checks (:pr:`9206`) `crusaderky`_ * Fri Jun 24 2022 Ben Greiner - Update to to 2022.6.1 * Enhancements - Dask in pyodide (GH#9053) Ian Rose - Create dask.utils.show_versions (GH#9144) Sultan Orazbayev - Better error message for unsupported numpy operations on dask.dataframe objects. (GH#9201) Julia Signell - Add allow_rechunk kwarg to dask.array.overlap function (GH#7776) Genevieve Buckley - Add minutes and hours to dask.utils.format_time (GH#9116) Matthew Rocklin - More retries when writing parquet to remote filesystem (GH#9175) Ian Rose * Bug Fixes - Timedelta deterministic hashing (GH#9213) Fabien Aulaire - Enum deterministic hashing (GH#9212) Fabien Aulaire - shuffle_group(): avoid converting to arrays (GH#9157) Mads R. B. Kristensen * Deprecations - Deprecate extra format_time utility (GH#9184) James Bourbeau- Release 2022.6.0 * Enhancements - Add feature to show names of layer dependencies in HLG JupyterLab repr (GH#9081) Angelos Omirolis - Add arrow schema extraction dispatch (GH#9169) GALI PREM SAGAR - Add sort_results argument to assert_eq (GH#9130) Pavithra Eswaramoorthy - Add weeks to parse_timedelta (GH#9168) Matthew Rocklin - Warn that cloudpickle is not always deterministic (GH#9148) Pavithra Eswaramoorthy - Switch parquet default engine (GH#9140) Jim Crist-Harif - Use deterministic hashing with _iLocIndexer / _LocIndexer (GH#9108) Fabien Aulaire - Enfore consistent schema in to_parquet pyarrow (GH#9131) Jim Crist-Harif * Bug Fixes - Fix pyarrow.StringArray pickle (GH#9170) Jim Crist-Harif - Fix parallel metadata collection in pyarrow engine (GH#9165) Richard (Rick) Zamora - Improve pyarrow partitioning logic (GH#9147) James Bourbeau - pyarrow 8.0 partitioning fix (GH#9143) James Bourbeau- Release 2022.05.2 * Enhancements - Add a dispatch for non-pandas Grouper objects and use it in GroupBy (GH#9074) brandon-b-miller - Error if read_parquet & to_parquet files intersect (GH#9124) Jim Crist-Harif - Visualize task graphs using ipycytoscape (GH#9091) Ian Rose- Release 2022.05.1 * New Features - Add DataFrame.from_dict classmethod (GH#9017) Matthew Powers - Add from_map function to Dask DataFrame (GH#8911) Richard (Rick) Zamora * Enhancements - Improve to_parquet error for appended divisions overlap (GH#9102) Jim Crist-Harif - Enabled user-defined process-initializer functions (GH#9087) ParticularMiner - Mention align_dataframes=False option in map_partitions error (GH#9075) Gabe Joseph - Add kwarg enforce_ndim to dask.array.map_blocks() (GH#8865) ParticularMiner - Implement Series.GroupBy.fillna / DataFrame.GroupBy.fillna methods (GH#8869) Pavithra Eswaramoorthy - Allow fillna with Dask DataFrame (GH#8950) Pavithra Eswaramoorthy - Update error message for assignment with 1-d dask array (GH#9036) Pavithra Eswaramoorthy - Collection Protocol (GH#8674) Doug Davis - Patch around pandas ArrowStringArray pickling (GH#9024) Jim Crist-Harif - Band-aid for compute_as_if_collection (GH#8998) Ian Rose - Add p2p shuffle option (GH#8836) Matthew Rocklin * Bug Fixes - Fixup column projection with no columns (GH#9106) Jim Crist-Harif - Blockwise cull NumPy dtype (GH#9100) Ian Rose - Fix column-projection bug in from_map (GH#9078) Richard (Rick) Zamora - Prevent nulls in index for non-numeric dtypes (GH#8963) Jorge López - Fix is_monotonic methods for more than 8 partitions (GH#9019) Julia Signell - Handle enumerate and generator inputs to from_map (GH#9066) Richard (Rick) Zamora - Revert is_dask_collection; back to previous implementation (GH#9062) Doug Davis - Fix Blockwise.clone does not handle iterable literal arguments correctly (GH#8979) JSKenyon - Array setitem hardmask (GH#9027) David Hassell - Fix overlapping divisions error on append (GH#8997) Ian Rose * Deprecations - Add pre-deprecation warnings for read_parquet kwargs chunksize and aggregate_files (GH#9052) Richard (Rick) Zamora- Release 2022.05.0 * This is a bugfix release with doc changes only- Release 2022.04.2 * This release includes several deprecations/breaking API changes to dask.dataframe.read_parquet and dask.dataframe.to_parquet: - to_parquet no longer writes _metadata files by default. If you want to write a _metadata file, you can pass in write_metadata_file=True. - read_parquet now defaults to split_row_groups=False, which results in one Dask dataframe partition per parquet file when reading in a parquet dataset. If you’re working with large parquet files you may need to set split_row_groups=True to reduce your partition size. - read_parquet no longer calculates divisions by default. If you require read_parquet to return dataframes with known divisions, please set calculate_divisions=True. - read_parquet has deprecated the gather_statistics keyword argument. Please use the calculate_divisions keyword argument instead. - read_parquet has deprecated the require_extensions keyword argument. Please use the parquet_file_extension keyword argument instead. * New Features - Add removeprefix and removesuffix as StringMethods (GH#8912) Jorge López * Enhancements - Call fs.invalidate_cache in to_parquet (GH#8994) Jim Crist-Harif - Change to_parquet default to write_metadata_file=None (GH#8988) Jim Crist-Harif - Let arg reductions pass keepdims (GH#8926) Julia Signell - Change split_row_groups default to False in read_parquet (GH#8981) Richard (Rick) Zamora - Improve NotImplementedError message for da.reshape (GH#8987) Jim Crist-Harif - Simplify to_parquet compute path (GH#8982) Jim Crist-Harif - Raise an error if you try to use vindex with a Dask object (GH#8945) Julia Signell - Avoid pre_buffer=True when a precache method is specified (GH#8957) Richard (Rick) Zamora - from_dask_array uses blockwise instead of merging graphs (GH#8889) Bryan Weber - Use pre_buffer=True for “pyarrow” Parquet engine (GH#8952) Richard (Rick) Zamora * Bug Fixes - Handle dtype=None correctly in da.full (GH#8954) Tom White - Fix dask-sql bug caused by blockwise fusion (GH#8989) Richard (Rick) Zamora - to_parquet errors for non-string column names (GH#8990) Jim Crist-Harif - Make sure da.roll works even if shape is 0 (GH#8925) Julia Signell - Fix recursion error issue with set_index (GH#8967) Paul Hobson - Stringify BlockwiseDepDict mapping values when produces_keys=True (GH#8972) Richard (Rick) Zamora - Use DataFram`eIOLayer in DataFrame.from_delayed (GH#8852) Richard (Rick) Zamora - Check that values for the in predicate in read_parquet are correct (GH#8846) Bryan Weber - Fix bug for reduction of zero dimensional arrays (GH#8930) Tom White - Specify dtype when deciding division using np.linspace in read_sql_query (GH#8940) Cheun Hong * Deprecations - Deprecate gather_statistics from read_parquet (GH#8992) Richard (Rick) Zamora - Change require_extension to top-level parquet_file_extension read_parquet kwarg (GH#8935) Richard (Rick) Zamora- Release 2022.04.1 * New Features - Add missing NumPy ufuncs: abs, left_shift, right_shift, positive. (GH#8920) Tom White * Enhancements - Avoid collecting parquet metadata in pyarrow when write_metadata_file=False (GH#8906) Richard (Rick) Zamora - Better error for failed wildcard path in dd.read_csv() (fixes [#8878]) (GH#8908) Roger Filmyer - Return da.Array rather than dd.Series for non-ufunc elementwise functions on dd.Series (GH#8558) Julia Signell - Let get_dummies use meta computation in map_partitions (GH#8898) Julia Signell - Masked scalars input to da.from_array (GH#8895) David Hassell - Raise ValueError in merge_asof for duplicate kwargs (GH#8861) Bryan Weber * Bug Fixes - Make is_monotonic work when some partitions are empty (GH#8897) Julia Signell - Fix custom getter in da.from_array when inline_array=False (GH#8903) Ian Rose - Correctly handle dict-specification for rechunk. (GH#8859) Richard - Fix merge_asof: drop index column if left_on == right_on (GH#8874) Gil Forsyth * Deprecations - Warn users that engine=\'auto\' will change in future (GH#8907) Jim Crist-Harif - Remove pyarrow-legacy engine from parquet API (GH#8835) Richard (Rick) Zamora- Release 2022.04.0 * This is the first release with support for Python 3.10 * New Features - Add Python 3.10 support (GH#8566) James Bourbeau * Enhancements - Add check on dtype.itemsize in order to produce a useful error (GH#8860) Davide Gavio - Add mild typing to common utils functions (GH#8848) Matthew Rocklin - Add sanity checks to divisions setter (GH#8806) Jim Crist-Harif - Use Blockwise and map_partitions for more tasks (GH#8831) Bryan Weber * Bug Fixes - Fix dataframe.merge_asof to preserve right_on column (GH#8857) Sarah Charlotte Johnson - Fix “Buffer dtype mismatch” for pandas >= 1.3 on 32bit (GH#8851) Ben Greiner - Fix slicing fusion by altering SubgraphCallable getter (GH#8827) Ian Rose * Deprecations - Remove support for PyPy (GH#8863) James Bourbeau - Drop setuptools at runtime (GH#8855) crusaderky - Remove dataframe.tseries.resample.getnanos (GH#8834) Sarah Charlotte Johnson- Drop dask-fix8169-pandas13.patch and dask-py310-test.patch * Sun Mar 27 2022 Ben Greiner - dask.dataframe requires dask.bag (revealed by swifter test suite) * Fri Mar 25 2022 Ben Greiner - Update to 2022.3.0 * Bag: add implementation for reservoir sampling * Add ma.count to Dask array * Change to_parquet default to compression=\"snappy\" * Add weights parameter to dask.array.reduction * Add ddf.compute_current_divisions to get divisions on a sorted index or column * Pass __name__ and __doc__ through on DelayedLeaf * Raise exception for not implemented merge how option * Move Bag.map_partitions to Blockwise * Improve error messages for malformed config files * Revise column-projection optimization to capture common dask-sql patterns * Useful error for empty divisions * Scipy 1.8.0 compat: copy private classes into dask/array/stats.py- Release 2022.2.1 * Add aggregate functions first and last to dask.dataframe.pivot_table * Add std() support for datetime64 dtype for pandas-like objects * Add materialized task counts to HighLevelGraph and Layer html reprs * Do not allow iterating a DataFrameGroupBy * Fix missing newline after info() call on empty DataFrame * Add groupby.compute as a not implemented method * Improve multi dataframe join performance * Include bool type for Index * Allow ArrowDatasetEngine subclass to override pandas->arrow conversion also for partitioned write * Increase performance of k-diagonal extraction in da.diag() and da.diagonal() * Change linspace creation to match numpy when num equal to 0 * Tokenize dataclasses * Update tokenize to treat dict and kwargs differently- Release 2022.2.0 * Add region to to_zarr when using existing array * Add engine_kwargs support to dask.dataframe.to_sql * Add include_path_column arg to read_json * Add expand_dims to Dask array * Add scheduler option to assert_eq utilities * Fix eye inconsistency with NumPy for dtype=None * Fix concatenate inconsistency with NumPy for axis=None * Type annotations, part 1 * Really allow any iterable to be passed as a meta * Use map_partitions (Blockwise) in to_parquet- Update dask-fix8169-pandas13.patch- Add dask-py310-test.patch -- gh#dask/dask#8566- Make the distributed/dask update sync requirement even more obvious. * Sat Jan 29 2022 Ben Greiner - Update to 2022.1.1 * Add dask.dataframe.series.view() * Update tz for fastparquet + pandas 1.4.0 * Cleaning up misc tests for pandas compat * Moving to SQLAlchemy >= 1.4 * Pandas compat: Filter sparse warnings * Fail if meta is not a pandas object * Use fsspec.parquet module for better remote-storage read_parquet performance * Move DataFrame ACA aggregations to HLG * Add optional information about originating function call in DataFrameIOLayer * Blockwise array creation redux * Refactor config default search path retrieval * Add optimize_graph flag to Bag.to_dataframe function * Make sure that delayed output operations still return lists of paths * Pandas compat: Fix to_frame name to not pass None * Pandas compat: Fix axis=None warning * Expand Dask YAML config search directories * Fix groupby.cumsum with series grouped by index * Fix derived_from for pandas methods * Enforce boolean ascending for sort_values * Fix parsing of __setitem__ indices * Avoid divide by zero in slicing * Downgrade meta error in * Pandas compat: Deprecate append when pandas >= 1.4.0 * Replace outdated columns argument with meta in DataFrame constructor * Refactor deploying docs * Pin coverage in CI * Move cached_cumsum imports to be from dask.utils * Update gpuCI RAPIDS_VER to 22.04 * Update cocstring for from_delayed function * Handle plot_width / plot_height deprecations * Remove unnecessary pyyaml importorskip * Specify scheduler in DataFrame assert_eq * Tue Jan 25 2022 Ben Greiner - Revert python310 enablement -- gh#dask/distributed#5460 * Tue Jan 25 2022 Dirk Müller - reenable python 3.10 build as distributed is also reenabled * Thu Jan 20 2022 Ben Greiner - Update to 2022.1.0 * Add groupby.shift method (GH#8522) kori73 * Add DataFrame.nunique (GH#8479) Sarah Charlotte Johnson * Add da.ndim to match np.ndim (GH#8502) Julia Signell * Replace interpolation with method and method with internal_method (GH#8525) Julia Signell * Remove daily stock demo utility (GH#8477) James Bourbeau * Add Series and Index is_monotonic * methods (GH#8304) Daniel Mesejo-León * Deprecate token keyword argument to map_blocks (GH#8464) James Bourbeau * Deprecation warning for default value of boundary kwarg in map_overlap (GH#8397) Genevieve Buckley- Skip python310: Not supported by distributed yet - - gh#dask/distributed#5350 * Wed Sep 22 2021 Ben Greiner - Update to 2021.09.1 * Fix groupby for future pandas * Remove warning filters in tests that are no longer needed * Add link to diagnostic visualize function in local diagnostic docs * Add datetime_is_numeric to dataframe.describe * Remove references to pd.Int64Index in anticipation of deprecation * Use loc if needed for series __get_item__ * Specifically ignore warnings on mean for empty slices * Skip groupby nunique test for pandas >= 1.3.3 * Implement ascending arg for sort_values * Replace operator.getitem * Deprecate zero_broadcast_dimensions and homogeneous_deepmap * Add error if drop_index is negative * Allow scheduler to be an Executor * Handle asarray/asanyarray cases where like is a dask.Array * Fix index_col duplication if index_col is type str * Add dtype and order to asarray and asanyarray definitions * Deprecate dask.dataframe.Series.__contains__ * Fix edge case with like-arrays in _wrapped_qr * Deprecate boundary_slice kwarg: kind for pandas compat- Release 2021.09.0 * Fewer open files * Add FileNotFound to expected http errors * Add DataFrame.sort_values to API docs * Change to dask.order: be more eager at times * Add pytest color to CI * FIX: make_people works with processes scheduler * Adds deep param to Dataframe copy method and restrict it to False * Fix typo in configuration docs * Update formatting in DataFrame.query docstring * Un-xfail sparse tests for 0.13.0 release * Add axes property to DataFrame and Series * Add CuPy support in da.unique (values only) * Unit tests for sparse.zeros_like (xfailed) * Add explicit like kwarg support to array creation functions * Separate Array and DataFrame mindeps builds * Fork out percentile_dispatch to dask.array * Ensure filepath exists in to_parquet * Update scheduler plugin usage in test_scheduler_highlevel_graph_unpack_import * Add DataFrame.shuffle to API docs * Order requirements alphabetically- Release 2021.08.1 * Add ignore_metadata_file option to read_parquet (pyarrow-dataset and fastparquet support only) * Add reference to pytest-xdist in dev docs * Include tz in meta from to_datetime * CI Infra Docs * Include invalid DataFrame key in assert_eq check * Use __class__ when creating DataFrames * Use development version of distributed in gpuCI build * Ignore whitespace when gufunc signature * Move pandas import and percentile dispatch refactor * Add colors to represent high level layer types * Upstream instance fix * Add dask.widgets and migrate HTML reprs to jinja2 * Remove wrap_func_like_safe, not required with NumPy >= 1.17 * Fix threaded scheduler memory backpressure regression * Add percentile dispatch * Use a publicly documented attribute obj in groupby rather than private _selected_obj * Specify module to import rechunk from * Use dict to store data for {nan,}arg{min,max} in certain cases * Fix blocksize description formatting in read_pandas * Fix \"point\" -> \"pointers\" typo in docs- Release 2021.08.0 * Fix to_orc delayed compute behavior * Don\'t convert to low-level task graph in compute_as_if_collection * Fix multifile read for hdf * Resolve warning in distributed tests * Update to_orc collection name * Resolve skipfooter problem * Raise NotImplementedError for non-indexable arg passed to to_datetime * Ensure we error on warnings from distributed * Added dict format in to_bag accessories of DataFrame * Delayed docs indirect dependencies * Add tooltips to graphviz high-level graphs * Close 2021 User Survey * Reorganize CuPy tests into multiple files * Refactor and Expand Dask-Dataframe ORC API * Don\'t enforce columns if enforce=False * Fix map_overlap trimming behavior when drop_axis is not None * Mark gpuCI CuPy test as flaky * Avoid using Delayed in to_csv and to_parquet * Removed redundant check_dtypes * Use pytest.warns instead of raises for checking parquet engine deprecation * Bump RAPIDS_VER in gpuCI to 21.10 * Add back pyarrow-legacy test coverage for pyarrow>=5 * Allow pyarrow>=5 in to_parquet and read_parquet * Skip CuPy tests requiring NEP-35 when NumPy < 1.20 is available * Add tail and head to SeriesGroupby * Update Zoom link for monthly meeting * Add gpuCI build script * Deprecate daily_stock utility * Add distributed.nanny to configuration reference docs * Require NumPy 1.18+ & Pandas 1.0+- Add dask-fix8169-pandas13.patch -- gh#dask/dask#8169 * Sun Aug 08 2021 Ben Greiner - Update to 2021.7.2 * This is the last release with support for NumPy 1.17 and pandas 0.25. Beginning with the next release, NumPy 1.18 and pandas 1.0 will be the minimum supported versions. * Add dask.array SVG to the HTML Repr * Avoid use of Delayed in to_parquet * Temporarily pin pyarrow<5 in CI * Add deprecation warning for top-level ucx and rmm config values * Remove skips from doctests (4 of 6) * Remove skips from doctests (5 of 6) * Adds missing prepend/append functionality to da.diff * Change graphviz font family to sans * Fix read-csv name - when path is different, use different name for task * Update configuration reference for ucx and rmm changes * Add meta support to __setitem__ * NEP-35 support for slice_with_int_dask_array * Unpin fastparquet in CI * Remove skips from doctests (3 of 6)- Release 2021.7.1 * Make array assert_eq check dtype * Remove skips from doctests (6 of 6) * Remove experimental feature warning from actors docs * Remove skips from doctests (2 of 6) * Separate out Array and Bag API * Implement lazy Array.__iter__ * Clean up places where we inadvertently iterate over arrays * Add numeric_only kwarg to DataFrame reductions * Add pytest marker for GPU tests * Add support for histogram2d in dask.array * Remove skips from doctests (1 of 6) * Add node size scaling to the Graphviz output for the high level graphs * Update old Bokeh links * Temporarily pin fastparquet in CI * Add dask.array import to progress bar docs * Use separate files for each DataFrame API function and method * Fix pyarrow-dataset ordering bug * Generalize unique aggregate * Raise NotImplementedError when using pd.Grouper * Add aggregate_files argument to enable multi-file partitions in read_parquet * Un-xfail test_daily_stock * Update access configuration docs * Use packaging for version comparisons * Handle infinite loops in merge_asof * Fri Jul 16 2021 Ben Greiner - Update to 2021.07.0 * Include fastparquet in upstream CI build * Blockwise: handle non-string constant dependencies * fastparquet now supports new time types, including ns precision * Avoid ParquetDataset API when appending in ArrowDatasetEngine * Add retry logic to test_shuffle_priority * Use strict channel priority in CI * Support nested dask.distributed imports * Should check module name only, not the entire directory filepath * Updates due to https://github.com/dask/fastparquet/pull/623 * da.eye fix for chunks=-1 * Temporarily xfail test_daily_stock * Set priority annotations in SimpleShuffleLayer * Blockwise: stringify constant key inputs * Allow mixing dask and numpy arrays in AATTguvectorize * Don\'t sample dict result of a shuffle group when calculating its size * Fix scipy tests * Deterministically tokenize datetime.date * Add sample_rows to read_csv-like * Fix typo in config.deserialize docstring * Remove warning filter in test_dataframe_picklable * Improvements to histogramdd * Make PY_VERSION private- Release 2021.06.2 * layers.py compare parts_out with set(self.parts_out) * Make check_meta understand pandas dtypes better * Remove \"Educational Resources\" doc page * - Release 2021.06.1 * Replace funding page with \'Supported By\' section on dask.org * Add initial deprecation utilities * Enforce dtype conservation in ufuncs that explicitly use dtype= * Add Coiled to list of paid support organizations * Small tweaks to the HTML repr for Layer & HighLevelGraph * Add dark mode support to HLG HTML repr * Remove compatibility entries for old distributed * Implementation of HTML repr for HighLevelGraph layers * Update default blockwise token to avoid DataFrame column name clash * Use dispatch concat for merge_asof * Fix upstream freq tests * Use more context managers from the standard library * Simplify skips in parquet tests * Remove check for outdated bokeh * More test coverage uploads * Remove ImportError catching from dask/__init__.py * Allow DataFrame.join() to take a list of DataFrames to merge with * Fix maximum recursion depth exception in dask.array.linspace * Fix docs links * Initial da.select() implementation and test * Layers must implement get_output_keys method * Don\'t include or expect freq in divisions * A HighLevelGraph abstract layer for map_overlap * Always include kwarg name in drop * Only rechunk for median if needed * Add add_(prefix|suffix) to DataFrame and Series * Move read_hdf to Blockwise * Make Layer.get_output_keys officially an abstract method * Non-dask-arrays and broadcasting in ravel_multi_index * Fix for paths ending with \"/\" in parquet overwrite * Fixing calling .visualize() with filename=None * Generate unique names for SubgraphCallable * Pin fsspec to 2021.5.0 in CI * Evaluate graph lazily if meta is provided in from_delayed * Add meta support for DatetimeTZDtype * Add dispatch label to automatic PR labeler * Fix HDFS tests- Release 2021.06.0 * Remove abstract tokens from graph keys in rewrite_blockwise * Ensure correct column order in csv project_columns * Renamed inner loop variables to avoid duplication * Do not return delayed object from to_zarr * Array: correct number of outputs in apply_gufunc * Rewrite da.fromfunction with da.blockwise * Rename make_meta_util to make_meta * Repartition before shuffle if the requested partitions are less than input partitions * Blockwise: handle constant key inputs * Added raise to apply_gufunc * Show failing tests summary in CI * sizeof sets in Python 3.9 * Warn if using pandas datetimelike string in dataframe.__getitem__ * Highlight the client.dashboard_link * Easier link for subscribing to the Google calendar * Automatically show graph visualization in Jupyter notebooks * Add autofunction for unify_chunks in API docs- Release 2021.05.1 * Pandas compatibility * Fix optimize_dataframe_getitem bug * Update make_meta import in docs * Implement da.searchsorted * Fix format string in error message * Fix read_sql_table returning wrong result for single column loads * Add slack join link in support.rst * Remove unused alphabet variable * Fix meta creation incase of object * Add dispatch for union_categoricals * Consolidate array Dispatch objects * Move DataFrame dispatch.registers to their own file * Fix delayed with dataclasses where init=False * Allow a column to be named divisions * Stack nd array with unknown chunks * Promote the 2021 Dask User Survey * Fix typo in DataFrame.set_index() * Cleanup array API reference links * Accept axis tuple for flip to be consistent with NumPy * Bump pre-commit hook versions * Cleanup to_zarr docstring * Fix the docstring of read_orc * Doc ipyparallel & mpi4py concurrent.futures * Update tests to support CuPy 9 * Fix some HighLevelGraph documentation inaccuracies * Fix spelling in Series getitem error message * Tue May 18 2021 Ben Greiner - update to version 2021.5.0 * Remove deprecated kind kwarg to comply with pandas 1.3.0 (GH#7653) Julia Signell * Fix bug in DataFrame column projection (GH#7645) Richard (Rick) Zamora * Merge global annotations when packing (GH#7565) Mads R. B. Kristensen * Avoid inplace= in pandas set_categories (GH#7633) James Bourbeau * Change the active-fusion default to False for Dask-Dataframe (GH#7620) Richard (Rick) Zamora * Array: remove extraneous code from RandomState (GH#7487) Gabe Joseph * Implement str.concat when others=None (GH#7623) Daniel Mesejo-León * Fix dask.dataframe in sandboxed environments (GH#7601) Noah D. Brenowitz * Support for cupyx.scipy.linalg (GH#7563) Benjamin Zaitlen * Move timeseries and daily-stock to Blockwise (GH#7615) Richard (Rick) Zamora * Fix bugs in broadcast join (GH#7617) Richard (Rick) Zamora * Use Blockwise for DataFrame IO (parquet, csv, and orc) (GH#7415) Richard (Rick) Zamora * Adding chunk & type information to Dask HighLevelGraph s (GH#7309) Genevieve Buckley * Add pyarrow sphinx intersphinx_mapping (GH#7612) Ray Bell * Remove skip on test freq (GH#7608) Julia Signell * Defaults in read_parquet parameters (GH#7567) Ray Bell * Remove ignore_abc_warning (GH#7606) Julia Signell * Harden DataFrame merge between column-selection and index (GH#7575) Richard (Rick) Zamora * Get rid of ignore_abc decorator (GH#7604) Julia Signell * Remove kwarg validation for bokeh (GH#7597) Julia Signell * Add loky example (GH#7590) Naty Clementi * Delayed: nout when arguments become tasks (GH#7593) Gabe Joseph * Update distributed version in mindep CI build (GH#7602) James Bourbeau * Support all or no overlap between partition columns and real columns (GH#7541) Richard (Rick) Zamora- Stress that python-distributed, if used, has to have a matching version number. Always update at the same time. * Mon May 03 2021 Arun Persaud - update to version 2021.4.1: * Handle Blockwise HLG pack/unpack for concatenate=True (:pr:`7455`) Richard (Rick) Zamora * map_partitions: use tokenized info as name of the SubgraphCallable (:pr:`7524`) Mads R. B. Kristensen * Using tmp_path and tmpdir to avoid temporary files and directories hanging in the repo (:pr:`7592`) Naty Clementi * Contributing to docs (development guide) (:pr:`7591`) Naty Clementi * Add more packages to Python 3.9 CI build (:pr:`7588`) James Bourbeau * Array: Fix NEP-18 dispatching in finalize (:pr:`7508`) Gabe Joseph * Misc fixes for numpydoc (:pr:`7569`) Matthias Bussonnier * Avoid pandas level= keyword deprecation (:pr:`7577`) James Bourbeau * Map e.g. .repartition(freq=\"M\") to .repartition(freq=\"MS\") (:pr:`7504`) Ruben van de Geer * Remove hash seeding in parallel CI runs (:pr:`7128`) Elliott Sales de Andrade * Add defaults in parameters in to_parquet (:pr:`7564`) Ray Bell * Simplify transpose axes cleanup (:pr:`7561`) Julia Signell * Make ValueError in len(index_names) > 1 explicit it\'s using fastparquet (:pr:`7556`) Ray Bell * Fix dict-column appending for pyarrow parquet engines (:pr:`7527`) Richard (Rick) Zamora * Add a documentation auto label (:pr:`7560`) Doug Davis * Add dask.delayed.Delayed to docs so it can be referenced by other sphinx docs (:pr:`7559`) Doug Davis * Fix upstream idxmaxmin for uneven split_every (:pr:`7538`) Julia Signell * Make normalize_token for pandas Series/DataFrame future proof (no direct block access) (:pr:`7318`) Joris Van den Bossche * Redesigned __setitem__ implementation (:pr:`7393`) David Hassell * histogram, histogramdd improvements (docs; return consistencies) (:pr:`7520`) Doug Davis * Force nightly pyarrow in the upstream build (:pr:`7530`) Joris Van den Bossche * Fix Configuration Reference (:pr:`7533`) Benjamin Zaitlen * Use .to_parquet on dask.dataframe in doc string (:pr:`7528`) Ray Bell * Avoid double msgpack serialization of HLGs (:pr:`7525`) Mads R. B. Kristensen * Encourage usage of yaml.safe_load() in configuration doc (:pr:`7529`) Hristo Georgiev * Fix reshape bug. Add relevant test. Fixes #7171. (:pr:`7523`) JSKenyon * Support custom_metadata= argument in to_parquet (:pr:`7359`) Richard (Rick) Zamora * Clean some documentation warnings (:pr:`7518`) Daniel Mesejo-León * Getting rid of more docs warnings (:pr:`7426`) Julia Signell * Added product (alias of prod) (:pr:`7517`) Freyam Mehta * Fix upstream __array_ufunc__ tests (:pr:`7494`) Julia Signell * Escape from map_overlap to map_blocks if depth is zero (:pr:`7481`) Genevieve Buckley * Add check_type to array assert_eq (:pr:`7491`) Julia Signell * Fri Apr 09 2021 Benjamin Greiner - Reenable 32bit tests after distributed is not cythonized anymore gh#dask/dask#7489 * Sun Apr 04 2021 Arun Persaud - update to version 2021.4.0: * Adding support for multidimensional histograms with dask.array.histogramdd (:pr:`7387`) Doug Davis * Update docs on number of threads and workers in default LocalCluster (:pr:`7497`) cameron16 * Add labels automatically when certain files are touched in a PR (:pr:`7506`) Julia Signell * Extract ignore_order from kwargs (:pr:`7500`) GALI PREM SAGAR * Only provide installation instructions when distributed is missing (:pr:`7498`) Matthew Rocklin * Start adding isort (:pr:`7370`) Julia Signell * Add ignore_order parameter in dd.concat (:pr:`7473`) Daniel Mesejo-León * Use powers-of-two when displaying RAM (:pr:`7484`) Guido Imperiale * Added License Classifier (:pr:`7485`) Tom Augspurger * Replace conda with mamba (:pr:`7227`) Guido Imperiale * Fix typo in array docs (:pr:`7478`) James Lamb * Use concurrent.futures in local scheduler (:pr:`6322`) John A Kirkham * Tue Mar 30 2021 Ben Greiner - Update to 2021.3.1 * Add a dispatch for is_categorical_dtype to handle non-pandas objects (GH#7469) brandon-b-miller * Use multiprocessing.Pool in test_read_text (GH#7472) John A Kirkham * Add missing meta kwarg to gufunc class (GH#7423) Peter Andreas Entschev * Example for memory-mapped Dask array (GH#7380) Dieter Weber * Fix NumPy upstream failures xfail pandas and fastparquet failures (GH#7441) Julia Signell * Fix bug in repartition with freq (GH#7357) Ruben van de Geer * Fix __array_function__ dispatching for tril/triu (GH#7457) Peter Andreas Entschev * Use concurrent.futures.Executors in a few tests (GH#7429) John A Kirkham * Require NumPy >=1.16 (GH#7383) Guido Imperiale * Minor sort_values housekeeping (GH#7462) Ryan Williams * Ensure natural sort order in parquet part paths (GH#7249) Ryan Williams * Remove global env mutation upon running test_config.py (GH#7464) Hristo * Update NumPy intersphinx URL (GH#7460) Gabe Joseph * Add rot90 (GH#7440) Trevor Manz * Update docs for required package for endpoint (GH#7454) Nick Vazquez * Master -> main in slice_array docstring (GH#7453) Gabe Joseph * Expand dask.utils.is_arraylike docstring (GH#7445) Doug Davis * Simplify BlockwiseIODeps importing (GH#7420) Richard (Rick) Zamora * Update layer annotation packing method (GH#7430) James Bourbeau * Drop duplicate test in test_describe_empty (GH#7431) John A Kirkham * Add Series.dot method to dataframe module (GH#7236) Madhu94 * Added df kurtosis-method and testing (GH#7273) Jan Borchmann * Avoid quadratic-time performance for HLG culling (GH#7403) Bruce Merry * Temporarily skip problematic sparse test (GH#7421) James Bourbeau * Update some CI workflow names (GH#7422) James Bourbeau * Fix HDFS test (GH#7418) Julia Signell * Make changelog subtitles match the hierarchy (GH#7419) Julia Signell * Add support for normalize in value_counts (GH#7342) Julia Signell * Avoid unnecessary imports for HLG Layer unpacking and materialization (GH#7381) Richard (Rick) Zamora * Bincount fix slicing (GH#7391) Genevieve Buckley * Add sliding_window_view (GH#7234) Deepak Cherian * Fix typo in docs/source/develop.rst (GH#7414) Hristo * Switch documentation builds for PRs to readthedocs (GH#7397) James Bourbeau * Adds sort_values to dask.DataFrame (GH#7286) gerrymanoim * Pin sqlalchemy<1.4.0 in CI (GH#7405) James Bourbeau * Comment fixes (GH#7215) Ryan Williams * Dead code removal / fixes (GH#7388) Ryan Williams * Use single thread for pa.Table.from_pandas calls (GH#7347) Richard (Rick) Zamora * Replace \'container\' with \'image\' (GH#7389) James Lamb * DOC hyperlink repartition (GH#7394) Ray Bell * Pass delimiter to fsspec in bag.read_text (GH#7349) Martin Durant * Update read_hdf default mode to \"r\" (GH#7039) rs9w33 * Embed literals in SubgraphCallable when packing Blockwise (GH#7353) Mads R. B. Kristensen * Update test_hdf.py to not reuse file handlers (GH#7044) rs9w33 * Require additional dependencies: cloudpickle, partd, fsspec, toolz (GH#7345) Julia Signell * Prepare Blockwise + IO infrastructure (GH#7281) Richard (Rick) Zamora * Remove duplicated imports from test_slicing.py (GH#7365) Hristo * Add test deps for pip development (GH#7360) Julia Signell * Support int slicing for non-NumPy arrays (GH#7364) Peter Andreas Entschev * Automatically cancel previous CI builds (GH#7348) James Bourbeau * dask.array.asarray should handle case where xarray class is in top-level namespace (GH#7335) Tom White * HighLevelGraph length without materializing layers (GH#7274) Gabe Joseph * Drop support for Python 3.6 (GH#7006) James Bourbeau * Fix fsspec usage in create_metadata_file (GH#7295) Richard (Rick) Zamora * Change default branch from master to main (GH#7198) Julia Signell * Add Xarray to CI software environment (GH#7338) James Bourbeau * Update repartition argument name in error text (GH#7336) Eoin Shanaghy * Run upstream tests based on commit message (GH#7329) James Bourbeau * Use pytest.register_assert_rewrite on util modules (GH#7278) Bruce Merry * Add example on using specific chunk sizes in from_array() (GH#7330) James Lamb * Move NumPy skip into test (GH#7247) Julia Signell- Update package descriptions- Add dask-delayed and dask-diagnostics packages- Drop dask-multiprocessing package merged into main- Skip python36: upstream dropped support for Python < 3.7- Drop dask-pr7247-numpyskip.patch merged upstream- Test more optional requirements for better compatibility assurance. * Sun Mar 07 2021 Ben Greiner - Update to 2021.3.0 * This is the first release with support for Python 3.9 and the last release with support for Python 3.6 * Bump minimum version of distributed (GH#7328) James Bourbeau * Fix percentiles_summary with dask_cudf (GH#7325) Peter Andreas Entschev * Temporarily revert recent Array.__setitem__ updates (GH#7326) James Bourbeau * Blockwise.clone (GH#7312) Guido Imperiale * NEP-35 duck array update (GH#7321) James Bourbeau * Don’t allow setting .name for array (GH#7222) Julia Signell * Use nearest interpolation for creating percentiles of integer input (GH#7305) Kyle Barron * Test exp with CuPy arrays (GH#7322) John A Kirkham * Check that computed chunks have right size and dtype (GH#7277) Bruce Merry * pytest.mark.flaky (GH#7319) Guido Imperiale * Contributing docs: add note to pull the latest git tags before pip installing Dask (GH#7308) Genevieve Buckley * Support for Python 3.9 (GH#7289) Guido Imperiale * Add broadcast-based merge implementation (GH#7143) Richard (Rick) Zamora * Add split_every to graph_manipulation (GH#7282) Guido Imperiale * Typo in optimize docs (GH#7306) Julius Busecke * dask.graph_manipulation support for xarray.Dataset (GH#7276) Guido Imperiale * Add plot width and height support for Bokeh 2.3.0 (GH#7297) James Bourbeau * Add NumPy functions tri, triu_indices, triu_indices_from, tril_indices, tril_indices_from (GH#6997) Illviljan * Remove “cleanup” task in DataFrame on-disk shuffle (GH#7260) Sinclair Target * Use development version of distributed in CI (GH#7279) James Bourbeau * Moving high level graph pack/unpack Dask (GH#7179) Mads R. B. Kristensen * Improve performance of merge_percentiles (GH#7172) Ashwin Srinath * DOC: add dask-sql and fugue (GH#7129) Ray Bell * Example for working with categoricals and parquet (GH#7085) McToel * Adds tree reduction to bincount (GH#7183) Thomas J. Fan * Improve documentation of name in from_array (GH#7264) Bruce Merry * Fix cumsum for empty partitions (GH#7230) Julia Signell * Add map_blocks example to dask array creation docs (GH#7221) Julia Signell * Fix performance issue in dask.graph_manipulation.wait_on() (GH#7258) Guido Imperiale * Replace coveralls with codecov.io (GH#7246) Guido Imperiale * Pin to a particular black rev in pre-commit (GH#7256) Julia Signell * Minor typo in documentation: array-chunks.rst (GH#7254) Magnus Nord * Fix bugs in Blockwise and ShuffleLayer (GH#7213) Richard (Rick) Zamora * Fix parquet filtering bug for \"pyarrow-dataset\" with pyarrow-3.0.0 (GH#7200) Richard (Rick) Zamora * graph_manipulation without NumPy (GH#7243) Guido Imperiale * Support for NEP-35 (GH#6738) Peter Andreas Entschev * Avoid running unit tests during doctest CI build (GH#7240) James Bourbeau * Run doctests on CI (GH#7238) Julia Signell * Cleanup code quality on set arithmetics (GH#7196) Guido Imperiale * Add dask.array.delete (GH#7125) Julia Signell * Unpin graphviz now that new conda-forge recipe is built (GH#7235) Julia Signell * Don’t use NumPy 1.20 from conda-forge on Mac (GH#7211) Guido Imperiale * map_overlap: Don’t rechunk axes without overlap (GH#7233) Deepak Cherian * Pin graphviz to avoid issue with latest conda-forge build (GH#7232) Julia Signell * Use html_css_files in docs for custom CSS (GH#7220) James Bourbeau * Graph manipulation: clone, bind, checkpoint, wait_on (GH#7109) Guido Imperiale * Fix handling of filter expressions in parquet pyarrow-dataset engine (GH#7186) Joris Van den Bossche * Extend __setitem__ to more closely match numpy (GH#7033) David Hassell * Clean up Python 2 syntax (GH#7195) Guido Imperiale * Fix regression in Delayed._length (GH#7194) Guido Imperiale * __dask_layers__() tests and tweaks (GH#7177) Guido Imperiale * Properly convert HighLevelGraph in multiprocessing scheduler (GH#7191) Jim Crist-Harif * Don’t fail fast in CI (GH#7188) James Bourbeau- Add dask-pr7247-numpyskip.patch -- gh#dask/dask#7247 * Wed Feb 17 2021 Ben Greiner - Run the full test suite: use rootdir conftest.py * importable optional dependencies are skipped automatically * can use network marker to skip network tests- Don\'t package and test -dataframe and -array for python36 flavor, because python36-numpy and depending packages were dropped from Tumbleweed with version 1.20.- Skip more distributed tests occasionally failing * Mon Feb 08 2021 Ben Greiner - Update to version 2020.2.0 * Add percentile support for NEP-35 (GH#7162) Peter Andreas Entschev * Added support for Float64 in column assignment (GH#7173) Nils Braun * Coarsen rechunking error (GH#7127) Davis Bennett * Fix upstream CI tests (GH#6896) Julia Signell * Revise HighLevelGraph Mapping API (GH#7160) Guido Imperiale * Update low-level graph spec to use any hashable for keys (GH#7163) James Bourbeau * Generically rebuild a collection with different keys (GH#7142) Guido Imperiale * Make easier to link issues in PRs (GH#7130) Ray Bell * Add dask.array.append (GH#7146) D-Stacks * Allow dask.array.ravel to accept array_like argument (GH#7138) D-Stacks * Fixes link in array design doc (GH#7152) Thomas J. Fan * Fix example of using blockwise for an outer product (GH#7119) Bruce Merry * Deprecate HighlevelGraph.dicts in favor of .layers (GH#7145) Amit Kumar * Align FastParquetEngine with pyarrow engines (GH#7091) Richard (Rick) Zamora * Merge annotations (GH#7102) Ian Rose * Simplify contents of parts list in read_parquet (GH#7066) Richard (Rick) Zamora * check_meta(): use __class__ when checking DataFrame types (GH#7099) Mads R. B. Kristensen * Cache several properties (GH#7104) Illviljan * Fix parquet getitem optimization (GH#7106) Richard (Rick) Zamora * Add cytoolz back to CI environment (GH#7103) James Bourbeau * Thu Jan 28 2021 Ben Greiner - Update to version 2020.1.1 Partially fix cumprod (GH#7089) Julia Signell * Test pandas 1.1.x / 1.2.0 releases and pandas nightly (GH#6996) Joris Van den Bossche * Use assign to avoid SettingWithCopyWarning (GH#7092) Julia Signell * \'mode\' argument passed to bokeh.output_file() (GH#7034) (GH#7075) patquem * Skip empty partitions when doing groupby.value_counts (GH#7073) Julia Signell * Add error messages to assert_eq() (GH#7083) James Lamb * Make cached properties read-only (GH#7077) Illviljan- Changelog for 2021.01.0 * map_partitions with review comments (GH#6776) Kumar Bharath Prabhu * Make sure that population is a real list (GH#7027) Julia Signell * Propagate storage_options in read_csv (GH#7074) Richard (Rick) Zamora * Remove all BlockwiseIO code (GH#7067) Richard (Rick) Zamora * Fix CI (GH#7069) James Bourbeau * Add option to control rechunking in reshape (GH#6753) Tom Augspurger * Fix linalg.lstsq for complex inputs (GH#7056) Johnnie Gray * Add compression=\'infer\' default to read_csv (GH#6960) Richard (Rick) Zamora * Revert parameter changes in svd_compressed #7003 (GH#7004) Eric Czech * Skip failing s3 test (GH#7064) Martin Durant * Revert BlockwiseIO (GH#7048) Richard (Rick) Zamora * Add some cross-references to DataFrame.to_bag() and Series. to_bag() (GH#7049) Rob Malouf * Rewrite matmul as blockwise without contraction/concatenate (GH#7000) Rafal Wojdyla * Use functools.cached_property in da.shape (GH#7023) Illviljan * Use meta value in series non_empty (GH#6976) Julia Signell * Revert “Temporarly pin sphinx version to 3.3.1 (GH#7002)” (GH#7014) Rafal Wojdyla * Revert python-graphviz pinning (GH#7037) Julia Signell * Accidentally committed print statement (GH#7038) Julia Signell * Pass dropna and observed in agg (GH#6992) Julia Signell * Add index to meta after .str.split with expand (GH#7026) Ruben van de Geer * CI: test pyarrow 2.0 and nightly (GH#7030) Joris Van den Bossche * Temporarily pin python-graphviz in CI (GH#7031) James Bourbeau * Underline section in numpydoc (GH#7013) Matthias Bussonnier * Keep normal optimizations when adding custom optimizations (GH#7016) Matthew Rocklin * Temporarily pin sphinx version to 3.3.1 (GH#7002) Rafal Wojdyla * DOC: Misc formatting (GH#6998) Matthias Bussonnier * Add inline_array option to from_array (GH#6773) Tom Augspurger * Revert “Initial pass at blockwise array creation routines (GH#6931)” (:pr:`6995) James Bourbeau * Set npartitions in set_index (GH#6978) Julia Signell * Upstream config serialization and inheritance (GH#6987) Jacob Tomlinson * Bump the minimum time in test_minimum_time (GH#6988) Martin Durant * Fix pandas dtype inference for read_parquet (GH#6985) Richard (Rick) Zamora * Avoid data loss in set_index with sorted=True (GH#6980) Richard (Rick) Zamora * Bugfix in read_parquet for handling un-named indices with index=False (GH#6969) Richard (Rick) Zamora * Use __class__ when comparing meta data (GH#6981) Mads R. B. Kristensen * Comparing string versions won’t always work (GH#6979) Rafal Wojdyla * Fix GH#6925 (GH#6982) sdementen * Initial pass at blockwise array creation routines (GH#6931) Ian Rose * Simplify has_parallel_type() (GH#6927) Mads R. B. Kristensen * Handle annotation unpacking in BlockwiseIO (GH#6934) Simon Perkins * Avoid deprecated yield_fixture in test_sql.py (GH#6968) Richard (Rick) Zamora * Remove bad graph logic in BlockwiseIO (GH#6933) Richard (Rick) Zamora * Get config item if variable is None (GH#6862) Jacob Tomlinson * Update from_pandas docstring (GH#6957) Richard (Rick) Zamora * Prevent fuse_roots from clobbering annotations (GH#6955) Simon Perkins * Wed Jan 13 2021 Benjamin Greiner - update to version 2020.12.0 * Switched to CalVer for versioning scheme. * Introduced new APIs for HighLevelGraph to enable sending high-level representations of task graphs to the distributed scheduler. * Introduced new HighLevelGraph layer objects including BasicLayer, Blockwise, BlockwiseIO, ShuffleLayer, and more. * Added support for applying custom Layer-level annotations like priority, retries, etc. with the dask.annotations context manager. * Updated minimum supported version of pandas to 0.25.0 and NumPy to 1.15.1. * Support for the pyarrow.dataset API to read_parquet. * Several fixes to Dask Array’s SVD.- For a full list of changes see https://docs.dask.org/en/latest/changelog.html- Clean requirements- Fix incorrect usage of python3_only macro- Test with pytest-xdist in order to avoid hang after test
|
|
|