|
|
|
|
Changelog for python3-dask-multiprocessing-1.1.1-bp154.1.2.noarch.rpm :
* Tue Feb 16 2021 Yuchen Lin - Do not build the python2 package since python-joblib is python3 only * Sat Feb 02 2019 Arun Persaud - update to version 1.1.1: * Array + Add support for cupy.einsum (:pr:`4402`) Johnnie Gray + Provide byte size in chunks keyword (:pr:`4434`) Adam Beberg + Raise more informative error for histogram bins and range (:pr:`4430`) James Bourbeau * DataFrame + Lazily register more cudf functions and move to backends file (:pr:`4396`) Matthew Rocklin + Fix ORC tests for pyarrow 0.12.0 (:pr:`4413`) Jim Crist + rearrange_by_column: ensure that shuffle arg defaults to \'disk\' if it\'s None in dask.config (:pr:`4414`) George Sakkis + Implement filters for _read_pyarrow (:pr:`4415`) George Sakkis + Avoid checking against types in is_dataframe_like (:pr:`4418`) Matthew Rocklin + Pass username as \'user\' when using pyarrow (:pr:`4438`) Roma Sokolov * Delayed + Fix DelayedAttr return value (:pr:`4440`) Matthew Rocklin * Documentation + Use SVG for pipeline graphic (:pr:`4406`) John A Kirkham + Add doctest-modules to py.test documentation (:pr:`4427`) Daniel Severo * Core + Work around psutil 5.5.0 not allowing pickling Process objects Dimplexion * Sun Jan 20 2019 Arun Persaud - specfile: * update copyright year- update to version 1.1.0: * Array + Fix the average function when there is a masked array (:pr:`4236`) Damien Garaud + Add allow_unknown_chunksizes to hstack and vstack (:pr:`4287`) Paul Vecchio + Fix tensordot for 27+ dimensions (:pr:`4304`) Johnnie Gray + Fixed block_info with axes. (:pr:`4301`) Tom Augspurger + Use safe_wraps for matmul (:pr:`4346`) Mark Harfouche + Use chunks=\"auto\" in array creation routines (:pr:`4354`) Matthew Rocklin + Fix np.matmul in dask.array.Array.__array_ufunc__ (:pr:`4363`) Stephan Hoyer + COMPAT: Re-enable multifield copy->view change (:pr:`4357`) Diane Trout + Calling np.dtype on a delayed object works (:pr:`4387`) Jim Crist + Rework normalize_array for numpy data (:pr:`4312`) Marco Neumann * DataFrame + Add fill_value support for series comparisons (:pr:`4250`) James Bourbeau + Add schema name in read_sql_table for empty tables (:pr:`4268`) Mina Farid + Adjust check for bad chunks in map_blocks (:pr:`4308`) Tom Augspurger + Add dask.dataframe.read_fwf (:pr:`4316`) AATTslnguyen + Use atop fusion in dask dataframe (:pr:`4229`) Matthew Rocklin + Use parallel_types(`) in from_pandas (:pr:`4331`) Matthew Rocklin + Change DataFrame._repr_data to method (:pr:`4330`) Matthew Rocklin + Install pyarrow fastparquet for Appveyor (:pr:`4338`) Gábor Lipták + Remove explicit pandas checks and provide cudf lazy registration (:pr:`4359`) Matthew Rocklin + Replace isinstance(..., pandas`) with is_dataframe_like (:pr:`4375`) Matthew Rocklin + ENH: Support 3rd-party ExtensionArrays (:pr:`4379`) Tom Augspurger + Pandas 0.24.0 compat (:pr:`4374`) Tom Augspurger * Documentation + Fix link to \'map_blocks\' function in array api docs (:pr:`4258`) David Hoese + Add a paragraph on Dask-Yarn in the cloud docs (:pr:`4260`) Jim Crist + Copy edit documentation (:pr:`4267), (:pr:`4263`), (:pr:`4262`), (:pr:`4277`), (:pr:`4271`), (:pr:`4279), (:pr:`4265`), (:pr:`4295`), (:pr:`4293`), (:pr:`4296`), (:pr:`4302`), (:pr:`4306`), (:pr:`4318`), (:pr:`4314`), (:pr:`4309`), (:pr:`4317`), (:pr:`4326`), (:pr:`4325`), (:pr:`4322`), (:pr:`4332`), (:pr:`4333`), Miguel Farrajota + Fix typo in code example (:pr:`4272`) Daniel Li + Doc: Update array-api.rst (:pr:`4259`) (:pr:`4282`) Prabakaran Kumaresshan + Update hpc doc (:pr:`4266`) Guillaume Eynard-Bontemps + Doc: Replace from_avro with read_avro in documents (:pr:`4313`) Prabakaran Kumaresshan + Remove reference to \"get\" scheduler functions in docs (:pr:`4350`) Matthew Rocklin + Fix typo in docstring (:pr:`4376`) Daniel Saxton + Added documentation for dask.dataframe.merge (:pr:`4382`) Jendrik Jördening * Core + Avoid recursion in dask.core.get (:pr:`4219`) Matthew Rocklin + Remove verbose flag from pytest setup.cfg (:pr:`4281`) Matthew Rocklin + Support Pytest 4.0 by specifying marks explicitly (:pr:`4280`) Takahiro Kojima + Add High Level Graphs (:pr:`4092`) Matthew Rocklin + Fix SerializableLock locked and acquire methods (:pr:`4294`) Stephan Hoyer + Pin boto3 to earlier version in tests to avoid moto conflict (:pr:`4276`) Martin Durant + Treat None as missing in config when updating (:pr:`4324`) Matthew Rocklin + Update Appveyor to Python 3.6 (:pr:`4337`) Gábor Lipták + Use parse_bytes more liberally in dask.dataframe/bytes/bag (:pr:`4339`) Matthew Rocklin + Add a better error message when cloudpickle is missing (:pr:`4342`) Mark Harfouche + Support pool= keyword argument in threaded/multiprocessing get functions (:pr:`4351`) Matthew Rocklin + Allow updates from arbitrary Mappings in config.update, not only dicts. (:pr:`4356`) Stuart Berg + Move dask/array/top.py code to dask/blockwise.py (:pr:`4348`) Matthew Rocklin + Add has_parallel_type (:pr:`4395`) Matthew Rocklin + CI: Update Appveyor (:pr:`4381`) Tom Augspurger + Ignore non-readable config files (:pr:`4388`) Jim Crist * Sat Dec 01 2018 Arun Persaud - update to version 1.0.0: * Array + Add nancumsum/nancumprod unit tests (:pr:`4215`) Guido Imperiale * DataFrame + Add index to to_dask_dataframe docstring (:pr:`4232`) James Bourbeau + Text and fix when appending categoricals with fastparquet (:pr:`4245`) Martin Durant + Don\'t reread metadata when passing ParquetFile to read_parquet (:pr:`4247`) Martin Durant * Documentation + Copy edit documentation (:pr:`4222`) (:pr:`4224`) (:pr:`4228`) (:pr:`4231`) (:pr:`4230`) (:pr:`4234`) (:pr:`4235`) (:pr:`4254`) Miguel Farrajota + Updated doc for the new scheduler keyword (:pr:`4251`) AATTmilesial * Core + Avoid a few warnings (:pr:`4223`) Matthew Rocklin + Remove dask.store module (:pr:`4221`) Matthew Rocklin + Remove AUTHORS.md Jim Crist * Thu Nov 22 2018 Arun Persaud - update to version 0.20.2: * Array + Avoid fusing dependencies of atop reductions (:pr:`4207`) Matthew Rocklin * Dataframe + Improve memory footprint for dataframe correlation (:pr:`4193`) Damien Garaud + Add empty DataFrame check to boundary_slice (:pr:`4212`) James Bourbeau * Documentation + Copy edit documentation (:pr:`4197`) (:pr:`4204`) (:pr:`4198`) (:pr:`4199`) (:pr:`4200`) (:pr:`4202`) (:pr:`4209`) Miguel Farrajota + Add stats module namespace (:pr:`4206`) James Bourbeau + Fix link in dataframe documentation (:pr:`4208`) James Bourbeau * Mon Nov 12 2018 Arun Persaud - update to version 0.20.1: * Array + Only allocate the result space in wrapped_pad_func (:pr:`4153`) John A Kirkham + Generalize expand_pad_width to expand_pad_value (:pr:`4150`) John A Kirkham + Test da.pad with 2D linear_ramp case (:pr:`4162`) John A Kirkham + Fix import for broadcast_to. (:pr:`4168`) samc0de + Rewrite Dask Array\'s pad to add only new chunks (:pr:`4152`) John A Kirkham + Validate index inputs to atop (:pr:`4182`) Matthew Rocklin * Core + Dask.config set and get normalize underscores and hyphens (:pr:`4143`) James Bourbeau + Only subs on core collections, not subclasses (:pr:`4159`) Matthew Rocklin + Add block_size=0 option to HTTPFileSystem. (:pr:`4171`) Martin Durant + Add traverse support for dataclasses (:pr:`4165`) Armin Berres + Avoid optimization on sharedicts without dependencies (:pr:`4181`) Matthew Rocklin + Update the pytest version for TravisCI (:pr:`4189`) Damien Garaud + Use key_split rather than funcname in visualize names (:pr:`4160`) Matthew Rocklin * Dataframe + Add fix for DataFrame.__setitem__ for index (:pr:`4151`) Anderson Banihirwe + Fix column choice when passing list of files to fastparquet (:pr:`4174`) Martin Durant + Pass engine_kwargs from read_sql_table to sqlalchemy (:pr:`4187`) Damien Garaud * Documentation + Fix documentation in Delayed best practices example that returned an empty list (:pr:`4147`) Jonathan Fraine + Copy edit documentation (:pr:`4164`) (:pr:`4175`) (:pr:`4185`) (:pr:`4192`) (:pr:`4191`) (:pr:`4190`) (:pr:`4180`) Miguel Farrajota + Fix typo in docstring (:pr:`4183`) Carlos Valiente * Tue Oct 30 2018 Arun Persaud - update to version 0.20.0: * Array + Fuse Atop operations (:pr:`3998`), (:pr:`4081`) Matthew Rocklin + Support da.asanyarray on dask dataframes (:pr:`4080`) Matthew Rocklin + Remove unnecessary endianness check in datetime test (:pr:`4113`) Elliott Sales de Andrade + Set name=False in array foo_like functions (:pr:`4116`) Matthew Rocklin + Remove dask.array.ghost module (:pr:`4121`) Matthew Rocklin + Fix use of getargspec in dask array (:pr:`4125`) Stephan Hoyer + Adds dask.array.invert (:pr:`4127`), (:pr:`4131`) Anderson Banihirwe + Raise informative error on arg-reduction on unknown chunksize (:pr:`4128`), (:pr:`4135`) Matthew Rocklin + Normalize reversed slices in dask array (:pr:`4126`) Matthew Rocklin * Bag + Add bag.to_avro (:pr:`4076`) Martin Durant * Core + Pull num_workers from config.get (:pr:`4086`), (:pr:`4093`) James Bourbeau + Fix invalid escape sequences with raw strings (:pr:`4112`) Elliott Sales de Andrade + Raise an error on the use of the get= keyword and set_options (:pr:`4077`) Matthew Rocklin + Add import for Azure DataLake storage, and add docs (:pr:`4132`) Martin Durant + Avoid collections.Mapping/Sequence (:pr:`4138`) Matthew Rocklin * Dataframe + Include index keyword in to_dask_dataframe (:pr:`4071`) Matthew Rocklin + add support for duplicate column names (:pr:`4087`) Jan Koch + Implement min_count for the DataFrame methods sum and prod (:pr:`4090`) Bart Broere + Remove pandas warnings in concat (:pr:`4095`) Matthew Rocklin + DataFrame.to_csv header option to only output headers in the first chunk (:pr:`3909`) Rahul Vaidya + Remove Series.to_parquet (:pr:`4104`) Justin Dennison + Avoid warnings and deprecated pandas methods (:pr:`4115`) Matthew Rocklin + Swap \'old\' and \'previous\' when reporting append error (:pr:`4130`) Martin Durant * Documentation + Copy edit documentation (:pr:`4073`), (:pr:`4074`), (:pr:`4094`), (:pr:`4097`), (:pr:`4107`), (:pr:`4124`), (:pr:`4133`), (:pr:`4139`) Miguel Farrajota + Fix typo in code example (:pr:`4089`) Antonino Ingargiola + Add pycon 2018 presentation (:pr:`4102`) Javad + Quick description for gcsfs (:pr:`4109`) Martin Durant + Fixed typo in docstrings of read_sql_table method (:pr:`4114`) TakaakiFuruse + Make target directories in redirects if they don\'t exist (:pr:`4136`) Matthew Rocklin * Wed Oct 10 2018 Arun Persaud - update to version 0.19.4: * Array + Implement apply_gufunc(..., axes=..., keepdims=...) (:pr:`3985`) Markus Gonser * Bag + Fix typo in datasets.make_people (:pr:`4069`) Matthew Rocklin * Dataframe + Added percentiles options for dask.dataframe.describe method (:pr:`4067`) Zhenqing Li + Add DataFrame.partitions accessor similar to Array.blocks (:pr:`4066`) Matthew Rocklin * Core + Pass get functions and Clients through scheduler keyword (:pr:`4062`) Matthew Rocklin * Documentation + Fix Typo on hpc example. (missing = in kwarg). (:pr:`4068`) Matthias Bussonier + Extensive copy-editing: (:pr:`4065`), (:pr:`4064`), (:pr:`4063`) Miguel Farrajota * Mon Oct 08 2018 Arun Persaud - update to version 0.19.3: * Array + Make da.RandomState extensible to other modules (:pr:`4041`) Matthew Rocklin + Support unknown dims in ravel no-op case (:pr:`4055`) Jim Crist + Add basic infrastructure for cupy (:pr:`4019`) Matthew Rocklin + Avoid asarray and lock arguments for from_array(getitem`) (:pr:`4044`) Matthew Rocklin + Move local imports in corrcoef to global imports (:pr:`4030`) John A Kirkham + Move local indices import to global import (:pr:`4029`) John A Kirkham + Fix-up Dask Array\'s fromfunction w.r.t. dtype and kwargs (:pr:`4028`) John A Kirkham + Don\'t use dummy expansion for trim_internal in overlapped (:pr:`3964`) Mark Harfouche + Add unravel_index (:pr:`3958`) John A Kirkham * Bag + Sort result in Bag.frequencies (:pr:`4033`) Matthew Rocklin + Add support for npartitions=1 edge case in groupby (:pr:`4050`) James Bourbeau + Add new random dataset for people (:pr:`4018`) Matthew Rocklin + Improve performance of bag.read_text on small files (:pr:`4013`) Eric Wolak + Add bag.read_avro (:pr:`4000`) (:pr:`4007`) Martin Durant * Dataframe + Added an index parameter to :meth:`dask.dataframe.from_dask_array` for creating a dask DataFrame from a dask Array with a given index. (:pr:`3991`) Tom Augspurger + Improve sub-classability of dask dataframe (:pr:`4015`) Matthew Rocklin + Fix failing hdfs test [test-hdfs] (:pr:`4046`) Jim Crist + fuse_subgraphs works without normal fuse (:pr:`4042`) Jim Crist + Make path for reading many parquet files without prescan (:pr:`3978`) Martin Durant + Index in dd.from_dask_array (:pr:`3991`) Tom Augspurger + Making skiprows accept lists (:pr:`3975`) Julia Signell + Fail early in fastparquet read for nonexistent column (:pr:`3989`) Martin Durant * Core + Add support for npartitions=1 edge case in groupby (:pr:`4050`) James Bourbeau + Automatically wrap large arguments with dask.delayed in map_blocks/partitions (:pr:`4002`) Matthew Rocklin + Fuse linear chains of subgraphs (:pr:`3979`) Jim Crist + Make multiprocessing context configurable (:pr:`3763`) Itamar Turner-Trauring * Documentation + Extensive copy-editing (:pr:`4049`), (:pr:`4034`), (:pr:`4031`), (:pr:`4020`), (:pr:`4021`), (:pr:`4022`), (:pr:`4023`), (:pr:`4016`), (:pr:`4017`), (:pr:`4010`), (:pr:`3997`), (:pr:`3996`), Miguel Farrajota + Update shuffle method selection docs [skip ci] (:pr:`4048`) James Bourbeau + Remove docs/source/examples, point to examples.dask.org (:pr:`4014`) Matthew Rocklin + Replace readthedocs links with dask.org (:pr:`4008`) Matthew Rocklin + Updates DataFrame.to_hdf docstring for returned values [skip ci] (:pr:`3992`) James Bourbeau * Mon Sep 17 2018 Arun Persaud - update to version 0.19.2: * Array + apply_gufunc implements automatic infer of functions output dtypes (:pr:`3936`) Markus Gonser + Fix array histogram range error when array has nans (#3980) James Bourbeau + Issue 3937 follow up, int type checks. (#3956) Yu Feng + from_array: add AATTmartindurant\'s explaining of how hashing is done for an array. (#3965) Mark Harfouche + Support gradient with coordinate (#3949) Keisuke Fujii * Core + Fix use of has_keyword with partial in Python 2.7 (#3966) Mark Harfouche + Set pyarrow as default for HDFS (#3957) Matthew Rocklin * Documentation + Use dask_sphinx_theme (#3963) Matthew Rocklin + Use JupyterLab in Binder links from main page Matthew Rocklin + DOC: fixed sphinx syntax (#3960) Tom Augspurger * Sat Sep 08 2018 Arun Persaud - update to version 0.19.1: * Array + Don\'t enforce dtype if result has no dtype (:pr:`3928`) Matthew Rocklin + Fix NumPy issubtype deprecation warning (:pr:`3939`) Bruce Merry + Fix arg reduction tokens to be unique with different arguments (:pr:`3955`) Tobias de Jong + Coerce numpy integers to ints in slicing code (:pr:`3944`) Yu Feng + Linalg.norm ndim along axis partial fix (:pr:`3933`) Tobias de Jong * Dataframe + Deterministic DataFrame.set_index (:pr:`3867`) George Sakkis + Fix divisions in read_parquet when dealing with filters #3831 [#3930] (:pr:`3923`) (:pr:`3931`) AATTandrethrill + Fixing returning type in categorical.as_known (:pr:`3888`) Sriharsha Hatwar + Fix DataFrame.assign for callables (:pr:`3919`) Tom Augspurger + Include partitions with no width in repartition (:pr:`3941`) Matthew Rocklin + Don\'t constrict stage/k dtype in dataframe shuffle (:pr:`3942`) Matthew Rocklin * Documentation + DOC: Add hint on how to render task graphs horizontally (:pr:`3922`) Uwe Korn + Add try-now button to main landing page (:pr:`3924`) Matthew Rocklin * Sun Sep 02 2018 arunAATTgmx.de- specfile: * remove devel from noarch- update to version 0.19.0: * Array + Fix argtopk split_every bug (:pr:`3810`) Guido Imperiale + Ensure result computing dask.array.isnull(`) always gives a numpy array (:pr:`3825`) Stephan Hoyer + Support concatenate for scipy.sparse in dask array (:pr:`3836`) Matthew Rocklin + Fix argtopk on 32-bit systems. (:pr:`3823`) Elliott Sales de Andrade + Normalize keys in rechunk (:pr:`3820`) Matthew Rocklin + Allow shape of dask.array to be a numpy array (:pr:`3844`) Mark Harfouche + Fix numpy deprecation warning on tuple indexing (:pr:`3851`) Tobias de Jong + Rename ghost module to overlap (:pr:`3830`) `Robert Sare`_ + Re-add the ghost import to da __init__ (:pr:`3861`) Jim Crist + Ensure copy preserves masked arrays (:pr:`3852`) Tobias de Jong * DataFrame + Added dtype and sparse keywords to :func:`dask.dataframe.get_dummies` (:pr:`3792`) Tom Augspurger + Added :meth:`dask.dataframe.to_dask_array` for converting a Dask Series or DataFrame to a Dask Array, possibly with known chunk sizes (:pr:`3884`) Tom Augspurger + Changed the behavior for :meth:`dask.array.asarray` for dask dataframe and series inputs. Previously, the series was eagerly converted to an in-memory NumPy array before creating a dask array with known chunks sizes. This caused unexpectedly high memory usage. Now, no intermediate NumPy array is created, and a Dask array with unknown chunk sizes is returned (:pr:`3884`) Tom Augspurger + DataFrame.iloc (:pr:`3805`) Tom Augspurger + When reading multiple paths, expand globs. (:pr:`3828`) Irina Truong + Added index column name after resample (:pr:`3833`) Eric Bonfadini + Add (lazy) shape property to dataframe and series (:pr:`3212`) Henrique Ribeiro + Fix failing hdfs test [test-hdfs] (:pr:`3858`) Jim Crist + Fixes for pyarrow 0.10.0 release (:pr:`3860`) Jim Crist + Rename to_csv keys for diagnostics (:pr:`3890`) Matthew Rocklin + Match pandas warnings for concat sort (:pr:`3897`) Tom Augspurger + Include filename in read_csv (:pr:`3908`) Julia Signell * Core + Better error message on import when missing common dependencies (:pr:`3771`) Danilo Horta + Drop Python 3.4 support (:pr:`3840`) Jim Crist + Remove expired deprecation warnings (:pr:`3841`) Jim Crist + Add DASK_ROOT_CONFIG environment variable (:pr:`3849`) `Joe Hamman`_ + Don\'t cull in local scheduler, do cull in delayed (:pr:`3856`) Jim Crist + Increase conda download retries (:pr:`3857`) Jim Crist + Add python_requires and Trove classifiers (:pr:`3855`) AATThugovk + Fix collections.abc deprecation warnings in Python 3.7.0 (:pr:`3876`) Jan Margeta + Allow dot jpeg to xfail in visualize tests (:pr:`3896`) Matthew Rocklin + Add Python 3.7 to travis.yml (:pr:`3894`) Matthew Rocklin + Add expand_environment_variables to dask.config (:pr:`3893`) `Joe Hamman`_ * Docs + Fix typo in import statement of diagnostics (:pr:`3826`) John Mrziglod + Add link to YARN docs (:pr:`3838`) Jim Crist + fix of minor typos in landing page index.html (:pr:`3746`) Christoph Moehl + Update delayed-custom.rst (:pr:`3850`) Anderson Banihirwe + DOC: clarify delayed docstring (:pr:`3709`) Scott Sievert + Add new presentations (:pr:`3880`) AATTjavad94 + Add dask array normalize_chunks to documentation (:pr:`3878`) Daniel Rothenberg + Docs: Fix link to snakeviz (:pr:`3900`) Hans Moritz Günther + Add missing ` to docstring (:pr:`3915`) AATTrtobar- changes from version 0.18.2: * Array + Reimplemented argtopk to make it release the GIL (:pr:`3610`) Guido Imperiale + Don\'t overlap on non-overlapped dimensions in map_overlap (:pr:`3653`) Matthew Rocklin + Fix linalg.tsqr for dimensions of uncertain length (:pr:`3662`) Jeremy Chen + Break apart uneven array-of-int slicing to separate chunks (:pr:`3648`) Matthew Rocklin + Align auto chunks to provided chunks, rather than shape (:pr:`3679`) Matthew Rocklin + Adds endpoint and retstep support for linspace (:pr:`3675`) James Bourbeau + Implement .blocks accessor (:pr:`3689`) Matthew Rocklin + Add block_info keyword to map_blocks functions (:pr:`3686`) Matthew Rocklin + Slice by dask array of ints (:pr:`3407`) Guido Imperiale + Support dtype in arange (:pr:`3722`) Guido Imperiale + Fix argtopk with uneven chunks (:pr:`3720`) Guido Imperiale + Raise error when replace=False in da.choice (:pr:`3765`) James Bourbeau + Update chunks in Array.__setitem__ (:pr:`3767`) Itamar Turner-Trauring + Add a chunksize convenience property (:pr:`3777`) Jacob Tomlinson + Fix and simplify array slicing behavior when step < 0 (:pr:`3702`) Ziyao Wei + Ensure to_zarr with return_stored True returns a Dask Array (:pr:`3786`) John A Kirkham * Bag + Add last_endline optional parameter in to_textfiles (:pr:`3745`) George Sakkis * Dataframe + Add aggregate function for rolling objects (:pr:`3772`) Gerome Pistre + Properly tokenize cumulative groupby aggregations (:pr:`3799`) Cloves Almeida * Delayed + Add the AATT operator to the delayed objects (:pr:`3691`) Mark Harfouche + Add delayed best practices to documentation (:pr:`3737`) Matthew Rocklin + Fix AATTdelayed decorator for methods and add tests (:pr:`3757`) Ziyao Wei * Core + Fix extra progressbar (:pr:`3669`) Mike Neish + Allow tasks back onto ordering stack if they have one dependency (:pr:`3652`) Matthew Rocklin + Prefer end-tasks with low numbers of dependencies when ordering (:pr:`3588`) Tom Augspurger + Add assert_eq to top-level modules (:pr:`3726`) Matthew Rocklin + Test that dask collections can hold scipy.sparse arrays (:pr:`3738`) Matthew Rocklin + Fix setup of lz4 decompression functions (:pr:`3782`) Elliott Sales de Andrade + Add datasets module (:pr:`3780`) Matthew Rocklin * Sun Jun 24 2018 arunAATTgmx.de- update to version 0.18.1: * Array + from_array now supports scalar types and nested lists/tuples in input, just like all numpy functions do. It also produces a simpler graph when the input is a plain ndarray (:pr:`3556`) Guido Imperiale + Fix slicing of big arrays due to cumsum dtype bug (:pr:`3620`) Marco Rossi + Add Dask Array implementation of pad (:pr:`3578`) John A Kirkham + Fix array random API examples (:pr:`3625`) James Bourbeau + Add average function to dask array (:pr:`3640`) James Bourbeau + Tokenize ghost_internal with axes (:pr:`3643`) Matthew Rocklin + from_array: special handling for ndarray, list, and scalar types (:pr:`3568`) Guido Imperiale + Add outer for Dask Arrays (:pr:`3658`) John A Kirkham * DataFrame + Add Index.to_series method (:pr:`3613`) Henrique Ribeiro + Fix missing partition columns in pyarrow-parquet (:pr:`3636`) Martin Durant * Core + Minor tweaks to CI (:pr:`3629`) Guido Imperiale + Add back dask.utils.effective_get (:pr:`3642`) Matthew Rocklin + DASK_CONFIG dictates config write location (:pr:`3621`) Jim Crist + Replace \'collections\' key in unpack_collections with unique key (:pr:`3632`) Yu Feng + Avoid deepcopy in dask.config.set (:pr:`3649`) Matthew Rocklin- changes from version 0.18.0: * Array + Add to/read_zarr for Zarr-format datasets and arrays (:pr:`3460`) Martin Durant + Experimental addition of generalized ufunc support, apply_gufunc, gufunc, and as_gufunc (:pr:`3109`) (:pr:`3526`) (:pr:`3539`) Markus Gonser + Avoid unnecessary rechunking tasks (:pr:`3529`) Matthew Rocklin + Compute dtypes at runtime for fft (:pr:`3511`) Matthew Rocklin + Generate UUIDs for all da.store operations (:pr:`3540`) Martin Durant + Correct internal dimension of Dask\'s SVD (:pr:`3517`) John A Kirkham + BUG: do not raise IndexError for identity slice in array.vindex (:pr:`3559`) Scott Sievert + Adds isneginf and isposinf (:pr:`3581`) John A Kirkham + Drop Dask Array\'s learn module (:pr:`3580`) John A Kirkham + added sfqr (short-and-fat) as a counterpart to tsqr… (:pr:`3575`) Jeremy Chen + Allow 0-width chunks in dask.array.rechunk (:pr:`3591`) Marc Pfister + Document Dask Array\'s nan_to_num in public API (:pr:`3599`) John A Kirkham + Show block example (:pr:`3601`) John A Kirkham + Replace token= keyword with name= in map_blocks (:pr:`3597`) Matthew Rocklin + Disable locking in to_zarr (needed for using to_zarr in a distributed context) (:pr:`3607`) John A Kirkham + Support Zarr Arrays in to_zarr/from_zarr (:pr:`3561`) John A Kirkham + Added recursion to array/linalg/tsqr to better manage the single core bottleneck (:pr:`3586`) `Jeremy Chan`_ * Dataframe + Add to/read_json (:pr:`3494`) Martin Durant + Adds index to unsupported arguments for DataFrame.rename method (:pr:`3522`) James Bourbeau + Adds support to subset Dask DataFrame columns using numpy.ndarray, pandas.Series, and pandas.Index objects (:pr:`3536`) James Bourbeau + Raise error if meta columns do not match dataframe (:pr:`3485`) Christopher Ren + Add index to unsupprted argument for DataFrame.rename (:pr:`3522`) James Bourbeau + Adds support for subsetting DataFrames with pandas Index/Series and numpy ndarrays (:pr:`3536`) James Bourbeau + Dataframe sample method docstring fix (:pr:`3566`) James Bourbeau + fixes dd.read_json to infer file compression (:pr:`3594`) Matt Lee + Adds n to sample method (:pr:`3606`) James Bourbeau + Add fastparquet ParquetFile object support (:pr:`3573`) AATTandrethrill * Bag + Rename method= keyword to shuffle= in bag.groupby (:pr:`3470`) Matthew Rocklin * Core + Replace get= keyword with scheduler= keyword (:pr:`3448`) Matthew Rocklin + Add centralized dask.config module to handle configuration for all Dask subprojects (:pr:`3432`) (:pr:`3513`) (:pr:`3520`) Matthew Rocklin + Add dask-ssh CLI Options and Description. (:pr:`3476`) AATTbeomi + Read whole files fix regardless of header for HTTP (:pr:`3496`) Martin Durant + Adds synchronous scheduler syntax to debugging docs (:pr:`3509`) James Bourbeau + Replace dask.set_options with dask.config.set (:pr:`3502`) Matthew Rocklin + Update sphinx readthedocs-theme (:pr:`3516`) Matthew Rocklin + Introduce \"auto\" value for normalize_chunks (:pr:`3507`) Matthew Rocklin + Fix check in configuration with env=None (:pr:`3562`) Simon Perkins + Update sizeof definitions (:pr:`3582`) Matthew Rocklin + Remove --verbose flag from travis-ci (:pr:`3477`) Matthew Rocklin + Remove \"da.random\" from random array keys (:pr:`3604`) Matthew Rocklin * Mon May 21 2018 arunAATTgmx.de- update to version 0.17.5: * Compatibility with pandas 0.23.0 (:pr:`3499`) Tom Augspurger * Sun May 06 2018 arunAATTgmx.de- update to version 0.17.4: * Dataframe + Add support for indexing Dask DataFrames with string subclasses (:pr:`3461`) James Bourbeau + Allow using both sorted_index and chunksize in read_hdf (:pr:`3463`) Pierre Bartet + Pass filesystem to arrow piece reader (:pr:`3466`) Martin Durant + Switches to using dask.compat string_types (#3462) James Bourbeau- changes from version 0.17.3: * Array + Add einsum for Dask Arrays (:pr:`3412`) Simon Perkins + Add piecewise for Dask Arrays (:pr:`3350`) John A Kirkham + Fix handling of nan in broadcast_shapes (:pr:`3356`) John A Kirkham + Add isin for dask arrays (:pr:`3363`). Stephan Hoyer + Overhauled topk for Dask Arrays: faster algorithm, particularly for large k\'s; added support for multiple axes, recursive aggregation, and an option to pick the bottom k elements instead. (:pr:`3395`) Guido Imperiale + The topk API has changed from topk(k, array) to the more conventional topk(array, k). The legacy API still works but is now deprecated. (:pr:`2965`) Guido Imperiale + New function argtopk for Dask Arrays (:pr:`3396`) Guido Imperiale + Fix handling partial depth and boundary in map_overlap (:pr:`3445`) John A Kirkham + Add gradient for Dask Arrays (:pr:`3434`) John A Kirkham * DataFrame + Allow t as shorthand for table in to_hdf for pandas compatibility (:pr:`3330`) Jörg Dietrich + Added top level isna method for Dask DataFrames (:pr:`3294`) Christopher Ren + Fix selection on partition column on read_parquet for engine=\"pyarrow\" (:pr:`3207`) Uwe Korn + Added DataFrame.squeeze method (:pr:`3366`) Christopher Ren + Added infer_divisions option to read_parquet to specify whether read engines should compute divisions (:pr:`3387`) Jon Mease + Added support for inferring division for engine=\"pyarrow\" (:pr:`3387`) Jon Mease + Provide more informative error message for meta= errors (:pr:`3343`) Matthew Rocklin + add orc reader (:pr:`3284`) Martin Durant + Default compression for parquet now always Snappy, in line with pandas (:pr:`3373`) Martin Durant + Fixed bug in Dask DataFrame and Series comparisons with NumPy scalars (:pr:`3436`) James Bourbeau + Remove outdated requirement from repartition docstring (:pr:`3440`) Jörg Dietrich + Fixed bug in aggregation when only a Series is selected (:pr:`3446`) Jörg Dietrich + Add default values to make_timeseries (:pr:`3421`) Matthew Rocklin * Core + Support traversing collections in persist, visualize, and optimize (:pr:`3410`) Jim Crist + Add schedule= keyword to compute and persist. This replaces common use of the get= keyword (:pr:`3448`) Matthew Rocklin * Sat Mar 24 2018 arunAATTgmx.de- update to version 0.17.2: * Array + Add broadcast_arrays for Dask Arrays (:pr:`3217`) John A Kirkham + Add bitwise_ * ufuncs (:pr:`3219`) John A Kirkham + Add optional axis argument to squeeze (:pr:`3261`) John A Kirkham + Validate inputs to atop (:pr:`3307`) Matthew Rocklin + Avoid calls to astype in concatenate if all parts have the same dtype (:pr:`3301`) `Martin Durant`_ * DataFrame + Fixed bug in shuffle due to aggressive truncation (:pr:`3201`) Matthew Rocklin + Support specifying categorical columns on read_parquet with categories=[…] for engine=\"pyarrow\" (:pr:`3177`) Uwe Korn + Add dd.tseries.Resampler.agg (:pr:`3202`) Richard Postelnik + Support operations that mix dataframes and arrays (:pr:`3230`) Matthew Rocklin + Support extra Scalar and Delayed args in dd.groupby._Groupby.apply (:pr:`3256`) Gabriele Lanaro * Bag + Support joining against single-partitioned bags and delayed objects (:pr:`3254`) Matthew Rocklin * Core + Fixed bug when using unexpected but hashable types for keys (:pr:`3238`) Daniel Collins + Fix bug in task ordering so that we break ties consistently with the key name (:pr:`3271`) Matthew Rocklin + Avoid sorting tasks in order when the number of tasks is very large (:pr:`3298`) Matthew Rocklin * Fri Mar 02 2018 sebix+novell.comAATTsebix.at- correctly package bytecode- use %license macro * Fri Feb 23 2018 arunAATTgmx.de- update to version 0.17.1: * Array + Corrected dimension chunking in indices (:issue:`3166`, :pr:`3167`) Simon Perkins + Inline store_chunk calls for store\'s return_stored option (:pr:`3153`) John A Kirkham + Compatibility with struct dtypes for NumPy 1.14.1 release (:pr:`3187`) Matthew Rocklin * DataFrame + Bugfix to allow column assignment of pandas datetimes(:pr:`3164`) Max Epstein * Core + New file-system for HTTP(S), allowing direct loading from specific URLs (:pr:`3160`) `Martin Durant`_ + Fix bug when tokenizing partials with no keywords (:pr:`3191`) Matthew Rocklin + Use more recent LZ4 API (:pr:`3157`) `Thrasibule`_ + Introduce output stream parameter for progress bar (:pr:`3185`) `Dieter Weber`_ * Sat Feb 10 2018 arunAATTgmx.de- update to version 0.17.0: * Array + Added a support object-type arrays for nansum, nanmin, and nanmax (:issue:`3133`) Keisuke Fujii + Update error handling when len is called with empty chunks (:issue:`3058`) Xander Johnson + Fixes a metadata bug with store\'s return_stored option (:pr:`3064`) John A Kirkham + Fix a bug in optimization.fuse_slice to properly handle when first input is None (:pr:`3076`) James Bourbeau + Support arrays with unknown chunk sizes in percentile (:pr:`3107`) Matthew Rocklin + Tokenize scipy.sparse arrays and np.matrix (:pr:`3060`) Roman Yurchak * DataFrame + Support month timedeltas in repartition(freq=...) (:pr:`3110`) Matthew Rocklin + Avoid mutation in dataframe groupby tests (:pr:`3118`) Matthew Rocklin + read_csv, read_table, and read_parquet accept iterables of paths (:pr:`3124`) Jim Crist + Deprecates the dd.to_delayed function in favor of the existing method (:pr:`3126`) Jim Crist + Return dask.arrays from df.map_partitions calls when the UDF returns a numpy array (:pr:`3147`) Matthew Rocklin + Change handling of columns and index in dd.read_parquet to be more consistent, especially in handling of multi-indices (:pr:`3149`) Jim Crist + fastparquet append=True allowed to create new dataset (:pr:`3097`) `Martin Durant`_ + dtype rationalization for sql queries (:pr:`3100`) `Martin Durant`_ * Bag + Document bag.map_paritions function may recieve either a list or generator. (:pr:`3150`) Nir * Core + Change default task ordering to prefer nodes with few dependents and then many downstream dependencies (:pr:`3056`) Matthew Rocklin + Add color= option to visualize to color by task order (:pr:`3057`) (:pr:`3122`) Matthew Rocklin + Deprecate dask.bytes.open_text_files (:pr:`3077`) Jim Crist + Remove short-circuit hdfs reads handling due to maintenance costs. May be re-added in a more robust manner later (:pr:`3079`) Jim Crist + Add dask.base.optimize for optimizing multiple collections without computing. (:pr:`3071`) Jim Crist + Rename dask.optimize module to dask.optimization (:pr:`3071`) Jim Crist + Change task ordering to do a full traversal (:pr:`3066`) Matthew Rocklin + Adds an optimize_graph keyword to all to_delayed methods to allow controlling whether optimizations occur on conversion. (:pr:`3126`) Jim Crist + Support using pyarrow for hdfs integration (:pr:`3123`) Jim Crist + Move HDFS integration and tests into dask repo (:pr:`3083`) Jim Crist + Remove write_bytes (:pr:`3116`) Jim Crist * Thu Jan 11 2018 arunAATTgmx.de- specfile: * update copyright year- update to version 0.16.1: * Array + Fix handling of scalar percentile values in \"percentile\" (:pr:`3021`) `James Bourbeau`_ + Prevent \"bool()\" coercion from calling compute (:pr:`2958`) `Albert DeFusco`_ + Add \"matmul\" (:pr:`2904`) `John A Kirkham`_ + Support N-D arrays with \"matmul\" (:pr:`2909`) `John A Kirkham`_ + Add \"vdot\" (:pr:`2910`) `John A Kirkham`_ + Explicit \"chunks\" argument for \"broadcast_to\" (:pr:`2943`) `Stephan Hoyer`_ + Add \"meshgrid\" (:pr:`2938`) `John A Kirkham`_ and (:pr:`3001`) `Markus Gonser`_ + Preserve singleton chunks in \"fftshift\"/\"ifftshift\" (:pr:`2733`) `John A Kirkham`_ + Fix handling of negative indexes in \"vindex\" and raise errors for out of bounds indexes (:pr:`2967`) `Stephan Hoyer`_ + Add \"flip\", \"flipud\", \"fliplr\" (:pr:`2954`) `John A Kirkham`_ + Add \"float_power\" ufunc (:pr:`2962`) (:pr:`2969`) `John A Kirkham`_ + Compatability for changes to structured arrays in the upcoming NumPy 1.14 release (:pr:`2964`) `Tom Augspurger`_ + Add \"block\" (:pr:`2650`) `John A Kirkham`_ + Add \"frompyfunc\" (:pr:`3030`) `Jim Crist`_ * DataFrame + Fixed naming bug in cumulative aggregations (:issue:`3037`) `Martijn Arts`_ + Fixed \"dd.read_csv\" when \"names\" is given but \"header\" is not set to \"None\" (:issue:`2976`) `Martijn Arts`_ + Fixed \"dd.read_csv\" so that passing instances of \"CategoricalDtype\" in \"dtype\" will result in known categoricals (:pr:`2997`) `Tom Augspurger`_ + Prevent \"bool()\" coercion from calling compute (:pr:`2958`) `Albert DeFusco`_ + \"DataFrame.read_sql()\" (:pr:`2928`) to an empty database tables returns an empty dask dataframe `Apostolos Vlachopoulos`_ + Compatability for reading Parquet files written by PyArrow 0.8.0 (:pr:`2973`) `Tom Augspurger`_ + Correctly handle the column name (`df.columns.name`) when reading in \"dd.read_parquet\" (:pr:2973`) `Tom Augspurger`_ + Fixed \"dd.concat\" losing the index dtype when the data contained a categorical (:issue:`2932`) `Tom Augspurger`_ + Add \"dd.Series.rename\" (:pr:`3027`) `Jim Crist`_ + \"DataFrame.merge()\" (:pr:`2960`) now supports merging on a combination of columns and the index `Jon Mease`_ + Removed the deprecated \"dd.rolling *\" methods, in preperation for their removal in the next pandas release (:pr:`2995`) `Tom Augspurger`_ + Fix metadata inference bug in which single-partition series were mistakenly special cased (:pr:`3035`) `Jim Crist`_ + Add support for \"Series.str.cat\" (:pr:`3028`) `Jim Crist`_ * Core + Improve 32-bit compatibility (:pr:`2937`) `Matthew Rocklin`_ + Change task prioritization to avoid upwards branching (:pr:`3017`) `Matthew Rocklin`_ * Sun Nov 19 2017 arunAATTgmx.de- update to version 0.16.0: * Fix install of fastparquet on travis (#2897) * Fix port for bokeh dashboard (#2889) * fix hdfs3 version * Modify hdfs import to point to hdfs3 (#2894) * Explicitly pass in pyarrow filesystem for parquet (#2881) * COMPAT: Ensure lists for multiple groupby keys (#2892) * Avoid list index error in repartition_freq (#2873) * Finish moving `infer_storage_options` (#2886) * Support arrow in `to_parquet`. Several other parquet cleanups. (#2868) * Bugfix: Filesystem object not passed to pyarrow reader (#2527) * Fix py34 build * Fixup s3 tests (#2875) * Close resource profiler process on __exit__ (#2871) * Add changelog for to_parquet changes. [ci skip] * A few parquet cleanups (#2867) * Fixed fillna with Series (#2810) * Error nicely on parse dates failure in read_csv (#2863) * Fix empty dataframe partitioning for numpy 1.10.4 (#2862) * Test `unique`\'s inverse mapping\'s shape (#2857) * Move `thread_state` out of the top namespace (#2858) * Explain unique\'s steps (#2856) * fix and test for issue #2811 (#2818) * Minor tweaks to `_unique_internal` optional result handling (#2855) * Update dask interface during XArray integration (#2847) * Remove unnecessary map_partitions in aggregate (#2712) * Simplify `_unique_internal` (#2850) * Add more tests for read_parquet(engine=\'pyarrow\') (#2822) * Do not raise exception when calling set_index on empty dataframe [#2819] (#2827) * Test unique on more data (#2846) * Do not except on set_index on text column with empty partitions [#2820] (#2831) * Compat for bokeh 0.12.10 (#2844) * Support `return_ *` arguments with `unique` (#2779) * Fix installing of pandas dev (#2838) * Squash a few warnings in dask.array (#2833) * Array optimizations don\'t elide some getter calls (#2826) * test against pandas rc (#2814) * df.astype(categorical_dtype) -> known categoricals (#2835) * Fix cloudpickle test (#2836) * BUG: Quantile with missing data (#2791) * API: remove dask.async (#2828) * Adds comma to flake8 section in setup.cfg (#2817) * Adds asarray and asanyarray to the dask.array public API (#2787) * flake8 now checks bare excepts (#2816) * CI: Update for new flake8 / pycodestyle (#2808) * Fix concat series bug (#2800) * Typo in the docstring of read_parquet\'s filters param (#2806) * Docs update (#2803) * minor doc changes in bag.core (#2797) * da.random.choice works with array args (#2781) * Support broadcasting 0-length dimensions (#2784) * ResourceProfiler plot works with single point (#2778) * Implement Dask Array\'s unique to be lazy (#2775) * Dask Collection Interface * Reduce test memory usage (#2782) * Deprecate vnorm (#2773) * add auto-import of gcsfs (#2776) * Add allclose (#2771) * Remove `random.different_seeds` from API docs (#2772) * Follow-up for atleast_nd (#2765) * Use get_worker().client.get if available (#2762) * Link PR for \"Allow tuples as sharedict keys\" (#2766) * Allow tuples as sharedict keys (#2763) * update docs to use flatten vs concat (#2764) * Add atleast_nd functions (#2760) * Consolidate changelog for 0.15.4 (#2759) * Add changelog template for future date (#2758) * Mon Oct 30 2017 arunAATTgmx.de- update to version 0.15.4: * Drop s3fs requirement (#2750) * Support -1 as an alias for dimension size in chunks (#2749) * Handle zero dimension when rechunking (#2747) * Pandas 0.21 compatability (#2737) * API: Add `.str` accessor for Categorical with object dtype (#2743) * Fix install failures * Reduce memory usage * A few test cleanups * Fix #2720 (#2729) * Pass on file_scheme to fastparquet (#2714) * Support indexing with np.int (#2719) * Tree reduction support for dask.bag.Bag.foldby (#2710) * Update link to IPython parallel docs (#2715) * Call mkdir from correct namespace in array.to_npy_stack. (#2709) * add int96 times to parquet writer (#2711) * Sun Sep 24 2017 arunAATTgmx.de- update to version 0.15.3: * add .github/PULL_REQUEST_TEMPLATE.md file * Make `y` optional in dask.array.learn (#2701) * Add apply_over_axes (#2702) * Use apply_along_axis name in Dask (#2704) * Tweak apply_along_axis\'s pre-NumPy 1.13.0 error (#2703) * Add apply_along_axis (#2698) * Use travis conditional builds (#2697) * Skip days in daily_stock that have nan values (#2693) * TST: Have array assert_eq check scalars (#2681) * Add schema keyword to read_sql (#2582) * Only install pytest-runner if needed (#2692) * Remove resize tool from bokeh plots (#2688) * Add ptp (#2691) * Catch warning from numpy in subs (#2457) * Publish Series methods in dataframe api (#2686) * Fix norm keepdims (#2683) * Dask array slicing with boolean arrays (#2658) * repartition works with mixed categoricals (#2676) * Merge pull request #2667 from martindurant/parquet_file_schema * Fix for parquet file schemes * Optional axis argument for cumulative functions (#2664) * Remove partial_by_order * Support literals in atop * [ci skip] Add flake8 note in developer doc page (#2662) * Add filenames return for ddf.to_csv and bag.to_textfiles as they both… (#2655) * CLN: Remove redundant code, fix typos (#2652) * [docs] company name change from Continuum to Anaconda (#2660) * Fix what hapend when combining partition_on and append in to_parquet (#2645) * WIP: Add user defined aggregations (#2344) * [docs] new cheatsheet (#2649) * Masked arrays (#2301) * Indexing with an unsigned integer array (#2647) * ENH: Allow the groupby by param to handle columns and index levels (#2636) * update copyright date (#2642) * python setup.py test runs py.test (#2641) * Avoid using operator.itemgetter in dask.dataframe (#2638) * Add ` *_like` array creation functions (#2640) * Consistent slicing names (#2601) * Replace Continuum Analytics with Anaconda Inc. (#2631) * Implement Series.str[index] (#2634) * Support complex data with vnorm (#2621)- changes from version 0.15.2: * BUG: setitem should update divisions (#2622) * Allow dataframe.loc with numpy array (#2615) * Add link to Stack Overflow\'s mcve docpage to support docs (#2612) * Improve dtype inference and reflection (#2571) * Add ediff1d (#2609) * Optimize concatenate on singleton sequences (#2610) * Add diff (#2607) * Document norm in Dask Array API (#2605) * Add norm (#2597) * Don\'t check for memory leaks in distributed tests (#2603) * Include computed collection within sharedict in delayed (#2583) * Reorg array (#2595) * Remove `expand` parameter from df.str.split (#2593) * Normalize `meta` on call to `dd.from_delayed` (#2591) * Remove bare `except:` blocks and test that none exist. (#2590) * Adds choose method to dask.array.Array (#2584) * Generalize vindex in dask.array (#2573) * Clear `_cached_keys` on name change in dask.array (#2572) * Don\'t render None for unknown divisions (#2570) * Add missing initialization to CacheProfiler (#2550) * Add argwhere, *nonzero, where (cond) (#2539) * Fix indices error message (#2565) * Fix and secure some references (#2563) * Allows for read_hdf to accept an iterable of files (#2547) * Allow split on rechunk on first pass (#2560) * Improvements to dask.array.where (#2549) * Adds isin method to dask.dataframe.DataFrame (#2558) * Support dask array conditional in compress (#2555) * Clarify ResourceProfiler docstring [ci skip] (#2553) * In compress, use Dask to expand condition array (#2545) * Support compress with axis as None (#2541) * df.idxmax/df.idxmin work with empty partitions (#2542) * FIX typo in accumulate docstring (#2552) * da.where works with non-bool condition (#2543) * da.repeat works with negative axis (#2544) * Check metadata in `dd.from_delayed` (#2534) * TST: clean up test directories in shuffle (#2535) * Do no attemp to compute divisions on empty dataframe. (#2529) * Remove deprecated bag behavior (#2525) * Updates read_hdf docstring (#2518) * Add dd.to_timedelta (#2523) * Better error message for read_csv (#2522) * Remove spurious keys from map_overlap graph (#2520) * Do not compare x.dim with None in array. (#1847) * Support concat for categorical MultiIndex (#2514) * Support for callables in df.assign (#2513) * Thu May 04 2017 toddrme2178AATTgmail.com- Implement single-spec version- Update source URL.- Split classes into own subpackages to lighten base dependencies.- Update to version 0.15.1 * Add storage_options to to_textfiles and to_csv (:pr:`2466`) * Rechunk and simplify rfftfreq (:pr:`2473`), (:pr:`2475`) * Better support ndarray subclasses (:pr:`2486`) * Import star in dask.distributed (:pr:`2503`) * Threadsafe cache handling with tokenization (:pr:`2511`)- Update to version 0.15.0 + Array * Add dask.array.stats submodule (:pr:`2269`) * Support ``ufunc.outer`` (:pr:`2345`) * Optimize fancy indexing by reducing graph overhead (:pr:`2333`) (:pr:`2394`) * Faster array tokenization using alternative hashes (:pr:`2377`) * Added the matmul ``AATT`` operator (:pr:`2349`) * Improved coverage of the ``numpy.fft`` module (:pr:`2320`) (:pr:`2322`) (:pr:`2327`) (:pr:`2323`) * Support NumPy\'s ``__array_ufunc__`` protocol (:pr:`2438`) + Bag * Fix bug where reductions on bags with no partitions would fail (:pr:`2324`) * Add broadcasting and variadic ``db.map`` top-level function. Also remove auto-expansion of tuples as map arguments (:pr:`2339`) * Rename ``Bag.concat`` to ``Bag.flatten`` (:pr:`2402`) + DataFrame * Parquet improvements (:pr:`2277`) (:pr:`2422`) + Core * Move dask.async module to dask.local (:pr:`2318`) * Support callbacks with nested scheduler calls (:pr:`2397`) * Support pathlib.Path objects as uris (:pr:`2310`)- Update to version 0.14.3 + DataFrame * Pandas 0.20.0 support- Update to version 0.14.2 + Array * Add da.indices (:pr:`2268`), da.tile (:pr:`2153`), da.roll (:pr:`2135`) * Simultaneously support drop_axis and new_axis in da.map_blocks (:pr:`2264`) * Rechunk and concatenate work with unknown chunksizes (:pr:`2235`) and (:pr:`2251`) * Support non-numpy container arrays, notably sparse arrays (:pr:`2234`) * Tensordot contracts over multiple axes (:pr:`2186`) * Allow delayed targets in da.store (:pr:`2181`) * Support interactions against lists and tuples (:pr:`2148`) * Constructor plugins for debugging (:pr:`2142`) * Multi-dimensional FFTs (single chunk) (:pr:`2116`) + Bag * to_dataframe enforces consistent types (:pr:`2199`) + DataFrame * Set_index always fully sorts the index (:pr:`2290`) * Support compatibility with pandas 0.20.0 (:pr:`2249`), (:pr:`2248`), and (:pr:`2246`) * Support Arrow Parquet reader (:pr:`2223`) * Time-based rolling windows (:pr:`2198`) * Repartition can now create more partitions, not just less (:pr:`2168`) + Core * Always use absolute paths when on POSIX file system (:pr:`2263`) * Support user provided graph optimizations (:pr:`2219`) * Refactor path handling (:pr:`2207`) * Improve fusion performance (:pr:`2129`), (:pr:`2131`), and (:pr:`2112`)- Update to version 0.14.1 + Array * Micro-optimize optimizations (:pr:`2058`) * Change slicing optimizations to avoid fusing raw numpy arrays (:pr:`2075`) (:pr:`2080`) * Dask.array operations now work on numpy arrays (:pr:`2079`) * Reshape now works in a much broader set of cases (:pr:`2089`) * Support deepcopy python protocol (:pr:`2090`) * Allow user-provided FFT implementations in ``da.fft`` (:pr:`2093`) + Bag + DataFrame * Fix to_parquet with empty partitions (:pr:`2020`) * Optional ``npartitions=\'auto\'`` mode in ``set_index`` (:pr:`2025`) * Optimize shuffle performance (:pr:`2032`) * Support efficient repartitioning along time windows like ``repartition(freq=\'12h\')`` (:pr:`2059`) * Improve speed of categorize (:pr:`2010`) * Support single-row dataframe arithmetic (:pr:`2085`) * Automatically avoid shuffle when setting index with a sorted column (:pr:`2091`) * Improve handling of integer-na handling in read_csv (:pr:`2098`) + Delayed * Repeated attribute access on delayed objects uses the same key (:pr:`2084`) + Core * Improve naming of nodes in dot visuals to avoid generic ``apply`` (:pr:`2070`) * Ensure that worker processes have different random seeds (:pr:`2094`)- Update to version 0.14.0 + Array * Fix corner cases with zero shape and misaligned values in ``arange`` * Improve concatenation efficiency (:pr:`1923`) * Avoid hashing in ``from_array`` if name is provided (:pr:`1972`) + Bag * Repartition can now increase number of partitions (:pr:`1934`) * Fix bugs in some reductions with empty partitions (:pr:`1939`), (:pr:`1950`), (:pr:`1953`) + DataFrame * Support non-uniform categoricals (:pr:`1877`), (:pr:`1930`) * Groupby cumulative reductions (:pr:`1909`) * DataFrame.loc indexing now supports lists (:pr:`1913`) * Improve multi-level groupbys (:pr:`1914`) * Improved HTML and string repr for DataFrames (:pr:`1637`) * Parquet append (:pr:`1940`) * Add ``dd.demo.daily_stock`` function for teaching (:pr:`1992`) + Delayed * Add ``traverse=`` keyword to delayed to optionally avoid traversing nested data structures (:pr:`1899`) * Support Futures in from_delayed functions (:pr:`1961`) * Improve serialization of decorated delayed functions (:pr:`1969`) + Core * Improve windows path parsing in corner cases (:pr:`1910`) * Rename tasks when fusing (:pr:`1919`) * Add top level ``persist`` function (:pr:`1927`) * Propagate ``errors=`` keyword in byte handling (:pr:`1954`) * Dask.compute traverses Python collections (:pr:`1975`) * Structural sharing between graphs in dask.array and dask.delayed (:pr:`1985`)- Update to version 0.13.0 + Array * Mandatory dtypes on dask.array. All operations maintain dtype information and UDF functions like map_blocks now require a dtype= keyword if it can not be inferred. (:pr:`1755`) * Support arrays without known shapes, such as arises when slicing arrays with arrays or converting dataframes to arrays (:pr:`1838`) * Support mutation by setting one array with another (:pr:`1840`) * Tree reductions for covariance and correlations. (:pr:`1758`) * Add SerializableLock for better use with distributed scheduling (:pr:`1766`) * Improved atop support (:pr:`1800`) * Rechunk optimization (:pr:`1737`), (:pr:`1827`) + Bag * Avoid wrong results when recomputing the same groupby twice (:pr:`1867`) + DataFrame * Add ``map_overlap`` for custom rolling operations (:pr:`1769`) * Add ``shift`` (:pr:`1773`) * Add Parquet support (:pr:`1782`) (:pr:`1792`) (:pr:`1810`), (:pr:`1843`), (:pr:`1859`), (:pr:`1863`) * Add missing methods combine, abs, autocorr, sem, nsmallest, first, last, prod, (:pr:`1787`) * Approximate nunique (:pr:`1807`), (:pr:`1824`) * Reductions with multiple output partitions (for operations like drop_duplicates) (:pr:`1808`), (:pr:`1823`) (:pr:`1828`) * Add delitem and copy to DataFrames, increasing mutation support (:pr:`1858`) + Delayed * Changed behaviour for ``delayed(nout=0)`` and ``delayed(nout=1)``: ``delayed(nout=1)`` does not default to ``out=None`` anymore, and ``delayed(nout=0)`` is also enabled. I.e. functions with return tuples of length 1 or 0 can be handled correctly. This is especially handy, if functions with a variable amount of outputs are wrapped by ``delayed``. E.g. a trivial example: ``delayed(lambda *args: args, nout=len(vals))( *vals)`` + Core * Refactor core byte ingest (:pr:`1768`), (:pr:`1774`) * Improve import time (:pr:`1833`)- update to version 0.12.0: * update changelog (#1757) * Avoids spurious warning message in concatenate (#1752) * CLN: cleanup dd.multi (#1728) * ENH: da.ufuncs now supports DataFrame/Series (#1669) * Faster array slicing (#1731) * Avoid calling list on partitions (#1747) * Fix slicing error with None and ints (#1743) * Add da.repeat (#1702) * ENH: add dd.DataFrame.resample (#1741) * Unify column names in dd.read_csv (#1740) * replace empty with random in test to avoid nans * Update diagnostics plots (#1736) * Allow atop to change chunk shape (#1716) * ENH: DataFrame.loc now supports 2d indexing (#1726) * Correct shape when indexing with Ellipsis and None * ENH: Add DataFrame.pivot_table (#1729) * CLN: cleanup DataFrame class handling (#1727) * ENH: Add DataFrame.combine_first (#1725) * ENH: Add DataFrame all/any (#1724) * micro-optimize _deps (#1722) * A few small tweaks to da.Array.astype (#1721) * BUG: Fixed metadata lookup failure in Accessor (#1706) * Support auto-rechunking in stack and concatenate (#1717) * Forward `get` kwarg in df.to_csv (#1715) * Add rename support for multi-level columns (#1712) * Update paid support section * Add `drop` to reset_index (#1711) * Cull dask.arrays on slicing (#1709) * Update dd.read_ * functions in docs * WIP: Feature/dataframe aggregate (implements #1619) (#1678) * Add da.round (#1708) * Executor -> Client * Add support of getitem for multilevel columns (#1697) * Prepend optimization keywords with name of optimization (#1690) * Add dd.read_table (#1682) * Fix dd.pivot_table dtype to be deterministic (#1693) * da.random with state is consistent across sizes (#1687) * Remove `raises`, use pytest.raises instead (#1679) * Remove unnecessary calls to list (#1681) * Dataframe tree reductions (#1663) * Add global optimizations to compute (#1675) * TST: rename dataframe eq to assert_eq (#1674) * ENH: Add DataFrame/Series.align (#1668) * CLN: dataframe.io (#1664) * ENH: Add DataFrame/Series clip_xxx (#1667) * Clear divisions on single_partitions_merge (#1666) * ENH: add dd.pivot_table (#1665) * Typo in `use-cases`? (#1670) * add distributed follow link doc page * Dataframe elemwise (#1660) * Windows file and endline test handling (#1661) * remove old badges * Fix #1656: failures when parallel testing (#1657) * Remove use of multiprocessing.Manager (#1652) (#1653) * A few fixes for `map_blocks` (#1654) * Automatically expand chunking in atop (#1644) * Add AppVeyor configuration (#1648) * TST: move flake8 to travis script (#1655) * CLN: Remove unused funcs (#1638) * Implementing .size and groupby size method (#1627) (#1649) * Use strides, shape, and offset in memmap tokenize (#1646) * Validate scalar metadata is scalar (#1642) * Convert readthedocs links for their .org -> .io migration for hosted projects (#1639) * CLN: little cleanup of dd.categorical (#1635) * Signature of Array.transpose matches numpy (#1632) * Error nicely when indexing Array with Array (#1629) * ENH: add DataFrame.get_xtype_counts (#1634) * PEP8: some fixes (#1633)- changes from version 0.11.1: * support uniform index partitions in set_index(sorted) (#1626) * Groupby works with multiprocessing (#1625) * Use a nonempty index in _maybe_partial_time_string * Fix segfault in groupby-var * Support Pandas 0.19.0 * Deprecations (#1624) * work-around for ddf.info() failing because of https://github.com/pydata/pandas/issues/14368 (#1623) * .str accessor needs to pass thru both args & kwargs (#1621) * Ensure dtype is provided in additional tests (#1620) * coerce rounded numbers to int in dask.array.ghost (#1618) * Use assert_eq everywhere in dask.array tests (#1617) * Update documentation (#1606) * Support new_axes= keyword in atop (#1612) * pass through node_attr and edge_attr in dot_graph (#1614) * Add swapaxes to dask array (#1611) * add clip to Array (#1610) * Add atop(concatenate=False) keyword argument (#1609) * Better error message on metadata inference failure (#1598) * ENH/API: Enhanced Categorical Accessor (#1574) * PEP8: dataframe fix except E127,E402,E501,E731 (#1601) * ENH: dd.get_dummies for categorical Series (#1602) * PEP8: some fixes (#1605) * Fix da.learn tests for scikit-learn release (#1597) * Suppress warnings in psutil (#1589) * avoid more timeseries warnings (#1586) * Support inplace operators in dataframe (#1585) * Squash warnings in resample (#1583) * expand imports for dask.distributed (#1580) * Add indicator keyword to dd.merge (#1575) * Error loudly if `nrows` used in read_csv (#1576) * Add versioneer (#1569) * Strengthen statement about gitter for developers in docs * Raise IndexError on out of bounds slice. (#1579) * ENH: Support Series in read_hdf (#1577) * COMPAT/API: DataFrame.categorize missing values (#1578) * Add `pipe` method to dask.dataframe (#1567) * Sample from `read_bytes` ends on a delimiter (#1571) * Remove mention of bag join in docs (#1568) * Tokenize mmap works without filename (#1570) * String accessor works with indexes (#1561) * corrected links to documentation from Examples (#1557) * Use conda-forge channel in travis (#1559) * add s3fs to travis.yml (#1558) * ENH: DataFrame.select_dtypes (#1556) * Improve slicing performance (#1539) * Check meta in `__init__` of _Frame * Fix metadata in Series.getitem * A few changes to `dask.delayed` (#1542) * Fixed read_hdf example (#1544) * add section on distributed computing with link to toc * Fix spelling (#1535) * Only fuse simple indexing with getarray backends (#1529) * Deemphasize graphs in docs (#1531) * Avoid pickle when tokenizing __main__ functions (#1527) * Add changelog doc going up to dask 0.6.1 (2015-07-23). (#1526) * update dataframe docs * update index * Update to highlight the use of glob based file naming option for df exports (#1525) * Add custom docstring to dd.to_csv, mentioning that one file per partition is written (#1524) * Run slow tests in Travis for all Python versions, even if coverage check is disabled. (#1523) * Unify example doc pages into one (#1520) * Remove lambda/inner functions in dask.dataframe (#1516) * Add documentation for dataframe metadata (#1514) * \"dd.map_partitions\" works with scalar outputs (#1515) * meta_nonempty returns types of correct size (#1513) * add memory use note to tsqr docstring * Fix slow consistent keyname test (#1510) * Chunks check (#1504) * Fix last \'line\' in sample; prevents open quotes. (#1495) * Create new threadpool when operating from thread (#1487) * Add finalize- prefix to dask.delayed collections * Move key-split from distributed to dask * State that delayed values should be lists in bag.from_delayed (#1490) * Use lists in db.from_sequence (#1491) * Implement user defined aggregations (#1483) * Field access works with non-scalar fields (#1484)- Update to 0.11.0 * DataFrames now enforce knowing full metadata (columns, dtypes) everywhere. Previously we would operate in an ambiguous state when functions lost dtype information (such as apply). Now all dataframes always know their dtypes and raise errors asking for information if they are unable to infer (which they usually can). Some internal attributes like _pd and _pd_nonempty have been moved. * The internals of the distributed scheduler have been refactored to transition tasks between explicit states. This improves resilience, reasoning about scheduling, plugin operation, and logging. It also makes the scheduler code easier to understand for newcomers. * Breaking Changes + The distributed.s3 and distributed.hdfs namespaces are gone. Use protocols in normal methods like read_text(\'s3://...\' instead. + Dask.array.reshape now errs in some cases where previously it would have create a very large number of tasks- update to version 0.10.2: * raise informative error on merge(on=frame) * Fix crash with -OO Python command line (#1388) * [WIP] Read hdf partitioned (#1407) * Add dask.array.digitize. (#1409) * Adding documentation to create dask DataFrame from HDF5 (#1405) * Unify shuffle algorithms (#1404) * dd.read_hdf: clear errors on exceeding row numbers (#1406) * Rename `get_division` to `get_partition` * Add nice error messages on import failures * Use task-based shuffle in hash_joins (#1383) * Fixed #1381: Reimplemented DataFrame.repartition(npartition=N) so it doesn\'t require indexing and just coalesce existing partitions without shuffling/balancing (#1396) * Import visualize from dask.diagnostics in docs * Backport `equal_nans` to older version of numpy * Improve checks for dtype and shape in dask.array * Progess bar process should be deamon * LZMA may not be available in python 3 (#1391) * dd.to_hdf: multiple files multiprocessing avoid locks (#1384) * dir works with numeric column names * Dataframe groupby works with numeric column names * Use fsync when appending to partd * Fix pickling issue in dataframe to_bag * Add documentation for dask.dataframe.to_hdf * Fixed a copy-paste typo in DataFrame.map_partitions docstring * Fix \'visualize\' import location in diagnostics documentation (#1376) * update cheat sheet (#1371)- update to version 0.10.1: * `inline` no longer removes keys (#1356) * avoid c: in infer_storage_options (#1369) * Protect reductions against empty partitions (#1361) * Add doc examples for dask.array.histogram. (#1363) * Fix typo in pip install requirements path (#1364) * avoid unnecessary dependencies between save tasks in dataframe.to_hdf (#1293) * remove xfail mark for blosc missing const * Add `anon=True` for read from s3 test * `subs` doesn\'t needlessly compare keys and values * Use pytest.importorskip instead of try/except/return pattern * Fixes for bokeh 0.12.0 * Multiprocess scheduler handles unpickling errors * arra.random with array-like parameters (#1327) * Fixes issue #1337 (#1338) * Remove dask runtime dependence on mock 2.7 backport. * Load known but external protocols automatically (#1325) * Add center argument to Series/DataFrame.rolling (#1280) * Add Bag.random_sample method. (#1332) * Correct docs install command and add missing required packages (#1333) * Mark the 4 slowest tests as slow to get a faster suite by default. (#1334) * Travis: Install mock package in Python 2.7. * Automatic blocksize for read_csv based on available memory and number of cores. * Replace \"Matthew Rocklin\" with \"Dask Development Team\" (#1329) * Support column assignment in DataFrame (#1322) * Few travis fixes, pandas version >= 0.18.0 (#1314) * Don\'t run hdf test if pytables package is not present. (#1323) * Add delayed.compute to api docs. * Support datetimes in DataFrame._build_pd (#1319) * Test setting the index with datetime with timezones, which is a pandas-defined dtype * (#1315) * Add s3fs to requirements (#1316) * Pass dtype information through in Series.astype (#1320) * Add draft of development guidelines (#1305) * Skip tests needing optional package when it\'s not present. (#1318) * DOC: Document DataFrame.categorize * make dd.to_csv support writing to multiple csv files (#1303) * quantiles for repartitioning (#1261) * DOC: Minimal doc for get_sync (#1312) * Pass through storage_options in db.read_text (#1304) * Fixes #1237: correctly propagate storage_options through read_ * APIs and use urlsplit to automatically get remote connection settings (#1269) * TST: Travis build matrix to specify numpy/pandas ver (#1300) * amend doc string to Bag.to_textfiles * Return dask.Delayed when saving files with compute = false (#1286) * Support empty or small dataframes in from_pandas (#1290) * Add validation and tests for order breaking name_function (#1275) * ENH: dataframe now supports partial string selection (#1278) * Fix typo in spark-dask docs * added note and verbose exception about CSV parsing errors (#1287)- update to version 0.10.0: * Add parametrization to merge tests * Add more challenging types to nonempty_sample_df test * Windows fixes * TST: Fix coveralls badge (#1276) * Sort index on shuffle (#1274) * Update specification docs to reflect new spec. * Add groupby docs (#1273) * Update spark docs * Rolling class receives normal arguments (unchecked other than pandas call), stores at * Reduce communication in rolling operations #1242 (#1270) * Fix Shuffle (#1255) * Work on earlier versions of Pandas * Handle additional Pandas types * Use non-empty fake dataframe in merge operations * Add failing test for merge case * Add utility function to create sample dataframe * update release procedure * amend doc string to Bag.to_textfiles (#1258) * Drop Python 2.6 support (#1264) * Clean DataFrame naming conventions (#1263) * Fix some bugs in the rolling implementation. * Fix core.get to use new spec * Make graph definition recursive * Handle empty partitions in dask.bag.to_textfiles * test index.min/max * Add regression test for non-ndarray slicing * Standardize dataframe keynames * bump csv sample size to 256k (#1253) * Switch tests to utils.tmpdir (#1251) * Fix dot_graph filename split bug * Correct documentation to reflect argument existing now. * Allow non-zero axis for .rolling (for application over columns) * Fix scheduler behavior for top-level lists * Various spelling mistakes in docstrings, comments, exception messages, and a filename * Fix typo. (#1247) * Fix tokenize in dask.delayed * Remove unused imports, pep8 fixes * Fix bug in slicing optimization * Add Task Shuffle (#1186) * Add bytes API (#1224) * Add dask_key_name to docs, fix bug in methods * Allow formatting in dask.dataframe.to_hdf path and key parameters * Match pandas\' exceptions a bit closer in the rolling API. Also, correct computation f * Add tests to package (#1231) * Document visualize method (#1234) * Skip new rolling API\'s tests if the pandas we have is too old. * Improve df_or_series.rolling(...) implementation. * Remove `iloc` property on `dask.dataframe` * Support for the new pandas rolling API. * test delayed names are different under kwargs * Add Hussain Sultan to AUTHORS * Add `optimize_graph` keyword to multiprocessing get * Add `optimize_graph` keyword to `compute` * Add dd.info() (#1213) * Cleanup base tests * Add groupby documentation stub * pngmath is deprecated in sphinx 1.4 * A few docfixes * Extract dtype in dd.from_bcolz * Throw NotImplementedError if old toolz.accumulate * Add isnull and notnull for dataframe * Add dask.bag.accumulate * Fix categorical partitioning * create single lock for glob read_hdf * Fix failing from_url doctest * Add missing api to bag docs * Add Skipper Seabold to AUTHORS. * Don\'t use mutable default argument * Fix typo * Ensure to_task_dasks always returns a task * Fix dir for dataframe objects * Infer metadata in dd.from_delayed * Fix some closure issues in dask.dataframe * Add storage_options keyword to read_csv * Define finalize function for dask.dataframe.Scalar * py26 compatibility * add stacked logos to docs * test from-array names * rename from_array tasks * add atop to array docs * Add motivation and example to delayed docs * splat out delayed values in compute docs * Fix optimize docs * add html page with logos * add dask logo to documentation images * Few pep8 cleanups to dask.dataframe.groupby * Groupby aggregate works with list of columns * Use different names for input and output in from_array * Don\'t enforce same column names * don\'t write header for first block in csv * Add var and std to DataFrame groupby (#1159) * Move conda recipe to conda-forge (#1162) * Use function names in map_blocks and elemwise (#1163) * add hyphen to delayed name (#1161) * Avoid shuffles when merging with Pandas objects (#1154) * Add DataFrame.eval * Ensure future imports * Add db.Bag.unzip * Guard against shape attributes that are not sequences * Add dask.array.multinomial- update to version 0.9.0: * No upstream changelog- update to version 0.8.2: * No upstream changelog- update to version 0.8.1: * No upstream changelog- update to version 0.8.0: * No upstream changelog- update to version 0.7.5: * No upstream changelog- update to version 0.7.5: * No upstream changelog- update to version 0.7.0: * No upstream changelog- update to version 0.6.1: * No upstream changelog * Tue Jul 14 2015 toddrme2178AATTgmail.com- Update to 0.6.0 * No upstream changelog * Tue May 19 2015 toddrme2178AATTgmail.com- Update to 0.5.0 * No upstream changelog * Thu Apr 09 2015 toddrme2178AATTgmail.com- Initial version
|
|
|