SEARCH
NEW RPMS
DIRECTORIES
ABOUT
FAQ
VARIOUS
BLOG

 
 
Changelog for python311-scikit-learn-1.4.2-1.1.x86_64.rpm :

* Mon Apr 15 2024 Dirk Müller - update to 1.4.2:
* This release only includes support for numpy 2.
* Tue Feb 20 2024 Ben Greiner - Update to 1.4.1.post1 [#]# Metadata Routing
* Fix routing issue with ColumnTransformer when used inside another meta-estimator. #28188 by Adrin Jalali.
* No error is raised when no metadata is passed to a metaestimator that includes a sub-estimator which doesn’t support metadata routing. #28256 by Adrin Jalali.
* Fix multioutput.MultiOutputRegressor and multioutput.MultiOutputClassifier to work with estimators that don’t consume any metadata when metadata routing is enabled. [#28240] by Adrin Jalali. [#]# DataFrame Support
* Enhancement Fix Pandas and Polars dataframe are validated directly without ducktyping checks. #28195 by Thomas Fan. [#]# Changes impacting many modules
* Efficiency Fix Partial revert of #28191 to avoid a performance regression for estimators relying on euclidean pairwise computation with sparse matrices. The impacted estimators are: - sklearn.metrics.pairwise_distances_argmin - sklearn.metrics.pairwise_distances_argmin_min - sklearn.cluster.AffinityPropagation - sklearn.cluster.Birch - sklearn.cluster.SpectralClustering - sklearn.neighbors.KNeighborsClassifier - sklearn.neighbors.KNeighborsRegressor - sklearn.neighbors.RadiusNeighborsClassifier - sklearn.neighbors.RadiusNeighborsRegressor - sklearn.neighbors.LocalOutlierFactor - sklearn.neighbors.NearestNeighbors - sklearn.manifold.Isomap - sklearn.manifold.TSNE - sklearn.manifold.trustworthiness - #28235 by Julien Jerphanion.
* Fixes a bug for all scikit-learn transformers when using set_output with transform set to pandas or polars. The bug could lead to wrong naming of the columns of the returned dataframe. #28262 by Guillaume Lemaitre.
* When users try to use a method in StackingClassifier, StackingClassifier, StackingClassifier, SelectFromModel, RFE, SelfTrainingClassifier, OneVsOneClassifier, OutputCodeClassifier or OneVsRestClassifier that their sub-estimators don’t implement, the AttributeError now reraises in the traceback. #28167 by Stefanie Senger.- Release 1.4.0
* HistGradientBoosting Natively Supports Categorical DTypes in DataFrames
* Polars output in set_output
* Missing value support for Random Forest
* Add support for monotonic constraints in tree-based models
* Enriched estimator displays
* Metadata Routing Support
* Improved memory and runtime efficiency for PCA on sparse data
* Highlights and detailed changelog:
* https://scikit-learn.org/stable/auto_examples/release_highlights/plot_release_highlights_1_4_0.html
* https://scikit-learn.org/stable/whats_new/v1.4.html#release-notes-1-4- Enable python312 test flavor, avoid testing it with the other flavors- Prepare for python39 flavor drop
* Tue Nov 07 2023 Dirk Müller - update to 1.3.2:
* All dataset fetchers now accept `data_home` as any object that implements the :class:`os.PathLike` interface, for instance, :class:`pathlib.Path`.
* Fixes a bug in :class:`decomposition.KernelPCA` by forcing the output of the internal :class:`preprocessing.KernelCenterer` to be a default array. When the arpack solver is used, it expects an array with a `dtype` attribute.
* Fixes a bug for metrics using `zero_division=np.nan` (e.g. :func:`~metrics.precision_score`) within a paralell loop (e.g. :func:`~model_selection.cross_val_score`) where the singleton for `np.nan` will be different in the sub-processes.
* Do not leak data via non-initialized memory in decision tree pickle files and make the generation of those files deterministic.
* Ridge models with `solver=\'sparse_cg\'` may have slightly different results with scipy>=1.12, because of an underlying change in the scipy solver
* The `set_output` API correctly works with list input.
* :class:`calibration.CalibratedClassifierCV` can now handle models that produce large prediction scores.
* Wed Aug 09 2023 Steve Kowalik - Skip another recalcitrant test on 32 bit.
* Tue Aug 01 2023 Markéta Machová - Python flavors shifted again, drop test-py38, add test-py311
* Tue Jul 25 2023 Markéta Machová - Update to 1.3.0
* We are in the process of introducing a new way to route metadata such as sample_weight throughout the codebase, which would affect how meta-estimators such as pipeline.Pipeline and model_selection.GridSearchCV route metadata.
* Originally hosted in the scikit-learn-contrib repository, cluster.HDBSCAN has been adopted into scikit-learn.
* A new category encoding strategy preprocessing.TargetEncoder encodes the categories based on a shrunk estimate of the average target values for observations belonging to that category.
* The classes tree.DecisionTreeClassifier and tree.DecisionTreeRegressor now support missing values.
* model_selection.ValidationCurveDisplay is now available to plot results from model_selection.validation_curve
* The class ensemble.HistGradientBoostingRegressor supports the Gamma deviance loss function via loss=\"gamma\".
* Similarly to preprocessing.OneHotEncoder, the class preprocessing.OrdinalEncoder now supports aggregating infrequent categories into a single output for each feature.
* More changes, see https://scikit-learn.org/stable/whats_new/v1.3.html
* Sat Jun 10 2023 ecsos - Add %{?sle15_python_module_pythons}
* Wed Feb 08 2023 Arun Persaud - update to version 1.2.1:
* Changed models + The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures. + Fix The fitted components in MiniBatchDictionaryLearning might differ. The online updates of the sufficient statistics now properly take the sizes of the batches into account. #25354 by Jérémie du Boisberranger. + Fix The categories_ attribute of preprocessing.OneHotEncoder now always contains an array of object`s when using predefined categories that are strings. Predefined categories encoded as bytes will no longer work with `X encoded as strings. #25174 by Tim Head.
* Changes impacting all modules + Fix Support pandas.Int64 dtyped y for classifiers and regressors. #25089 by Tim Head. + Fix Remove spurious warnings for estimators internally using neighbors search methods. #25129 by Julien Jerphanion. + Fix Fix a bug where the current configuration was ignored in estimators using n_jobs > 1. This bug was triggered for tasks dispatched by the auxillary thread of joblib as sklearn.get_config used to access an empty thread local configuration instead of the configuration visible from the thread where joblib.Parallel was first called. #25363 by Guillaume Lemaitre.
* Changelog o sklearn.base + Fix Fix a regression in BaseEstimator.__getstate__ that would prevent certain estimators to be pickled when using Python 3.11. #25188 by Benjamin Bossan. + Fix Inheriting from base.TransformerMixin will only wrap the transform method if the class defines transform itself. #25295 by Thomas Fan. o sklearn.datasets + Fix Fix an inconsistency in datasets.fetch_openml between liac-arff and pandas parser when a leading space is introduced after the delimiter. The ARFF specs requires to ignore the leading space. #25312 by Guillaume Lemaitre. o sklearn.decomposition + Fix Fixed a bug in decomposition.MiniBatchDictionaryLearning where the online updates of the sufficient statistics where not correct when calling partial_fit on batches of different sizes. #25354 by Jérémie du Boisberranger. + Fix decomposition.DictionaryLearning better supports readonly NumPy arrays. In particular, it better supports large datasets which are memory-mapped when it is used with coordinate descent algorithms (i.e. when fit_algorithm=\'cd\'). #25172 by Julien Jerphanion. o sklearn.ensemble + Fix ensemble.RandomForestClassifier, ensemble.RandomForestRegressor ensemble.ExtraTreesClassifier and ensemble.ExtraTreesRegressor now support sparse readonly datasets. #25341 by Julien Jerphanion + sklearn.feature_extraction + Fix feature_extraction.FeatureHasher raises an informative error when the input is a list of strings. #25094 by Thomas Fan. o sklearn.linear_model + Fix Fix a regression in linear_model.SGDClassifier and linear_model.SGDRegressor that makes them unusable with the verbose parameter set to a value greater than 0. #25250 by Jérémie Du Boisberranger. o sklearn.manifold + Fix manifold.TSNE now works correctly when output type is set to pandas #25370 by Tim Head. o sklearn.model_selection + Fix model_selection.cross_validate with multimetric scoring in case of some failing scorers the non-failing scorers now returns proper scores instead of error_score values. #23101 by András Simon and Thomas Fan. o sklearn.neural_network + Fix neural_network.MLPClassifier and neural_network.MLPRegressor no longer raise warnings when fitting data with feature names. #24873 by Tim Head. o sklearn.preprocessing + Fix preprocessing.FunctionTransformer.inverse_transform correctly supports DataFrames that are all numerical when check_inverse=True. #25274 by Thomas Fan. + Fix preprocessing.SplineTransformer.get_feature_names_out correctly returns feature names when extrapolations=\"periodic\". #25296 by Thomas Fan. o sklearn.tree + Fix tree.DecisionTreeClassifier, tree.DecisionTreeRegressor tree.ExtraTreeClassifier and tree.ExtraTreeRegressor now support sparse readonly datasets. #25341 by Julien Jerphanion o sklearn.utils + Fix Restore utils.check_array’s behaviour for pandas Series of type boolean. The type is maintained, instead of converting to float64. #25147 by Tim Head.
* API Change utils.fixes.delayed is deprecated in 1.2.1 and will be removed in 1.5. Instead, import utils.parallel.delayed and use it in conjunction with the newly introduced utils.parallel.Parallel to ensure proper propagation of the scikit-learn configuration to the workers. #25363 by Guillaume Lemaitre.
* Sun Jan 15 2023 Ben Greiner - Update to version 1.2.0
* Pandas output with set_output API
* Interaction constraints in Histogram-based Gradient Boosting Trees
* New and enhanced displays
* Faster parser in fetch_openml
* Experimental Array API support in LinearDiscriminantAnalysis
* Improved efficiency of many estimators- Drop sklearn-pr24283-gradient-segfault.patch- PEP517 build
* Thu Oct 27 2022 Ben Greiner - Update to version 1.1.3
* This bugfix release only includes fixes for compatibility with the latest SciPy release >= 1.9.2.- Update sklearn-pr24283-gradient-segfault.patch
* Tue Oct 11 2022 Ben Greiner - Update dependencies- Add sklearn-pr24283-gradient-segfault.patch
* gh#scikit-learn/scikit-learn#24283- Update test suite setup.
* Sat Sep 10 2022 Arun Persaud - update to version 1.1.2:
* Changes + Fix A default HTML representation is shown for meta-estimators with invalid parameters. #24015 by Thomas Fan. + Fix Add support for F-contiguous arrays for estimators and functions whose back-end have been changed in 1.1. #23990 by Julien Jerphanion. + Fix Wheels are now available for MacOS 10.9 and greater. #23833 by Thomas Fan.
* sklearn.base + Fix The get_params method of the BaseEstimator class now supports estimators with type-type params that have the get_params method. #24017 by Henry Sorsky.
* sklearn.cluster + Fix Fixed a bug in cluster.Birch that could trigger an error when splitting a node if there are duplicates in the dataset. #23395 by Jérémie du Boisberranger.
* sklearn.feature_selection + Fix feature_selection.SelectFromModel defaults to selection threshold 1e-5 when the estimator is either linear_model.ElasticNet or linear_model.ElasticNetCV with l1_ratio equals 1 or linear_model.LassoCV. #23636 by Hao Chun Chang.
* sklearn.impute + Fix impute.SimpleImputer uses the dtype seen in fit for transform when the dtype is object. #22063 by Thomas Fan.
* sklearn.linear_model + Fix Use dtype-aware tolerances for the validation of gram matrices (passed by users or precomputed). #22059 by Malte S. Kurz. + Fix Fixed an error in linear_model.LogisticRegression with solver=\"newton-cg\", fit_intercept=True, and a single feature. #23608 by Tom Dupre la Tour.
* sklearn.manifold + Fix manifold.TSNE now throws a ValueError when fit with perplexity>=n_samples to ensure mathematical correctness of the algorithm. #10805 by Mathias Andersen and #23471 by Meekail Zain.
* sklearn.metrics + Fix Fixed error message of metrics.coverage_error for 1D array input. #23548 by Hao Chun Chang.
* sklearn.preprocessing + Fix preprocessing.OrdinalEncoder.inverse_transform correctly handles use cases where unknown_value or encoded_missing_value is nan. #24087 by Thomas Fan.
* sklearn.tree + Fix Fixed invalid memory access bug during fit in tree.DecisionTreeRegressor and tree.DecisionTreeClassifier. #23273 by Thomas Fan.
* Tue May 31 2022 Arun Persaud - specfile
* updated numpy, scipy, and matplotlib requirements- update to version 1.1.1:
* Enhancement The error message is improved when importing model_selection.HalvingGridSearchCV, model_selection.HalvingRandomSearchCV, or impute.IterativeImputer without importing the experimental flag. #23194 by Thomas Fan.
* Enhancement Added an extension in doc/conf.py to automatically generate the list of estimators that handle NaN values. #23198 by Lise Kleiber, Zhehao Liu and Chiara Marmo.
* sklearn.datasets + Fix Avoid timeouts in datasets.fetch_openml by not passing a timeout argument, #23358 by Loïc Estève.
* sklearn.decomposition + Fix Avoid spurious warning in decomposition.IncrementalPCA when n_samples == n_components. #23264 by Lucy Liu.
* sklearn.feature_selection + Fix The partial_fit method of feature_selection.SelectFromModel now conducts validation for max_features and feature_names_in parameters. #23299 by Long Bao.
* sklearn.metrics + Fix Fixes metrics.precision_recall_curve to compute precision-recall at 100% recall. The Precision-Recall curve now displays the last point corresponding to a classifier that always predicts the positive class: recall=100% and precision=class balance. #23214 by Stéphane Collot and Max Baak.
* sklearn.preprocessing + Fix preprocessing.PolynomialFeatures with degree equal to 0 will raise error when include_bias is set to False, and outputs a single constant array when include_bias is set to True. #23370 by Zhehao Liu.
* sklearn.tree + Fix Fixes performance regression with low cardinality features for tree.DecisionTreeClassifier, tree.DecisionTreeRegressor, ensemble.RandomForestClassifier, ensemble.RandomForestRegressor, ensemble.GradientBoostingClassifier, and ensemble.GradientBoostingRegressor. #23410 by Loïc Estève.
* sklearn.utils + Fix utils.class_weight.compute_sample_weight now works with sparse y. #23115 by kernc.- changes from version 1.1.0: long changelog, see https://scikit-learn.org/stable/whats_new/v1.1.html#version-1-1-0
* Mon May 30 2022 Steve Kowalik - Split up to using multibuild per Python version since the test suite may take a while.
* Wed Feb 02 2022 Steve Kowalik - Update to 1.0.2:
* Fixed an infinite loop in cluster.SpectralClustering by moving an iteration counter from try to except. #21271 by Tyler Martin.
* datasets.fetch_openml is now thread safe. Data is first downloaded to a temporary subfolder and then renamed. #21833 by Siavash Rezazadeh.
* Fixed the constraint on the objective function of decomposition.DictionaryLearning, decomposition.MiniBatchDictionaryLearning, decomposition.SparsePCA and decomposition.MiniBatchSparsePCA to be convex and match the referenced article. #19210 by Jérémie du Boisberranger.
* ensemble.RandomForestClassifier, ensemble.RandomForestRegressor, ensemble.ExtraTreesClassifier, ensemble.ExtraTreesRegressor, and ensemble.RandomTreesEmbedding now raise a ValueError when bootstrap=False and max_samples is not None. #21295 Haoyin Xu.
* Solve a bug in ensemble.GradientBoostingClassifier where the exponential loss was computing the positive gradient instead of the negative one. #22050 by Guillaume Lemaitre.
* Fixed feature_selection.SelectFromModel by improving support for base estimators that do not set feature_names_in_. #21991 by Thomas Fan.
* Fix a bug in linear_model.RidgeClassifierCV where the method predict was performing an argmax on the scores obtained from decision_function instead of returning the multilabel indicator matrix. #19869 by Guillaume Lemaitre.
* linear_model.LassoLarsIC now correctly computes AIC and BIC. An error is now raised when n_features > n_samples and when the noise variance is not provided. #21481 by Guillaume Lemaitre and Andrés Babino.
* Fixed an unnecessary error when fitting manifold.Isomap with a precomputed dense distance matrix where the neighbors graph has multiple disconnected components. #21915 by Tom Dupre la Tour.
* All sklearn.metrics.DistanceMetric subclasses now correctly support read-only buffer attributes. This fixes a regression introduced in 1.0.0 with respect to 0.24.2. #21694 by Julien Jerphanion.
* neighbors.KDTree and neighbors.BallTree correctly supports read-only buffer attributes. #21845 by Thomas Fan.
* Fixes compatibility bug with NumPy 1.22 in preprocessing.OneHotEncoder. #21517 by Thomas Fan.
* Prevents tree.plot_tree from drawing out of the boundary of the figure. #21917 by Thomas Fan.
* Support loading pickles of decision tree models when the pickle has been generated on a platform with a different bitness. A typical example is to train and pickle the model on 64 bit machine and load the model on a 32 bit machine for prediction. #21552 by Loïc Estève.
* Non-fit methods in the following classes do not raise a UserWarning when fitted on DataFrames with valid feature names: covariance.EllipticEnvelope, ensemble.IsolationForest, ensemble.AdaBoostClassifier, neighbors.KNeighborsClassifier, neighbors.KNeighborsRegressor, neighbors.RadiusNeighborsClassifier, neighbors.RadiusNeighborsRegressor. #21199 by Thomas Fan.
* Fixed calibration.CalibratedClassifierCV to take into account sample_weight when computing the base estimator prediction when ensemble=False. #20638 by Julien Bohné.
* Fixed a bug in calibration.CalibratedClassifierCV with method=\"sigmoid\" that was ignoring the sample_weight when computing the the Bayesian priors. #21179 by Guillaume Lemaitre.
* Compute y_std properly with multi-target in sklearn.gaussian_process.GaussianProcessRegressor allowing proper normalization in multi-target scene. #20761 by Patrick de C. T. R. Ferreira.
* Fixed a bug in feature_extraction.CountVectorizer and feature_extraction.TfidfVectorizer by raising an error when ‘min_idf’ or ‘max_idf’ are floating-point numbers greater than 1. #20752 by Alek Lefebvre.
* linear_model.LogisticRegression now raises a better error message when the solver does not support sparse matrices with int64 indices. #21093 by Tom Dupre la Tour.
* neighbors.KNeighborsClassifier, neighbors.KNeighborsRegressor, neighbors.RadiusNeighborsClassifier, neighbors.RadiusNeighborsRegressor with metric=\"precomputed\" raises an error for bsr and dok sparse matrices in methods: fit, kneighbors and radius_neighbors, due to handling of explicit zeros in bsr and dok sparse graph formats. #21199 by Thomas Fan.
* pipeline.Pipeline.get_feature_names_out correctly passes feature names out from one step of a pipeline to the next. #21351 by Thomas Fan.
* svm.SVC and svm.SVR check for an inconsistency in its internal representation and raise an error instead of segfaulting. This fix also resolves CVE-2020-28975. #21336 by Thomas Fan.
* manifold.TSNE now avoids numerical underflow issues during affinity matrix computation.
* manifold.Isomap now connects disconnected components of the neighbors graph along some minimum distance pairs, instead of changing every infinite distances to zero.
* Many others, see full changelog at https://scikit-learn.org/dev/whats_new/v1.0.html
* Sun Jun 06 2021 Dirk Müller - update to 0.24.2:
* a lot of bugfixes see https://scikit-learn.org/stable/whats_new/v0.24.html- drop scikit-learn-pr19101-npfloat.patch: upstream
* Sat Feb 13 2021 Ben Greiner - Add scikit-learn-pr19101-npfloat.patch in order to work with NumPy 1.20
* Fri Jan 22 2021 Benjamin Greiner - Skip python36 because SciPy 1.6.0 dropped it- optionally enable more tests with matplotlib and pandas by - -with extratests
* Fri Jan 22 2021 andy great - Skip test_convergence_dtype_consistency on 32 bit arch due to precision-related errors on 32 bit arch. https://github.com/scikit-learn/scikit-learn/issues/19230- Remove explicit dependecy python-matplotlib
* Wed Jan 20 2021 andy great - Remove assert_allclose-for-FP-comparison.patch, fixed.- Update to version 0.24.1.
* sklearn.metrics
* Fix numerical stability bug that could happen in metrics.adjusted_mutual_info_score and metrics.mutual_info_score with NumPy 1.20+.
* sklearn.semi_supervised
* Fix semi_supervised.SelfTrainingClassifier is now accepting meta-estimator (e.g. ensemble.StackingClassifier). The validation of this estimator is done on the fitted estimator, once we know the existence of the method predict_proba.- Updates for version 0.24.0.
* sklearn.base
* Fix base.BaseEstimator.get_params now will raise an AttributeError if a parameter cannot be retrieved as an instance attribute. Previously it would return None.
* sklearn.calibration
* Efficiency calibration.CalibratedClassifierCV.fit now supports parallelization via joblib.Parallel using argument n_jobs.
* Enhancement Allow calibration.CalibratedClassifierCV use with prefit pipeline.Pipeline where data is not X is not array-like, sparse matrix or dataframe at the start. #17546 by Lucy Liu.
* Enhancement Add ensemble parameter to calibration.CalibratedClassifierCV, which enables implementation of calibration via an ensemble of calibrators (current method) or just one calibrator using all the data (similar to the built-in feature of sklearn.svm estimators with the probabilities=True parameter).
* sklearn.cluster
* Enhancement cluster.AgglomerativeClustering has a new parameter compute_distances. When set to True, distances between clusters are computed and stored in the distances_ attribute even when the parameter distance_threshold is not used. This new parameter is useful to produce dendrogram visualizations, but introduces a computational and memory overhead.
* Enhancement cluster.SpectralClustering and cluster.spectral_clustering have a new keyword argument verbose. When set to True, additional messages will be displayed which can aid with debugging. #18052 by Sean O. Stalley.
* Enhancement Added cluster.kmeans_plusplus as public function. Initialization by KMeans++ can now be called separately to generate initial cluster centroids.
* API Change cluster.MiniBatchKMeans attributes, counts_ and init_size_, are deprecated and will be removed in 1.1 (renaming of 0.26).
* sklearn.compose
* Fix compose.ColumnTransformer will skip transformers the column selector is a list of bools that are False.
* Fix compose.ColumnTransformer now displays the remainder in the diagram display. #18167 by Thomas Fan.
* Fix compose.ColumnTransformer enforces strict count and order of column names between fit and transform by raising an error instead of a warning, following the deprecation cycle.
* sklearn.covariance
* API Change Deprecates cv_alphas_ in favor of cv_results_[\'alphas\'] and grid_scores_ in favor of split scores in cv_results_ in covariance.GraphicalLassoCV. cv_alphas_ and grid_scores_ will be removed in version 1.1 (renaming of 0.26).
* sklearn.cross_decomposition
* Fixed a bug in cross_decomposition.PLSSVD which would sometimes return components in the reversed order of importance.
* Fixed a bug in cross_decomposition.PLSSVD, cross_decomposition.CCA, and cross_decomposition.PLSCanonical, which would lead to incorrect predictions for est.transform(Y) when the training data is single-target.
* Fix Increases the stability of cross_decomposition.CCA
* API Change For cross_decomposition.NMF, the init value, when ‘init=None’ and n_components <= min(n_samples, n_features) will be changed from \'nndsvd\' to \'nndsvda\' in 1.1 (renaming of 0.26).
* API Change The bounds of the n_components parameter is now restricted:
* into [1, min(n_samples, n_features, n_targets)], for cross_decomposition.PLSSVD, cross_decomposition.CCA, and cross_decomposition.PLSCanonical.
* into [1, n_features] or cross_decomposition.PLSRegression.
* An error will be raised in 1.1 (renaming of 0.26).
* API Change For cross_decomposition.PLSSVD, cross_decomposition.CCA, and cross_decomposition.PLSCanonical, the x_scores_ and y_scores_ attributes were deprecated and will be removed in 1.1 (renaming of 0.26). They can be retrieved by calling transform on the training data. The norm_y_weights attribute will also be removed. #17095 by Nicolas Hug.
* API Change For cross_decomposition.PLSRegression, cross_decomposition.PLSCanonical, cross_decomposition.CCA, and cross_decomposition.PLSSVD, the x_mean_, y_mean_, x_std_, and y_std_ attributes were deprecated and will be removed in 1.1 (renaming of 0.26).
* Fix decomposition.TruncatedSVD becomes deterministic by using the random_state. It controls the weights’ initialization of the underlying ARPACK solver.
* sklearn.datasets
* Feature datasets.fetch_openml now validates md5 checksum of arff files downloaded or cached to ensure data integrity.
* Feature datasets.fetch_openml now validates md5checksum of arff files downloaded or cached to ensure data integrity.
* Enhancement datasets.fetch_openml now allows argument as_frame to be ‘auto’, which tries to convert returned data to pandas DataFrame unless data is sparse. #17396 by Jiaxiang.
* Enhancement datasets.fetch_covtype now now supports the optional argument as_frame; when it is set to True, the returned Bunch object’s data and frame members are pandas DataFrames, and the target member is a pandas Series.
* Enhancement datasets.fetch_kddcup99 now now supports the optional argument as_frame; when it is set to True, the returned Bunch object’s data and frame members are pandas DataFrames, and the target member is a pandas Series.
* Enhancement datasets.fetch_20newsgroups_vectorized now supports loading as a pandas DataFrame by setting as_frame=True.
* API Change The default value of as_frame in datasets.fetch_openml is changed from False to ‘auto’.
* Many more updates and fixes.- Skip tests for test_fetch_openml_verify_checksum[True] and test_fetch_openml_verify_checksum[False], not sure why it fail.
 
ICM