Prepare release 0.14 (#1262)

mfeurer · web-flow · commit d940e0ebfe70 · 2023-07-03T11:17:36.000+02:00
* Bump version number and add changelog

* Incorporate feedback from Pieter

* Fix unit test

* Make assert less strict

* Update release notes

* Fix indent
diff --git a/doc/progress.rst b/doc/progress.rst
@@ -6,25 +6,55 @@
 Changelog
 =========
 
+0.14.0
+~~~~~~
+
+**IMPORTANT:** This release paves the way towards a breaking update of OpenML-Python. From version
+0.15, functions that had the option to return a pandas DataFrame will return a pandas DataFrame
+by default. This version (0.14) emits a warning if you still use the old access functionality. 
+More concretely:
+
+* In 0.15 we will drop the ability to return dictionaries in listing calls and only provide
+  pandas DataFrames. To disable warnings in 0.14 you have to request a pandas DataFrame
+  (using ``output_format="dataframe"``).
+* In 0.15 we will drop the ability to return datasets as numpy arrays and only provide
+  pandas DataFrames. To disable warnings in 0.14 you have to request a pandas DataFrame 
+  (using ``dataset_format="dataframe"``).
+
+Furthermore, from version 0.15, OpenML-Python will no longer download datasets and dataset metadata
+by default. This version (0.14) emits a warning if you don't explicitly specifiy the desired behavior.
+
+Please see the pull requests #1258 and #1260 for further information.
+
+* ADD #1081: New flag that allows disabling downloading dataset features.
+* ADD #1132: New flag that forces a redownload of cached data.
+* FIX #1244: Fixes a rare bug where task listing could fail when the server returned invalid data.
+* DOC #1229: Fixes a comment string for the main example.
+* DOC #1241: Fixes a comment in an example.
+* MAINT #1124: Improve naming of helper functions that govern the cache directories.
+* MAINT #1223, #1250: Update tools used in pre-commit to the latest versions (``black==23.30``, ``mypy==1.3.0``, ``flake8==6.0.0``).
+* MAINT #1253: Update the citation request to the JMLR paper.
+* MAINT #1246: Add a warning that warns the user that checking for duplicate runs on the server cannot be done without an API key.
+
 0.13.1
 ~~~~~~
 
- * ADD #1081 #1132: Add additional options for (not) downloading datasets ``openml.datasets.get_dataset`` and cache management.
- * ADD #1028: Add functions to delete runs, flows, datasets, and tasks (e.g., ``openml.datasets.delete_dataset``).
- * ADD #1144: Add locally computed results to the ``OpenMLRun`` object's representation if the run was created locally and not downloaded from the server.
- * ADD #1180: Improve the error message when the checksum of a downloaded dataset does not match the checksum provided by the API.
- * ADD #1201: Make ``OpenMLTraceIteration`` a dataclass.
- * DOC #1069: Add argument documentation for the ``OpenMLRun`` class.
- * DOC #1241 #1229 #1231: Minor documentation fixes and resolve documentation examples not working.
- * FIX #1197 #559 #1131: Fix the order of ground truth and predictions in the ``OpenMLRun`` object and in ``format_prediction``.
- * FIX #1198: Support numpy 1.24 and higher.
- * FIX #1216: Allow unknown task types on the server. This is only relevant when new task types are added to the test server.
- * FIX #1223: Fix mypy errors for implicit optional typing.
- * MAINT #1155: Add dependabot github action to automatically update other github actions.
- * MAINT #1199: Obtain pre-commit's flake8 from github.com instead of gitlab.com.
- * MAINT #1215: Support latest numpy version.
- * MAINT #1218: Test Python3.6 on Ubuntu 20.04 instead of the latest Ubuntu (which is 22.04).
- * MAINT #1221 #1212 #1206 #1211: Update github actions to the latest versions.
+* ADD #1081 #1132: Add additional options for (not) downloading datasets ``openml.datasets.get_dataset`` and cache management.
+* ADD #1028: Add functions to delete runs, flows, datasets, and tasks (e.g., ``openml.datasets.delete_dataset``).
+* ADD #1144: Add locally computed results to the ``OpenMLRun`` object's representation if the run was created locally and not downloaded from the server.
+* ADD #1180: Improve the error message when the checksum of a downloaded dataset does not match the checksum provided by the API.
+* ADD #1201: Make ``OpenMLTraceIteration`` a dataclass.
+* DOC #1069: Add argument documentation for the ``OpenMLRun`` class.
+* DOC #1241 #1229 #1231: Minor documentation fixes and resolve documentation examples not working.
+* FIX #1197 #559 #1131: Fix the order of ground truth and predictions in the ``OpenMLRun`` object and in ``format_prediction``.
+* FIX #1198: Support numpy 1.24 and higher.
+* FIX #1216: Allow unknown task types on the server. This is only relevant when new task types are added to the test server.
+* FIX #1223: Fix mypy errors for implicit optional typing.
+* MAINT #1155: Add dependabot github action to automatically update other github actions.
+* MAINT #1199: Obtain pre-commit's flake8 from github.com instead of gitlab.com.
+* MAINT #1215: Support latest numpy version.
+* MAINT #1218: Test Python3.6 on Ubuntu 20.04 instead of the latest Ubuntu (which is 22.04).
+* MAINT #1221 #1212 #1206 #1211: Update github actions to the latest versions.
 
 0.13.0
 ~~~~~~
diff --git a/openml/__version__.py b/openml/__version__.py
@@ -3,4 +3,4 @@
 # License: BSD 3-Clause
 
 # The following line *must* be the last in the module, exactly as formatted:
-__version__ = "0.14.0dev"
+__version__ = "0.14.0"
diff --git a/tests/test_utils/test_utils.py b/tests/test_utils/test_utils.py
@@ -22,17 +22,22 @@ def test_list_all(self):
 
     def test_list_all_with_multiple_batches(self):
         res = openml.utils._list_all(
-            listing_call=openml.tasks.functions._list_tasks, output_format="dict", batch_size=2000
+            listing_call=openml.tasks.functions._list_tasks, output_format="dict", batch_size=1050
         )
         # Verify that test server state is still valid for this test to work as intended
-        #  -> If the number of results is less than 2000, the test can not test the
-        #  batching operation.
-        assert len(res) > 2000
+        #  -> If the number of results is less than 1050, the test can not test the
+        #  batching operation. By having more than 1050 results we know that batching
+        # was triggered. 1050 appears to be a number of tasks that is available on a fresh
+        # test server.
+        assert len(res) > 1050
         openml.utils._list_all(
             listing_call=openml.tasks.functions._list_tasks,
             output_format="dataframe",
-            batch_size=2000,
+            batch_size=1050,
         )
+        # Comparing the number of tasks is not possible as other unit tests running in
+        # parallel might be adding or removing tasks!
+        # assert len(res) <= len(res2)
 
     @unittest.mock.patch("openml._api_calls._perform_api_call", side_effect=mocked_perform_api_call)
     def test_list_all_few_results_available(self, _perform_api_call):