Tags · scaleapi/nucleus-python-client

v0.18.3

[DE-7859] Expose pHash on DatasetItem (v0.18.3) (#461)

* [DE-7859] Expose pHash on DatasetItem (v0.18.3)

Add a `phash` field to the DatasetItem dataclass and thread it through
`from_json`. Because every SDK method that returns a DatasetItem
(items_and_annotation_generator, items_generator, query_items,
dataset.items, iloc/refloc/loc) deserializes through DatasetItem.from_json,
exposing the field there is sufficient — no per-method changes required.

Also adds a top-level CLAUDE.md with release/branch conventions and
architecture pointers for future Claude Code sessions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Tighten phash field comment

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Loosen test_dataset_append_async — don't pin to step counts

The upload job pipeline plans with N total_steps initially, then
dynamically collapses to a single step once it knows how to
short-circuit (small input → batched upload). By the time
sleep_until_complete() returns, status() always reports total_steps=1,
completed_steps=1 — so the hard-coded expectation of 5/5 deterministically
fails on the current backend.

Drop the step-count assertions and keep the meaningful invariants:
job completed successfully, progress is 1.00, and
completed_steps == total_steps (whatever they are).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix flaky dedup tests: compare unique_item_ids as sets

`test_deduplicate_*_by_ids` runs dedup over the surviving set returned
from a prior dedup and asserts the second result equals the first.
The set of survivors is well-defined, but the backend doesn't guarantee
a stable list order across runs — the "kept" list depends on the order
in which the deduplication loop visits items, and that order can differ
between the whole-dataset (cursor-paginated) and by-ids (batched-by-input)
code paths.

Asserting list equality therefore fails intermittently when the same
items come back in a different order. Switch all four call sites
(image / video-scene / video-url / by-ids-returns-job) to set comparison.
The other invariants (length, `original_count`) still hold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Exclude phash from DatasetItem __eq__

Adding `phash` as a regular dataclass field made every `item1 == item2`
comparison sensitive to whether the backend had populated the hash —
which it doesn't on every endpoint (some handlers cherry-pick columns
and exclude phash, others select all columns and include it). Tests
that constructed a DatasetItem locally and then compared it to the
backend round-trip (test_append_and_export, test_slice_dataset_item_iterator)
broke as a result.

phash is a derived value (computed from image_location), so two items
with the same source image should compare equal regardless of whether
their hashes happen to be populated. Mark the field `compare=False` so
auto-generated __eq__ ignores it, matching the natural semantics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test_dataset_tags: poll fresh get_tags() instead of asserting on remove_tags response

The DELETE /tags handler refetches the tag list immediately after the
delete and returns it. In prod that refetch can hit a read replica that
hasn't yet replayed the DELETE, so the response includes the just-deleted
tag — making the test fail. A separate follow-up request always sees the
correct state (verified against api.scale.com — first poll is already
consistent at ~25ms round-trip).

Tighten the test against the post-state by polling get_tags() with a 5s
settle window, rather than trusting the remove_tags response. Same change
applied to the idempotent-remove follow-up assertion. Backend deferred —
the inconsistency is bounded and not user-impacting.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

May 20, 2026
280f3bc
zip
tar.gz

v0.18.2

Add dataset tags to SDK for identification (DE-7033) (#456)

Expose dataset tags through the Python SDK so customers can identify
datasets labeled by Scale vs other vendors via the API.

- Add `tags` field to DatasetInfo model (returned by dataset.info())
- Add get_tags(), add_tags(), remove_tags() methods to Dataset class
- Use POST /tags/remove instead of DELETE to avoid proxy body-stripping
- Use pydantic v1/v2 compat shim for null-coercion validator
- Guard against passing a bare string instead of a list

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

May 8, 2026
5c4b847
zip
tar.gz

v0.18.1

[DE-7784] Migrate sdk image deduplication to async mode (#459)

* Add support for async dedup

* Update sphinx package versions

* Update sdk version to 0.18.1

* Remove support for sync dedup in sdk

* Updater CHANGELOG

* Address greptile

May 6, 2026
3321deb
zip
tar.gz
Notes

v0.17.14

Enforce mutually exclusive api_key and limited_access_key inputs in N…

…ucleusClient and fix all remaining Sphinx doc build warnings (#457)

* Make different auth keys mutually exclusive

* Fix mypy errors

* Re-add UploadResponse export from init file

* Undo removal of unused imports

Apr 16, 2026
93cb518
zip
tar.gz
Notes

v0.17.13

Release v0.17.13 (#455)

* Update package version to enable release of latest changes on master

* Update CHANGELOG

Mar 6, 2026
400dfd8
zip
tar.gz
Notes

v0.17.12

[DE-6999] Enable image deduplication within nucleus sdk (#452)

* Enable deduplication in nucleus sdk

* Lint fixes

* Fix import order

* Add tests for deduplication sdk

* Fix isort import formatting errors

* Add fixture for image dataset specifically for dedup

* Fix image dataset creation syntax

* Create image dataset syncrhonously

* Make dataset_with_duplicates fixture sync

* Add dedup test for scene made with video url

* Document difference between deduplicate and deduplicate_by_ids better in docstring

* Add tests to cover all ingestion forms

* Refactor tests to use DEDUP_DEFAULT_TEST_THRESHOLD constant

* Use try-finally for dataset creation and deletion

* Make edge case test docstrings more detailed

* Remove deprecated video sync upload tests

* Update test_jobs to be deterministic

* Split jobs tests into listing and retrieval separately

* Fix docstring typo

Mar 2, 2026
878ca05
zip
tar.gz
Notes

v0.17.11

Added support for limited access keys (to be used with or in substitute

of api keys)

Example usage:
```
c = nucleus.NucleusClient(limited_access_key="<LIMITED_ACCESS_KEY>")
```

```
c = nucleus.NucleusClient(api_key="<API_KEY>", limited_access_key="<LIMITED_ACCESS_KEY>")
```

```
c = nucleus.NucleusClient(api_key="<API_KEY>")
```

Nov 14, 2025
671f475
zip
tar.gz
Notes

v0.17.9

export_class_label support

Mar 13, 2025
eb389ce
zip
tar.gz
Notes

v0.17.8

Tasked based annotations

Jan 2, 2025
b9ec464
zip
tar.gz

v0.17.7

 v0.17.7

Nov 6, 2024
b06ea97
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.18.3

v0.18.2

v0.18.1

v0.17.14

v0.17.13

v0.17.12

v0.17.11

v0.17.9

v0.17.8

v0.17.7

Tags: scaleapi/nucleus-python-client