Skip to content

Tags: scaleapi/nucleus-python-client

Tags

v0.18.3

Toggle v0.18.3's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[DE-7859] Expose pHash on DatasetItem (v0.18.3) (#461)

* [DE-7859] Expose pHash on DatasetItem (v0.18.3)

Add a `phash` field to the DatasetItem dataclass and thread it through
`from_json`. Because every SDK method that returns a DatasetItem
(items_and_annotation_generator, items_generator, query_items,
dataset.items, iloc/refloc/loc) deserializes through DatasetItem.from_json,
exposing the field there is sufficient — no per-method changes required.

Also adds a top-level CLAUDE.md with release/branch conventions and
architecture pointers for future Claude Code sessions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Tighten phash field comment

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Loosen test_dataset_append_async — don't pin to step counts

The upload job pipeline plans with N total_steps initially, then
dynamically collapses to a single step once it knows how to
short-circuit (small input → batched upload). By the time
sleep_until_complete() returns, status() always reports total_steps=1,
completed_steps=1 — so the hard-coded expectation of 5/5 deterministically
fails on the current backend.

Drop the step-count assertions and keep the meaningful invariants:
job completed successfully, progress is 1.00, and
completed_steps == total_steps (whatever they are).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix flaky dedup tests: compare unique_item_ids as sets

`test_deduplicate_*_by_ids` runs dedup over the surviving set returned
from a prior dedup and asserts the second result equals the first.
The set of survivors is well-defined, but the backend doesn't guarantee
a stable list order across runs — the "kept" list depends on the order
in which the deduplication loop visits items, and that order can differ
between the whole-dataset (cursor-paginated) and by-ids (batched-by-input)
code paths.

Asserting list equality therefore fails intermittently when the same
items come back in a different order. Switch all four call sites
(image / video-scene / video-url / by-ids-returns-job) to set comparison.
The other invariants (length, `original_count`) still hold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Exclude phash from DatasetItem __eq__

Adding `phash` as a regular dataclass field made every `item1 == item2`
comparison sensitive to whether the backend had populated the hash —
which it doesn't on every endpoint (some handlers cherry-pick columns
and exclude phash, others select all columns and include it). Tests
that constructed a DatasetItem locally and then compared it to the
backend round-trip (test_append_and_export, test_slice_dataset_item_iterator)
broke as a result.

phash is a derived value (computed from image_location), so two items
with the same source image should compare equal regardless of whether
their hashes happen to be populated. Mark the field `compare=False` so
auto-generated __eq__ ignores it, matching the natural semantics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test_dataset_tags: poll fresh get_tags() instead of asserting on remove_tags response

The DELETE /tags handler refetches the tag list immediately after the
delete and returns it. In prod that refetch can hit a read replica that
hasn't yet replayed the DELETE, so the response includes the just-deleted
tag — making the test fail. A separate follow-up request always sees the
correct state (verified against api.scale.com — first poll is already
consistent at ~25ms round-trip).

Tighten the test against the post-state by polling get_tags() with a 5s
settle window, rather than trusting the remove_tags response. Same change
applied to the idempotent-remove follow-up assertion. Backend deferred —
the inconsistency is bounded and not user-impacting.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

v0.18.2

Toggle v0.18.2's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Add dataset tags to SDK for identification (DE-7033) (#456)

Expose dataset tags through the Python SDK so customers can identify
datasets labeled by Scale vs other vendors via the API.

- Add `tags` field to DatasetInfo model (returned by dataset.info())
- Add get_tags(), add_tags(), remove_tags() methods to Dataset class
- Use POST /tags/remove instead of DELETE to avoid proxy body-stripping
- Use pydantic v1/v2 compat shim for null-coercion validator
- Guard against passing a bare string instead of a list

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

v0.18.1

Toggle v0.18.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[DE-7784] Migrate sdk image deduplication to async mode (#459)

* Add support for async dedup

* Update sphinx package versions

* Update sdk version to 0.18.1

* Remove support for sync dedup in sdk

* Updater CHANGELOG

* Address greptile

v0.17.14

Toggle v0.17.14's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Enforce mutually exclusive api_key and limited_access_key inputs in N…

…ucleusClient and fix all remaining Sphinx doc build warnings (#457)

* Make different auth keys mutually exclusive

* Fix mypy errors

* Re-add UploadResponse export from init file

* Undo removal of unused imports

v0.17.13

Toggle v0.17.13's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Release v0.17.13 (#455)

* Update package version to enable release of latest changes on master

* Update CHANGELOG

v0.17.12

Toggle v0.17.12's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[DE-6999] Enable image deduplication within nucleus sdk (#452)

* Enable deduplication in nucleus sdk

* Lint fixes

* Fix import order

* Add tests for deduplication sdk

* Fix isort import formatting errors

* Add fixture for image dataset specifically for dedup

* Fix image dataset creation syntax

* Create image dataset syncrhonously

* Make dataset_with_duplicates fixture sync

* Add dedup test for scene made with video url

* Document difference between deduplicate and deduplicate_by_ids better in docstring

* Add tests to cover all ingestion forms

* Refactor tests to use DEDUP_DEFAULT_TEST_THRESHOLD constant

* Use try-finally for dataset creation and deletion

* Make edge case test docstrings more detailed

* Remove deprecated video sync upload tests

* Update test_jobs to be deterministic

* Split jobs tests into listing and retrieval separately

* Fix docstring typo

v0.17.11

Toggle v0.17.11's commit message
Added support for limited access keys (to be used with or in substitute

of api keys)

Example usage:
```
c = nucleus.NucleusClient(limited_access_key="<LIMITED_ACCESS_KEY>")
```

```
c = nucleus.NucleusClient(api_key="<API_KEY>", limited_access_key="<LIMITED_ACCESS_KEY>")
```

```
c = nucleus.NucleusClient(api_key="<API_KEY>")
```

v0.17.9

Toggle v0.17.9's commit message
export_class_label support

v0.17.8

Toggle v0.17.8's commit message
Tasked based annotations

v0.17.7

Toggle v0.17.7's commit message
 v0.17.7