Tags: scaleapi/nucleus-python-client
Tags
[DE-7859] Expose pHash on DatasetItem (v0.18.3) (#461) * [DE-7859] Expose pHash on DatasetItem (v0.18.3) Add a `phash` field to the DatasetItem dataclass and thread it through `from_json`. Because every SDK method that returns a DatasetItem (items_and_annotation_generator, items_generator, query_items, dataset.items, iloc/refloc/loc) deserializes through DatasetItem.from_json, exposing the field there is sufficient — no per-method changes required. Also adds a top-level CLAUDE.md with release/branch conventions and architecture pointers for future Claude Code sessions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Tighten phash field comment Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Loosen test_dataset_append_async — don't pin to step counts The upload job pipeline plans with N total_steps initially, then dynamically collapses to a single step once it knows how to short-circuit (small input → batched upload). By the time sleep_until_complete() returns, status() always reports total_steps=1, completed_steps=1 — so the hard-coded expectation of 5/5 deterministically fails on the current backend. Drop the step-count assertions and keep the meaningful invariants: job completed successfully, progress is 1.00, and completed_steps == total_steps (whatever they are). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix flaky dedup tests: compare unique_item_ids as sets `test_deduplicate_*_by_ids` runs dedup over the surviving set returned from a prior dedup and asserts the second result equals the first. The set of survivors is well-defined, but the backend doesn't guarantee a stable list order across runs — the "kept" list depends on the order in which the deduplication loop visits items, and that order can differ between the whole-dataset (cursor-paginated) and by-ids (batched-by-input) code paths. Asserting list equality therefore fails intermittently when the same items come back in a different order. Switch all four call sites (image / video-scene / video-url / by-ids-returns-job) to set comparison. The other invariants (length, `original_count`) still hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Exclude phash from DatasetItem __eq__ Adding `phash` as a regular dataclass field made every `item1 == item2` comparison sensitive to whether the backend had populated the hash — which it doesn't on every endpoint (some handlers cherry-pick columns and exclude phash, others select all columns and include it). Tests that constructed a DatasetItem locally and then compared it to the backend round-trip (test_append_and_export, test_slice_dataset_item_iterator) broke as a result. phash is a derived value (computed from image_location), so two items with the same source image should compare equal regardless of whether their hashes happen to be populated. Mark the field `compare=False` so auto-generated __eq__ ignores it, matching the natural semantics. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test_dataset_tags: poll fresh get_tags() instead of asserting on remove_tags response The DELETE /tags handler refetches the tag list immediately after the delete and returns it. In prod that refetch can hit a read replica that hasn't yet replayed the DELETE, so the response includes the just-deleted tag — making the test fail. A separate follow-up request always sees the correct state (verified against api.scale.com — first poll is already consistent at ~25ms round-trip). Tighten the test against the post-state by polling get_tags() with a 5s settle window, rather than trusting the remove_tags response. Same change applied to the idempotent-remove follow-up assertion. Backend deferred — the inconsistency is bounded and not user-impacting. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add dataset tags to SDK for identification (DE-7033) (#456) Expose dataset tags through the Python SDK so customers can identify datasets labeled by Scale vs other vendors via the API. - Add `tags` field to DatasetInfo model (returned by dataset.info()) - Add get_tags(), add_tags(), remove_tags() methods to Dataset class - Use POST /tags/remove instead of DELETE to avoid proxy body-stripping - Use pydantic v1/v2 compat shim for null-coercion validator - Guard against passing a bare string instead of a list Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Enforce mutually exclusive api_key and limited_access_key inputs in N… …ucleusClient and fix all remaining Sphinx doc build warnings (#457) * Make different auth keys mutually exclusive * Fix mypy errors * Re-add UploadResponse export from init file * Undo removal of unused imports
[DE-6999] Enable image deduplication within nucleus sdk (#452) * Enable deduplication in nucleus sdk * Lint fixes * Fix import order * Add tests for deduplication sdk * Fix isort import formatting errors * Add fixture for image dataset specifically for dedup * Fix image dataset creation syntax * Create image dataset syncrhonously * Make dataset_with_duplicates fixture sync * Add dedup test for scene made with video url * Document difference between deduplicate and deduplicate_by_ids better in docstring * Add tests to cover all ingestion forms * Refactor tests to use DEDUP_DEFAULT_TEST_THRESHOLD constant * Use try-finally for dataset creation and deletion * Make edge case test docstrings more detailed * Remove deprecated video sync upload tests * Update test_jobs to be deterministic * Split jobs tests into listing and retrieval separately * Fix docstring typo
Added support for limited access keys (to be used with or in substitute of api keys) Example usage: ``` c = nucleus.NucleusClient(limited_access_key="<LIMITED_ACCESS_KEY>") ``` ``` c = nucleus.NucleusClient(api_key="<API_KEY>", limited_access_key="<LIMITED_ACCESS_KEY>") ``` ``` c = nucleus.NucleusClient(api_key="<API_KEY>") ```
PreviousNext