https://dashboard.scale.com/nucleus
Aggregate metrics in ML are not good enough. To improve production ML, you need to understand their qualitative failure modes, fix them by gathering more data, and curate diverse scenarios.
Scale Nucleus helps you:
- Visualize your data
- Curate interesting slices within your dataset
- Review and manage annotations
- Measure and debug your model performance
Nucleus is a new way—the right way—to develop ML models, helping us move away from the concept of one dataset and towards a paradigm of collections of scenarios.
$ pip install -e .
$ pip install git+ssh://git@github.com/scaleapi/nucleus-python-client.git
The first step to using the Nucleus library is instantiating a client object. The client abstractions serves to authenticate the user and act as the gateway for users to interact with their datasets, models, and model runs.
import nucleus
client = nucleus.NucleusClient("YOUR_API_KEY_HERE")response = client.create_dataset({"name": "My Dataset"})
dataset = client.get_dataset(response["dataset_id"])datasets = client.list_datasets()By specifying target dataset id. A response code of 200 indicates successful deletion.
client.delete_dataset("YOUR_DATASET_ID")You can append both local images and images from the web. Each image object is a dictionary with three fields:
datasetItem1 = {"image_url": "http://<my_image_url>", "reference_id": "my_image_name.jpg",
"metadata": {"label": "0"}}The append function expects a list of datasetItems to upload, like this:
response = dataset.append({"items": [datasetItem2]})If you're uploading a local image, you can specify a filepath as the image_url.
datasetItem2 = {"image_url": "./data_folder/my_img_001.png", "reference_id": "my_img_001.png",
"metadata": {"label": "1"}}
response = dataset.append({"items": [datasetItem2]}, local = True)For particularly large item uploads, consider using one of the example scripts located in references These scripts upload items in batches for easier debugging.
Tells us the dataset name, number of dataset items, model_runs, and slice_ids.
dataset.infoThere are three methods to access individual Dataset Items:
(1) Dataset Items are accessible by reference id
item = dataset.refloc("my_img_001.png")(2) Dataset Items are accessible by index
item = dataset.iloc(0)(3) Dataset Items are accessible by the dataset_item_id assigned internally
item = dataset.loc("dataset_item_id")Upload groundtruth annotations for the items in your dataset. Box2DAnnotation has same format as https://dashboard.scale.com/nucleus/docs/api#add-ground-truth
response = dataset.annotate({"annotations:" [Box2DAnnotation, ..., Box2DAnnotation]})For particularly large payloads, please reference the accompanying scripts in references
The model abstraction is intended to represent a unique architecture. Models are independent of any dataset.
response = client.add_model({"name": "My Model", "reference_id": "model-0.5", "metadata": {"iou_thr": 0.5}})In contrast to the model abstraction, the model run abstraction represents an experiment. A model run is associated with both a model and a dataset. A model run is meant to represent "the predictions of model y on dataset x"
Creating a model run returns a ModelRun object.
model_run = dataset.create_model_run({"reference_id": "model-0.5"})Returns the associated model_id, human-readable name of the run, status, and user specified metadata.
model_run.infoThis method populates the model_run object with predictions. Returns the associated model_id, human-readable name of the run, status, and user specified metadata. Takes a list of Box2DPredictions within the payload, where Box2DPrediction is formulated as in https://dashboard.scale.com/nucleus/docs/api#upload-model-outputs
payload = {"annotations": List[Box2DPrediction]}
model_run.predict(payload)You can access the modelRun predictions for an individual dataset_item through three methods:
(1) user specified reference_id
model_run.refloc("my_img_001.png")(2) Index
model_run.iloc(0)(3) Internally maintained dataset_item_id
model_run.loc("dataset_item_id")The commit action indicates that the user is finished uploading predictions associated with this model run. Committing a model run kicks off Nucleus internal processes to calculate performance metrics like IoU. After being committed, a ModelRun object becomes immutable.
model_run.commit()