Skip to content

Commit 42dce79

Browse files
authored
Various fixes for integration tests (#111)
* update model bundles demo code for integration test * add back sync * various fixes
1 parent 50c8365 commit 42dce79

File tree

5 files changed

+80
-25
lines changed

5 files changed

+80
-25
lines changed

docs/concepts/batch_jobs.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ batch_job = client.batch_async_request(
2525
{"x": 2, "y": "hello"},
2626
{"x": 3, "y": "world"},
2727
],
28+
gpus=0,
2829
labels={
2930
"team": "MY_TEAM",
3031
"product": "MY_PRODUCT",

docs/concepts/endpoint_predictions.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@
33
Once endpoints have been created, users can send tasks to them to make
44
predictions. The following code snippet shows how to send tasks to endpoints.
55

6-
76
=== "Sending a Task to an Async Endpoint"
87
```py
98
import os
@@ -35,7 +34,8 @@ predictions. The following code snippet shows how to send tasks to endpoints.
3534
client = LaunchClient(api_key=os.getenv("LAUNCH_API_KEY"))
3635
endpoint = client.get_model_endpoint("demo-endpoint-streaming")
3736
response = endpoint.predict(request=EndpointRequest(args={"x": 2, "y": "hello"}))
38-
print(response)
37+
for chunk in response:
38+
print(chunk)
3939
```
4040

4141
::: launch.model_endpoint.EndpointRequest

docs/concepts/model_bundles.md

Lines changed: 14 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,16 @@
11
# Model Bundles
22

33
Model Bundles are deployable models that can be used to make predictions. They
4-
are created by packaging a model up into a deployable format.
4+
are created by packaging a model up into a deployable format.
55

66
## Creating Model Bundles
77

88
There are five methods for creating model bundles:
99
[`create_model_bundle_from_callable_v2`](/api/client/#launch.client.LaunchClient.create_model_bundle_from_callable_v2),
1010
[`create_model_bundle_from_dirs_v2`](/api/client/#launch.client.LaunchClient.create_model_bundle_from_dirs_v2),
1111
[`create_model_bundle_from_runnable_image_v2`](/api/client/#launch.client.LaunchClient.create_model_bundle_from_runnable_image_v2),
12-
[`create_model_bundle_from_triton_enhanced_runnable_image_v2`](/api/client/#launch.client.LaunchClient.create_model_bundle_from_triton_enhanced_runnable_image_v2), and [`create_model_bundle_from_streaming_enhanced_runnable_image_v2`](/api/client/#launch.client.LaunchClient.create_model_bundle_from_streaming_enhanced_runnable_image_v2).
12+
[`create_model_bundle_from_triton_enhanced_runnable_image_v2`](/api/client/#launch.client.LaunchClient.create_model_bundle_from_triton_enhanced_runnable_image_v2),
13+
and [`create_model_bundle_from_streaming_enhanced_runnable_image_v2`](/api/client/#launch.client.LaunchClient.create_model_bundle_from_streaming_enhanced_runnable_image_v2).
1314

1415
The first directly pickles a user-specified `load_predict_fn`, a function which
1516
loads the model and returns a `predict_fn`, a function which takes in a request.
@@ -20,7 +21,8 @@ requests at port 5005 using HTTP and exposes `POST /predict` and
2021
`GET /readyz` endpoints.
2122
The fourth is a variant of the third that also starts an instance of the NVidia
2223
Triton framework for efficient model serving.
23-
The fifth is a variant of the third that responds with a stream of SSEs at `POST /stream` (the user can decide whether `POST /predict` is also exposed).
24+
The fifth is a variant of the third that responds with a stream of SSEs at `POST /stream` (the user
25+
can decide whether `POST /predict` is also exposed).
2426

2527
Each of these modes of creating a model bundle is called a "Flavor".
2628

@@ -57,7 +59,6 @@ Each of these modes of creating a model bundle is called a "Flavor".
5759
* You want to use a `RunnableImageFlavor`
5860
* You also want to support token streaming while the model is generating
5961

60-
6162
=== "Creating From Callables"
6263
```py
6364
import os
@@ -132,7 +133,7 @@ Each of these modes of creating a model bundle is called a "Flavor".
132133
""")
133134

134135
requirements_filename = os.path.join(directory, "requirements.txt")
135-
with open(predict_filename, "w") as f:
136+
with open(requirements_filename, "w") as f:
136137
f.write("""
137138
pytest==7.2.1
138139
numpy
@@ -157,13 +158,13 @@ Each of these modes of creating a model bundle is called a "Flavor".
157158
__root__: int
158159
159160
BUNDLE_PARAMS = {
160-
"model_bundle_name": "test-bundle",
161+
"model_bundle_name": "test-bundle-from-dirs",
161162
"base_paths": [directory],
162163
"load_predict_fn_module_path": "predict.my_load_predict_fn",
163164
"load_model_fn_module_path": "model.my_load_model_fn",
164165
"request_schema": MyRequestSchema,
165166
"response_schema": MyResponseSchema,
166-
"requirements_path": "requirements.txt",
167+
"requirements_path": requirements_filename,
167168
"pytorch_image_tag": "1.7.1-cuda11.0-cudnn8-runtime",
168169
}
169170

@@ -173,6 +174,7 @@ Each of these modes of creating a model bundle is called a "Flavor".
173174
# Clean up files from demo
174175
os.remove(model_filename)
175176
os.remove(predict_filename)
177+
os.remove(requirements_filename)
176178
os.rmdir(directory)
177179
```
178180

@@ -197,9 +199,7 @@ Each of these modes of creating a model bundle is called a "Flavor".
197199
"response_schema": MyResponseSchema,
198200
"repository": "...",
199201
"tag": "...",
200-
"command": [
201-
...
202-
],
202+
"command": ...,
203203
"env": {
204204
"TEST_KEY": "test_value",
205205
},
@@ -227,14 +227,12 @@ Each of these modes of creating a model bundle is called a "Flavor".
227227

228228

229229
BUNDLE_PARAMS = {
230-
"model_bundle_name": "test-bundle",
230+
"model_bundle_name": "test-triton-bundle",
231231
"request_schema": MyRequestSchema,
232232
"response_schema": MyResponseSchema,
233233
"repository": "...",
234234
"tag": "...",
235-
"command": [
236-
...
237-
],
235+
"command": ...,
238236
"env": {
239237
"TEST_KEY": "test_value",
240238
},
@@ -274,12 +272,8 @@ Each of these modes of creating a model bundle is called a "Flavor".
274272
"response_schema": MyResponseSchema,
275273
"repository": "...",
276274
"tag": "...",
277-
"command": [ # optional; if provided, will also expose the /predict endpoint
278-
...
279-
],
280-
"streaming_command": [ # required
281-
...
282-
],
275+
"command": ..., # optional; if provided, will also expose the /predict endpoint
276+
"streaming_command": ..., # required
283277
"env": {
284278
"TEST_KEY": "test_value",
285279
},

docs/concepts/model_endpoints.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,8 @@ of CPUs, amount of memory, GPU count, and type of GPU.
1010
Endpoints can be asynchronous, synchronous, or streaming. Asynchronous endpoints return
1111
a future immediately after receiving a request, and the future can be used to
1212
retrieve the prediction once it is ready. Synchronous endpoints return the
13-
prediction directly after receiving a request. Streaming endpoints are variants of synchronous endpoints that return a stream of SSEs instead of a single HTTP response.
13+
prediction directly after receiving a request. Streaming endpoints are variants of synchronous
14+
endpoints that return a stream of SSEs instead of a single HTTP response.
1415

1516
!!! info
1617
# Choosing the right inference mode
@@ -90,7 +91,8 @@ endpoint = client.create_model_endpoint(
9091

9192
## Creating Streaming Model Endpoints
9293

93-
Streaming model endpoints are variants of sync model endpoints that are useful for tasks with strict requirements on perceived latency. Streaming endpoints are more expensive than async endpoints.
94+
Streaming model endpoints are variants of sync model endpoints that are useful for tasks with strict
95+
requirements on perceived latency. Streaming endpoints are more expensive than async endpoints.
9496
!!! Note
9597
Streaming model endpoints require at least 1 `min_worker`.
9698

@@ -104,6 +106,7 @@ endpoint = client.create_model_endpoint(
104106
model_bundle="test-streaming-bundle",
105107
cpus=1,
106108
min_workers=1,
109+
per_worker=1,
107110
endpoint_type="streaming",
108111
update_if_exists=True,
109112
labels={
@@ -131,7 +134,7 @@ from launch import LaunchClient
131134

132135
client = LaunchClient(api_key=os.getenv("LAUNCH_API_KEY"))
133136
client.edit_model_endpoint(
134-
model_endpoint="demo-endpoint",
137+
model_endpoint="demo-endpoint-sync",
135138
max_workers=2,
136139
)
137140
```

launch/client.py

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1904,6 +1904,63 @@ def _streaming_request(
19041904
)
19051905
return response
19061906

1907+
def _sync_request(
1908+
self,
1909+
endpoint_name: str,
1910+
url: Optional[str] = None,
1911+
args: Optional[Dict] = None,
1912+
return_pickled: bool = False,
1913+
) -> Dict[str, Any]:
1914+
"""
1915+
Not recommended for use, instead use functions provided by SyncEndpoint Makes a request
1916+
to the Sync Model Endpoint at endpoint_id, and blocks until request completion or
1917+
timeout. Endpoint at endpoint_id must be a SyncEndpoint, otherwise this request will fail.
1918+
1919+
Parameters:
1920+
endpoint_name: The name of the endpoint to make the request to
1921+
1922+
url: A url that points to a file containing model input. Must be accessible by Scale
1923+
Launch, hence it needs to either be public or a signedURL. **Note**: the contents of
1924+
the file located at ``url`` are opened as a sequence of ``bytes`` and passed to the
1925+
predict function. If you instead want to pass the url itself as an input to the
1926+
predict function, see ``args``.
1927+
1928+
args: A dictionary of arguments to the ``predict`` function defined in your model
1929+
bundle. Must be json-serializable, i.e. composed of ``str``, ``int``, ``float``,
1930+
etc. If your ``predict`` function has signature ``predict(foo, bar)``, then args
1931+
should be a dictionary with keys ``foo`` and ``bar``. Exactly one of ``url`` and
1932+
``args`` must be specified.
1933+
1934+
return_pickled: Whether the python object returned is pickled, or directly written to
1935+
the file returned.
1936+
1937+
Returns:
1938+
A dictionary with key either ``"result_url"`` or ``"result"``, depending on the value
1939+
of ``return_pickled``. If ``return_pickled`` is true, the key will be ``"result_url"``,
1940+
and the value is a signedUrl that contains a cloudpickled Python object,
1941+
the result of running inference on the model input.
1942+
Example output:
1943+
``https://foo.s3.us-west-2.amazonaws.com/bar/baz/qux?xyzzy``
1944+
1945+
Otherwise, if ``return_pickled`` is false, the key will be ``"result"``,
1946+
and the value is the output of the endpoint's ``predict`` function, serialized as json.
1947+
"""
1948+
validate_task_request(url=url, args=args)
1949+
endpoint = self.get_model_endpoint(endpoint_name)
1950+
endpoint_id = endpoint.model_endpoint.id # type: ignore
1951+
with ApiClient(self.configuration) as api_client:
1952+
api_instance = DefaultApi(api_client)
1953+
payload = dict_not_none(return_pickled=return_pickled, url=url, args=args)
1954+
request = EndpointPredictV1Request(**payload)
1955+
query_params = frozendict({"model_endpoint_id": endpoint_id})
1956+
response = api_instance.create_sync_inference_task_v1_sync_tasks_post( # type: ignore
1957+
body=request,
1958+
query_params=query_params,
1959+
skip_deserialization=True,
1960+
)
1961+
resp = json.loads(response.response.data)
1962+
return resp
1963+
19071964
def _async_request(
19081965
self,
19091966
endpoint_name: str,

0 commit comments

Comments
 (0)