You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
and [`output`](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#inputs-and-outputs)
37
37
properties. These properties will allow Triton to load the Python model with
38
-
[Minimal Model Configuration](https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md#minimal-model-configuration)
38
+
[Minimal Model Configuration](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#minimal-model-configuration)
Copy file name to clipboardExpand all lines: examples/decoupled/README.md
+3-2Lines changed: 3 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -36,7 +36,8 @@ how to write a decoupled model where each request can generate 0 to many respons
36
36
These files are heavily commented to describe each function call.
37
37
These example models are designed to show the flexibility available to decoupled models
38
38
and in no way should be used in production. These examples circumvents
39
-
the restriction placed by the [instance count](https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md#instance-groups)
Copy file name to clipboardExpand all lines: inferentia/README.md
+5-4Lines changed: 5 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -239,22 +239,23 @@ their need.
239
239
240
240
To enable dynamic batching, `--enable_dynamic_batching`
241
241
flag needs to be specified. `gen_triton_model.py` supports following three
242
-
options for configuring [Triton's dynamic batching](https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md):
242
+
options for configuring [Triton's dynamic batching](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md):
243
243
244
-
1.`--preferred_batch_size`: Please refer to [model configuration documentation](https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md#preferred-batch-sizes) for details on preferred batch size. To optimize
244
+
1.`--preferred_batch_size`: Please refer to [model configuration documentation](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#preferred-batch-sizes) for details on preferred batch size. To optimize
245
245
performance, this is recommended to be multiples of engaged neuron cores.
246
246
For example, if each instance is using 2 neuron cores, `preferred_batch_size`
247
247
could be 2, 4 or 6.
248
248
2.`--max_queue_delay_microseconds`: Please refer to
249
-
[model configuration documentation](https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md#delayed-batching) for details.
249
+
[model configuration documentation](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#delayed-batching) for details.
250
250
3.`--disable_batch_requests_to_neuron`: Enable the non-default way for Triton to
251
251
handle batched requests. Triton backend will send each request to neuron
252
252
separately, irrespective of if the Triton server requests are batched.
253
253
This flag is recommended when users want to optimize performance with models
254
254
that do not perform well with batching without the flag.
255
255
256
256
Additionally, `--max_batch_size` will affect the maximum batching limit. Please
257
-
refer to the [model configuration documentation](https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md#maximum-batch-size)
0 commit comments