AWS ML BlogThursday · June 18, 2026FREE

Amazon SageMaker AI Async Inference now supports inline request payloads

sagemakerawsasync-inferenceinline-payload

Amazon SageMaker AI Async Inference has introduced support for inline request payloads, enabling users to send payloads directly within the invocation request rather than uploading them to Amazon S3. This update simplifies the workflow for small payloads by removing the extra step of uploading to S3, which can reduce latency and operational overhead. The feature is available in all AWS regions where SageMaker AI Async Inference is supported. Users can now specify the payload inline in the request body, making it easier to integrate with applications that generate small inference requests. This change does not affect the existing S3-based payload option, which remains available for larger payloads or when asynchronous processing is preferred. The inline payload feature is particularly useful for real-time or near-real-time inference scenarios where minimizing round trips is important.

// why it matters

Simplifies async inference for small payloads by removing the need for S3 uploads.

Sources

Primary · AWS ML Blog
▸ Read original at aws.amazon.com

Like this? Get the next digest.

Amazon SageMaker AI Async Inference now supports inline request payloads — aigest.dev