Salad Inference Endpoints (SIE)

Introduction to SIE

Salad Inference Endpoints (SIE) allow you to instantly start running production quality ML models across thousands of dedicated GPUs—for a fraction of the cost of public cloud. With SIE you only pay for the time that it takes for the ML Model to process your request, reducing your bill to the minimum possible amount. SIE has ZERO COLD START TIME, we are able to do this by running the infrastructure on your behalf, when you aren't using the model, other customer requests are being serviced by that same model.

Frequently Asked Questions

How are my costs calculated?

You are only changed for the time that it takes for your model to run on the GPU enabled hardware. The time is calculated from the moment the host hardware receives your request until the response is sent back. Network transit time is NOT included in your compute time. Even though your bill shows an hourly rate, compute time is tracked at the millisecond level. Invoices are automatically sent out at the end of each month based on your usage.

How are cold starts handled?

Since SIE models are running across the Salad network 24/7, there is NO COLD START times, as soon as your request is received by SIE it will be dispatched to one of the instances of the model running, giving you the best possible response time for you inference products.

How do I access SIE

SIE is available to any Salad customer with valid billing information added to their organization. Once you have configured your account, grab your API key and head over to the SIE specific guide on how to access these models.