All About SaladCloud Container Engine
This document is a must-read before getting started with SaladCloud. Here, we detail what SaladCloud is, the nature of our network, how SaladCloud works, unique traits of our distributed network, the choice of consumer GPUs over data center GPUs and other frequently asked questions.
-
What is Salad?
- SaladCloud is the world’s largest distributed cloud network with 1000s of consumer GPUs at the lowest cost. Our cloud is powered by unused latent compute shared by individuals & businesses around the world.
-
What is SaladCloud Container Engine (SCE)?
- Workloads are deployed to SaladCloud via docker containers. SCE is a massively scalable orchestration engine, purpose-built to simplify this container development.
- Containerize your model and inference server, choose the hardware and we take care of the rest.
-
How big is the SaladCloud Compute Network?
- Over 2 Million+ GPU owners in 190+ countries are part of the SaladCloud ecosystem. 450,000+ GPUs having contributed significant compute and everyday, 11,000+ GPUs are active on the network.
-
What kind of GPUs does SaladCloud have?
- All GPUs on SaladCloud belong to the RTX/GTX class of GPUs from Nvidia. Our GPU selection policy is strict and we only onboard AI-enabled, high performance compute capable GPUs to the network.
-
How does SaladCloud work?
- GPUs on SaladCloud are similar to spot instances. On one side, we have GPU owners who contribute their resources to SaladCloud when not in use. Some providers share GPUs for 20-22 hours a day. Others share GPUs for 1-2 hours per day. Users running workloads on SaladCloud select the GPU types and quantity they need. SaladCloud handles all the orchestration in the backend and ensures you will have uninterrupted GPU time as per requirements.
-
How does security work on Salad?
- Every day, 100s of businesses run production workloads on SaladCloud securely. We have several layers of security to keep your containers safe, encrypting them in transit, and at rest. Containers run in an isolated environment on our nodes - keeping your data isolated and also ensuring you have the same compute environment regardless of the machine you’re running on.
-
Why do owners share GPUs with Salad?
- Owners earn rewards (in the form of Salad balance) for sharing their compute. Many compute providers earn 200 per month on SaladCloud as a reward that they exchange for games, gift cards and more.
-
What if a host tries to access my container?
-
Our constant host intrusion detection tests look for operations like folder access, opening a shell, etc. If a host machine tries to access the linux environment, we automatically implode the environment and blacklist the machine. We’re also bringing Falco into our runtime for a more robust set of checks.
-
This diagram explains the SaladCloud architecture to protect workloads from malicious software and suppliers from malicious workloads.
-
-
-
What are some unique traits of SaladCloud compared to other clouds?
- Since SaladCloud is a compute-share network, our GPUs have longer cold start times than usual, and are subject to interruption.
- We only have RTX/GTX class of GPUs from Nvidia. Our thesis is that most AI/ML production workloads get better cost-performance on consumer-grade GPUs.
- The highest vRAM on the network is 24 GB.
- Workloads requiring extremely low latency times are not a fit for our network.
-
What are SaladCloud Endpoints/APIs?
- SaladCloud has one API offering today - a full-featured Transcription API. We will be adding more APIs to our suite in the coming months.
-
How does latency work on Salad?
- Since our GPUs are consumer grade and shared, higher latency is to be expected. The best use cases for SaladCloud’s network are ones that DO NOT have extremely low latency requirements.
- Many companies run production workloads on SaladCloud with acceptable latency times. One of SaladCloud’s largest users is an AI image generation tool that serves millions of users. Our team can work with you on the right architecture to ensure your latency requirements are met.
-
How do I troubleshoot issues on Salad?
- All SaladCloud users have access to logs by configuring external logging. Axiom is our preferred external logging service provider. Your logs are seamlessly transmitted to Axiom for troubleshooting.
- SaladCloud users can also view their logs directly in the portal.
- SaladCloud users can access a terminal in a running container instance.
-
What should I be aware of before deploying a workload to Salad?
- Cold start time is high on SaladCloud. Give it a few minutes to let your containers up and running.
- Instances will be interrupted with no warning. Architect your application accordingly, such as retrying failed requests, and running multiple replicas to provide coverage during automatic fail-overs.
- Network performance will vary from node to node, due the distributed, residential nature of the network.
- For Mac developers, be sure to build your containers for amd64, not arm64.
- If you’re looking to implement networking on your instances within a container group by leveraging SaladCloud’s container gateway, you’ll need to configure IPv6 within your container image.
-
What happens if a GPU goes offline?
- SaladCloud Container Engine automatically reallocates your workload to another GPU (same type and class) when a resource goes offline.
-
How can I be sure a GPU is performant?
- We use a proprietary trust rating system to index node performance, forecast availability, and select the optimal hardware configuration for deployment. We also run proprietary tests on every GPU to determine their fit for our network.
-
Do you have a checklist to track before deploying a workload to Salad?
- This checklist is for SCE. For checklists for APIs and SGS, check the relevant docs page.
- Test your docker container in a local environment
- Make sure you select enough vCPU, RAM, and vRAM for your application
- Setup container gateway or a job queue
- Make sure you’re not running an empty container (i.e. Ubuntu) without setting the command to “sleep infinity”.
- Run your workload with at least 3 replicas to account for latency.
- This checklist is for SCE. For checklists for APIs and SGS, check the relevant docs page.
-
Are consumer-grade GPUs good for AI workloads?
- Yes. Many consumer GPUs on our network offer comparable or even better performance for AI workloads. You can read all our benchmarks here.
-
What is the expected uptime on SaladCloud?
-
SaladCloud’s uptime is fundamentally different from other providers. While individual nodes on SaladCloud have a reliability of 90%-95%, the combined reliability of the system is higher, thanks to redundancy and higher no. of nodes.
-
Here is a real example over 24 hours for a production AI image generation workload with 100 requested nodes. As we would expect, it’s fairly uncommon for all 100 to be running at the same time, but 100% of the time, we have at least 82 live nodes. For this customer, 82 simultaneous nodes offered plenty of throughput to keep up with their own internal SLOs, and provided a 0-downtime experience.
-
-
What is the right SaladCloud product for my workload?
-
SaladCloud has three product lines:
- SaladCloud Container Engine (SCE): a managed container engine built for deploying GPU-intensive containers at a massive scale.
- SaladCloud Gateway Service (SGS): a proxy service that routes requests from VPN operators or other customers who need distributed residential IP infrastructure to SaladCloud nodes.
- SaladCloud Transcription API: the lowest-priced transcription service in the market with unparalleled accuracy.
-
-
How does the billing work?
- SaladCloud bills per use at the org level. You can view your current bill at the org level.
- You are billed per-second only for container instances in the “running” state. Instances in other states such as “downloading” and “allocating” are not billed.
- Bandwidth is not metered, and there are no “stale charges” related to storage for containers that are not running.
-
What about compliance on Salad?
- We are SOC 2 Type 1 compliant. You can read more about our compliance here.
Was this page helpful?