Getting started with SaladCloud
This document is a must-read before getting started with SaladCloud. Here, we detail what Salad is, the nature of our network, how SaladCloud works, unique traits of our distributed network, the choice of consumer GPUs over datacenter GPUs and other frequently asked questions.
-
What is Salad?
- Salad is the world’s largest distributed cloud network with 1000s of consumer GPUs at the lowest cost. Our cloud is powered by unused latent compute shared by individuals & businesses around the world.
-
How big is the Salad Compute Network?
- Over 2 Million+ GPU owners in 190+ countries are part of the Salad ecosystem. 450,000+ GPUs having contributed significant compute and everyday, 11,000+ GPUs are active on the network.
-
What kind of GPUs does Salad have?
- All GPUs on Salad belong to the RTX/GTX class of GPUs from Nvidia. Our GPU selection policy is strict and we only onboard AI-enabled, high performance compute capable GPUs to the network.
-
How does Salad work?
- GPUs on Salad are similar to spot instances. On one side, we have GPU owners who contribute their resources to Salad when not in use. Some providers share GPUs for 20-22 hours a day. Others share GPUs for 1-2 hours per day. Users running workloads on Salad select the GPU types and quantity they need. Salad handles all the orchestration in the backend and ensures you will have uninterrupted GPU time as per requirements.
-
How does security work on Salad?
- Every day, 100s of businesses run production workloads on Salad securely. We have several layers of security to keep your containers safe, encrypting them in transit, and at rest. Containers run in an isolated environment on our nodes - keeping your data isolated and also ensuring you have the same compute environment regardless of the machine you’re running on.
-
Why do owners share GPUs with Salad?
- Owners earn rewards (in the form of Salad balance) for sharing their compute. Many compute providers earn 200 per month on Salad as a reward that they exchange for games, gift cards and more.
-
What if a host tries to access my container?
- Our constant host intrusion detection tests look for operations like folder access, opening a shell, etc. If a host machine tries to access the linux environment, we automatically implode the environment and blacklist the machine. We’re also bringing Falco into our runtime for a more robust set of checks.
-
This diagram explains the SaladCloud architecture to protect workloads from malicious software and suppliers from malicious workloads.
-
- Our constant host intrusion detection tests look for operations like folder access, opening a shell, etc. If a host machine tries to access the linux environment, we automatically implode the environment and blacklist the machine. We’re also bringing Falco into our runtime for a more robust set of checks.
-
What are some unique traits of Salad compared to other clouds?
- Since Salad is a computeshare network, our GPUs have longer cold start times than usual, and are subject to interruption.
- We only have RTX/GTX class of GPUs from Nvidia. Our thesis is that most AI/ML production workloads get better cost-performance on consumer-grade GPUs.
- The highest vRAM on the network is 24 GB.
- Workloads requiring extremely low latency times are not a fit for our network.
-
What is Salad Container Engine? How does that help me?
- Workloads are deployed to Salad via docker containers. SCE is a massively scalable orchestration engine, purpose-built to simplify this container development.
- Containerize your model and inference server, choose the hardware and we take care of the rest. No extra DevOps work or changes to your workflow.
-
What are Salad Endpoints/APIs?
- Salad has two API offerings today - a Transcription API and a Dreambooth API for stable diffusion fine-tuning. We will be adding more APIs to our suite in the coming months.
-
How does latency work on Salad?
- Since our GPUs are consumer grade and shared, higher latency is to be expected. The best use cases for Salad’s network are ones that DO NOT have extremely low latency requirements.
- Many companies run production workloads on Salad with acceptable latency times. One of Salad’s largest users is an AI image generation tool that serves millions of users. Our team can work with you on the right architecture to ensure your latency requirements are met.
-
How do I troubleshoot issues on Salad?
- All Salad users have access to logs by configuring external logging. Axiom is our preferred external logging service provider. Your logs are seamlessly transmitted to Axiom for troubleshooting.
- Salad users can also view their logs directly in the portal.
- Salad users can access a terminal in a running container instance.
-
What should I be aware of before deploying a workload to Salad?
- Cold start time is high on Salad. Give it a few minutes to let your containers up and running.
- Instances will be interrupted with no warning. Architect your application accordingly, such as retrying failed requests, and running multiple replicas to provide coverage during automatic fail-overs.
- Network performance will vary from node to node, due the distributed, residential nature of the network.
- For Mac developers, be sure to build your containers for amd64, not arm64.
-
What happens if a GPU goes offline?
- Salad Container Engine automatically reallocates your workload to another GPU (same type and class) when a resource goes offline.
-
How can I be sure a GPU is performant?
- We use a proprietary trust rating system to index node performance, forecast availability, and select the optimal hardware configuration for deployment. We also run proprietary tests on every GPU to determine their fit for our network.
-
Do you have a checklist to track before deploying a workload to Salad?
- This checklist is for SCE. For checklists for APIs and SGS, check the relevant docs page.
- Test your docker container in a local environment
- Make sure you select enough vCPU, RAM, and vRAM for your application
- Setup container gateway or a job queue
- Make sure you’re not running an empty container (i.e. Ubuntu) without setting the command to “sleep infinity”.
- Run your workload with at least 3 replicas to account for latency.
- This checklist is for SCE. For checklists for APIs and SGS, check the relevant docs page.
-
Are consumer-grade GPUs good for AI workloads?
- Yes. Many consumer GPUs on our network offer comparable or even better performance for AI workloads. You can read all our benchmarks here.
-
What is the expected uptime on SaladCloud?
-
Salad’s uptime is fundamentally different from other providers. While individual nodes on Salad have a reliability of 90%-95%, the combined reliability of the system is higher, thanks to redundancy and higher no. of nodes.
-
Here is a real example over 24 hours for a production AI image generation workload with 100 requested nodes. As we would expect, it’s fairly uncommon for all 100 to be running at the same time, but 100% of the time, we have at least 82 live nodes. For this customer, 82 simultaneous nodes offered plenty of throughput to keep up with their own internal SLOs, and provided a 0-downtime experience.
-
-
What is the right Salad product for my workload?
- Salad has four product lines: Salad Container Engine (SCE), Salad Gateway Service (SGS), Salad Transcription API and Salad Dreambooth API.
-
How does the billing work?
- Salad bills per use at the org level. You can view your current bill at the org level.
- You are billed per-second only for container instances in the “running” state. Instances in other states such as “downloading” and “allocating” are not billed.
- Bandwidth is not metered.
-
What about compliance on Salad?
- We are SOC 2 Type 1 compliant. You can read more about our compliance here.
Was this page helpful?