Overview
This recipe runsBAAI/bge-m3 with
Hugging Face Text Embeddings Inference on a Salad GPU. It
serves dense embeddings for RAG, semantic search, document retrieval, and similarity search workloads.
BGE-M3 is useful when you need one embedding model that handles multilingual content, short queries, and longer
documents. The model supports more than 100 languages, accepts inputs up to 8192 tokens, and returns 1024-dimensional
dense vectors that can be stored in a vector database.
The BGE-M3 model family also supports sparse and ColBERT-style retrieval modes through FlagEmbedding. This Salad recipe
is focused on the dense embedding endpoint exposed by TEI because that is the interface most RAG frameworks and vector
search systems expect.
Quick Start
- Open the SaladCloud Portal.
- Deploy the BGE-M3 Embeddings recipe.
- Enter a Container Group Name.
- Decide whether to enable Require Container Gateway Authentication:
- Enabled: requests must include your SaladCloud API key.
- Disabled: anyone with the URL can call the embedding service.
- Deploy and wait for the first startup to finish.
The model is downloaded from Hugging Face at startup, so it can take several minutes before the deployment becomes
ready.
/embed for TEI’s native embedding API or /v1/embeddings for the OpenAI-compatible
API.
Defaults
The recipe comes preconfigured with these defaults:- Server: Hugging Face Text Embeddings Inference
- Model ID:
BAAI/bge-m3 - Container image:
ghcr.io/huggingface/text-embeddings-inference:89-1.9 - Command equivalent:
text-embeddings-router --model-id BAAI/bge-m3 --port 3000 - Container port:
3000 - Host bind:
:: - Data type:
float16 - Max batch tokens:
16384 - Max client batch size:
32 - Readiness probe:
GET /health - Authentication: enabled by default
API Endpoints
Useful TEI endpoints include:GET /health- readiness probe and health checkGET /info- model and server metadataGET /docs- Swagger documentationPOST /embed- TEI native dense embedding endpointPOST /v1/embeddings- OpenAI-compatible embeddings endpointPOST /similarity- TEI similarity endpoint
Authentication
Require Container Gateway Authentication is available in the deployment form and is enabled by default.- Enabled: every request must include the
Salad-Api-Keyheader. - Disabled: anyone with the deployment URL can call the API.
Example Request
Use/embed to generate dense vectors with TEI’s native API:
Salad-Api-Key header.
Each returned vector should contain 1024 floating-point values.
OpenAI-Compatible Request
Use/v1/embeddings if your client expects an OpenAI-compatible embeddings API:
Salad-Api-Key header.
Test A Deployment
Check health:1024.
Tuning Notes
- Keep
MODEL_IDset toBAAI/bge-m3unless you intentionally want to repurpose the recipe. - BGE-M3 supports long inputs, but long batches use more VRAM. Lower
MAX_BATCH_TOKENSif replicas run out of memory. - Increase
MAX_CLIENT_BATCH_SIZEonly after load testing your expected request shape. - If you want private or gated Hugging Face models in a customized deployment, add
HF_TOKENin Advanced Configuration.