Run Ollama
Ollama is a toolkit for deploying and service Large Language Models (LLMs). Ollama enables local operation of open-source large language models like Llama 2, simplifying setup and configuration, including GPU usage, and providing a library of supported models.
Ollama API
Ollama can be used as an API that can:
- Generate text completions using different language models and tags.
- Stream responses in JSON format or receive them as single objects.
- Include optional parameters such as images, formatting options, and system messages.
- Maintain conversational memory using the context parameter.
- Control response streaming and model memory retention.
For detailed instructions and examples, refer to the Ollama documentation
Deploying Ollama on Salad
Container
Ollama provides a pre-built docker available via the Docker Container registry : https://hub.docker.com/r/ollama/ollama
In order to deploy the container on Salad, you will need to specify the image:
All the other options can be specified using api requests.
Required - Container Gateway Setup
In addition you need to specify the port your api will be available through. Default port for Ollama is 11434
Was this page helpful?