Ollama is a toolkit for deploying and service Large Language Models (LLMs). Ollama enables local operation of open-source large language models like Llama 2, simplifying setup and configuration, including GPU usage, and providing a library of supported models.

Learn More Here

Ollama API

Ollama can be used as an API that can:

  • Generate text completions using different language models and tags.
  • Stream responses in JSON format or receive them as single objects.
  • Include optional parameters such as images, formatting options, and system messages.
  • Maintain conversational memory using the context parameter.
  • Control response streaming and model memory retention.

For detailed instructions and examples, refer to the Ollama documentation

Deploying Ollama on Salad

Container

Ollama provides a pre-built docker available via the Docker Container registry : https://hub.docker.com/r/ollama/ollama

In order to deploy the container on Salad, you will need to specify the image:

All the other options can be specified using api requests.

Required - Container Gateway Setup

In addition you need to specify the port your api will be available through. Default port for Ollama is 11434