Cog is an open-source tool that offers CLI tools, Python modules, and an HTTP prediction server, facilitating the development of AI applications. Some applications utilize only its prediction server, while others also leverage its CLI tools to manage container images.

All Cog-based applications can be easily run on SaladCloud. Here are the main scenarios and suggested approaches:

NoScenarioDescription
1Deploy the Cog-based images directly on SaladCloudRun the images without any modifications. If a load balancer is in place for inbound traffic, override the ENTRYPOINT and CMD settings of the images by using SaladCloud Portal or APIs, and configure the Cog HTTP prediction server to use IPv6 before starting the server.
2Build a wrapper image for SaladCloud based on an existing Cog-based imageCreate a new Dockerfile without needing to modify the original dockerfile. Introduce new features and incorporate IPv6 support if applications need to process inbound traffic through a load balancer.
3Build an image using Cog HTTP prediction server for SaladCloudUse only the Cog HTTP prediction server without its CLI tools. Directly work with the Dockerfile for flexible and precise control over the construction of the image.
4Build an image using both Cog CLI tools and HTTP prediction server for SaladCloudUse Cog CLI tools and the cog.yaml file to manage the Dockerfile and image.

Scenario 1: Deploy the Cog-based images directly on SaladCloud

All Cog-based images can directly run on SaladCloud without any modifications, and you can leverage a load balancer or a job queue along with Cloud Storage for input and output.

To receive inbound traffic through a load balancer, containers running on SaladCloud need to listen on an IPv6 port. However, the Cog HTTP prediction server currently only uses IPv4 that cannot be configured via an environment variable. This issue can be easily addressed by executing an additional command when running an image on SaladCloud. Let’s take the BLIP and Whisper images, built by Replicate, as examples and walk through the steps to run these Cog-based images on SaladCloud.

First, run the BLIP image locally to verify the location of the site-packages directory and check the source code of Cog HTTP predict server:

# Check the installation directory of the server code.

ubuntu@asus:~$ docker run --rm --gpus all -it r8.im/salesforce/blip:latest python -m site

sys.path = [
    '/src',
    '/root/.pyenv/versions/3.8.13/lib/python38.zip',
    '/root/.pyenv/versions/3.8.13/lib/python3.8',
    '/root/.pyenv/versions/3.8.13/lib/python3.8/lib-dynload',
    '/root/.pyenv/versions/3.8.13/lib/python3.8/site-packages',
]
USER_BASE: '/root/.local' (doesn't exist)
USER_SITE: '/root/.local/lib/python3.8/site-packages' (doesn't exist)
ENABLE_USER_SITE: True

# Locate the server code file.

ubuntu@asus:~$ docker run --rm --gpus all -it r8.im/salesforce/blip:latest cat /root/.pyenv/versions/3.8.13/lib/python3.8/site-packages/cog/server/http.py

# Check the host used by the server.

ubuntu@asus:~$ docker run --rm --gpus all -it r8.im/salesforce/blip:latest cat /root/.pyenv/versions/3.8.13/lib/python3.8/site-packages/cog/server/http.py | grep -A2 0.0.0.0

        host="0.0.0.0",
        port=5000,

After locating the server code file named “http.py”, we simply need to execute an additional command ‘sed’ to modify the code for IPv6 by replacing ‘0.0.0.0’ with ‘::’.

sed -i 's/0.0.0.0/::/g' /root/.pyenv/versions/3.8.13/lib/python3.8/site-packages/cog/server/http.py

Most Cog-based containers start by running “python -m cog.server.http” (defined by a ‘CMD’ within the Dockerfile) as PID 1, but let’s check the BLIP image:

# Run the container.

ubuntu@asus:~$ docker run --rm --gpus all -it r8.im/salesforce/blip:latest

INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:5000 (Press CTRL+C to quit)

ubuntu@asus:~$ docker ps

CONTAINER ID   IMAGE                          COMMAND                  CREATED              STATUS              PORTS      NAMES
865dac78a8ef   r8.im/salesforce/blip:latest   "python -m cog.serve…"   About a minute ago   Up About a minute   5000/tcp   jovial_johnson

# Open a new terminal and run 'ps -ef' within the container.

ubuntu@asus:~$ docker exec -it 865 ps -ef

UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0 12 18:34 pts/0    00:00:30 /root/.pyenv/versions/3.8.13/bin/python -m cog.server.http
root       103     0  0 18:39 pts/1    00:00:00 ps -ef

# The PID 1 is 'python -m cog.server.http'.

Some Cog-based containers use Tini as the init process within the container. In this setup, Tini is specified as the first command (defined by an ‘ENTRYPOINT’) and runs as PID 1; ‘python -m cog.server.http’ is specified as its parameter (also defined by a ‘CMD’) and runs as its single child process. let’s check the Whisper image:

# Run the container.

ubuntu@asus:~$ docker run --rm --gpus all -it r8.im/openai/whisper

INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:5000 (Press CTRL+C to quit)

ubuntu@asus:~$ docker ps

CONTAINER ID   IMAGE                  COMMAND                     CREATED              STATUS              PORTS                    NAMES
9397ce073a7e   r8.im/openai/whisper   "/sbin/tini -- python..."   About a minute ago   Up About a minute   0.0.0.0:5000->5000/tcp   awesome_hawking

# Open a new terminal and run 'ps -ef' within the container.

ubuntu@asus:~$ docker exec -it 939 ps -ef

UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 15:27 pts/0    00:00:00 /sbin/tini -- python -m cog.server.http
root         7     1  1 15:27 pts/0    00:00:03 /root/.pyenv/versions/3.11.4/bin/python -m cog.server.http
root        79     7  0 15:27 pts/0    00:00:00 /root/.pyenv/versions/3.11.4/bin/python -c from multiprocessing.resource_tracker import main;main(
root        80     7 11 15:27 pts/0    00:00:23 /root/.pyenv/versions/3.11.4/bin/python -c from multiprocessing.spawn import spawn_main; spawn_mai
root       116     0  0 15:30 pts/1    00:00:00 ps -ef

# The PID1 is '/sbin/tini' and 'python -m cog.server.http' runs as its child process.

Regardless of whether Tini is used, we just need to override the original CMD in the image with a new one when running it on SaladCloud: execute ‘sed’ to modify the server code for IPv6, and then run the original command “python -m cog.server.http” to start the server.

# Before

python -m cog.server.http

# After

sh -c "sed -i 's/0.0.0.0/::/g' /root/.pyenv/versions/3.8.13/lib/python3.8/site-packages/cog/server/http.py && python -m cog.server.http"

# '&&' means the 2nd command will only execute if the first command has executed successfully and its exit status is zero.

Let’s conduct some tests locally to ensure the workaround works as expected.

For the BLIP image without Tini:

ubuntu@asus:~$ docker run --rm --gpus all -it r8.im/salesforce/blip:latest sh -c "sed -i 's/0.0.0.0/::/g' /root/.pyenv/versions/3.8.13/lib/python3.8/site-packages/cog/server/http.py && python -m cog.server.http"

INFO:     Started server process [10]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://[::]:5000 (Press CTRL+C to quit)
INFO:     ::1:40436 - "GET / HTTP/1.1" 200 OK
INFO:     ::1:52836 - "GET / HTTP/1.1" 200 OK
INFO:     ::1:43312 - "POST /predictions HTTP/1.1" 200 OK

ubuntu@asus:~$ docker ps

CONTAINER ID   IMAGE                          COMMAND       CREATED         STATUS         PORTS      NAMES
4abbe2e7c4d9   r8.im/salesforce/blip:latest   "/bin/bash"   2 minutes ago   Up 2 minutes   5000/tcp   musing_jemison

# Open a new terminal and enter the container.

ubuntu@asus:~$ docker exec -it 4ab /bin/bash

root@4abbe2e7c4d9:/src# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 21:41 pts/0    00:00:00 sh -c sed -i 's/0.0.0.0/::/g' /root/.pyenv/versions/3.8.13/lib/python3.8
root         8     1 38 21:41 pts/0    00:00:29 /root/.pyenv/versions/3.8.13/bin/python -m cog.server.http
root        78     0  0 21:42 pts/1    00:00:00 /bin/bash
root        86    78  0 21:42 pts/1    00:00:00 ps -ef

# Send some HTTP requests to the server via the IPv6 port

root@4abbe2e7c4d9:/src# curl http://[::1]:5000/

{"docs_url":"/docs","openapi_url":"/openapi.json"}

root@4abbe2e7c4d9:/src# curl -s -X POST   -H "Content-Type: application/json"   -d $'{ "input": {
      "task": "image_captioning",
      "image": "https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg" } }' http://[::1]:5000/predictions

{"status":"succeeded","output":"Caption: a woman sitting on the beach with a dog"}

For the Whisper image with Tini:

ubuntu@asus:~$ docker run --rm --gpus all -it r8.im/openai/whisper:latest sh -c "sed -i 's/0.0.0.0/::/g' /root/.pyenv/versions/3.11.4/lib/python3.11/site-packages/cog/server/http.py && python -m cog.server.http"

INFO:     Application startup complete.
INFO:     Uvicorn running on http://[::]:5000 (Press CTRL+C to quit)

ubuntu@asus:~$ docker ps

CONTAINER ID   IMAGE                         COMMAND                  CREATED              STATUS          PORTS      NAMES
70a73843fb34   r8.im/openai/whisper:latest   "/sbin/tini -- sh -c…"   About a minute ago   Up 59 seconds   5000/tcp   competent_hopper

# Open a new terminal and enter the container.

ubuntu@asus:~$ docker exec -it 70a /bin/bash

root@70a73843fb34:/src# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 21:51 pts/0    00:00:00 /sbin/tini -- sh -c sed -i 's/0.0.0.0/::/g' /root/.pyenv/versions/3.11.4/lib/python3.11/site-packages/cog/server/http.py && python -m cog.server.http
root         7     1  0 21:51 pts/0    00:00:00 sh -c sed -i 's/0.0.0.0/::/g' /root/.pyenv/versions/3.11.4/lib/python3.11/site-packages/cog/server/http.py && python -m cog.server.http
root         9     7  0 21:51 pts/0    00:00:03 /root/.pyenv/versions/3.11.4/bin/python -m cog.server.http
root        81     9  0 21:51 pts/0    00:00:00 /root/.pyenv/versions/3.11.4/bin/python -c from multiprocessing.resource_tracker import main;main(13)
root        82     9 25 21:51 pts/0    00:01:35 /root/.pyenv/versions/3.11.4/bin/python -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=16, pipe_handle=18) --multiprocessing-fork
root       108     0  0 21:52 pts/1    00:00:00 /bin/bash
root       137   108  0 21:57 pts/1    00:00:00 ps -ef

Now we can create a container group with the following parameters to run the BLIP image on SaladCloud:

# Image Source

r8.im/salesforce/blip:latest

# Replica Count

3 or more

# Resource

2 vCPUs, 12GB Memory
Any GPU types with 12 GB or more VRAM

# Command

Command:   sh
Argument1: -c
Argument2: sed -i 's/0.0.0.0/::/g' /root/.pyenv/versions/3.8.13/lib/python3.8/site-packages/cog/server/http.py && python -m cog.server.http

# Container Gateway

Enabled, Port 5000

# Readiness Probe

Enabled
Protocol: HTTP/1.X
Path: /
Port: 5000
Initial Delay Seconds: 60
Period Seconds: 10
Timeout Seconds: 5
Success Threshold: 1
Failure Threshold: 3

The provided command and its arguments override both the ENTRYPOINT and CMD of the original image, ensuring that the original first command ‘python -m cog.server.http’ is run only after the Cog HTTP prediction server code has been successfully modified to listen on an IPv6 port.

The Readiness Probe evaluates whether a container is ready to accept the traffic from the load balancer. Specifically, the HTTP/1.X probe checks the health of an HTTP service running inside the container by sending an HTTP GET request and checking the response code.

For the Cog HTTP prediction server in this BLIP image, it will be ready to answer requests only after downloading and loading models into the GPU. Since there isn’t a dedicated path operation function for health checks in this server version, the root path can be used instead.

After the container group is deployed, an access domain name will be generated, which can be used to access the application.

Let’s perform some inference tests using the Access Domain Name:

# image_captioning

ubuntu@asus:~$ curl -s -X POST   -H "Content-Type: application/json"   -d $'{ "input": {
      "task": "image_captioning",
      "image": "https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg" } }' https://ham-herbs-2xxhjy5xoatmq45k.salad.cloud/predictions

{"status":"succeeded","output":"Caption: a woman sitting on the beach with a dog"}

# visual_question_answering

ubuntu@asus:~$ curl -s -X POST \
  -H "Content-Type: application/json" \
  -d $'{ "input": {
      "task": "visual_question_answering",
      "question": "where is the dog?",
      "image": "https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg" } }' \
  https://ham-herbs-2xxhjy5xoatmq45k.salad.cloud/predictions

{"status":"succeeded","output":"Answer: on beach"}

# image_text_matching

ubuntu@asus:~$ curl -s -X POST \
  -H "Content-Type: application/json" \
  -d $'{ "input": {
      "task": "image_text_matching",
      "caption": "a dog and a women are sitting at the beach",
      "image": "https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg" } }' \
  https://ham-herbs-2xxhjy5xoatmq45k.salad.cloud/predictions

{"status":"succeeded","output":"The image and text is matched with a probability of 0.9977.\nThe image feature and text feature has a cosine similarity of 0.5178."}

Let us create another container group with the following parameters to run the Whisper image on SaladCloud:

# Image Source

r8.im/openai/whisper

# Replica Count

3 or more

# Resource

2 vCPUs, 12GB Memory
Any GPU types with 12 GB or more VRAM

# Command

Command:   /sbin/tini
Argument1: --
Argument2: sh
Argument3: -c
Argument4: sed -i 's/0.0.0.0/::/g' /root/.pyenv/versions/3.11.4/lib/python3.11/site-packages/cog/server/http.py && python -m cog.server.http

# Container Gateway

Enabled, Port 5000

# Readiness Probe

Enabled
Protocol: exec

Command: python
Argument1: -c
Argument2: import requests,sys;sys.exit(0 if 'READY' in requests.get('http://[::1]:5000/health-check').text else -1)

Initial Delay Seconds: 60
Period Seconds: 10
Timeout Seconds: 5
Success Threshold: 1
Failure Threshold: 3

Because the command and its arguments override both the ENTRYPOINT and CMD of the original image, we need to specify ‘/sbin/tini’ for the ENTRYPOINT and the other four arguments for the CMD, as parameters of ‘/sbin/tini’. This ensures that Tini is still used as the init process within the container, and the original command ‘python -m cog.server.http’ is run only after the Cog HTTP prediction server code has been successfully modified to listen on an IPv6 port.

The Cog HTTP prediction server in this Whisper image provides a dedicated ‘/health-check’ endpoint and an operation function for health checks, indicating different running statuses (STARTING, READY, BUSY and FAILED) within the server.

The Readiness Probe with the protocol - exec, will run the given command inside the container, if the command returns an exit code of 0 ( by using ‘echo $?’ ), the container is considered in a healthy state. Any other exit codes indicate the container is not ready yet.

A Python script is provided here and run regularly to determine whether the container is in the READY status, indicating the models have been loaded successfully and the Cog HTTP prediction server is ready to accept the traffic from the load balancer.

By default, the functions for health checks and predictions in the Cog HTTP prediction server are declared with ‘async def’ and run within the same main thread. However, long-running synchronous predictions could potentially delay timely responses to readiness probes. To address this issue, we have two options: adjusting the ‘Timeout Seconds’ and/or ‘Failure threshold’ in the Readiness Probe based on specific context, or declaring the prediction function with ‘def’ in the server code. The latter approach spawns a new thread to handle predictions when a new request arrives.

Scenario 2: Build a wrapper image for SaladCloud based on an existing Cog-based image

If you want to introduce new features, such as adding an I/O worker to the Cog HTTP prediction server, you can create a wrapper image based on an existing Cog-based image, without needing to modify its original Dockerfile.

In the new Dockerfile, you can begin with the original image, introduce additional features and then incorporate IPv6 support if a load balancer is required. There are multiple approaches when working directly with the Dockerfile:

Option 1: Run ‘sed’ command to modify the server code while building the image.

Create the Dockerfile for the BLIP wrapper image:

# Dockerfile of the wrapper image

# Use the original image as the base image.

FROM r8.im/salesforce/blip:latest

# Introduce necessary features here.

# Modify the Cog HTTP prediction server code to use IPv6.

RUN sed -i 's/0.0.0.0/::/g' /root/.pyenv/versions/3.8.13/lib/python3.8/site-packages/cog/server/http.py

# The original first command is not changed.

Build and test the image locally:

ubuntu@asus:~$ docker image build -t docker.io/saladtechnologies/sip:0.0.1-migrated-blip -f Dockerfile .

ubuntu@asus:~$ docker run --rm --gpus all -it docker.io/saladtechnologies/sip:0.0.1-migrated-blip

INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://[::]:5000 (Press CTRL+C to quit)

ubuntu@asus:~$ docker push docker.io/saladtechnologies/sip:0.0.1-migrated-blip

Because we don’t modify the CMD of the original image, the Command is not needed when running this wrapper image on SaladCloud. Below are the parameters for the deployment on SaladCloud:

# Image Source

docker.io/saladtechnologies/sip:0.0.1-migrated-blip

# Replica Count

3 or more

# Resource

2 vCPUs, 12GB Memory
Any GPU types with 12 GB or more VRAM

# Command (not needed)

# Container Gateway

Enabled, Port 5000

# Readiness Probe

Enabled
Protocol: HTTP/1.X
Path: /
Port: 5000
Initial Delay Seconds: 60
Period Seconds: 10
Timeout Seconds: 5
Success Threshold: 1
Failure Threshold: 3

Option 2: Add socat in the image to route IPv6 to IPv4.

Create the Dockerfile for the BLIP wrapper image:

# Dockerfile of the wrapper image

# Use the original image as the base image.

FROM r8.im/salesforce/blip:latest

# Introduce necessary features here.

# Install socat.

RUN apt-get update
RUN apt-get install -y socat net-tools

# '&' means socat is running in the background.
# Access to Port 8888 on IPv6 is forwarded to Port 5000 on IPv4.

CMD ["sh", "-c", "socat TCP6-LISTEN:8888,fork TCP4:127.0.0.1:5000 & python -m cog.server.http"]

Regardless of whether Tini is used in the original image, we just need to override the original CMD with a new one in the wrapper image: run a new shell ‘sh’ to execute ‘socat’ in the background, and then run the original command “python -m cog.server.http” to start the server.

Build and test the image locally:

ubuntu@asus:~$ docker image build -t docker.io/saladtechnologies/sip:0.0.1-migrated-socat-blip -f Dockerfile.socat .

ubuntu@asus:~$ docker run --rm --gpus all -it docker.io/saladtechnologies/sip:0.0.1-migrated-socat-blip

INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:5000 (Press CTRL+C to quit)

# Open a new terminal to enter the container.

ubuntu@asus:~$ docker ps
CONTAINER ID   IMAGE                                             COMMAND                  CREATED         STATUS         PORTS      NAMES
fe31764c0005   saladtechnologies/sip:0.0.1-migrated-socat-blip   "sh -c 'socat TCP6-L…"   9 seconds ago   Up 8 seconds   5000/tcp   great_dubinsky

ubuntu@asus:~$ docker exec -it fe3 /bin/bash

root@fe31764c0005:/src# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 18:10 pts/0    00:00:00 sh -c socat TCP6-LISTEN:8888,fork TCP4:127.0.0.1:5000 & python -m cog.server.http
root         8     1  0 18:10 pts/0    00:00:00 socat TCP6-LISTEN:8888,fork TCP4:127.0.0.1:5000
root         9     1 73 18:10 pts/0    00:00:42 /root/.pyenv/versions/3.8.13/bin/python -m cog.server.http
root        75     0  0 18:10 pts/1    00:00:00 /bin/bash
root        88    75  0 18:11 pts/1    00:00:00 ps -ef

# Check whether both IPv4 and IPv6 are enabled.

root@fe31764c0005:/src# netstat -topln

Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name     Timer
tcp        0      0 0.0.0.0:5000            0.0.0.0:*               LISTEN      8/python             off (0.00/0/0)
tcp6       0      0 :::8888                 :::*                    LISTEN      7/socat              off (0.00/0/0)

# Send some HTTP requests to the server via the IPv4 port and the IPv6 port.

root@fe31764c0005:/src# curl http://[::1]:8888/

{"docs_url":"/docs","openapi_url":"/openapi.json"}

root@fe31764c0005:/src# curl http://127.0.0.1:5000/

{"docs_url":"/docs","openapi_url":"/openapi.json"}

ubuntu@asus:~$ docker push docker.io/saladtechnologies/sip:0.0.1-migrated-socat-blip

Because we don’t modify the CMD of the original image, the Command is not needed when running this wrapper image on SaladCloud. However, both the Container Gateway and the HTTP/1.X probe need to be configured to use Port 8888 because socat is listening on Port 8888 of IPv6; and the HTTP/1.x probe to socat on Port 8888 of IPv6 of won’t be successful until the Cog HTTP prediction server is ready on Port 5000 of IPv4.

Below are the parameters for the deployment on SaladCloud:

# Image Source

docker.io/saladtechnologies/sip:0.0.1-migrated-socat-blip

# Replica Count

3 or more

# Resource

2 vCPUs, 12GB Memory
Any GPU types with 12 GB or more VRAM

# Command (not needed)

# Container Gateway

Enabled, Port 8888

# Readiness Probe

Enabled
Protocol: HTTP/1.X
Path: /
Port: 8888
Initial Delay Seconds: 60
Period Seconds: 10
Timeout Seconds: 5
Success Threshold: 1
Failure Threshold: 3

Scenario 3: Build an image using Cog HTTP prediction server for SaladCloud

Only using the Cog HTTP prediction server without its CLI tools is feasible if you need flexible and fine-grained control over how the image is built. You can refer to this guide on writing a Dockerfile from scratch by leveraging the Pytorch image for your AI applications.

Scenario 4: Build an image using both Cog CLI tools and HTTP prediction server for SaladCloud

If you prefer to use Cog CLI tools to manage the Dockerfile and image, you need to modify the ‘cog.yaml’ and ‘predict.py’ files to add socat support in the image for running on SaladCloud. You can refer to this guide for more information. Alternatively, you can use the approaches in Scenario 1 or Scenario 2 to add IPv6 support later.