Cog is an open-source tool that offers CLI tools, Python modules, and an HTTP prediction server,
facilitating the development of AI applications. Some applications utilize only its prediction server, while others also
leverage its CLI tools to manage container images.
All Cog-based applications can be easily run on SaladCloud. Here are the main scenarios and suggested approaches:
No | Scenario | Description |
---|
1 | Deploy the Cog-based images directly on SaladCloud | Run the images without any modifications. If a load balancer is in place for inbound traffic, override the ENTRYPOINT and CMD settings of the images by using SaladCloud Portal or APIs, and configure the Cog HTTP prediction server to use IPv6 before starting the server. |
2 | Build a wrapper image for SaladCloud based on an existing Cog-based image | Create a new Dockerfile without needing to modify the original dockerfile. Introduce new features and incorporate IPv6 support if applications need to process inbound traffic through a load balancer. |
3 | Build an image using Cog HTTP prediction server for SaladCloud | Use only the Cog HTTP prediction server without its CLI tools. Directly work with the Dockerfile for flexible and precise control over the construction of the image. |
4 | Build an image using both Cog CLI tools and HTTP prediction server for SaladCloud | Use Cog CLI tools and the cog.yaml file to manage the Dockerfile and image. |
Scenario 1: Deploy the Cog-based images directly on SaladCloud
All Cog-based images can directly run on SaladCloud without any modifications, and you can leverage a load balancer or a
job queue along with Cloud Storage for input and output.
To receive inbound traffic through a load balancer, containers running on SaladCloud need to listen on an IPv6 port.
However, the Cog HTTP prediction server currently only uses IPv4 that cannot be configured via an environment variable.
This issue can be easily addressed by executing an additional command when running an image on SaladCloud. Let’s take
the BLIP and Whisper images, built by Replicate, as examples and walk through the steps to run these Cog-based images on
SaladCloud.
First, run the BLIP image locally to verify the location of the
site-packages directory and check the source code of Cog HTTP predict server:
# Check the installation directory of the server code.
ubuntu@asus:~$ docker run --rm --gpus all -it r8.im/salesforce/blip:latest python -m site
sys.path = [
'/src',
'/root/.pyenv/versions/3.8.13/lib/python38.zip',
'/root/.pyenv/versions/3.8.13/lib/python3.8',
'/root/.pyenv/versions/3.8.13/lib/python3.8/lib-dynload',
'/root/.pyenv/versions/3.8.13/lib/python3.8/site-packages',
]
USER_BASE: '/root/.local' (doesn't exist)
USER_SITE: '/root/.local/lib/python3.8/site-packages' (doesn't exist)
ENABLE_USER_SITE: True
# Locate the server code file.
ubuntu@asus:~$ docker run --rm --gpus all -it r8.im/salesforce/blip:latest cat /root/.pyenv/versions/3.8.13/lib/python3.8/site-packages/cog/server/http.py
# Check the host used by the server.
ubuntu@asus:~$ docker run --rm --gpus all -it r8.im/salesforce/blip:latest cat /root/.pyenv/versions/3.8.13/lib/python3.8/site-packages/cog/server/http.py | grep -A2 0.0.0.0
host="0.0.0.0",
port=5000,
After locating the server code file named “http.py”, we simply need to execute an additional command ‘sed’ to modify the
code for IPv6 by replacing ‘0.0.0.0’ with ‘::’.
sed -i 's/0.0.0.0/::/g' /root/.pyenv/versions/3.8.13/lib/python3.8/site-packages/cog/server/http.py
Most Cog-based containers start by running “python -m cog.server.http” (defined by a ‘CMD’ within the Dockerfile) as PID
1, but let’s check the BLIP image:
# Run the container.
ubuntu@asus:~$ docker run --rm --gpus all -it r8.im/salesforce/blip:latest
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:5000 (Press CTRL+C to quit)
ubuntu@asus:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
865dac78a8ef r8.im/salesforce/blip:latest "python -m cog.serve…" About a minute ago Up About a minute 5000/tcp jovial_johnson
# Open a new terminal and run 'ps -ef' within the container.
ubuntu@asus:~$ docker exec -it 865 ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 12 18:34 pts/0 00:00:30 /root/.pyenv/versions/3.8.13/bin/python -m cog.server.http
root 103 0 0 18:39 pts/1 00:00:00 ps -ef
# The PID 1 is 'python -m cog.server.http'.
Some Cog-based containers use Tini as the init process within the container. In this
setup, Tini is specified as the first command (defined by an ‘ENTRYPOINT’) and runs as PID 1; ‘python -m
cog.server.http’ is specified as its parameter (also defined by a ‘CMD’) and runs as its single child process. let’s
check the Whisper image:
# Run the container.
ubuntu@asus:~$ docker run --rm --gpus all -it r8.im/openai/whisper
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:5000 (Press CTRL+C to quit)
ubuntu@asus:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9397ce073a7e r8.im/openai/whisper "/sbin/tini -- python..." About a minute ago Up About a minute 0.0.0.0:5000->5000/tcp awesome_hawking
# Open a new terminal and run 'ps -ef' within the container.
ubuntu@asus:~$ docker exec -it 939 ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 15:27 pts/0 00:00:00 /sbin/tini -- python -m cog.server.http
root 7 1 1 15:27 pts/0 00:00:03 /root/.pyenv/versions/3.11.4/bin/python -m cog.server.http
root 79 7 0 15:27 pts/0 00:00:00 /root/.pyenv/versions/3.11.4/bin/python -c from multiprocessing.resource_tracker import main;main(
root 80 7 11 15:27 pts/0 00:00:23 /root/.pyenv/versions/3.11.4/bin/python -c from multiprocessing.spawn import spawn_main; spawn_mai
root 116 0 0 15:30 pts/1 00:00:00 ps -ef
# The PID1 is '/sbin/tini' and 'python -m cog.server.http' runs as its child process.
Regardless of whether Tini is used, we just need to override the original CMD in the image with a new one when running
it on SaladCloud: execute ‘sed’ to modify the server code for IPv6, and then run the original command “python -m
cog.server.http” to start the server.
# Before
python -m cog.server.http
# After
sh -c "sed -i 's/0.0.0.0/::/g' /root/.pyenv/versions/3.8.13/lib/python3.8/site-packages/cog/server/http.py && python -m cog.server.http"
# '&&' means the 2nd command will only execute if the first command has executed successfully and its exit status is zero.
Let’s conduct some tests locally to ensure the workaround works as expected.
For the BLIP image without Tini:
ubuntu@asus:~$ docker run --rm --gpus all -it r8.im/salesforce/blip:latest sh -c "sed -i 's/0.0.0.0/::/g' /root/.pyenv/versions/3.8.13/lib/python3.8/site-packages/cog/server/http.py && python -m cog.server.http"
INFO: Started server process [10]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://[::]:5000 (Press CTRL+C to quit)
INFO: ::1:40436 - "GET / HTTP/1.1" 200 OK
INFO: ::1:52836 - "GET / HTTP/1.1" 200 OK
INFO: ::1:43312 - "POST /predictions HTTP/1.1" 200 OK
ubuntu@asus:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
4abbe2e7c4d9 r8.im/salesforce/blip:latest "/bin/bash" 2 minutes ago Up 2 minutes 5000/tcp musing_jemison
# Open a new terminal and enter the container.
ubuntu@asus:~$ docker exec -it 4ab /bin/bash
root@4abbe2e7c4d9:/src# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 21:41 pts/0 00:00:00 sh -c sed -i 's/0.0.0.0/::/g' /root/.pyenv/versions/3.8.13/lib/python3.8
root 8 1 38 21:41 pts/0 00:00:29 /root/.pyenv/versions/3.8.13/bin/python -m cog.server.http
root 78 0 0 21:42 pts/1 00:00:00 /bin/bash
root 86 78 0 21:42 pts/1 00:00:00 ps -ef
# Send some HTTP requests to the server via the IPv6 port
root@4abbe2e7c4d9:/src# curl http://[::1]:5000/
{"docs_url":"/docs","openapi_url":"/openapi.json"}
root@4abbe2e7c4d9:/src# curl -s -X POST -H "Content-Type: application/json" -d $'{ "input": {
"task": "image_captioning",
"image": "https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg" } }' http://[::1]:5000/predictions
{"status":"succeeded","output":"Caption: a woman sitting on the beach with a dog"}
For the Whisper image with Tini:
ubuntu@asus:~$ docker run --rm --gpus all -it r8.im/openai/whisper:latest sh -c "sed -i 's/0.0.0.0/::/g' /root/.pyenv/versions/3.11.4/lib/python3.11/site-packages/cog/server/http.py && python -m cog.server.http"
INFO: Application startup complete.
INFO: Uvicorn running on http://[::]:5000 (Press CTRL+C to quit)
ubuntu@asus:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
70a73843fb34 r8.im/openai/whisper:latest "/sbin/tini -- sh -c…" About a minute ago Up 59 seconds 5000/tcp competent_hopper
# Open a new terminal and enter the container.
ubuntu@asus:~$ docker exec -it 70a /bin/bash
root@70a73843fb34:/src# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 21:51 pts/0 00:00:00 /sbin/tini -- sh -c sed -i 's/0.0.0.0/::/g' /root/.pyenv/versions/3.11.4/lib/python3.11/site-packages/cog/server/http.py && python -m cog.server.http
root 7 1 0 21:51 pts/0 00:00:00 sh -c sed -i 's/0.0.0.0/::/g' /root/.pyenv/versions/3.11.4/lib/python3.11/site-packages/cog/server/http.py && python -m cog.server.http
root 9 7 0 21:51 pts/0 00:00:03 /root/.pyenv/versions/3.11.4/bin/python -m cog.server.http
root 81 9 0 21:51 pts/0 00:00:00 /root/.pyenv/versions/3.11.4/bin/python -c from multiprocessing.resource_tracker import main;main(13)
root 82 9 25 21:51 pts/0 00:01:35 /root/.pyenv/versions/3.11.4/bin/python -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=16, pipe_handle=18) --multiprocessing-fork
root 108 0 0 21:52 pts/1 00:00:00 /bin/bash
root 137 108 0 21:57 pts/1 00:00:00 ps -ef
Now we can create a container group with the following parameters to run the BLIP image on SaladCloud:
# Image Source
r8.im/salesforce/blip:latest
# Replica Count
3 or more
# Resource
2 vCPUs, 12GB Memory
Any GPU types with 12 GB or more VRAM
# Command
Command: sh
Argument1: -c
Argument2: sed -i 's/0.0.0.0/::/g' /root/.pyenv/versions/3.8.13/lib/python3.8/site-packages/cog/server/http.py && python -m cog.server.http
# Container Gateway
Enabled, Port 5000
# Readiness Probe
Enabled
Protocol: HTTP/1.X
Path: /
Port: 5000
Initial Delay Seconds: 60
Period Seconds: 10
Timeout Seconds: 5
Success Threshold: 1
Failure Threshold: 3
The provided command and its arguments
override both the ENTRYPOINT and CMD of the original image, ensuring that the original first command ‘python -m
cog.server.http’ is run only after the Cog HTTP prediction server code has been successfully modified to listen on an
IPv6 port.
The Readiness Probe evaluates whether a container is ready to accept the traffic from the load balancer. Specifically,
the HTTP/1.X probe
checks the health of an HTTP service running inside the container by sending an HTTP GET request and checking the
response code.
For the Cog HTTP prediction server in this BLIP image, it will be ready to answer requests only after downloading and
loading models into the GPU. Since there isn’t a dedicated path operation function for health checks in this server
version, the root path can be used instead.
After the container group is deployed, an access domain name will be generated, which can be used to access the
application.
Let’s perform some inference tests using the Access Domain Name:
# image_captioning
ubuntu@asus:~$ curl -s -X POST -H "Content-Type: application/json" -d $'{ "input": {
"task": "image_captioning",
"image": "https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg" } }' https://ham-herbs-2xxhjy5xoatmq45k.salad.cloud/predictions
{"status":"succeeded","output":"Caption: a woman sitting on the beach with a dog"}
# visual_question_answering
ubuntu@asus:~$ curl -s -X POST \
-H "Content-Type: application/json" \
-d $'{ "input": {
"task": "visual_question_answering",
"question": "where is the dog?",
"image": "https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg" } }' \
https://ham-herbs-2xxhjy5xoatmq45k.salad.cloud/predictions
{"status":"succeeded","output":"Answer: on beach"}
# image_text_matching
ubuntu@asus:~$ curl -s -X POST \
-H "Content-Type: application/json" \
-d $'{ "input": {
"task": "image_text_matching",
"caption": "a dog and a women are sitting at the beach",
"image": "https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg" } }' \
https://ham-herbs-2xxhjy5xoatmq45k.salad.cloud/predictions
{"status":"succeeded","output":"The image and text is matched with a probability of 0.9977.\nThe image feature and text feature has a cosine similarity of 0.5178."}
Let us create another container group with the following parameters to run the Whisper image on SaladCloud:
# Image Source
r8.im/openai/whisper
# Replica Count
3 or more
# Resource
2 vCPUs, 12GB Memory
Any GPU types with 12 GB or more VRAM
# Command
Command: /sbin/tini
Argument1: --
Argument2: sh
Argument3: -c
Argument4: sed -i 's/0.0.0.0/::/g' /root/.pyenv/versions/3.11.4/lib/python3.11/site-packages/cog/server/http.py && python -m cog.server.http
# Container Gateway
Enabled, Port 5000
# Readiness Probe
Enabled
Protocol: exec
Command: python
Argument1: -c
Argument2: import requests,sys;sys.exit(0 if 'READY' in requests.get('http://[::1]:5000/health-check').text else -1)
Initial Delay Seconds: 60
Period Seconds: 10
Timeout Seconds: 5
Success Threshold: 1
Failure Threshold: 3
Because the command and its arguments
override both the ENTRYPOINT and CMD of the original image, we need to specify ‘/sbin/tini’ for the ENTRYPOINT and the
other four arguments for the CMD, as parameters of ‘/sbin/tini’. This ensures that Tini is still used as the init
process within the container, and the original command ‘python -m cog.server.http’ is run only after the Cog HTTP
prediction server code has been successfully modified to listen on an IPv6 port.
The Cog HTTP prediction server in this Whisper image provides a dedicated ‘/health-check’ endpoint and an operation
function for health checks, indicating different running statuses (STARTING, READY, BUSY and FAILED) within the server.
The Readiness Probe with
the protocol - exec,
will run the given command inside the container, if the command returns an exit code of 0 ( by using ‘echo $?’ ), the
container is considered in a healthy state. Any other exit codes indicate the container is not ready yet.
A Python script is provided here and run regularly to determine whether the container is in the READY status, indicating
the models have been loaded successfully and the Cog HTTP prediction server is ready to accept the traffic from the load
balancer.
By default, the functions for health checks and predictions in the Cog HTTP prediction server are declared with ‘async
def’ and run within the same main thread. However, long-running synchronous predictions could potentially delay timely
responses to readiness probes. To address this issue, we have two options: adjusting the ‘Timeout Seconds’ and/or
‘Failure threshold’ in the Readiness Probe based on specific context, or declaring the prediction function with ‘def’ in
the server code. The latter approach spawns a new thread to handle predictions when a new request arrives.
Scenario 2: Build a wrapper image for SaladCloud based on an existing Cog-based image
If you want to introduce new features, such as adding an I/O worker to the Cog HTTP prediction server, you can create a
wrapper image based on an existing Cog-based image, without needing to modify its original Dockerfile.
In the new Dockerfile, you can begin with the original image, introduce additional features and then incorporate IPv6
support if a load balancer is required. There are multiple approaches when working directly with the Dockerfile:
Option 1: Run ‘sed’ command to modify the server code while building the image.
Create the Dockerfile for the BLIP wrapper image:
# Dockerfile of the wrapper image
# Use the original image as the base image.
FROM r8.im/salesforce/blip:latest
# Introduce necessary features here.
# Modify the Cog HTTP prediction server code to use IPv6.
RUN sed -i 's/0.0.0.0/::/g' /root/.pyenv/versions/3.8.13/lib/python3.8/site-packages/cog/server/http.py
# The original first command is not changed.
Build and test the image locally:
ubuntu@asus:~$ docker image build -t docker.io/saladtechnologies/sip:0.0.1-migrated-blip -f Dockerfile .
ubuntu@asus:~$ docker run --rm --gpus all -it docker.io/saladtechnologies/sip:0.0.1-migrated-blip
INFO: Started server process [1]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://[::]:5000 (Press CTRL+C to quit)
ubuntu@asus:~$ docker push docker.io/saladtechnologies/sip:0.0.1-migrated-blip
Because we don’t modify the CMD of the original image, the Command is not needed when running this wrapper image on
SaladCloud. Below are the parameters for the deployment on SaladCloud:
# Image Source
docker.io/saladtechnologies/sip:0.0.1-migrated-blip
# Replica Count
3 or more
# Resource
2 vCPUs, 12GB Memory
Any GPU types with 12 GB or more VRAM
# Command (not needed)
# Container Gateway
Enabled, Port 5000
# Readiness Probe
Enabled
Protocol: HTTP/1.X
Path: /
Port: 5000
Initial Delay Seconds: 60
Period Seconds: 10
Timeout Seconds: 5
Success Threshold: 1
Failure Threshold: 3
Option 2: Add socat in the image to route IPv6 to IPv4.
Create the Dockerfile for the BLIP wrapper image:
# Dockerfile of the wrapper image
# Use the original image as the base image.
FROM r8.im/salesforce/blip:latest
# Introduce necessary features here.
# Install socat.
RUN apt-get update
RUN apt-get install -y socat net-tools
# '&' means socat is running in the background.
# Access to Port 8888 on IPv6 is forwarded to Port 5000 on IPv4.
CMD ["sh", "-c", "socat TCP6-LISTEN:8888,fork TCP4:127.0.0.1:5000 & python -m cog.server.http"]
Regardless of whether Tini is used in the original image, we just need to override the original CMD with a new one in
the wrapper image: run a new shell ‘sh’ to execute ‘socat’ in the background, and then run the original command “python
-m cog.server.http” to start the server.
Build and test the image locally:
ubuntu@asus:~$ docker image build -t docker.io/saladtechnologies/sip:0.0.1-migrated-socat-blip -f Dockerfile.socat .
ubuntu@asus:~$ docker run --rm --gpus all -it docker.io/saladtechnologies/sip:0.0.1-migrated-socat-blip
INFO: Started server process [1]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:5000 (Press CTRL+C to quit)
# Open a new terminal to enter the container.
ubuntu@asus:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
fe31764c0005 saladtechnologies/sip:0.0.1-migrated-socat-blip "sh -c 'socat TCP6-L…" 9 seconds ago Up 8 seconds 5000/tcp great_dubinsky
ubuntu@asus:~$ docker exec -it fe3 /bin/bash
root@fe31764c0005:/src# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 18:10 pts/0 00:00:00 sh -c socat TCP6-LISTEN:8888,fork TCP4:127.0.0.1:5000 & python -m cog.server.http
root 8 1 0 18:10 pts/0 00:00:00 socat TCP6-LISTEN:8888,fork TCP4:127.0.0.1:5000
root 9 1 73 18:10 pts/0 00:00:42 /root/.pyenv/versions/3.8.13/bin/python -m cog.server.http
root 75 0 0 18:10 pts/1 00:00:00 /bin/bash
root 88 75 0 18:11 pts/1 00:00:00 ps -ef
# Check whether both IPv4 and IPv6 are enabled.
root@fe31764c0005:/src# netstat -topln
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name Timer
tcp 0 0 0.0.0.0:5000 0.0.0.0:* LISTEN 8/python off (0.00/0/0)
tcp6 0 0 :::8888 :::* LISTEN 7/socat off (0.00/0/0)
# Send some HTTP requests to the server via the IPv4 port and the IPv6 port.
root@fe31764c0005:/src# curl http://[::1]:8888/
{"docs_url":"/docs","openapi_url":"/openapi.json"}
root@fe31764c0005:/src# curl http://127.0.0.1:5000/
{"docs_url":"/docs","openapi_url":"/openapi.json"}
ubuntu@asus:~$ docker push docker.io/saladtechnologies/sip:0.0.1-migrated-socat-blip
Because we don’t modify the CMD of the original image, the Command is not needed when running this wrapper image on
SaladCloud. However, both the Container Gateway and the HTTP/1.X probe need to be configured to use Port 8888 because
socat is listening on Port 8888 of IPv6; and the HTTP/1.x probe to socat on Port 8888 of IPv6 of won’t be successful
until the Cog HTTP prediction server is ready on Port 5000 of IPv4.
Below are the parameters for the deployment on SaladCloud:
# Image Source
docker.io/saladtechnologies/sip:0.0.1-migrated-socat-blip
# Replica Count
3 or more
# Resource
2 vCPUs, 12GB Memory
Any GPU types with 12 GB or more VRAM
# Command (not needed)
# Container Gateway
Enabled, Port 8888
# Readiness Probe
Enabled
Protocol: HTTP/1.X
Path: /
Port: 8888
Initial Delay Seconds: 60
Period Seconds: 10
Timeout Seconds: 5
Success Threshold: 1
Failure Threshold: 3
Scenario 3: Build an image using Cog HTTP prediction server for SaladCloud
Only using the Cog HTTP prediction server without its CLI tools is feasible if you need flexible and fine-grained
control over how the image is built. You can refer to
this guide on writing a
Dockerfile from scratch by leveraging the Pytorch image for your AI applications.
Scenario 4: Build an image using both Cog CLI tools and HTTP prediction server for SaladCloud
If you prefer to use Cog CLI tools to manage the Dockerfile and image, you need to modify the ‘cog.yaml’ and
‘predict.py’ files to add socat support in the image for running on SaladCloud. You can refer to
this guide for more
information. Alternatively, you can use the approaches in Scenario 1 or Scenario 2 to add IPv6 support later.