High Level

Regardless of your choice of stable diffusion inference server, models, or extensions, the basic process is as follows:

  1. Get a docker image that runs your inference server
  2. Copy any models and extensions you want into the docker image
  3. Ensure the container is listening on an ipv6 address
  4. Push the new image up to a container registry
  5. Deploy the image as a SaladCloud container group

Find a Docker Image

Find a docker image of Fooocus. Here is one that we have verified works on Salad:

Fooocus 1. Git Repo: https://github.com/mrhan1993/Fooocus-API 2. Docker Image: konieshadow/fooocus-api:v0.4.1.1 3. Model Directory: /app/repositories/Fooocus/models

Note that we’re using Fooocus-api, and not the official Fooocus image. This is because Fooocus does not have a REST API, and we need to use a wrapper to make it work with Salad.

Download Your Models and Extensions

Fooocus comes with a set of pretrained checkpoints, loras, and vae, in addition to a prompt expansion model. By default in our docker image, these models download on start. However, we’re going to add them to the docker image directly so our container can become productive much faster after it enters the “running” state.

Create a Dockerfile

  1. Create a new file called Dockerfile and open it in your preferred text editor. At this point, your directory should look like this:

    .
    └── Dockerfile
    
  2. Copy the following into your Dockerfile:

# We're going to use this verified Fooocus API image as a base
FROM konieshadow/fooocus-api:v0.4.1.1

ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility
ENV NVIDIA_VISIBLE_DEVICES=all

# Now we pre-install the models used by Fooocus.
# This way, the container will be ready to go much faster after container download is complete.
ENV MODEL_DIR=/app/repositories/Fooocus/models
ENV CKPT_PATH=${MODEL_DIR}/checkpoints
ENV VAE_PATH=${MODEL_DIR}/vae_approx
ENV LORA_PATH=${MODEL_DIR}/loras
ENV PROMPT_EXPANSION_PATH=${MODEL_DIR}/prompt_expansion
ADD https://huggingface.co/lllyasviel/fav_models/resolve/main/fav/juggernautXL_v8Rundiffusion.safetensors?download=true ${CKPT_PATH}/juggernautXL_v8Rundiffusion.safetensors
ADD https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_offset_example-lora_1.0.safetensors?download=true ${LORA_PATH}/sd_xl_offset_example-lora_1.0.safetensors
ADD https://huggingface.co/lllyasviel/misc/resolve/main/xlvaeapp.pth?download=true ${VAE_PATH}/xlvaeapp.pth
ADD https://huggingface.co/lllyasviel/misc/resolve/main/vaeapp_sd15.pt?download=true ${VAE_PATH}/vaeapp_sd15.pth
ADD https://huggingface.co/lllyasviel/misc/resolve/main/xl-to-v1_interposer-v3.1.safetensors?download=true ${VAE_PATH}/xl-to-v1_interposer-v3.1.safetensors
ADD https://huggingface.co/lllyasviel/misc/resolve/main/fooocus_expansion.bin?download=true ${PROMPT_EXPANSION_PATH}/fooocus_expansion/pytorch_model.bin

CMD ["python", "main.py", "--host", "*", "--port", "8888", "--skip-pip", "--preview-option", "none", "--always-gpu", "--preload-pipeline" ]

Build and Test Your Docker Image

  1. Build the docker image. You should change the specified tag to suit your purpose.
docker build -t saladtechnologies/fooocus-api:preloaded-2.4.3 .
  1. (Recommended) Run the docker image locally to confirm it works as expected
docker run -it --rm --gpus all -p 8888:8888 --name fooocus \
saladtechnologies/fooocus-api:preloaded-2.4.3

Navigate to http://localhost:8888/docs in your browser to see the API docs for Fooocus API.

  1. Test a Text-to-Image request

See the docs for more information on submitting a text-to-image request. Here’s an example JSON request body:

{
  "prompt": "beautiful clouds at sunset over a calm lake",
  "require_base64": true
}

Submit this to the /v1/generation/text-to-image endpoint as a POST request:

curl -X 'POST' \
  'http://localhost:8888/v1/generation/text-to-image' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "prompt": "beautiful clouds at sunset over a calm lake",
  "require_base64": true
}'

You will receive back a response in JSON format, including the generated image in Base64 encoded format:

[
  {
    "base64": "base64encodedimage",
    "url": "http://*:8888/files/2024-08-01/e3dd0868-dfca-45cb-b718-adf65c745c10-0.png",
    "seed": "5413726088212485283",
    "finish_reason": "SUCCESS"
  }
]
  1. Decode the base64 encoded string into your image. You can do this in a free browser tool such as https://codebeautify.org/base64-to-image-converter

or using CLI tools like jq and base64. For this method, first save your response to a file called response.json. Then, run the following command:

jq -r '.[0].base64' response.json | base64 -d > image.png

Push and Deploy Your Docker Image

  1. Push your docker image up to docker hub (or the container registry of your choice.)
docker push saladtechnologies/fooocus-api:preloaded-2.4.3
  1. Deploy your image on Salad, using either the Portal or the SaladCloud Public API

    We’re going to name our container group something obvious, and fill in the configuration form. We’re going to use 3 replicas, to ensure coverage during node interruptions and reallocations.

    Since Fooocus uses an SDXL based model, we’re gong to give ourselves fairly powerful hardware: 4 vCPUs, 24GB ram, and an RTX 3090 Ti GPU.

    We want to add the container gateway to our deployment, so that we will get a URL we can use to access it. Make sure to set the port to 8888, or whatever you set the port to in the CMD of your dockerfile.

    We need to enable a startup probe and a liveness probe, to make sure the container gateway only routes requests to nodes that are ready for them.

Interact with Your Deployment

  1. Wait for the deployment to be ready.

    1. First, SaladCloud pulls your container image into our own internal high-performance cache.

    2. Next, SaladCloud locates eligible nodes for your workload, based on the configuration you provided.

    3. Next, SaladCloud begins downloading the cached container image to the nodes that have been assigned to your workload.

      This step can take tens of minutes in some cases, depending on the size of the image, and the internet speed of the individual nodes. Note that our progress bars are only estimates, and do not necessarily reflect real-time download status. These slow cold starts, and the possibility of nodes being interrupted by their host without warning, are why we always want to provision multiple replicas.

    4. Eventually, you will see instances listed as “running”, with a green check in the “ready” column.

  2. Submit your prompt to the provided Access Domain Name. You will get back a json response within 10-20 seconds. See above for how to submit the request and process the response.