Deploy Flux on Salad, generated in flux

High Level

Regardless of your choice of image-generation inference server, models, or extensions, the basic process is as follows:

  1. Get a docker image that runs your inference server
  2. Copy any models and extensions you want into the docker image
  3. Ensure the container is listening on an ipv6 address
  4. Push the new image up to a container registry
  5. Deploy the image as a salad container group

Find a Docker Image

Find a docker image of ComfyUI. Here’s one we’ve verified works on Salad:

ComfyUI

Note that once deployed, you will be interacting with this as an API, and not through the browser user interface. Support for Flux is available in the newest releases, so make sure you are pulling the latest version of the image.

Download Your Models and Extensions

We’re going to use the Flux1-schnell model, at 8-bit precision. You can download it here. We’re using the schnell version because it’s the fastest, and it has a license allowing for commercial use. We’re using this convenient fp8 checkpoint because it runs well on consumer gpus, and allows for a simpler workflow in ComfyUI.

Create a Dockerfile

  1. Create a new file called Dockerfile and open it in your preferred text editor. At this point, your directory should look like this:
.
├── Dockerfile
└── flux1-schnell-fp8.safetensors
  1. Save an API-Formatted workflow.json file in the same directory as your Dockerfile. This file will be used to warm up the server before starting it. You can generate this file by using the ComfyUI web interface to create a prompt, and then saving it in API format. See below for more details.
.
├── Dockerfile
├── flux1-schnell-fp8.safetensors
└── workflow_api_flux_fp8.json
  1. Copy the following into your Dockerfile:
# We're going to use this verified comfyui image as a base
FROM ghcr.io/ai-dock/comfyui:v2-cuda-12.1.1-base-22.04-v0.2.0

# Disable the authentication provided by the base image
ENV WEB_ENABLE_AUTH=false
# Startup can take a little longer with larger models.
# This configures the worker binary's built-in health check.
ENV STARTUP_CHECK_MAX_TRIES=30

# Now we copy our model into the image
ENV MODEL_DIR=/opt/ComfyUI/models
ENV CHECKPOINT_DIR=${MODEL_DIR}/checkpoints
COPY flux1-schnell-fp8.safetensors ${CHECKPOINT_DIR}/

# We also need to copy the comfyui-api binary into the image, since ComfyUI
# is fully asyncronous by default, and has no convenient way to retrieve 
# generated images.
ADD https://github.com/SaladTechnologies/comfyui-api/releases/download/1.4.0/comfyui-api .
RUN chmod +x comfyui-api

# OPTIONAL: Warmup the server by running a workflow before starting the server.
# The comfyui-api supports a warmup mode, where it will run a provided workflow before starting the server.
# This example assumes you have a workflow json file in the same directory as your Dockerfile.
COPY workflow_api_flux_fp8.json .
ENV WARMUP_PROMPT_FILE=workflow_api_flux_fp8.json

CMD ["./comfyui-api"]

Note that we are including a simple wrapper binary to the image to make it easier to retrieve generated images. ComfyUI accepts prompts into a queue, and then eventually saves images to the local filesystem. This makes it difficult to use in a stateless environment like Salad. This additional binary extends the ComfyUI /prompt API to allow either receiving the generated images in the response body, or having complete images submitted to a provided webhook url.

Build and Test Your Docker Image

  1. Build the docker image. You should change the specified tag to suit your purpose.
docker build -t saladtechnologies/comfyui:comfy0.2.0-api1.4.0-flux1-schnell-fp8 .
  1. (Recommended) Run the docker image locally to confirm it works as expected

    docker run -it --rm --gpus all -p 3000:3000 -p 8188:8188 --name comfyui \
    saladtechnologies/comfyui:comfy0.2.0-api1.4.0-flux1-schnell-fp8
    

    Using it here locally, we’re going to expose port 3000, which is required for the wrapper, and port 8188 that will let us access the web ui locally to make it easier to get the prompt object we need for the api.

    1. Go to http://localhost:8188/ in your browser. You should see something like this:

      Default Comfy UI Interface

    2. Drag in this image to populate the workflow. Taken from the ComfyUI documentation Drag in the image to populate the workflow

    You should see something like the following: Populated workflow

    1. Click “Queue Prompt” to generate an image. The first image should be identical to the one we used to populate the workflow, because they use the same seed. If you generate another image, it should be different. Here’s mine.

      A glass bottle with a galaxy and the planet earth inside, a glass containing white wine, a plate with mixed vegetables, and a plate with a piece of pie.

    2. Enable Dev Mode Options via the settings menu

      Click the gear to open the settings menu

      Check the box labeled "Enable Dev mode Options"

      You should see a new option in the menu, “Save (API Format)”:

      Notice the new button labeled "Save (API Format)"

    3. Click the “Save (API Format)” button, and save it. You’ll get a file called “workflow_api.json” that contains everything ComfyUI needs to run that prompt again.

{
  "6": {
    "inputs": {
      "text": "a bottle with a beautiful rainbow galaxy inside it on top of a wooden table in the middle of a modern kitchen beside a plate of vegetables and mushrooms and a wine glasse that contains a planet earth with a plate with a half eaten apple pie on it",
      "clip": [
        "30",
        1
      ]
    },
    "class_type": "CLIPTextEncode",
    "_meta": {
      "title": "CLIP Text Encode (Positive Prompt)"
    }
  },
  "8": {
    "inputs": {
      "samples": [
        "31",
        0
      ],
      "vae": [
        "30",
        2
      ]
    },
    "class_type": "VAEDecode",
    "_meta": {
      "title": "VAE Decode"
    }
  },
  "9": {
    "inputs": {
      "filename_prefix": "Flux",
      "images": [
        "8",
        0
      ]
    },
    "class_type": "SaveImage",
    "_meta": {
      "title": "Save Image"
    }
  },
  "27": {
    "inputs": {
      "width": 1024,
      "height": 1024,
      "batch_size": 1
    },
    "class_type": "EmptySD3LatentImage",
    "_meta": {
      "title": "EmptySD3LatentImage"
    }
  },
  "30": {
    "inputs": {
      "ckpt_name": "flux1-schnell-fp8.safetensors"
    },
    "class_type": "CheckpointLoaderSimple",
    "_meta": {
      "title": "Load Checkpoint"
    }
  },
  "31": {
    "inputs": {
      "seed": 1030319533692526,
      "steps": 4,
      "cfg": 1,
      "sampler_name": "euler",
      "scheduler": "simple",
      "denoise": 1,
      "model": [
        "30",
        0
      ],
      "positive": [
        "6",
        0
      ],
      "negative": [
        "33",
        0
      ],
      "latent_image": [
        "27",
        0
      ]
    },
    "class_type": "KSampler",
    "_meta": {
      "title": "KSampler"
    }
  },
  "33": {
    "inputs": {
      "text": "",
      "clip": [
        "30",
        1
      ]
    },
    "class_type": "CLIPTextEncode",
    "_meta": {
      "title": "CLIP Text Encode (Negative Prompt)"
    }
  }
}

You might notice this is kind of an unintuitive prompting format, but it does capture the nodes and connections used by ComfyUI. In my experience, the ComfyUI web ui is the best way to design your prompts, rather than trying to create a workflow json file like this from scratch.

  1. Submit the prompt to the wrapper API on port 3000, using Postman or any http request tool of your choice.

    Submitting the request via Postman

    You should submit a POST request to http://127.0.0.1:3000/prompt with a JSON request body like this, where the value of “prompt” is that workflow json we created previously.

    {
    	"prompt": { ... }
    }
    
  2. In a couple seconds you should receive a response like this:

    {
       "id": "randomuuid",
       "prompt": { ... },
       "images": ["base64encodedimage"]
    }
    
  3. Decode the base64 encoded string into your image. You can do this in a free browser tool such as https://codebeautify.org/base64-to-image-converter

    Decoding the returned base64 image

    or using CLI tools like jq and base64. For this method, first save your response to a file called response.json. Then, run the following command:

    jq -r '.images[0]' response.json | base64 -d > image.png
    

Push and Deploy Your Docker Image

  1. Push your docker image up to docker hub (or the container registry of your choice.)
docker push saladtechnologies/comfyui:comfy0.2.0-api1.4.0-flux1-schnell-fp8
  1. Deploy your image on Salad, using either the Portal or the Salad Public API

Untitled

We’re going to name our container group something obvious, and fill in the configuration form. Flux is a significantly larger model than previous image generation models, so we’re going to want pretty robust hardware.

Untitled

Untitled

Untitled

Additionally, we will want to configure our startup and readiness probes (endpoints provided by the wrapper), and enable the container gateway on port 3000. We’ve disabled authentication for this example, but you may want to enable it. If you enable authentication, requests must be submitted with your Salad API Key in the Salad-Api-Key header.

Untitled

Click Deploy, and wait for the deployment to come up.

Interact with Your Deployment

  1. Wait for the deployment to be ready.

    1. First, Salad pulls your container image into our own internal high-performance cache. Untitled

    2. Next, Salad begins downloading the cached container image to the nodes that have been assigned to your workload. Untitled This step can take tens of minutes in some cases, depending on the size of the image, and the internet speed of the individual nodes. Note that our progress bars are only estimates, and do not necessarily reflect real-time download status. These slow cold starts, and the possibility of nodes being interrupted by their host without warning, are why we always want to provision multiple replicas.

    3. Eventually, you will see instances listed as “running”, with a green check in the “ready” column.

      Untitled

  2. Submit your prompt to the provided Access Domain Name. You will get back a json response within a few seconds. See above for how to submit the request and process the response.

    Untitled