How to Deploy Flux (ComfyUI)
A guide to deploying Flux1-Schnell on SaladCloud with ComfyUI
High Level
Regardless of your choice of image-generation inference server, models, or extensions, the basic process is as follows:
- Get a docker image that runs your inference server
- Copy any models and extensions you want into the docker image
- Ensure the container is listening on an ipv6 address
- Push the new image up to a container registry
- Deploy the image as a SaladCloud container group
Find a Docker Image
Find a docker image of ComfyUI. Here’s one we’ve verified works on Salad:
ComfyUI
- Git Repo: https://github.com/ai-dock/comfyui
- Docker Image: ghcr.io/ai-dock/comfyui:v2-cuda-12.1.1-base-22.04-v0.2.0
- Model Directory:
/opt/ComfyUI/models
- Custom Node Directory:
/opt/ComfyUI/custom_nodes/
Note that once deployed, you will be interacting with this as an API, and not through the browser user interface. Support for Flux is available in the newest releases, so make sure you are pulling the latest version of the image.
Download Your Models and Extensions
We’re going to use the Flux1-schnell model, at 8-bit precision. You can
download it here. We’re using
the schnell
version because it’s the fastest, and it has a license allowing for commercial use. We’re using this
convenient fp8 checkpoint because it runs well on consumer gpus, and allows for a simpler workflow in ComfyUI.
Create a Dockerfile
- Create a new file called
Dockerfile
and open it in your preferred text editor. At this point, your directory should look like this:
- Save an API-Formatted workflow.json file in the same directory as your Dockerfile. This file will be used to warm up the server before starting it. You can generate this file by using the ComfyUI web interface to create a prompt, and then saving it in API format. See below for more details.
- Copy the following into your Dockerfile:
Note that we are including a simple wrapper binary to the image to make it easier to retrieve generated images. ComfyUI
accepts prompts into a queue, and then eventually saves images to the local filesystem. This makes it difficult to use
in a stateless environment like Salad. This additional binary extends the ComfyUI /prompt
API to allow either
receiving the generated images in the response body, or having complete images submitted to a provided webhook url.
Build and Test Your Docker Image
- Build the docker image. You should change the specified tag to suit your purpose.
-
(Recommended) Run the docker image locally to confirm it works as expected
Using it here locally, we’re going to expose port 3000, which is required for the wrapper, and port 8188 that will let us access the web ui locally to make it easier to get the prompt object we need for the api.
-
Go to http://localhost:8188/ in your browser. You should see something like this:
-
Drag in this image to populate the workflow. Taken from the ComfyUI documentation
You should see something like the following:
-
Click “Queue Prompt” to generate an image. The first image should be identical to the one we used to populate the workflow, because they use the same seed. If you generate another image, it should be different. Here’s mine.
-
Enable Dev Mode Options via the settings menu
You should see a new option in the menu, “Save (API Format)”:
-
Click the “Save (API Format)” button, and save it. You’ll get a file called “workflow_api.json” that contains everything ComfyUI needs to run that prompt again.
-
You might notice this is kind of an unintuitive prompting format, but it does capture the nodes and connections used by ComfyUI. In my experience, the ComfyUI web ui is the best way to design your prompts, rather than trying to create a workflow json file like this from scratch.
-
Submit the prompt to the wrapper API on port 3000, using Postman or any http request tool of your choice.
You should submit a POST request to
http://127.0.0.1:3000/prompt
with a JSON request body like this, where the value of “prompt” is that workflow json we created previously. -
In a couple seconds you should receive a response like this:
-
Decode the base64 encoded string into your image. You can do this in a free browser tool such as https://codebeautify.org/base64-to-image-converter
or using CLI tools like
jq
andbase64
. For this method, first save your response to a file calledresponse.json
. Then, run the following command:
Push and Deploy Your Docker Image
- Push your docker image up to docker hub (or the container registry of your choice.)
- Deploy your image on Salad, using either the Portal or the SaladCloud Public API
We’re going to name our container group something obvious, and fill in the configuration form. Flux is a significantly larger model than previous image generation models, so we’re going to want pretty robust hardware.
Additionally, we will want to configure our startup and readiness probes (endpoints provided by the wrapper), and enable
the container gateway on port 3000. We’ve disabled authentication for this example, but you may want to enable it. If
you enable authentication, requests must be submitted with your SaladCloud API Key in the Salad-Api-Key
header.
Click Deploy, and wait for the deployment to come up.
Interact with Your Deployment
-
Wait for the deployment to be ready.
-
First, SaladCloud pulls your container image into our own internal high-performance cache.
-
Next, SaladCloud begins downloading the cached container image to the nodes that have been assigned to your workload. This step can take tens of minutes in some cases, depending on the size of the image, and the internet speed of the individual nodes. Note that our progress bars are only estimates, and do not necessarily reflect real-time download status. These slow cold starts, and the possibility of nodes being interrupted by their host without warning, are why we always want to provision multiple replicas.
-
Eventually, you will see instances listed as “running”, with a green check in the “ready” column.
-
-
Submit your prompt to the provided Access Domain Name. You will get back a json response within a few seconds. See above for how to submit the request and process the response.
Was this page helpful?