Using the IMDS, you can reallocate a replica from within the running container. Here, we’ll write a Python script using our IMDS SDK that will reallocate the running replica if it doesn’t have enough VRAM free. You also can use the JSON request endpoint.

Running NVIDIA-SMI retrieving the free VRAM

In this example, we’re running nvidia-smi to check how much VRAM is available and piping the output to a log file. Then, we parse it and compare. If it doesn’t have enough free VRAM (we’re choosing 2GB must be free here), it’ll fail the test.

Python
import subprocess

#Does nothing if the test succeeds
def nodeSuccess():
    print("Passed VRAM check")

#Does nothing if the test fails
def nodeFail():
    print("Failed VRAM check, only", result, "MB free.")

#Runs NVIDIA SMI to check how much VRAM is free
subprocess.call("nvidia-smi --query-gpu=memory.free --format=csv,noheader,nounits > output.log")

#Reads the score from the VRAM usage log file
with open("output.log") as read:
    result = int(read.read())
    if result <= 2048: nodeSuccess()
    else: nodeFail()

In this example, the machine has only 500MB of VRAM free. A we would expect, if this runs it will fail as it’s below the minimum requirement of 2048MB.

Failed VRAM check, only 500 MB free.

Using the IMDS SDK to reallocate a node

Now, using the Python IMDS SDK we can automatically reallocate this replica to search for a new replica that has free VRAM. We’ll change the nodeFail function to instead call the IMDS SDK.

Python
from salad_cloud_imds_sdk import SaladCloudImdsSdk, Environment
from salad_cloud_imds_sdk.models import ReallocateContainer
import subprocess

#Does nothing if the test succeeds
def nodeSuccess():
    print("Passed VRAM check")

#Reallocates if the test fails
def nodeFail():
    print("Failed VRAM check, only", result, "MB free.")

    sdk = SaladCloudImdsSdk(
    base_url=Environment.DEFAULT.value,
    timeout=10000
    )

    request_body = ReallocateContainer(
        reason="NotEnoughVRAM"
    )

    sdk.metadata.reallocate_container(request_body=request_body)


#Runs NVIDIA SMI to check how much VRAM is free
subprocess.call("nvidia-smi --query-gpu=memory.free --format=csv,noheader,nounits > output.log")

#Reads the score from the VRAM usage log file
with open("output.log") as read:
    result = int(read.read())
    if result >= 2048: nodeSuccess()
    else: nodeFail()

Now, when a replica runs the check and fails, it will call to the IMDS SDK and automatically reallocate the replica. Below are examples in other languages for the IMDS SDK usage.

Using the JSON request endpoint to reallocate a node

We also can reallocate the node by sending a POST request to the IMDS reallocate endpoint. Here, we’ll use the same Python example, but with the POST request instead of the IMDS SDK.

Python
import requests
import subprocess

#Does nothing if the test succeeds
def nodeSuccess():
    print("Passed VRAM check")

#Does nothing if the test fails
def nodeFail():
    print("Failed VRAM check, only", result, "MB free.")

    url = "http://169.254.169.254/v1/reallocate"
    headers = {'Content-Type': 'application/json'}
    body = {"Reason": "NotEnoughVRAM"}

    response = requests.post(url, headers=headers, json=body)
    print(response)


#Runs NVIDIA SMI to check how much VRAM is free
subprocess.call("nvidia-smi --query-gpu=memory.free --format=csv,noheader,nounits > output.log")

#Reads the score from the VRAM usage log file
with open("output.log") as read:
    result = int(read.read())
    if result >= 2048: nodeSuccess()
    else: nodeFail()

Now, when the test fails, the code will send a JSON POST request to the IMDS endpoint to reallocate the node.