Training a custom YOLOv8 model for $0.25 (Tutorial)


Introduction: Training a Custom YOLO Model for Logo Detection

In the realm of AI and machine learning, the power of customization cannot be overstated. Our previous article covered deploying a pre-trained YOLOv8 model using Salad's cloud infrastructure. During that journey we did real-time object tracking and analysis. In the landscape of AI and machine learning, the ability to train custom models for specific tasks opens up a world of possibilities. And now, we're taking a step further: training a custom YOLO (You Only Look Once) model using Salad cloud.

Running inference with pre-trained models, as we've previously seen, can be relatively less dependent on computational resources. However, when it comes to training custom models that changes significantly. Training is much more GPU-intensive, time-consuming, and often pretty expensive. This is particularly true for deep learning models used in object detection, where numerous parameters are fine-tuned over extensive datasets. The process involves repeatedly processing large amounts of data, making heavy use of GPU resources for extended periods. This intensity not only extends the training duration but also adds to the overall cost, especially in cloud-based environments.

Acknowledging these challenges, our current project aims to train several YOLO models using different base models. We will be focusing on three critical aspects:

  1. Processing Times: We will monitor the duration taken by each model to train, providing insights into the efficiency of different YOLO configurations.
  2. Cost Analysis: Given that training is a resource-heavy task, understanding the financial implications is crucial. We will compare the costs of training each model on Salad's cloud platform, offering a clear perspective on the budgetary requirements for such tasks.
  3. Model Accuracy: The ultimate test of any model is its performance. We will evaluate the accuracy of each trained model, understanding how the training complexity translates into detection precision.

Dataset: The Foundation of Our Model

In our particular example we will use the "Flickr Logos 27" dataset, but we will make sure that using any other dataset is easy and straightforward. In the end of our project we want to have a model that will be able to detect particular logos in videos and pictures.

High-level overview of "Flickr Logos 27" dataset:

It comprises 810 images spread across 27 logo classes. All images are annotated with bounding boxes.

Brands included in the dataset: Adidas, Apple, Google, Coca Cola, Ferrari and others.

Preparing for YOLOv8: Structuring the Dataset

In order to train a YOLOv8 model our dataset needs to be in a format that the model can understand and learn from effectively.

We need to organize our training and validation images and labels like this:

We also need a dataset YAML to define paths to the images and the class names:

path: /app/training/training # Dataset root path
test: images/test
train: images/train
val: images/val

names:
  0: Adidas
  1: Apple
  ...

Each image in our dataset will be accompanied by a .txt file in YOLO format, detailing the objects present. These annotations are crucial, as they provide the model with the necessary information about the location and class of each logo in an image:

9 0.498 0.491 0.982 0.927

Preparing data is said to be 90% of success in Data Science, so make sure your dataset has the right format. Every dataset is very specific, so we will not go too deep in the dataset we used. We had to shuffle the files a little and create a yaml file based on our data. Scripts that we used to do that are in the git repo.

Since we want to make our process as easy and persistent as possible we will copy our training data into azure file share with the following directory structure where “training“ is the name of the fileshare:

training/{dataset_name}/images/

training/{dataset_name}/labels/

training/{dataset_name}/yolo8.yaml

Solution Architecture:

Training Different YOLOv8 Models

In our project we train 3 variants of the YOLOv8 model:

  1. YOLOv8 Nano (n)
  2. YOLOv8 Small (s)
  3. YOLOv8 Medium (m)

Each model offers different performance and complexity trade-offs, making them suitable for various use cases.

Hyperparameter Considerations

Training models requires careful selection of hyperparameters. Here are the choices we’ve made for our training:

  • Epochs: We will train each model for 50 epochs.
  • Batch Size: To ensure consistency and fairness in our comparison, all models will be trained with a batch size of 8.
  • Image Resolution: We will go with the default resolution of 640
  • Workers : 1

Sample Training Command with YOLOv8 Nano

Here's how you can initiate training using the YOLOv8 model with Python:

from ultralytics import YOLO 
# Load the YOLOv8 Nano model 
model = YOLO('yolov8n.pt') 
# Start the training process 
results = model.train( data='yolo8.yaml', imgsz=640, epochs=10, batch=8, name='yolov8n_custom' )

Explanation of Command Line Arguments

  • model: This specifies the YOLO model to be used. In our example, we use yolov8n.pt, which refers to the YOLOv8 Nano model.
  • imgsz (Image Size): Defines the resolution of the input images for training. The default value is 640, but this can be adjusted based on the dataset and GPU capabilities.
  • data: The path to the YAML file containing the dataset configuration. This file includes paths to training, validation, and test image directories, and the class names.
  • epochs: This determines the number of complete passes through the training dataset. We've set it to 10 for our example, but this can vary based on the size of your dataset and the desired level of model convergence.
  • batch (Batch Size): Indicates the number of training samples to work through before updating the internal model parameters. The value depends on the available GPU memory; a higher batch size generally requires more memory.
  • name: The name assigned to the training run. This is useful for organizing and identifying results, especially when conducting multiple experiments.

We will replace all the values with arguments so that we can easily switch them during deployment to Salad.

We will also run the validation command in order to check our model performance:

model.val()

Resuming Interrupted Trainings

In the world of deep learning, resuming training from a saved state is a crucial feature. This feature is particularly beneficial when training is unexpectedly interrupted or when you wish to continue refining a model with additional data or more training epochs.

YOLO simplifies the process of resuming training. When resuming, the model not only reloads the weights from the last saved checkpoint but also restores the optimizer state, learning rate scheduler, and the current epoch number. This seamless integration ensures that training can continue precisely from where it was paused.

To resume training you simply need to use the resume argument within the training method and provide the path to the .pt file containing the partially trained model weights. We will slightly modify the code checking if the intermediate weight is available and start from where it stopped:

last_model_path = f"runs/detect/{output_model_name}/weights/last.pt"
if os.path.isfile(last_model_path):
    model = YOLO(last_model_path)
    model.train(resume=True)

Keep in mind that checkpoints in Ultralytics YOLO are automatically saved at the end of every epoch, or at a fixed interval if the save_period argument is used. Therefore, to successfully resume a training session, you need to have at least one completed epoch.

Creating a Persistent Training Environment with Salad and Azure File Shares

To ensure a smooth and uninterrupted training process on Salad, especially when dealing with containerized environments, it's crucial to have a persistent storage solution. Azure File Shares provide an ideal way to maintain and access training data and model weights across different container sessions. In our setup, we utilize two Azure File Shares: runs and training.

Directory Structure and Data Management

  • Training Directory: Within the training file share, we have a sub-directory named {dataset_name}. This organization allows us to switch between different datasets efficiently, facilitating the training of various models without hassle.
  • Synchronizing Data: To manage and synchronize our training data and weights, we'll employ a bash script as the entry point for our train.py. This script will automate several crucial tasks, ensuring our training process is both efficient and consistent.

Process workflow

Functionality of the Bash Script

  1. Download Training Data: The script will pull all necessary training data, including images, labels, and the YOLOv8 YAML configuration file, into our container.
  2. Retrieve Weights from Previous Runs: It will download weights from previous training runs to the container, ensuring continuity.
  3. Handle Training Process:
    • The script initiates the train.py.
    • If a runs folder with the same model output name exists, the training resumes. Otherwise, it starts a new training session with the base model.
  4. Regular Sync to Azure: Every minute, the script will copy the contents of the runs folder from the container to Azure File Share, maintaining a backup and allowing for process continuity.
  5. Monitoring Training Completion:
    • Upon the completion of train.py, a done.txt file is created.
    • The script continuously checks for this file and, once detected, terminates the process.

Training Results:

To ensure a fair and uniform comparison across different YOLOv8 model trainings, we pick a consistent set of hardware and training parameters. We chose the RTX 4080 (16 GB) GPU for all our experiments, priced at a very low rate of $0.28 per hour. Additionally, each training was conducted over 50 epochs, with a batch size of 8, and an image resolution of 640.

YOLOv8 Nano Model Training: Efficiency and Speed

The first model in our comparative study is the YOLOv8 Nano, renowned for its compact yet powerful architecture. With 3.2 million parameters, it is the smallest model in the YOLOv8 family, yet capable of real-time performance, even on CPU-based systems.

Duration of Training: Training session for the YOLOv8 Nano model was completed in just about 10 minutes. Which gets us to about 5 cents for training a model.

Training results: once our model is trained we can find all the weight along with training and validation results in our Azure storage.
We will first check the confusion matrix:

We can see that some of the labels are being detected pretty good, but a few are positively detected in less than half of the scenarios.

Here we can see what the labels on a picture are and what we predicted with our trained model:

YOLOv8 Small Model Training:

Duration of Training: Training session for the YOLOv8 Small model was completed in about 15 minutes. Trainings Results:

We can see that the model performs better

YOLOv8 Medium Model Training:

Duration of Training: Training session for the YOLOv8 Medium model was completed in about 40 minutes.

This time we stopped our process and allocated it to the other machine to check if it can start from the last weights we have and it worked successfully.

Trainings Results:

We can see that bigger models give us better results.

Using custom model in the real life:

We will now upload the medium model into our inference code and run it on a Coca-Cola video. To do that we just need to specify the model path in our script. Check out our previous article and step by step guide on how to deploy an inference using Salad

# Load the model
model = YOLO("yolob8m-custom.pt")
results = model.track(image, persist=True)

Let’s check the results:

As a result we can see that we not only can use our custom trained model on images, but even on videos adding tracking possibilities.

Conclusion: Streamlined Training and Deployment of YOLOv8 Models on Salad cloud

In a concise and cost-effective operation, we successfully trained three YOLOv8 models of varying sizes on a custom dataset and put the medium model to the test in real-world scenario. Attaining these training results in approximately 40 minutes and at a cost of only about a quarter of a dollar on Salad cloud highlights the efficiency and cost-effectiveness of advanced model training using Salad's cloud environment. Training custom computer vision models has never been that accessible.