Master Image Segmentation with PyTorch U-NET

Find AI Tools
No difficulty
No complicated process
Find ai tools

Master Image Segmentation with PyTorch U-NET

Table of Contents

  1. Introduction
  2. Building the Model
  3. Preprocessing the Data
  4. Loading the Data
  5. Training the Model
  6. Evaluating Model Performance
  7. Saving and Loading the Model
  8. Visualizing Predictions
  9. Conclusion
  10. Frequently Asked Questions

Introduction

In this article, we will learn how to perform image segmentation in PyTorch. We will start by building a model from scratch and setting up the data loading pipeline. We will use the Albumentations library for data augmentation. Then, we will train the model on the Carvana dataset, which was a Kaggle competition a few years ago.

Image Segmentation in PyTorch

Image segmentation is the process of dividing an image into multiple segments or regions to simplify image analysis. It is a fundamental task in computer vision and has applications in various fields, such as object recognition, medical imaging, and self-driving cars.

Prerequisites

Before we start, make sure You have a basic understanding of PyTorch and deep learning concepts. Familiarity with Python programming and the Albumentations library is also beneficial.

Let's dive into the details and build our image segmentation model step by step.

Building the Model

The first step in our Journey is to build the model architecture. We will Create a model similar to U-Net, which is a popular architecture for image segmentation tasks. The major difference in our implementation is the use of padded convolutions to simplify the data loading part. We will also use the Albumentations library for data augmentation.

To begin, let's create the model.py file and import the necessary libraries:

import torch
import torch.nn as nn
import torchvision.transforms.functional as tf

Double Convolution

Next, we'll define a class called DoubleConvolution to represent the double convolution blocks used in our model. Each block consists of two convolutional layers with a ReLU activation function. Here's the implementation of the DoubleConvolution class:

class DoubleConvolution(nn.Module):
    def __init__(self, in_channels, out_channels):
        super(DoubleConvolution, self).__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(inplace=True),
            nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(inplace=True)
        )

    def forward(self, x):
        return self.conv(x)

The DoubleConvolution class takes the number of input channels and output channels as arguments. It applies two consecutive convolutional layers with batch normalization and ReLU activation.

U-Net Architecture

Now, let's define the main UNet class representing our U-Net architecture. The U-Net architecture consists of an encoder and a decoder with skip connections between them. Here's the implementation of the UNet class:

class UNet(nn.Module):
    def __init__(self, in_channels, out_channels, features=[64, 128, 256, 512]):
        super(UNet, self).__init__()
        self.downs = nn.ModuleList()
        self.ups = nn.ModuleList()
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)

        for feature in features:
            self.downs.append(DoubleConvolution(in_channels, feature))
            in_channels = feature

        for feature in reversed(features):
            self.ups.append(
                nn.ConvTranspose2d(feature * 2, feature, kernel_size=2, stride=2)
            )
            self.ups.append(DoubleConvolution(feature * 2, feature))

        self.bottleneck = DoubleConvolution(features[-1], features[-1] * 2)
        self.final_conv = nn.Conv2d(features[0], out_channels, kernel_size=1)

    def forward(self, x):
        skip_connections = []

        for down in self.downs:
            x = down(x)
            skip_connections.append(x)
            x = self.pool(x)

        x = self.bottleneck(x)
        skip_connections = skip_connections[::-1]

        for index in range(0, len(self.ups), 2):
            x = self.ups[index](x)
            skip_connection = skip_connections[index // 2]

            if x.shape != skip_connection.shape:
                x = tf.resize(x, size=skip_connection.shape[2:])

            x = torch.cat((skip_connection, x), dim=1)
            x = self.ups[index + 1](x)

        return self.final_conv(x)

The UNet class takes the number of input channels, number of output channels, and a list of features as arguments. It initializes the encoder and decoder blocks, including the pooling and transposed convolution layers.

Testing the Model

To ensure our model works correctly, let's write a test function and create a test case using random input data. Here's an example of how to test the model:

def test_model():
    torch.manual_seed(42)
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    test_input = torch.rand((1, in_channels, 256, 256)).to(device)

    model = UNet(in_channels, out_channels).to(device)
    predictions = model(test_input)

    assert predictions.shape == (1, out_channels, 256, 256), "Incorrect output shape"
    print("Model test passed!")

if __name__ == "__main__":
    test_model()

In the test_model() function, we set a random seed, create a random input tensor of size (1, in_channels, 256, 256), convert it to the appropriate device, and pass it through the model. We then assert that the output Shape is as expected and print a success message if the test passes.

Preprocessing the Data

Before we can train the model, we need to preprocess the data. This involves resizing the images, performing data augmentation, and normalizing the pixel values. We will use the PIL library for image manipulation and Albumentations library for data augmentation.

import os
from PIL import Image
from torch.utils.data import Dataset
import numpy as np

class CarvanaDataset(Dataset):
    def __init__(self, image_dir, mask_dir, transform=None):
        self.image_dir = image_dir
        self.mask_dir = mask_dir
        self.transform = transform
        self.images = os.listdir(image_dir)

    def __len__(self):
        return len(self.images)

    def __getitem__(self, index):
        image_path = os.path.join(self.image_dir, self.images[index])
        mask_path = os.path.join(self.mask_dir, self.images[index].replace(".jpg", "_mask.gif"))

        image = np.array(Image.open(image_path).convert("RGB"))
        mask = np.array(Image.open(mask_path).convert("L"))

        if self.transform:
            augmented = self.transform(image=image, mask=mask)
            image = augmented["image"]
            mask = augmented["mask"]

        return image, mask

In the CarvanaDataset class, we define the image directory, mask directory, and an optional transformation function. The __len__() method returns the total number of images in the dataset, and __getitem__() loads each image and corresponding mask. We convert them to numpy arrays and Apply the transformation function if specified.

To use the dataset, we need to create a PyTorch data loader:

from torch.utils.data import DataLoader
import torchvision.transforms as transforms

train_transforms = transforms.Compose([
    transforms.Resize((128, 128)),
    # Add additional transforms here
    transforms.ToTensor(),
])

val_transforms = transforms.Compose([
    transforms.Resize((128, 128)),
    # Add additional transforms here
    transforms.ToTensor(),
])

train_dataset = CarvanaDataset(train_image_dir, train_mask_dir, transform=train_transforms)
val_dataset = CarvanaDataset(val_image_dir, val_mask_dir, transform=val_transforms)

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=num_workers, pin_memory=True)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, num_workers=num_workers, pin_memory=True)

The train_transforms and val_transforms are composition of transforms to be applied to the training and validation images, respectively. You can add additional transforms Based on your specific requirements. We then instantiate the CarvanaDataset class with the Relevant directories and transformations and create data loaders with the specified batch size, shuffling, and number of workers.

Loading the Data

With the data preprocessing complete, we can now focus on loading the data. We will use the DataLoader class provided by PyTorch to streamline the process.

def get_loaders(train_image_dir, train_mask_dir, val_image_dir, val_mask_dir, batch_size, train_transforms, val_transforms, num_workers, pin_memory):
    train_dataset = CarvanaDataset(train_image_dir, train_mask_dir, transform=train_transforms)
    val_dataset = CarvanaDataset(val_image_dir, val_mask_dir, transform=val_transforms)

    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=num_workers, pin_memory=pin_memory)
    val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, num_workers=num_workers, pin_memory=pin_memory)

    return train_loader, val_loader

if __name__ == "__main__":
    train_loader, val_loader = get_loaders(train_image_dir, train_mask_dir, val_image_dir, val_mask_dir, batch_size, train_transforms, val_transforms, num_workers, pin_memory)
    print("Data loaders created successfully!")

In the get_loaders() function, we create instances of the CarvanaDataset class for both the training and validation data. We then create the data loaders by specifying the desired batch size, shuffling, number of workers, and pinning memory.

Training the Model

Now that we have our data loaders, we can proceed with training the model. The training process involves iterating over the data loader and performing forward and backward passes to update the model's weights.

def train_model(loader, model, optimizer, criterion, device):
    model.train()
    total_loss = 0

    for batch_idx, (data, targets) in enumerate(loader):
        data = data.to(device)
        targets = targets.unsqueeze(1).to(device)

        optimizer.zero_grad()
        outputs = model(data)
        loss = criterion(outputs, targets)

        loss.backward()
        optimizer.step()
        total_loss += loss.item()

        if batch_idx % 10 == 0:
            print(f"Batch {batch_idx}/{len(loader)}, Loss: {loss.item()}")

    return total_loss / len(loader)

The train_model() function takes a data loader, model, optimizer, loss criterion, and device as arguments. Inside the function, we set the model to training mode, initialize the total loss variable, and iterate over the data loader.

For each batch, we send the data and targets to the specified device, zero the optimizer's gradients, perform a forward pass through the model, compute the loss, perform backpropagation, and update the model's weights using the optimizer. We also keep track of the total loss for this epoch.

Once the training is complete, we return the average loss per batch.

To train the model, we need to set up the training process:

if __name__ == "__main__":
    model = UNet(in_channels, out_channels).to(device)
    optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
    criterion = nn.BCEWithLogitsLoss().to(device)

    for epoch in range(num_epochs):
        train_loss = train_model(train_loader, model, optimizer, criterion, device)
        print(f"Epoch {epoch}/{num_epochs}, Training Loss: {train_loss}")
        # Add validation and evaluation here

In the main part of the script, we instantiate the model, optimizer, and loss criterion. We then loop over the specified number of epochs and call the train_model() function to train the model on the provided training data.

Evaluating Model Performance

During training, it's important to evaluate the model's performance on a separate validation set to monitor its progress and prevent overfitting. We can use the validation loader to calculate relevant metrics such as accuracy or loss.

def check_accuracy(loader, model, device):
    model.eval()
    num_correct = 0
    num_pixels = 0
    dice_score = 0

    with torch.no_grad():
        for data, targets in loader:
            data = data.to(device)
            targets = targets.unsqueeze(1).to(device)

            predictions = model(data)
            predictions = (predictions > 0.5).float()

            num_correct += (predictions == targets).sum()
            num_pixels += targets.numel()
            dice_score += (2 * (predictions * targets).sum() + 1e-8) / (predictions.sum() + targets.sum() + 1e-8)

    accuracy = float(num_correct) / num_pixels * 100
    dice_score = dice_score / len(loader)

    print(f"Accuracy: {accuracy:.2f}%")
    print(f"Dice Score: {dice_score:.2f}")

The check_accuracy() function takes a data loader, model, and device as arguments. Inside the function, we set the model to evaluation mode and initialize variables for counting the number of correct predictions and pixels. We also calculate the dice score to measure the model's performance.

We iterate over the data loader, compute the predictions, compare them with the targets, and update the corresponding variables. Finally, we calculate the accuracy and dice score and print them.

To evaluate the model, we can add the following code after the training loop:

if __name__ == "__main__":
    # ...
    for epoch in range(num_epochs):
        train_loss = train_model(train_loader, model, optimizer, criterion, device)
        print(f"Epoch {epoch}/{num_epochs}, Training Loss: {train_loss}")

        check_accuracy(val_loader, model, device)

Saving and Loading the Model

Saving and loading the model's state allows us to resume training or use the trained model for inference without having to retrain it from scratch. We can utilize the torch.save() and torch.load() functions to handle model checkpointing.

def save_checkpoint(state, filename='checkpoint.pth.tar'):
    torch.save(state, filename)
    print("Checkpoint saved successfully!")

def load_checkpoint(model, optimizer, filename='checkpoint.pth.tar'):
    checkpoint = torch.load(filename)
    model.load_state_dict(checkpoint['state_dict'])
    optimizer.load_state_dict(checkpoint['optimizer'])
    print("Checkpoint loaded successfully!")

The save_checkpoint() function takes a dictionary containing the model's state and other relevant information and saves it to the specified filename. The load_checkpoint() function loads the saved checkpoint and updates the model and optimizer's states accordingly.

In our main script, we can add the following code to save and load checkpoints:

if __name__ == "__main__":
    # ...
    if load_model:
        load_checkpoint(model, optimizer)

    # Training loop
    for epoch in range(num_epochs):
        train_loss = train_model(train_loader, model, optimizer, criterion, device)
        print(f"Epoch {epoch}/{num_epochs}, Training Loss: {train_loss}")

        check_accuracy(val_loader, model, device)

        if save_model:
            save_checkpoint({
                'state_dict': model.state_dict(),
                'optimizer': optimizer.state_dict(),
            })

Here, we introduced two boolean flags load_model and save_model (set to False by default). If load_model is set to True, the script will load the saved checkpoint before starting the training loop. If save_model is set to True, the script will save the model's state at the end of each epoch.

Visualizing Predictions

To gain insights into the model's performance and Visualize the predictions, we can save some example images with their corresponding predictions to a folder:

def save_predictions_as_images(loader, model, folder, device):
    model.eval()

    if not os.path.exists(folder):
        os.makedirs(folder)

    with torch.no_grad():
        for idx, (data, _) in enumerate(loader):
            data = data.to(device)
            predictions = model(data)
            predictions = (predictions > 0.5).float()

            for i in range(len(data)):
                img = (data[i] * 255).byte().cpu().numpy().transpose(1, 2, 0)
                pred = (predictions[i] * 255).byte().cpu().numpy().squeeze()
                Image.fromarray(img).save(os.path.join(folder, f"image_{idx * len(data) + i}.jpg"))
                Image.fromarray(pred).save(os.path.join(folder, f"prediction_{idx * len(data) + i}.jpg"))

    print(f"Predictions saved in {folder}")

The save_predictions_as_images() function takes a data loader, model, output folder path, and device as arguments. Inside the function, we set the model to evaluation mode and create the output folder if it does not exist.

We iterate over the data loader, generate predictions using the trained model, and save the input images and corresponding predictions as separate JPEG files in the specified folder.

To generate and save predictions, we can add the following code after the training loop:

if __name__ == "__main__":
    # ...
    for epoch in range(num_epochs):
        train_loss = train_model(train_loader, model, optimizer, criterion, device)
        print(f"Epoch {epoch}/{num_epochs}, Training Loss: {train_loss}")

        check_accuracy(val_loader, model, device)

        if save_model:
            save_checkpoint({
                'state_dict': model.state_dict(),
                'optimizer': optimizer.state_dict(),
            })

        save_predictions_as_images(val_loader, model, "predictions", device)

Conclusion

In this article, we have built an image segmentation model in PyTorch from scratch. We have covered the model architecture, data preprocessing, data loading, training process, and model evaluation. We have also discussed saving and loading model checkpoints, as well as visualizing predictions.

By following this guide, you should now have a good understanding of how to approach image segmentation tasks and build your own models in PyTorch. Remember to experiment with different architectures, loss functions, and hyperparameters to achieve better results.

Frequently Asked Questions

Q: What is image segmentation? A: Image segmentation is the process of dividing an image into multiple segments or regions to simplify image analysis. It is a fundamental task in computer vision and has applications in various fields, such as object recognition, medical imaging, and self-driving cars.

Q: What is the U-Net architecture? A: The U-Net architecture is a popular deep learning architecture for image segmentation tasks. It consists of an encoder-decoder design with skip connections between them. The encoder part captures the context and extracts features from the input, while the decoder part upsamples the features and generates the segmentation map.

Q: What is data augmentation? A: Data augmentation is a technique used to artificially increase the size of the training dataset by applying random transformations to the input data. It helps to reduce overfitting and improve the model's generalization. Common data augmentation techniques include flipping, rotating, scaling, cropping, and adding noise to the images.

Q: How do I choose the appropriate loss function for image segmentation? A: The choice of loss function depends on the specific requirements of your task. For binary image segmentation, commonly used loss functions are binary cross entropy and dice loss. For multi-class segmentation, you can use cross-entropy loss or dice loss with additional modifications like focal loss or Tversky loss.

Q: How can I improve the performance of my image segmentation model? A: There are several ways to improve the performance of your image segmentation model:

  1. Increase the size of your training dataset or use data augmentation techniques to introduce more variation.
  2. Experiment with different model architectures, such as U-Net variants, to find the one that works best for your task.
  3. Fine-tune the hyperparameters, including learning rate, batch size, optimizer, and regularization techniques.
  4. Use pre-trained models and transfer learning to leverage knowledge from other related tasks or domains.
  5. Explore advanced techniques like ensembling, semi-Supervised learning, or active learning.
  6. Continuously evaluate your model's performance and analyze its weaknesses to guide future improvement efforts.

Remember that improving model performance is an iterative process, and it may require experimentation and fine-tuning based on your specific task and dataset.

Most people like

Are you spending too much time looking for ai tools?
App rating
4.9
AI Tools
100k+
Trusted Users
5000+
WHY YOU SHOULD CHOOSE TOOLIFY

TOOLIFY is the best ai tool source.

Browse More Content