Back to Home

Brain Tumor Segmentation using Attention-UNet

Muhammad Usman Khan1, Dr. Eid Rehman2

Medical AI Deep Learning Attention-UNet Segmentation

Abstract

This notebook is part of the NeuroInsight project, software designed to revolutionize the way medical professionals analyze brain MRI scans for abnormalities such as cancer. Our research demonstrates how we trained a convolutional neural network using the Attention-UNet architecture to perform highly accurate brain tumor segmentation.

Our model automates the process of:

1. About the Dataset

The dataset utilized for this research is BraTS 2019. This dataset is a well-known benchmark used for developing and evaluating algorithms for brain tumor analysis from multi-modal MRI scans. It is part of the Medical Segmentation Decathlon and the Multimodal Brain Tumor Image Segmentation Challenge (BraTS).

Dataset Overview

Four individual brain MRI scans showing T1, T1ce, T2, and FLAIR modalities alongside a combined colored Mask
Figure 1: MRI Modalities (T1, T1ce, T2, FLAIR) provided per patient case

Ground Truth Annotations

Each label map contains voxel-wise annotations for specific tumor sub-regions:

A sequence of masks breaking down the tumor into individual classes: Original Segmentation, Not Tumor, Non-Enhancing Tumor, Edema, and Enhancing Tumor
Figure 2: Breakdown of Ground Truth Segmentation Labels

2. Data Splitting

To train and evaluate our model effectively, we split our dataset into three distinct parts. We implemented Stratified Splitting so that the distribution of High-Grade Glioma (HGG) and Low-Grade Glioma (LGG) samples is preserved in each set, completely avoiding class imbalance issues.

Bar chart showing Stratified Distribution of HGG/LGG across Test, Train, and Validation splits
Figure 3: Stratified Distribution of HGG and LGG cases across Dataset Splits

3. Data Scaling & Preprocessing

Z-score normalization (or standardization) is a vital preprocessing technique that transforms data to have a mean of 0 and a standard deviation of 1. This is particularly important for medical imaging tasks where pixel values can range widely (e.g., maximum values in FLAIR images exceeding 1273.0), which could otherwise slow down the training process.

Z-Score Normalization Formula

$$z = \frac{x - \mu}{\sigma}$$

Z-score normalization helps scale these raw pixel values into a directly comparable range, drastically improving the stability and computational efficiency of the neural network's training phase.

4. Data Generator

To manage the significant memory footprint of volumetric MRI data, we developed a custom Python Data Generator. The following preprocessing steps occur dynamically within this generator:

  1. Loading and Normalizing: The __load_and_normalize function loads the T1CE and FLAIR MRI modalities using nibabel. Images are Z-score normalized (subtracting the mean, dividing by standard deviation) ensuring a zero-centered distribution.
  2. Processing Masks: The __load_segmentation function loads the segmentation mask and replaces class 4 with class 3 (a common approach in multi-class segmentation tasks to simplify continuous categorical logic).
  3. Slice Selection: The generator selects slices within a specified active range (default 50 to 130) to eliminate empty background slices, reducing memory usage and focusing purely on informative data.
  4. Resizing and Stacking: T1CE and FLAIR slices are resized to a consistent shape of 128x128 pixels. The modalities are then stacked together along the channel dimension to form a multi-channel input for the model.
  5. One-Hot Encoding: The 128x128 segmentation mask is processed via one-hot encoding, expanding it into a categorical mask with 4 distinct channels/classes, making it strictly suitable for the segmentation loss function.
  6. Shuffling: At the end of each epoch, an on_epoch_end callback shuffles patient data to ensure random, unbiased sampling during iterative training.
Diagram showing conversion of an Original Array of 1x240x240 into a One-Hot Encoded array of 1x240x240x4
Figure 4: One-Hot Encoding process for Segmentation Masks

5. Model Architecture Specifications

Component Details
Input Dimensions 128 × 128 × 2 (FLAIR and T1CE MRI slices)
Encoder Path 3 blocks of Conv2D (ReLU) + MaxPooling with expanding filters: 32 → 64 → 128
Bottleneck Layer 2 Conv2D layers with 256 filters each
Attention Gates Applied dynamically before each decoder skip connection (128, 64, 32 filters)
Decoder Path 3 UpSampling blocks with Conv2D layers and concatenated attention-filtered features
Output Layer 1 × 1 Conv2D + Softmax activation (for 4-class categorical segmentation)
Loss Function Categorical Crossentropy
Optimization Adam Optimizer (learning rate = 0.001)
Training Efficiency Mixed Precision Training (float16) with a Batch Size of 2

6. Evaluation Metrics

We evaluated our trained model using the following rigorous medical imaging metrics:

7. Neuro Insight: Clinical Application

To bridge the gap between academic research and accessible clinical utility, the Attention-UNet segmentation model detailed above was integrated into a full-stack medical application named Neuro Insight.

This platform empowers non-technical medical professionals to seamlessly upload 3D MRI scans, perform real-time automated segmentation, accurately classify tumor grades (HGG/LGG), and automatically generate detailed, shareable clinical reports.

Dashboard View
Screen 1 - Dashboard View
Modal to upload 3D MRIs
Screen 2 - Modal to upload 3D MRIs
MRI Uploaded
Screen 3 - MRI Uploaded
Inference generated
Screen 4 - Inference generated
Report generated
Screen 5 - Report generated
Report in PDF
Screen 6 - Report in PDF
Home page
Screen 7 - Home page
Teams page
Screen 8 - Teams page