Written by Brian Hulela
03 Sep 2025 • 18:52
Intersection over Union (IoU) is a fundamental metric used in computer vision, particularly in tasks such as object detection, image segmentation, and tracking.
It provides a quantitative measure of how well a predicted region matches the ground truth region.
Mathematically, IoU is defined as the ratio of the overlap area between two regions to the area of their union:
Where:
is the ground truth bounding box or mask.
is the predicted bounding box or mask.
is the area of intersection.
is the area of union.
The result lies in the range:
means no overlap at all.
means perfect overlap.
Suppose we have two bounding boxes:
Ground truth : top-left , bottom-right .
Prediction : top-left , bottom-right .
The areas are:
The intersection box is from to :
The union area:
So the IoU is:
This indicates a relatively poor overlap between the predicted and ground truth boxes.
Object detection models like YOLO, Faster R-CNN, and SSD rely heavily on IoU for evaluation. A prediction is considered correct if its IoU with the ground truth is above a certain threshold.
For example, in the Pascal VOC challenge, a threshold of 0.5 is used. If , the detection counts as a True Positive.
In MS COCO, multiple thresholds (from 0.5 to 0.95, in steps of 0.05) are averaged to give a more robust metric called mean Average Precision (mAP).
This makes IoU the backbone of evaluating detection performance.
When an object detector outputs multiple overlapping bounding boxes for the same object, Non-Maximum Suppression is applied. IoU is used to decide which boxes to keep and which to discard.
The algorithm keeps the box with the highest confidence score.
It removes other boxes with IoU above a certain threshold (e.g., 0.5) with respect to the kept box.
This ensures that each object is represented by only one bounding box.
In semantic and instance segmentation, IoU is used to compare predicted masks with ground truth masks. Here, the intersection and union are computed over pixels rather than bounding boxes.
For example:
This metric is often referred to as the Jaccard Index, which is mathematically equivalent to IoU.
In object tracking tasks, IoU helps measure how well a predicted bounding box follows the same object across frames. High IoU scores between consecutive frames indicate stable tracking.
Although this may be highly inaccurate in fast-paced environments, which can be mitigated by using a high-speed camera sensor.
In LiDAR- or depth-based perception systems (e.g., self-driving cars), IoU is extended into three dimensions. The formula remains the same, but intersection and union volumes replace areas:
This is crucial in evaluating 3D bounding box predictions for autonomous navigation and robotics.
Generalized IoU (GIoU): Adds a penalty term when predicted and ground truth boxes do not overlap, making the metric more informative during training.
Distance IoU (DIoU) and Complete IoU (CIoU): Introduce distance and aspect ratio terms to improve optimization in bounding box regression.
Soft IoU: Used in segmentation where probabilistic masks are considered, allowing for smoother evaluation.
Intersection over Union provides a consistent and interpretable way to measure spatial agreement between predictions and ground truth. Its simplicity, bounded range, and adaptability across 2D and 3D tasks make it one of the most widely used metrics in computer vision.
From filtering redundant detections in NMS to serving as the foundation of accuracy benchmarks in challenges like COCO, IoU has become indispensable in developing and evaluating machine perception systems.