Create a Bounding Box from a Segmentation Mask in Python
If you have a segmentation mask and want to convert it to a bounding box using Python, this tutorial should help. This tutorial assumes your segmentation mask is an image with black pixels as background and some shade of gray or white as foreground.
For example, (while not required to use this tutorial) if you are taking our course on 3D Rendered Datasets in Blender for Beginners — Immersive Limit, you may want to create segmentations using this method: Image Segmentation with Blender 2.83 - YouTube. Then you can use the code below to create bounding boxes. You may also find this tutorial helpful to understand RLE encoding: Run Length Encoding - YouTube
A bounding box is a rectangular box that completely encloses an object in an image. We will use OpenCV to open the image and then Numpy to efficiently calculate the bounding box. The advantage of Numpy is the speed of calculation. While it is possible to do this by reading pixels individually, Python is far slower than the underlying C code Numpy uses.
import numpy as np
import cv2
############
# Insert your code here to create/fetch your segmentation image
# My code was set up in Blender to output a segmentation png like this:
# seg_path = output_path / 'Segmentation_1.png'
############
# Read in the image with OpenCV
im = cv2.imread(str(seg_path), 0)
# Segmentation color value (in Blender this is likely your pass index)
seg_value = 1
if im is not None:
np_seg = np.array(im)
segmentation = np.where(np_seg == seg_value)
# Bounding Box
bbox = 0, 0, 0, 0
if len(segmentation) != 0 and len(segmentation[1]) != 0 and len(segmentation[0]) != 0:
x_min = int(np.min(segmentation[1]))
x_max = int(np.max(segmentation[1]))
y_min = int(np.min(segmentation[0]))
y_max = int(np.max(segmentation[0]))
bbox = x_min, x_max, y_min, y_max
#########
# Do what you need to do with the bbox, for example add it to your annotation file
#########
else:
# Handle error case where segmentation image cannot be read or is empty
print("Error: Segmentation image could not be read or is empty.")
License
All code on this page is licensed under the MIT License
MIT License Copyright (c) 2023 Immersive Limit LLC Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.