Image Segmentation Based on Color

Image segmentation is a fundamental task in image processing and computer vision, where the goal is to partition an image into multiple segments or regions. Each segment typically corresponds to different objects or parts of the scene. Color-based segmentation is one of the most popular approaches to segment an image, especially when color information is a key distinguishing feature between regions.

In this article, we’ll delve into how color-based image segmentation works, explain the mathematical concepts involved, and provide an example.

1. Color Spaces

To perform color-based segmentation, it’s essential to understand color spaces. A color space is a way to represent colors in a digital image using numerical values. The most common color spaces used for segmentation are:

  • RGB (Red, Green, Blue): A color is represented by three channels, each corresponding to the intensity of red, green, and blue components, with values typically ranging from 0 to 255.

    • Mathematically, a pixel is represented as: Pixel=(R,G,B)\text{Pixel} = (R, G, B) where RR, GG, and BB represent the intensities of red, green, and blue, respectively.
  • HSV (Hue, Saturation, Value): A more intuitive color space for humans, especially when it comes to segmenting objects based on color.

    • Hue (H) represents the type of color (e.g., red, blue).

    • Saturation (S) represents the purity or intensity of the color.

    • Value (V) represents the brightness of the color.

    • Mathematically, a pixel in the HSV color space can be represented as:

      Pixel=(H,S,V)\text{Pixel} = (H, S, V)

      where HH, SS, and VV represent hue, saturation, and value, respectively.

2. Thresholding for Segmentation

Thresholding is a simple yet effective method for color-based segmentation. The idea is to define thresholds for the color values to separate regions of interest from the background.

For example, in an RGB image, suppose we want to segment all red objects. We can set thresholds for the R, G, and B channels, such as:

If R>150 and G<100 and B<100, then classify the pixel as red.\text{If } R > 150 \text{ and } G < 100 \text{ and } B < 100, \text{ then classify the pixel as red}.

Pixels that meet these conditions are considered part of the red object, while others are not.

In the HSV color space, we can apply a threshold on the hue to segment a specific color:

If H[170,10] (red hue), classify the pixel as red.\text{If } H \in [170^\circ, 10^\circ] \text{ (red hue)}, \text{ classify the pixel as red}.

This method is effective for simple images with clearly defined colors but can struggle when colors overlap or when the lighting conditions vary significantly.

3. Clustering for Segmentation: K-Means Algorithm

Another method for color-based segmentation is to use clustering algorithms like K-Means. The idea is to group pixels with similar colors into clusters, where each cluster represents a different region.

The K-Means algorithm works as follows:

  1. Convert the image into a feature vector: In color segmentation, each pixel’s color (in RGB or HSV) can be treated as a vector in 3D space. For an RGB image, the feature vector of a pixel is:

    x=(R,G,B)\mathbf{x} = (R, G, B)
  2. Choose the number of clusters, K: This defines how many segments you want in the image. For instance, if you expect three objects in the image, set K=3K = 3.

  3. Randomly initialize K cluster centroids: These are the starting points for the clusters in color space.

  4. Assign each pixel to the nearest centroid: Using a distance metric (typically the Euclidean distance), assign each pixel to the closest centroid. The Euclidean distance for two pixels xi=(Ri,Gi,Bi)\mathbf{x}_i = (R_i, G_i, B_i) and xj=(Rj,Gj,Bj)\mathbf{x}_j = (R_j, G_j, B_j) is given by:

    d(xi,xj)=(RiRj)2+(GiGj)2+(BiBj)2d(\mathbf{x}_i, \mathbf{x}_j) = \sqrt{(R_i – R_j)^2 + (G_i – G_j)^2 + (B_i – B_j)^2}
  5. Recompute the centroids: For each cluster, calculate the mean of all the pixels assigned to that cluster, and update the centroid.

  6. Repeat steps 4 and 5 until convergence: The algorithm continues iterating until the centroids no longer change significantly.

4. Example: Segmenting an Image of Fruits

Consider an image with three types of fruit: apples (red), bananas (yellow), and grapes (purple). We want to segment the image based on these colors.

Step 1: Convert the image into RGB space

Each pixel in the image is represented as an (R,G,B)(R, G, B) vector. For example, a pixel of an apple might be (200,30,50)(200, 30, 50), a banana might be (255,240,60)(255, 240, 60), and a grape might be (100,50,150)(100, 50, 150).

Step 2: Apply K-Means Clustering

We choose K=3K = 3, as we expect three main color regions. The algorithm clusters the pixels into three groups based on their RGB values.

Step 3: Segment the Image

After clustering, we label each pixel based on the cluster it belongs to. All pixels in the “apple” cluster will form the apple segment, all pixels in the “banana” cluster form the banana segment, and so on.

Step 4: Visualize the Segmentation

The result is a segmented image where the apples, bananas, and grapes are isolated into distinct regions based on their color.

5. Advanced Methods: Gaussian Mixture Models (GMM)

For more complex segmentation tasks, Gaussian Mixture Models (GMM) can be used. This approach models the color distribution of each segment as a Gaussian distribution. Each segment corresponds to a different Gaussian, and the image is segmented by determining the probability that a pixel belongs to each Gaussian distribution.

In the RGB color space, the probability that a pixel x=(R,G,B)\mathbf{x} = (R, G, B) belongs to a segment is given by:

P(x)=1(2π)3/2Σ1/2exp(12(xμ)Σ1(xμ))P(\mathbf{x}) = \frac{1}{(2\pi)^{3/2} |\Sigma|^{1/2}} \exp \left( -\frac{1}{2} (\mathbf{x} – \mu)^\top \Sigma^{-1} (\mathbf{x} – \mu) \right)

where:

  • μ\mu is the mean of the Gaussian (the average color of the segment).
  • Σ\Sigma is the covariance matrix, representing the spread of the color distribution.

6. Evaluation Metrics

To evaluate the quality of the segmentation, we can use metrics like:

  • Intersection over Union (IoU): This measures the overlap between the predicted segment and the ground truth.
  • Accuracy: The percentage of correctly classified pixels.
  • Dice Coefficient: A similarity measure between two sets of pixels, particularly useful for medical imaging segmentation.

Segmentation in RGB Vector Space

The image you shared describes a process for image segmentation in RGB vector space using Euclidean distance and covariance-based measures to determine whether a pixel belongs to a specific color region. Below, I provide a more detailed breakdown of the concepts with full mathematical explanations based on the information shown in the image.

Segmentation in RGB Vector Space

Objective:

The goal of color-based segmentation is to classify pixels of an image based on their color values. The segmentation process aims to identify pixels whose color is similar to a pre-defined or average color that represents the region or object of interest.

Euclidean Distance in RGB Space:

In this method, each pixel in an image is represented as a point in a 3D RGB color space, where the axes correspond to the Red (R), Green (G), and Blue (B) components.

Let:

  • z=(zR,zG,zB)\mathbf{z} = (z_R, z_G, z_B) represent the RGB values of a pixel in the image.
  • a=(aR,aG,aB)\mathbf{a} = (a_R, a_G, a_B) represent the RGB values of the average color for the region we wish to segment.

To determine how similar the pixel color z\mathbf{z} is to the average color a\mathbf{a}, we calculate the Euclidean distance between them, denoted by D(z,a)D(\mathbf{z}, \mathbf{a}), as:

D(z,a)=za=(zRaR)2+(zGaG)2+(zBaB)2D(\mathbf{z}, \mathbf{a}) = \| \mathbf{z} – \mathbf{a} \| = \sqrt{(z_R – a_R)^2 + (z_G – a_G)^2 + (z_B – a_B)^2}

This distance measures how far the pixel color is from the average color. If the distance is below a pre-defined threshold D0D_0, the pixel is considered to belong to the target region. Otherwise, it is classified as background or part of another region.

Segmentation Criterion:

The locus of points such that the Euclidean distance D(z,a)D0D(\mathbf{z}, \mathbf{a}) \leq D_0 forms a sphere of radius D0D_0 in the RGB color space. Points (pixels) inside this sphere are considered similar to the average color, while points outside are not.

The binary segmentation mask can be created by coding points inside the sphere as “foreground” (e.g., white) and points outside as “background” (e.g., black).

Generalized Distance Measure:

In more complex cases, we can generalize this distance formula to account for varying degrees of variance in different directions of the color space. This leads to the Mahalanobis distance, which takes into account the covariance matrix C\mathbf{C} of the color samples.

The generalized distance formula is given by:

D(z,a)=(za)C1(za)D(\mathbf{z}, \mathbf{a}) = \sqrt{ (\mathbf{z} – \mathbf{a})^\top \mathbf{C}^{-1} (\mathbf{z} – \mathbf{a}) }

Where:

  • C\mathbf{C} is the covariance matrix of the RGB values of the sample color points (used to describe the distribution of the color data).
  • C1\mathbf{C}^{-1} is the inverse of the covariance matrix.

The locus of points where this distance is less than or equal to a threshold D0D_0 forms an ellipsoid in RGB space rather than a sphere. The shape of the ellipsoid reflects the variance of the color data in each direction (Red, Green, and Blue). Points inside the ellipsoid are considered similar to the average color.

Special Case: Identity Matrix

If the covariance matrix C\mathbf{C} is the identity matrix (i.e., C=I\mathbf{C} = \mathbf{I}), the generalized distance formula simplifies back to the standard Euclidean distance, as the matrix multiplication does not change the geometry.

Bounding Box Approximation:

To reduce computational complexity, instead of using ellipsoids or spheres, we can use a bounding box around the region of interest. The bounding box is an axis-aligned box in RGB space, where:

  • The box is centered on the average color a\mathbf{a}.
  • The dimensions of the box are proportional to the standard deviation of the RGB values along each axis.

For example, for the red axis, the width of the box would be based on the standard deviation of the red values from the color samples. This approach avoids computing distances and simplifies the segmentation task.

Practical Implementation:

  • For every pixel z\mathbf{z}, we check if it lies within the bounding box, which is much simpler than checking if it is inside an ellipsoid or computing square roots as required by the Euclidean distance.
  • This method provides a faster, but potentially less accurate, way to perform color-based segmentation, especially for real-time applications.
Segmentation in
 RGB space.
 (a) Original image
 with colors of
 interest shown
 enclosed by a
 rectangle.
 (b) Result of
 segmentation in
 RGB vector
 space

Color Edge Detection

The new image you’ve uploaded covers color edge detection and provides mathematical formulations to process the gradient of color images in the RGB color space. I’ll break down the key points and mathematical concepts for better understanding, based on the image you shared.

Color Edge Detection

Edge detection is critical for segmenting images based on the transitions between regions with different properties. In this section, the focus is on detecting edges specifically in color images by considering the gradients of the RGB channels.

Problems with Scalar Gradient:

Previously, gradient operators were introduced for grayscale images, where the gradient is defined for a scalar function. However, in color images, the RGB components represent a vector function. Applying gradient operators individually to each color channel and combining the results can lead to errors. Therefore, a new definition for the gradient is required when working with vector quantities like RGB color space.

Gradient in RGB Vector Space

Let the vectors r,g,b\mathbf{r}, \mathbf{g}, \mathbf{b} represent unit vectors along the R (red), G (green), and B (blue) axes in the RGB color space. To compute the gradient, we define two new vectors that represent the partial derivatives of the image intensities:

  • Vector u\mathbf{u}: This represents the partial derivatives of R, G, and B with respect to xx:
u=Rxr+Gxg+Bxb\mathbf{u} = \frac{\partial R}{\partial x} \mathbf{r} + \frac{\partial G}{\partial x} \mathbf{g} + \frac{\partial B}{\partial x} \mathbf{b}
  • Vector v\mathbf{v}: This represents the partial derivatives of R, G, and B with respect to yy:
v=Ryr+Gyg+Byb\mathbf{v} = \frac{\partial R}{\partial y} \mathbf{r} + \frac{\partial G}{\partial y} \mathbf{g} + \frac{\partial B}{\partial y} \mathbf{b}

These vectors capture how the color intensities change across the image in the xx– and yy-directions, respectively.

Computation of Second-Order Quantities

Next, we compute quantities that represent the changes in the vectors u\mathbf{u} and v\mathbf{v}, which correspond to the gradients of the color components. These are defined as dot products:

  • gxxg_{xx}: Represents the gradient of the image with respect to xx:
gxx=uu=(Rx)2+(Gx)2+(Bx)2g_{xx} = \mathbf{u} \cdot \mathbf{u} = \left( \frac{\partial R}{\partial x} \right)^2 + \left( \frac{\partial G}{\partial x} \right)^2 + \left( \frac{\partial B}{\partial x} \right)^2
  • gyyg_{yy}: Represents the gradient of the image with respect to yy:
gyy=vv=(Ry)2+(Gy)2+(By)2g_{yy} = \mathbf{v} \cdot \mathbf{v} = \left( \frac{\partial R}{\partial y} \right)^2 + \left( \frac{\partial G}{\partial y} \right)^2 + \left( \frac{\partial B}{\partial y} \right)^2
  • gxyg_{xy}: Represents the mixed derivative that combines changes in both xx and yy:
gxy=uv=RxRy+GxGy+BxByg_{xy} = \mathbf{u} \cdot \mathbf{v} = \frac{\partial R}{\partial x} \frac{\partial R}{\partial y} + \frac{\partial G}{\partial x} \frac{\partial G}{\partial y} + \frac{\partial B}{\partial x} \frac{\partial B}{\partial y}

Direction of Maximum Rate of Change

The goal of edge detection is to identify areas where there is a sharp change in image intensity, which corresponds to the maximum rate of change in the image. The direction θ(x,y)\theta(x, y) of the maximum rate of change at any point (x,y)(x, y) is given by:

θ(x,y)=12tan1(2gxygxxgyy)\theta(x, y) = \frac{1}{2} \tan^{-1} \left( \frac{2 g_{xy}}{g_{xx} – g_{yy}} \right)

This formula calculates the angle where the maximum change in image intensity occurs.

Magnitude of the Maximum Rate of Change

Once the direction θ(x,y)\theta(x, y) is computed, the value of the maximum rate of change at a point (x,y)(x, y), denoted as Fθ(x,y)F_\theta(x, y), is given by:

Fθ(x,y)=[12((gxx+gyy)+(gxxgyy)cos2θ(x,y)+2gxysin2θ(x,y))]12F_\theta(x, y) = \left[ \frac{1}{2} \left( (g_{xx} + g_{yy}) + (g_{xx} – g_{yy}) \cos 2 \theta(x, y) + 2 g_{xy} \sin 2 \theta(x, y) \right) \right]^{\frac{1}{2}}

This equation combines the second-order derivatives to compute the magnitude of the change in image intensity, which is useful for detecting edges.

Interpretation and Application

This method extends the concept of the gradient from scalar functions (grayscale images) to vector functions (color images), allowing the detection of edges in color images. The technique described by Di Zenzo (1986) focuses on calculating the direction and magnitude of the gradient for color images, providing a more accurate representation of edges than traditional methods.

The last part of the explanation outlines that the derivation of the formula is quite involved and refers to further reading (the work by Di Zenzo) for a deeper understanding. The partial derivatives required for this computation can be obtained using well-known edge detection techniques, such as the Sobel operators.

References

  • Gonzalez, R.C., & Woods, R.E. (2018). Digital Image Processing. Pearson.
  • Di Zenzo, S. (1986). A note on the gradient of a multi-image. Computer Vision, Graphics, and Image Processing, 33(1), 116-125.
  • Jain, A.K., & Dubes, R.C. (1988). Algorithms for Clustering Data. Prentice-Hall.

Leave a Comment