Object Recognition

Object recognition is a crucial technology in the field of computer vision and artificial intelligence, enabling machines to identify and classify various objects within digital images or video feeds. By mimicking the human ability to recognize and categorize objects, object recognition has paved the way for advancements across industries, from autonomous driving and facial recognition to medical imaging and industrial automation. With recent strides in machine learning and deep learning, particularly with convolutional neural networks (CNNs), the accuracy and efficiency of object recognition systems have dramatically improved.

Patterns and Pattern Classes

A tree description of the image
A tree description of the image

In pattern recognition, understanding “patterns” and “pattern classes” is fundamental. Here’s a structured breakdown of the mathematical concepts and examples discussed.

1. Patterns and Pattern Classes

  • Pattern: An arrangement of descriptors (or features) used to represent an object or entity.
  • Descriptor: An attribute or characteristic of an object, often called a “feature” in pattern recognition literature.
  • Pattern Class: A family of patterns sharing common properties. Each pattern class is denoted by viv_i, where ii represents the class number.

In machine-based pattern recognition, the goal is to assign patterns to their respective classes with minimal human intervention. This involves various techniques and mathematical tools.

2. Common Pattern Arrangements

Three common forms of patterns are vectors, strings, and trees, each suited to different types of descriptions:

a. Pattern Vectors (Quantitative Descriptions)

  • Represented by bold lowercase letters (e.g., x, y, z), pattern vectors are used for quantitative features.

  • A pattern vector is a column matrix:

    x=[x1x2xn]\mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}

    where each xix_i is a descriptor, and nn is the total number of descriptors. This vector is also expressed as:

    x=(x1,x2,,xn)T\mathbf{x} = (x_1, x_2, \ldots, x_n)^T

    where TT denotes transposition.

Example: Iris Classification (Fisher, 1936)
  • Each iris flower is described by two measurements: petal length and width, forming a 2D pattern vector:

    x=[x1x2]\mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}

    Here, x1x_1 is petal length, and x2x_2 is petal width. The pattern classes, v1v_1, v2v_2, and v3v_3, represent the varieties “setosa,” “virginica,” and “versicolor,” respectively.

  • By plotting these vectors in a 2D Euclidean space, each flower is represented as a point, allowing us to distinguish classes based on the separation of points.

b. Pattern Vectors in Higher Dimensions

In some cases, descriptors are sampled at specific intervals, creating high-dimensional vectors. For example:

x1=r(u1),x2=r(u2),,xn=r(un)x_1 = r(u_1), x_2 = r(u_2), \ldots, x_n = r(u_n)

Here, each uiu_i represents a sampled point, and r(ui)r(u_i) is the amplitude at that point. These vectors form points in an nn-dimensional Euclidean space, where patterns are visualized as “clouds” in the space.

c. Structural Descriptions (Strings and Trees)

For certain applications, structural descriptions are more effective than quantitative measures.

i. String Descriptions

Patterns with repeated structures can be represented as strings. For example, a staircase pattern may consist of alternating primitive elements, such as “A” and “B”. A staircase can then be represented by the string:

ABABAA – B – A – B – A \ldots

The order of these elements captures the pattern structure.

ii. Tree Descriptions

Tree structures represent hierarchical relationships, as seen in spatial and image analysis. For example, in a satellite image of a city, we could use the following hierarchy:

  • Root node: Entire image
  • Next level: Distinguishes “downtown” and “residential areas”
  • Further levels: Separate “housing,” “highways,” and “shopping malls” within residential areas

This hierarchical representation allows for detailed structural analysis.

3. Pattern Recognition Techniques

Selecting descriptors impacts the success of pattern recognition. Examples include:

  • Quantitative Techniques: These use measurements to create pattern vectors, as seen in iris classification.
  • Structural Techniques: These analyze spatial relationships, like fingerprint minutiae (e.g., ridge endings and branch points), which are used in fingerprint recognition.

Example Problem: Recognition of Shapes with Noise

Suppose we want to classify noisy shapes, each represented by a 1D signature. By sampling the signature at intervals, we obtain a pattern vector in nn-dimensional space. Alternatively, we could compute statistical features, such as moments, for each signature and use these in our pattern vector.

Recognition Based on Decision-Theoretic Methods

Recognition based on decision-theoretic methods is a fundamental approach in pattern recognition that leverages statistical probability and optimization techniques to classify patterns. These methods rely on decision functions that use feature vectors, distances, and probability distributions to assign patterns to classes, with a primary goal of minimizing misclassification errors.

Here, we expand on the mathematical concepts behind these methods, including the Minimum Distance Classifier, the Bayes Classifier, and Correlation-Based Matching, along with examples that illustrate these principles in action.

1. Decision-Theoretic Framework

In decision-theoretic pattern recognition, we use decision functions di(x)d_i(x) for each class viv_i, where x=(x1,x2,,xn)Tx = (x_1, x_2, \ldots, x_n)^T is an nn-dimensional feature vector. A pattern is assigned to the class with the highest value of di(x)d_i(x), that is:

Assign x to class vi if di(x)>dj(x) for all ji\text{Assign } x \text{ to class } v_i \text{ if } d_i(x) > d_j(x) \text{ for all } j \neq i

The set of points xx where di(x)=dj(x)d_i(x) = d_j(x) forms the decision boundary separating class viv_i from vjv_j. Decision boundaries play a critical role in defining the regions of the feature space assigned to each class.

2. Minimum Distance Classifier

The Minimum Distance Classifier is a straightforward classifier that assigns a pattern vector xx to the class whose prototype vector (often the mean vector) is closest to xx in terms of Euclidean distance. This method assumes that each class viv_i can be represented by a prototype or mean vector mim_i defined as:

mi=1Nixvixm_i = \frac{1}{N_i} \sum_{x \in v_i} x

where NiN_i is the number of training samples in class viv_i.

Euclidean Distance Calculation

For a new pattern xx, we compute the Euclidean distance Dj(x)D_j(x) from xx to each class prototype mjm_j as:

Dj(x)=xmj=k=1n(xkmj,k)2D_j(x) = \| x – m_j \| = \sqrt{\sum_{k=1}^{n} (x_k – m_{j,k})^2}

where mj,km_{j,k} denotes the kk-th component of the mean vector mjm_j.

Classification Rule

The pattern xx is assigned to the class with the smallest Dj(x)D_j(x):

Assign x to class vi if Di(x)<Dj(x) for all ji\text{Assign } x \text{ to class } v_i \text{ if } D_i(x) < D_j(x) \text{ for all } j \neq i

Example: Suppose we classify two types of iris flowers based on petal length and width. The mean vectors for each class are calculated, and the classifier computes the Euclidean distance between a new flower’s feature vector and each class mean. The class with the smallest distance is chosen as the classification for the new flower.

3. Bayes Classifier

The Bayes Classifier is an optimal statistical classifier that minimizes the average probability of misclassification by assigning patterns based on posterior probabilities. The Bayes classifier uses Bayes’ theorem to compute the posterior probability P(vix)P(v_i | x), the probability of xx belonging to class viv_i given the observed data.

Bayes Decision Rule

A pattern xx is assigned to the class viv_i if:

P(vix)>P(vjx)for all jiP(v_i | x) > P(v_j | x) \quad \text{for all } j \neq i

Using Bayes’ theorem, the posterior probability P(vix)P(v_i | x) can be rewritten as:

P(vix)=P(xvi)P(vi)P(x)P(v_i | x) = \frac{P(x | v_i) P(v_i)}{P(x)}

where:

  • P(xvi)P(x | v_i): Likelihood of observing xx given that it belongs to class viv_i.
  • P(vi)P(v_i): Prior probability of class viv_i.
  • P(x)P(x): Marginal probability of observing xx across all classes, acting as a normalizing constant.

The decision rule simplifies to:

Assign x to class vi if P(xvi)P(vi)>P(xvj)P(vj) for all ji\text{Assign } x \text{ to class } v_i \text{ if } P(x | v_i) P(v_i) > P(x | v_j) P(v_j) \text{ for all } j \neq i

Zero-One Loss Function

A common choice in Bayes classification is the 0-1 loss function, where the loss is 0 for correct classifications and 1 for incorrect ones. The Bayes classifier then minimizes the probability of misclassification by assigning xx to the class with the highest posterior probability.

Gaussian Bayes Classifier

If the probability densities P(xvi)P(x | v_i) follow a multivariate Gaussian distribution, the likelihood of xx given class viv_i with mean vector mim_i and covariance matrix CiC_i is:

P(xvi)=1(2π)n/2Ci1/2exp(12(xmi)TCi1(xmi))P(x | v_i) = \frac{1}{(2\pi)^{n/2} |C_i|^{1/2}} \exp \left( -\frac{1}{2} (x – m_i)^T C_i^{-1} (x – m_i) \right)

where Ci|C_i| is the determinant of CiC_i.

The Bayes decision function for each class then becomes:

di(x)=12(xmi)TCi1(xmi)12lnCi+lnP(vi)d_i(x) = -\frac{1}{2} (x – m_i)^T C_i^{-1} (x – m_i) – \frac{1}{2} \ln |C_i| + \ln P(v_i)

The classifier assigns xx to the class with the highest di(x)d_i(x).

Example: In remote sensing, multispectral data from different spectral bands can be used to classify land cover types. Each pixel is represented as a vector, and the Bayes classifier assigns each pixel to a land cover class based on estimated Gaussian densities for each class.

4. Correlation-Based Matching

Correlation-based matching is used when patterns are matched with predefined templates, especially in applications involving images or shapes. This method compares patterns based on similarity rather than probability, using a correlation coefficient as a measure of similarity.

Normalized Correlation Coefficient

The normalized correlation coefficient between a pattern xx and a template yy is given by:

r(x,y)=i=1n(xixˉ)(yiyˉ)i=1n(xixˉ)2i=1n(yiyˉ)2r(x, y) = \frac{\sum_{i=1}^{n} (x_i – \bar{x})(y_i – \bar{y})}{\sqrt{\sum_{i=1}^{n} (x_i – \bar{x})^2} \sqrt{\sum_{i=1}^{n} (y_i – \bar{y})^2}}

where xˉ\bar{x} and yˉ\bar{y} are the mean values of xx and yy, respectively. This coefficient r(x,y)r(x, y) ranges from -1 to 1:

  • r(x,y)=1r(x, y) = 1 indicates a perfect positive correlation.
  • r(x,y)=1r(x, y) = -1 indicates a perfect negative correlation.
  • r(x,y)=0r(x, y) = 0 indicates no correlation.

Template Matching Rule

Assign xx to the class represented by template yy if r(x,y)r(x, y) is maximized for that template among all others:

Assign x to class vi if r(x,yi)>r(x,yj) for all ji\text{Assign } x \text{ to class } v_i \text{ if } r(x, y_i) > r(x, y_j) \text{ for all } j \neq i

Example: In satellite image processing, a correlation-based matcher can identify a specific feature within an image (such as the eye of a hurricane) by finding the region with the highest correlation with a template of the feature.

Practical Applications of Decision-Theoretic Methods

Example 1: Character Recognition Using Minimum Distance Classifier In automated reading of bank checks, characters are represented by feature vectors based on pixel densities or boundary points. The minimum distance classifier compares a new character’s vector with prototype vectors of each character, assigning it to the class with the smallest distance.

Example 2: Land Cover Classification Using Bayes Classifier In remote sensing, each pixel of a multispectral image is classified as water, urban, or vegetation using the Bayes classifier. This classifier uses prior knowledge of land cover probabilities and multispectral data from sample regions to compute mean vectors and covariance matrices for each class.

Decision-theoretic methods provide a robust framework for pattern recognition, using probabilistic and mathematical criteria to minimize misclassification risks. The choice of a method depends on the nature of the data, the degree of overlap between classes, and the computational requirements of the application.

Structural Methods

In pattern recognition, structural methods leverage the structural relationships of a pattern’s shape rather than relying solely on numerical representations. Unlike quantitative approaches, structural techniques consider the inherent structure of patterns, making them particularly effective for boundary shape recognition. A commonly used method within structural pattern recognition involves string representations of shapes, which enables the classification of complex shapes through symbolic descriptions.

In this discussion, we cover two structural methods for boundary shape recognition based on string representations: Matching Shape Numbers and String Matching.

Matching Shape Numbers

In many cases, shapes can be represented by numbers based on their boundaries, known as shape numbers. This method is similar to the minimum distance concept introduced for pattern vectors, but here, it applies specifically to the comparison of boundaries using shape numbers. The idea is to compare shapes based on how similar their boundary representations are.

  1. Shape Representation with Shape Numbers: A shape number provides a numerical description of a closed boundary, often encoded using a 4-directional chain code. Each segment of the boundary is assigned a numerical value depending on its direction relative to the previous segment. For example, a rightward movement might be coded as 0, an upward movement as 1, a leftward movement as 2, and a downward movement as 3.

  2. Defining Similarity (Degree of Similarity): The degree of similarity between two shapes aa and bb can be determined by comparing their shape numbers at successive orders, as long as they coincide. If sj(a)s_j(a) and sj(b)s_j(b) denote the shape numbers of shapes aa and bb at the jj-th order, they are considered similar up to order kk if:

    sj(a)=sj(b)forj=4,6,8,,ks_j(a) = s_j(b) \quad \text{for} \quad j = 4, 6, 8, \ldots, k

    In this case, the degree of similarity is the largest value of kk for which the shape numbers coincide.

  3. Defining Distance (Dissimilarity Measure): The distance D(a,b)D(a, b) between two shapes aa and bb is the inverse of their degree of similarity:

    D(a,b)=1kD(a, b) = \frac{1}{k}

    This distance measure satisfies several properties:

    • Non-negativity: D(a,b)0D(a, b) \geq 0
    • Identity of Indiscernibles: D(a,b)=0D(a, b) = 0 if and only if a=ba = b
    • Symmetry: D(a,b)=D(b,a)D(a, b) = D(b, a)

    Example: Suppose we have two shapes, each represented by a 4-directional chain code sequence. Let’s say their shape numbers coincide up to the 6th order. In this case, the degree of similarity k=6k = 6, and thus, the distance D(a,b)=16D(a, b) = \frac{1}{6}. If another pair of shapes only matches up to the 4th order, their distance will be D(a,b)=14D(a, b) = \frac{1}{4}, indicating that they are less similar than the previous pair.

    A larger degree of similarity (higher kk) implies that the shapes are more alike, whereas a larger distance DD suggests that they are less similar.

String Matching

String matching is another approach used for comparing shapes based on boundary representations encoded as strings. This method allows for matching and alignment of boundaries represented as strings, where each boundary point is translated into a symbol, creating a sequential symbolic representation of the boundary.

  1. String Representation: Suppose we have two region boundaries, aa and bb, represented by strings SaS_a and SbS_b. A boundary is encoded into a string by assigning each boundary segment a symbol (such as a chain code), resulting in strings that can be compared symbol by symbol.

  2. Counting Matches and Mismatches: Let MM be the number of matching symbols between SaS_a and SbS_b, where a match occurs if the symbols at a given position in both strings are the same. The number of mismatches can be defined as:

    b=max(Sa,Sb)Mb = \max(|S_a|, |S_b|) – M

    where Sa|S_a| and Sb|S_b| denote the lengths of the strings SaS_a and SbS_b. If M=max(Sa,Sb)M = \max(|S_a|, |S_b|), then b=0b = 0, which implies that the two strings are identical.

  3. Similarity Measure (Ratio of Matches): A similarity measure between the two strings can be defined as:

    R=Mmax(Sa,Sb)R = \frac{M}{\max(|S_a|, |S_b|)}

    where RR approaches 1 for a perfect match and 0 if there are no matching symbols. This measure is maximized by aligning the strings optimally to maximize MM.

    Example: Suppose we compare two strings, Sa=0123S_a = “0123” and Sb=0121S_b = “0121”, with the same length (4). Comparing the symbols at each position, we find that three positions match. Therefore, M=3M = 3 and:

    R=34=0.75R = \frac{3}{4} = 0.75

    indicating a high degree of similarity between the two shapes.

  4. Normalization and Optimal Alignment: To achieve the best comparison, it may be necessary to shift one of the strings circularly (wrapping around the end) and calculate RR for each shift. The highest value of RR across all shifts indicates the best alignment and the closest match between the two strings.

References

  1. Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern Classification. John Wiley & Sons.
  2. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
  3. Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.

Leave a Comment