55:148 Digital Image Processing
55:247 Image Analysis and Understanding
Chapter 8, Part III
Image understanding: Point distribution
models
Point distribution models
- The Point Distribution Model (PDM) is a powerful shape description
technique that may subsequently be used in locating new instances of such
shapes in other images.
- It is most useful for describing features that have well understood
`general' shape, but which cannot be easily described by a rigid model.
- The PDM is a relatively recent development that has seen enormous
application in a short time.
- The PDM approach assumes the existence of a set of M examples (a training
set) from which to derive a statistical description of the shape and its
variation.
- In our context, we take this to mean some number of instances of the shape
represented by a boundary (a sequence of pixel co-ordinates).
- In addition, some number N of landmark points is selected on each
boundary; these points are chosen to correspond to a feature of the underlying
object.
- It is intuitively clear that if the hands so represented were in `about
the same place', so would the N landmark points be.
- Variations in the positions of these points would then be attributable to
natural variation between individuals.
- We may expect, though, that these differences would be `small' measured on
the scale of the overall shape.
- The PDM approach allows us to model these `small' differences (and,
indeed, to identify which are truly small, and which are more significant).
- Aligning the training data
- It is necessary first to align all the training shapes in an approximate
sense.
- This is done by selecting for each example a suitable translation, scaling
and rotation to ensure that they all correspond as closely as possible -
informally, the transformations are chosen to reduce (in a least squares
sense) the difference between an aligned shape and a `mean' shape derived from
the whole set.
- Specifically, suppose we wish to align just two shapes - each of these is
described by a vector of N co-ordinate pairs;
- A transform is composed of translation, rotation, and scaling represented
by matrix R applied to x2
- The best transform can be found by minimizing
- This minimization is a routine application of a least squares approach -
partial derivatives of E are calculated with respect to the unknowns (theta,
s, tx and ty), and set to zero, leaving simultaneous
linear equations to solve.
- This general idea is used to co-align all M shapes using the following
algorithm;
- Step 3 of this algorithm is necessary since otherwise it is
ill-conditioned (underconstrained); without doing this, convergence would not
occur. Final convergence may be tested by examining the differences involved
in realigning the shapes to the mean.
- This approach assumes that each of the landmark points is of equal
significance, but that may not be the case. If for some reason one of them
moves around the shape less than others, it has a desirable stability that we
might wish to exploit during the alignment. This can be done by introducing a
(diagonal) weight matrix W into equation 8.10
- The elements of W indicate the relative `stability' of each of the
landmarks in which a high number indicates high stability (so counts for more
in the error computation), and a low number the opposite.
- There are various ways of measuring this; one is to compute for each shape
the distance between landmarks k and l, and to let Vkl be the
variance in these distances.
- A high variance would indicate high mobility, and so setting the weight
for the k-th point to
-
- would have the desired weighting effect.
- Knowledge of this mean allows explicit measurement of the variation and
co-variation exhibited by each landmark and landmark pair; we can write
- Doing this for each training vector, we can calculate the 2N x 2N
covariance matrix
- This matrix has some particularly useful properties.
- If we imagine the aligned training set plotted in 2N dimensions, it will
exhibit variation more in some directions than others (these directions will
not, of course, in general align with the co-ordinate axes) - these variations
are important properties of the shape we are describing.
- What these directions are, and their (relative) importance, may be derived
from an eigen-decomposition of S - that is, solving the equation
- Solutions to equation (8.12) provide the eigenvectors pi
and eigenvalues lambdai of S; conventionally, we assume
\lambdai >= \lambdai+1.
- It can be shown that the eigenvectors associated with larger eigenvalues
correspond to the directions of larger variation in the underlying data - they
provide the modes of variation.
- Thus solving the equation and finding the highest eigenvalues tells us
where the variation in the model is most likely to occur.
- It is well known that a set of eigenvectors provides a basis, meaning that
we can represent any vector x as a linear combination of the 2N different
pi. If we write
-
- then for any vector x, a vector b exists such that
- where the components of b indicate how much variation is exhibited with
respect to each of the eigenvectors.
- Using the observation that the eigenvectors of lower index describe most
of the changes in the training set, we may expect that the contributions from
p2N, p2N+1, ... will play a small role, thus
- will be good for sufficiently high t
- This permits a dimensional compression of the representation - if there is
a lot of structure in the data, t will be low (relative to 2N) and good shape
description will be possible very compactly by representing the shape as
bt rather than x.
- One approach to this is to calculate lambdatotal, the sum of
the lambdai, and choose t such that
- The choice of alpha here will govern how much of the variation seen in the
training set can be recaptured by the compacted model.
- Further, it can be shown that the variance of bi over the
training set will be the associated lambdai; accordingly, for `well
behaved' shapes we might expect
-
- - that is, most of the population is within 3 sigma of the mean.
- This allows us to generate, from knowledge of P and lambdai,
plausible shapes that are not part of the training set.
Example - Metacarpal analysis
- We can illustrate this theory with an example taken from automatic hand
X-ray analysis.
- The finger bones (metacarpals) have characteristic long, thin shape with
bulges near the ends - precise shape differs from individual to individual,
and as an individual ages.
- Scrutiny of bone shape is of great value in diagnosing bone aging
disorders and is widely used by pediatricians.
- From a collection of X-rays, 40 landmarks (so vectors are 80 dimensional)
were picked out by hand on a number (approximately 50) of segmented
metacarpals.
- Next figure illustrates (after alignment) the mean shape, together with
the actual positions of the landmark points from the entire data set.
- The covariance matrix and its eigenvectors associated with the
variation are extracted; the relative contribution of the most influential
components is illustrated in Table~\ref{tab.PDM}.
- From this we see that more than 95% of the shape variation is captured by
the first eight modes of variation.
- Next figure illustrates the effect of varying the first mode of the mean
shape by up to 2.5 sqrt{lambda_1}.
- This mode, which accounts for more than 60% of the variation seen in the
data, captures the (asymmetric) thickening and thining of bones (relative to
their length) which is an obvious characteristic of maturity.
- In this example, it is clear that 2.5 is an unlikely factor for
sqrt{lambda_1} since the resulting shapes are too extreme - thus we may expect
b_1 to be smaller in magnitude for this application.
- Next figure similarly illustrates extremes of the third mode.
- The shape change here is somewhat subtler; part of what is captured is a
bending (in banana fashion) of the bone.
- Both extremes have a plausible `bone-like' look about them.
Fitting models to data
- A strength of this approach is that it permits plausible shapes to be
fitted to new data.
- Given an image in which we wish to locate an instance of a modelled shape
(specifically, given an edge map of the image, so having information about
where boundaries are most likely to lie), we require to know
- the mean shape x
- the transformation matrix P_t
- the particular shape parameter vector b_t
- the particular pose (translation, rotation, scale)
- The mean shape and the transformation matrix are known from the model
construction
- The identification of b_t and the pose is an optimization problem
- locate the parameters that best fit the data at hand, subject to certain
constraints.
- These constraints would include the known limits on reasonable values
for the components of b_t, and might also include domain knowledge about
plausible positions for the object to constrain the pose.
- In the metacarpal example, this would include knowledge that a bone lies
within the hand silhouette, is aligned with the finger and is of a known
approximate size.
- This approach may be used successfully with a number of well known
optimization algorithms.
- It is likely, however, that convergence would be slow.
- An alternative, quicker approach is to use the PDM as the basis of an
Active Shape Model (ASM).
- Here, we iterate toward the best fit by examining an approximate fit,
locating improved positions for the landmark points, then recalculating pose
and parameters.
Fitting an active shape model (ASM)
- Step 2 assumes that a suitable target can be found, which may not always
be true.
- If there is none, the landmark can be left where it is, and the model
constraints will eventually pull it into a reasonable position.
- There is also the option of locating targets by more sophisticated means
than simple intensity gradient measurements.
- The algorithm is illustrated in the next figure.
- Note that the model locates the correct position despite the proximity of
strong boundaries that could distract it - this does not occur because the
shape of the boundary is tightly bound in.
Extensions
- In a short time, the literature on PDMs and ASMs has become very extensive
- the technique lends itself to a very wide range of problems, but has some
drawbacks.
- The placing of the landmark points for construction of the training
set is clearly very labor intensive, and in some application error-prone.
- Automatic placing of these points has been addressed.
- Efficiency of the approach has also been enhanced by the common idea of a
multi-resolution attack.
- Using a coarse-to-fine strategy can produce benefits in both quality of
final fit, and reduction of computational load
- As presented, the approach is strictly linear in the sense that control
points may only move along a straight line (albeit with respect to directions
of maximum variation);
- non-linear effects are produced by combining contributions from
different modes;
- aside from being imperfect, this results in a representation that is not
as compact as it might be if the non-linear aspects were explicitly modeled.
- This problem has been addressed in two ways;
- introduction of the Polynomial Regression PDM which assumes dependence
between the modes, with minor modes being polynomial combinations of major
ones, and
- extension of the linear model by permitting polar relationships between
modes, thereby efficiently capturing the ability of (parts of) objects to
rotate around one another.
Last Modified: April 1, 1997