Mathematical Sciences Department
Worcester Polytechnic Institute
Worcester, MA 01609
January 15, 1998
Politicians and advertisers are often accused of manipulating images. Many mathematicians are up to their elbows in image manipulation as well, but their work is of a very different sort. Rather than trying to put the right spin on an election issue or to increase sales, these scientists provide the concepts, tools, and algorithms of such modern marvels as electronic picture postcards beamed from Mars, critical courtroom evidence teased from an amateur's video tape, and non-invasive medical diagnoses using images of functioning organs.
Mathematics is a core discipline in many areas of imaging:
No image is perfect. Perhaps the photographer moved the camera slightly the moment the shutter opened. (Astronauts have trouble with this one. It's hard to hold still in zero gravity!) Or turbulence in the atmosphere between a ground-based telescope and a distant star blurred the image recorded by the telescope's photodetectors. Because of poor lighting and mediocre recording techniques, more mundane devices like bank surveillance cameras are cursed from the start with gritty, low-resolution images. In all of these settings, image restoration techniques can help tease out a clearer picture by separating the image from whatever noise, clutter, or blur has crept in.
Images that require restoration are stored digitally. Information such as color, hue, and intensity at each pixel (picture element---the dots on a computer monitor, for example) are represented by numeric values. The effect of noise and blurring is to corrupt those values. Pixel by pixel, postprocessing works to restore a recorded image by returning the values of color, hue, and so on at each pixel to some best estimate of the true value.
The mathematical challenge of image restoration is solving an inverse problem, finding the clear image that best explains the distorted image at hand, given some assumptions about the intervening processes of noise and blurring. Solving the inverse problem is difficult because many different original images can appear to be nearly the same after they are distorted.
Such problems are ill-posed because blurred images that are only very slightly different from one another could have come from radically different original images. The corresponding calculations are ill-conditioned because, like finding the coordinates of the point of intersection of two lines that are nearly parallel, the calculated answer---the best guess of the original image---is very sensitive to round-off errors in the calculation and to errors in the data that describe the problem. Even worse, corruption by random noise may have produced a distorted image that corresponds to no actual original image; that corrupted image is analogous to noisy data points that theoretically should lie on a line but don't. (Click here for an illustration of this problem.) Image restoration must recover a plausible clear, original image without ever knowing the right answer, the "true" original image.
The classic setting for solving the image restoration inverse problem is regularized least squares. The received image is modeled as the sum of noise and a so-called blurring operator acting on the true image. The action of the blurring operator and the added noise provide a model that connects the unknown clear image to the available data describing the distorted image. The clear image is chosen to be the one that minimizes the sum of the squares of the difference between its blurred version and the distorted image at hand. Just as with a least-squares fit of a line to a cloud of experimental data, this least squares process requires solving a (very large!) set of ill-conditioned linear equations.
The structure of the coefficient matrix of that set of equations is often closely related to the discrete Fourier transform. The Fourier transform captures information about spatial frequencies in the image, and the coefficients in this matrix circulate row by row among the frequencies sampled by the discrete transform. (Consequently, such matrices are called circulant.) Exploiting both the mathematical structure of these matrices and their connections with Fourier transforms is crucial to designing algorithms that solve image restoration problems quickly and robustly.
To help counter the excess sensitivity arising from the ill conditioning inherent in image restoration, a technique known as regularization imposes smoothness requirements on the restored image. Loosely, such requirements discourage quick changes in contrast and other image characteristics between adjoining portions of the image. Choosing the degree of smoothing requires a mixture of mathematical precision and computational experience: rigorous mathematical arguments can often prove that regularized least squares will converge to a solution (a restored image), but computational experience is needed to adjust the regularization parameters to produce the most useful images from the least computational effort. Many of the ideas of regularization were introduced in the middle of this century by the Russian mathematician A. N. Tikhonov.
Image restoration is a formidable computation because of the size of the blurring operator---its dimension is the number of pixels in the image (90,000 in just one square inch of laser printer output!)---and because of its inherent ill conditioning. Nonetheless, these types of tools are regularly applied to improve pictures obtained from medical apparatus and to enhance images transmitted from satellites, though not without a price. Regularization renders the restoration problem practically solvable by neutralizing some of the ill conditioning, but it also removes sharp edges and similar distinguishing features. Once distinctive facial characteristics, for example, can become unrecognizable.
Unlike regularized least squares image restoration, one of the newest techniques for image restoration can preserve distinctive lines and edges. It trades solving a system of linear equations that minimize squared error summed over all pixels for a much harder computational problem, solving a nonlinear partial differential equation that minimizes another metric of image quality, its total variation. The total variation is the integral over the image of the magnitude of the gradient of intensity or of some other characteristic of the image, and total variation (TV) is the name of this alternative to linear image restoration.
Roughly, TV permits image intensity to have sharp jumps, but it limits spurious oscillations. Regularized least squares, on the other hand, tends to smooth out sharp jumps because it controls the second derivative of image intensity, its "spatial acceleration." Consequently, TV methods are best suited to "blocky" images. Leonid Rudin ofCognitech, Inc., and Stanley Osher of UCLA, formerly of Cognitech, came to this approach through methods used for tracking shock fronts in gas dynamics calculations, a setting that also seeks to preserve sharp boundaries without introducing extraneous detail.
TV methods restore the image by solving a partial differential equation that is equivalent to the minimum total variation condition. The connection between minimum total variation and the differential equation is analogous to the connection between minimum energy and the differential equation describing the motion of an object; the mathematical ideas come from the calculus of variations.
The computational challenge is solving the nonlinear partial differential equation, but shifting perspective to the differential equation point of view opens up a new set of tools to image processors, techniques of numerical linear algebra for solving systems of equations that approximate partial differential equations. The latest techniques in TV restoration use sophisticated iterative methods to solve the matrix equations that approximate the partial differential equation.
The power of TV techniques has been displayed in such public settings as the Los Angeles trial of those charged with beating truck driver Reginald Denny. Rudin used a TV-based algorithm to enhance a photograph taken from a helicopter of one of Denny's attackers. The identifying feature in question was a tattoo, and preservation of edges in that part of the image was critical. (See Image Processing Makes Its Mark in Court in the December 1993 SIAM News.)
TV restoration has also been applied to other inverse problems, such as electrical impedance tomography, where edges and jumps are important. (Tomography is described below.)
The images formed by astronomical telescopes are cursed by two effects, twinkle and shake. Twinkle comes from distortion in the atmosphere, and shake from mechanical and thermal distortion of the telescope structure and its mirror. The techniques of mathematical control theory can reduce the effects of both even before the image is formed, thereby givingimage restoration algorithms a substantial head start.
The ideal earth-based astronomical telescope is built on bedrock, high on a remote mountain. The solid foundation stabilizes the telescope against wind and other potential causes of vibration, while the altitude and isolation minimize atmospheric degradation. TheHubble space telescope carries this logic to its natural extreme, but even the Hubble's accuracy is limited by the effects of thermal stresses and other forces that shift the phase of the incoming light ever so slightly.
In a vacuum, light from a distant star would arrive at the telescope mirror as a single planar wavefront. The only limit on resolution would be diffraction by the telescope aperture. In fact, tiny local variations in the index of refraction of the atmosphere induce small phase errors that make the incoming plane wave look more like a sheet of crumpled paper. The mirror then adds phase errors of its own; even a theoretically perfect mirror will be distorted by thermal stresses, not to mention the effects of small vibrations in the telescope structure.
Active and adaptive optics attempt to compensate for these phase errors by using as a reference the phase error in the wavefront from a guide star, either a bright natural star very near the target image or a "star" created by directing a laser into the atmosphere. Active optics delicately nudges the primary mirror with hydraulic actuators to correct the very low frequency errors induced by thermal and gravitational distortion of the telescope's primary mirror. Since the primary mirror is too big to respond fast enough, the control algorithms of adaptive optics adjust a separate set of mirrors to correct the higher frequency errors caused by atmospheric irregularities and telescope vibration. In both cases, the techniques of control theory are used to adjust the mirrors to minimize the distortion in the image of the guide star.
These tactics produce impressive results. One simulation shows that the resolution of a telescope can improve from being barely able to spot an object the size of a house trailer in earth's orbit to detecting a hand waving from the trailer's window! But this real-time control requires extraordinarily high-speed computation---up to 10 billion floating point operations per second are needed to implement adaptive optics for an instrument like the 3.5-m telescope at the Starfire Optical Range of the U.S. Air Force Philips Laboratory in New Mexico.
To battle the demons of shake and twinkle, many modern astronomical telescopes are now built with deformable mirrors that can be adjusted dynamically. Real-time control of the separate actuators of such a mirror can neutralize distinctly different sources of error, such as wind shake and time-varying atmospheric distortion. Each source has its own characteristic temporal frequencies; those of wind shake, for example, are typically much higher than those of atmospheric turbulence.
The trick to adjusting the mirror actuators on the fly is to choose a basis set of mirror deformations, known as mirror control modes, which best control each disturbance at a bandwidth matched to its characteristic frequency. Correcting all disturbances at a common bandwidth will either allow high frequency errors to sneak through if this bandwidth is set too low, or add unnecessary noise to the correction of low frequency errors if the bandwidth is too high.
Brent Ellerbroek of Philips Laboratory, Charles Van Loan and Nikos Pistianis of Cornell University, andRobert Plemmons of Wake Forest University have developed a multiple bandwidth modal control strategy that can minimize mean squared phase error in the image across multiple sources of error. Their technique advances previous approaches by enabling the simultaneous optimization of both the mirror control modes and the associated control bandwidths without choosing in advance the basis set of control modes. That is, the control system itself makes its own mathematical decisions about the best trade-offs among errors to sample and ways to counteract them.
Mathematically, the optimal control modes could be determined by simultaneously diagonalizing all of the matrices that characterize the optical performance of the system, but that approach is computationally expensive. Instead, Ellerbroek et al. use a maximization process that works with various pairs of optical performance matrices. Their technique can show substantial improvement over single bandwidth control, and it lends itself to parallel implementation and near real-time computing.
The compressed images that fly over today's World Wide Web benefit from image compression technology developed several decades ago for satellites exploring our planetary neighbors. The image compression tools for tomorrow's World Wide Web are likely to be based entirely on mathematical ideas perfected even more recently. Besides facilitating more rapid transmission of images, these new methods of image compression enable storage, retrieval, and sorting of such vast archives as the FBI'sfinger print records. Mars
JPEG (for Joint Photographic Experts Group) is one of the standard image compression techniques currently in use. It can compress full-color or gray-scale images of natural scenes by about 20:1, an improvement anyone browsing the Web can appreciate. The scientists monitoring transmissions from the Mars Pathfinder can appreciate it even better, for they will be receiving data at a mere 40 bites per second (about 700 times slower than a typical modem!) when the spacecraft reaches Mars.
The essence of a compression scheme like JPEG is an algorithm that removes the least useful data from a digital representation of the image. That removal results in some loss of information, but if "least useful" is defined appropriately, then such losses are tolerable. For example, since the human eye is more sensitive to changes in brightness than to changes in color, JPEG preserves brightness information on a more detailed scale than color information. If the compressed image is to be analyzed in some other way, say by an automaticimage analysis algorithm, a different compromise may be needed, but the basic approach is unchanged.
The first step in a compression process like JPEG is determining the frequency with which some characteristic of the image, say brightness, varies across small blocks of the image. (The standard JPEG block is 8 pixels by 8 pixels.) A discrete Fourier (cosine) transform of the brightness data for that block will identify the magnitude of the spatial variation occurring at each of the frequencies sampled; the lowest frequency---zero---is just the average over the entire block. Except for the very high frequencies that weren't sampled by the Fourier transform, the original brightness data can be recovered with reasonable accuracy from the frequency information. These computations are speeded along by the aptly named fast Fourier transform, a famous and much refined algorithm described in a 1965 paper by James W. Cooley and John W. Tukey.
Rather than store the exact Fourier amplitude of each spatial frequency, the amplitudes are measured against some quantization scale, say, 0, ..., 64, rounding as needed. Since the higher frequencies typically have small amplitudes, they are often badly approximated in this step. Information has now been lost in two ways, first by ignoring the higher spatial frequencies and then by approximating the amplitudes of the lower frequencies that were retained. But these losses typically have little impact on human perceptions of the image.Huffman coding, a step that further shrinks the volume of data but without loss of information. Huffman codes, which were introduced by D. A. Huffman in 1952, represent amplitude values by means of a binary tree, a kind of family tree in which each parent has only two children. A single ancestor appears at the top of the tree; the bottom, the set of all descendants in the last generation, contains every possible brightness amplitude value on the quantized scale.
Rather than attempt to store the binary number that represents a particular brightness amplitude, a Huffman code stores the directions for stepping through the tree starting at the top to reach the proper value in the bottom line of descendants. ("Go left at the first generation, right at the second, ...") Since there are fewer generations than descendants (e.g., a single parent plus 6 generations for 64 descendants) and the left-right directions can be stored using 0's and 1's, the Huffman code is much shorter (logarithmically shorter) than the binary representation of the brightness amplitudes. Indeed, the Huffman code is optimal in the sense of being the shortest possible if the probability of occurrence of each of the symbols is a power of 1/2.
The JPEG compressed image consists of the Huffman codes for the Fourier amplitudes of the brightness and other information for each of the blocks. Recovering the image requires translating the Huffman codes, then recovering the approximate values of brightness, etc. for each 8 by 8 block of pixels from their approximate Fourier amplitudes.
A separate issue, one that can't be discussed here, is correcting for errors introduced during the transmission of the compressed image from deep space. Mathematical ideas like the famousReed-Solomon error correcting codes save the day.
Wavelet transforms are often a better tool for capturing the spatial variations of an image than Fourier transforms. Consequently, wavelets may well displace Fourier transforms in the next generation of image compression algorithms, in addition to theirmany other applications. They are already the mathematical power behind the FBI's fingerprint data base. The Fourier-based JPEG algorithm fails to compress fingerprints successfully because it introduces "tiling" artifacts where its blocks connect and because it loses such important fine scale, legally admissible detail as sweat pores.
Thedifference between Fourier and wavelet transforms is most pronounced when an image contains strong high frequency components, as in the sharp transition between the black and white ridges in a fingerprint. The Fourier transform samples the frequencies in integer multiples (harmonics) of the base frequency. When an image like a fingerprint contains important high frequency information, preserving an accurate Fourier transform requires keeping a large number of frequency amplitude values in order to retain the critical high frequencies.
Wavelet transforms, on the other hand, more easily accommodate a wide range of frequencies and spatial scales because both frequency and spatial extent can be controlled, unlike the infinite spatial extent of the cosines and sines of the Fourier transform. For example, a wavelet transform can be localized spatially near the boundary between black and white in a fingerprint tocapture the important high frequency changes there while retaining much less information in the uniform white spaces between the dark ridges.
The FBI's national standard for digitized fingerprint records was developed by Tom Hopper of the FBI's Criminal Justice Information Services Division and Jonathan Bradley and Chris Brislawn of the Computer Research and Applications Group at Los Alamos National Laboratory. Their work met daunting compression demands: the FBI has about 200 million fingerprint records. To record the digitized, uncompressed version of just one record would require about seven 3.5" high density floppy disks! Their approach achieves an average compression of 15:1, in effect squeezing two records onto one disk.
In essence, this successful compression is accomplished by replacing the Fourier transform step of image compression with a suitablewavelet transform. The frequency used by the wavelet transform can be scaled in octaves (powers of two), not just the integer multiples of the Fourier transform's frequency harmonics, so it better retains high frequency data. Since its spatial action can be localized by translation, a luxury unavailable in the Fourier transform, it avoids tiling artifacts and it puts its power to use where it is needed to preserve detail. A final step of Huffman coding further compresses the wavelet transform version of the fingerprint image.
Click here for a wavelet image compression construction kit.
Ronald Coifman of Yale University employs several types of wavelets in what he calls a "mathematical orchestration" to compress images with ratios on the order of 100:1. The restored image is formed from three superimposed, distinct pieces, as in a musical orchestration. One part is the largest one-third of one percent of the wavelet expansion coefficients of the original image (300:1 compression!). This approximation has the diffuse character of a watercolor. The residual, the difference between the original image and the watercolor reconstruction, is itself compressed using a kind of wavelet called a brushlet to produce the second component of the image orchestration. That rendering of the residual is reminiscent of Van Gogh's heavy brush strokes. The third part of the orchestration is a compression of the residual remaining after the first two compression steps. It has the pointillist look---small dots and tiny brushstrokes---of one of George Seraut's paintings. The three parts are combined to produce the restored image.
Besides powerful image compression, this mathematical orchestration idea also aids in filtering noise from MRI and other images. These tools are illustrated on the 1998 Mathematics Awareness Week poster.
Microsoft's Encarta encyclopaedia contains 7,000 color images on one CD-ROM. All of these images are based onfractal image compression, a tool especially well suited for on-line images that can be compressed at leisure but must be accessed rapidly. This commercial application is based on work of Michael Barnsley of Iterated Systems, Inc. and his colleagues.
Rather than Fourier or wavelet transforms, this technique uses iterated maps, a tool for generating fractals, to approximate an image. These mappings are constructed so that they are contractive, that is, so that they spiral in on a certain image, known as an attractor, when they are applied repeatedly. A family of such mappings is known as aniterated function system; in the plane an iterated function system could be built up by mappings of the form w(x, y) = (ax + by + c, dx + ey + f), for example. By an appropriate choice of parameters (a, b, ..., f in the example), a member of that family can be constructed so that its attractor---the result of repeatedly applying it to an arbitrary set of values---is a good approximation to part of an image, say the gray-scale values at each of the pixels in an 8 by 8 array from a black and white photograph. A set of such mappings can recover the entire photograph block by block because each mapping is a kind of specialized recipe whose repeated application can recreate its portion of the desired image from a featureless initial image.
The compressed form of an 8 by 8 block of pixels is just the set of parameter values that defines the member of the iterated function system that (approximately) recovers that block. Such data is analogous to the Fourier or wavelet coefficients obtained from those image compression techniques. For a gray scale image with 512 distinct intensity values at each pixel, for example, the iterated function system parameters for an 8 by 8 block require 26 bits of data. Since the original image requires 128 bits of data, the fractal compression ratio is about 5:1. Restricting the possible inputs to the iterated map, effectively giving it a head start, can increase the compression ratio to 8:1. Even better ratios are possible, depending on the image. As with Fourier and wavelet compression, the final step is a further lossless data compression step such asHuffman coding.
Tomography is the mathematical tool that permits constructing a view of a cross section of a patient's body, for example, from X-ray or magnetic resonance or electrical impedance views through the body. Mathematically, the problem of classical X-ray tomography is one of recovering the values of a function from its integral along straight lines. Medically, tomography provides a range of tools for imaging the human body and vastly extending a physician's diagnostic powers.
The tomography problem is a bit like recovering a ball player's game-by-game batting average from the batting average for the entire season. (Think of a batting average as a sum---a kind of integral over the season---of the fraction of hits in each game. )
From the season average alone, a strong start and a weak finish can't be distinguished from a uniformly mediocre season. But if additional information were available---say, batting averages that weighted different parts of the season more heavily than others---a large set of equations could be constructed whose solution would be the player's average from each game.
In medical applications, the "batting averages" can be X-ray intensity values measured for rays passing through a given cross section at different angles around the body. Each intensity value is an integral of the body's density along the straight-line path of the X-ray.
The mathematical (and computational) problem is recovering the values of the density function at discrete points of the body's cross section. Those density values determine the light and dark regions in the reconstructed picture of that section of the body.
Like image restoration, the tomography problem is ill-conditioned in the sense that small changes in the data can produce radically different solutions; two batting averages that differed in only the third digit could be the result of radically different game-by-game performances. This phenomena aggravates the computational problem because round off error in the computations can tend to shift the density profile toward an incorrect picture.
As in image restoration, the mathematical problem is also poorly posed; noise and other imperfections in a given set of X-ray values may leave no reasonable body profile which accounts exactly for the data. (Click here foran illustration of this problem.) The mathematical setting provides an interpretation for these difficulties in terms of the range and domain of certain operators, and it offers some nearly equivalent, nearby problems whose answers are reasonable body profiles.
Mathematically similar ideas also support other imaging techniques. One of the most promising of those still in the research stage iselectrical impedance tomography (EIT), in which potentially damaging X-rays are replaced by measurements of electrical impedance (complex resistance). One of the medical attractions of this approach, which is being developed by Margaret Cheney and her colleagues at RPI, is its enhanced ability to differentiate tumors from healthy tissue. For example, cancerous breast tissue has about four times the conductivity (inverse resistance) of healthy tissue, even though it is very difficulty to identify reliably in conventional X-ray mammography because healthy and cancerous tissue are nearly equally opaque to X-rays.
EIT data can be collected by surrounding a patient's upper body, say, with electrodes that can measure the electric field generated when a mild current is applied. The problem is to deduce from electric measurements on the surface the two-dimensional pattern of the tissue conductivity within the cross-section of the chest surrounded by the electrodes. This inverse problem is much more difficult mathematically and computationally than X-ray tomography because the data at one point is influenced by interactions all across the region, not just the behavior along the straight-line path of a high energy X-ray.
The interactions between the electric field and the conductivity of the tissue within the chest are described by a partial differential equation, a relationship that describes the local balance of current flow in terms of rates of change (derivatives) of tissue conductivity and electric potential across the chest. If the conductivity of the chest tissue were known, the boundary data, the potential at the electrodes on the skin, would provide enough additional information to permit constructing a solution of the differential equation, a function that describes the potential throughout the chest.
The difficult inverse problem is determining both the potential and the conductivity (for it is the conductivity values that signal tumors) from the partial differential equation and the boundary data. Ultimately, a map of conductivity values provides the diagnostician with an image of a section of the body analogous to an X-ray image but offering greatly improved contrast between tumors and healthy tissue.
Like many other problems in imaging, the standard approach to the inverse problem of EIT is to estimate in some clever way both the potential and the conductivity functions that minimize the error (often the least squares error) between the observed boundary data and that predicted by the estimated potential and conductivity. In practice, the differential equation is replaced by an approximating system of algebraic equations and the unknown functions by arrays of values. The dual challenges are finding accurate, stable computational algorithms and establishing precise mathematical estimates of the relations between the estimated and the actual conductivity values, particularly in the face of random noise.
Much of the work of imaging finds its way to computer screens, and many other kinds of data are represented as visual images there as well. Representing and manipulating those images presents another set of mathematical challenges. For exampe, many three-dimensional objects---including some of the characters and objects in the computer-animated feature film Toy Story---are built up from geometric primitives that must be manipulated in a coordinated way to represent motion or deformation. But the illumination of an object has as much influence on its perception by human viewers as its shape, adding another burden to the task of representing three-dimensional shapes on a two-dimensional screen.
Peter Schröder and his colleagues at the California Institute of Technology are exploiting the multiresolution powers of wavelets for illumination computations, for geometric molding, and for other applications in computer graphics. What Schröder calls "the twin challenges of computer graphics" are the complexity of the images and the need for rapid updates during interactive use. Wavelets are powerful because they carry information economically and because they operate naturally over a broad range of spatial scales, storing detail only when it's required and zooming in for a close-up when needed.
A completely faithful rendering of the illumination of a scene like the interior of a room requires tracking the multiple reflections of light from distributed sources like lamps and windows from all the surfaces in the room---walls, floors, mirrors, furniture, and so on. The radiosity level on any surface---illumination energy per unit area---is determined by an integral equation that incorporates both the emitted light and that reflected from all of the surfaces. (The integral is essentially a sum of the repeated reflections from all the surfaces in the scene.) A natural approximation to the continuum of surfaces is using finite elements, replacing every complex surface with a collection of small facets, each having constant radiosity, say. Then the problem of determining the level of illumination on each surface becomes one of solving a large system of linear equations for the radiosity level on each facet.
Unfortunately, this approximation has two drawbacks, size and density. A reasonable scene can easily require 10,000 facets, leading to a system of equations with a 10,000 by 10,000 coefficient matrix. Furthermore, most of those coefficients are nonzero, because the light from one facet interacts with that from almost every other. The difficulties here are unlike many other imaging problems, where coefficient matrices are either sparse (most entries don't require explicit storage because they are zero) or they have a special structure such as circulant that facilitates the use of fast Fourier transforms in the computations.
But representing the illumination levels using wavelets instead of the simple-minded constant scheme leads to a coefficient matrix with many small, if not zero, entries. In essence, the multiscale properties of wavelets work to accentuate the natural attenuation of light intensity caused by successive reflections. Discarding the smaller coefficients greatly reduces the number of nonzero terms in the coefficient matrix, thereby enabling the use of economical sparse matrix solution algorithms to find the approximate radiosity of each facet.
The upshot is an adequate, rapidly computed rendition of the illumination of the image, an excellent compromise since a fully accurate rendering could take hours or even days to compute. In a discussion of variouswavelet applications in computer graphics, Schröder suggests that exploiting this attenuation property to simplify the radiosity coefficient matrix "is not unlike many image compression algorithms, which similarly begin with an image, wavelet transform it, and then threshold without creating too much distortion after reconstruction."
Although many images in medicine and other settings are generated using sophisticated combinations of mathematics, computation, and engineering, they are often analyzed manually. But why should an experienced cardiologist waste valuable time outlining blocked blood vessels on a digital subtraction angiogram if that process could be automated without drawing false boundaries or being misled by noise in the image? Mathematical and computational ideas known aslevel set methods and fast marching methods are beginning to automate such edge and pattern detection tasks, ones that depend on finding accurate boundaries within images.
The key idea, developed byRavi Malladi and James Sethian of the University of California-Berkeley, is to imagine a propagating interface that starts as a small seed and grows inside a shape, like a ripple spreading toward the edge of a pond. Slowing the speed of the evolving front as it nears a boundary in the image will capture the unknown shape that surrounds the initial seed. For example, Malladi and Sethian make the motion of the spreading ripple dependent on the rate of change of image intensity if the goal is the boundary of a light object against a dark background. Using this approach, they have developed software that extracts beating hearts from MRI images, measures cortical brain structures, and locates tumors. Central to these accomplishments are level set and fast marching methods.
Level set methods, introduced by Sethian and Stanley Osher, follow a moving interface by representing its two-dimensional figure in three-dimensional space. The ripple becomes a level curve, like a contour line of constant altitude on a map of a mountain. If the higher-dimensional surface, the mountain, has the right shape, then the evolution of the ripple will correspond to slicing the mountain at successively lower altitudes. Conveniently, the same mathematical ideas apply to representing interfaces in three dimensions; they are level sets of four-dimensional surfaces.
This higher-dimensional perspective couples the description of the level curve to its altitude on the particular surface that captures its evolution. If the velocity of the moving front is known at every point, then a deceptively simple nonlinear partial differential equation relates the speed of the moving front, a rate of change with respect to time, to its curvature (the reciprocal of the radius of a tangent circle), a quantity which depends on rates of change in space.
Fast marching methods, introduced by Sethian, go one step further and construct the arrival time of the propagating ripple immediately. If the speed of the front depends only on such local properties as the image gradient, then the time at which the front passes any given point can be constructed as a single stationary surface; the height of that surface at a point is the time at which the moving front passes it. In this setting, the rates of spatial and temporal change are connected by a relation known as the nonlinear Eikonal equation. Borrowing schemes from discrete mathematics on networks and using efficient sorting algorithms make the fast marching method extraordinarily rapid.
Malladi and Sethian couple level set and fast marching methods to provide efficient, accurate tools for detecting and extracting images in a variety of applications, including medical imaging, industrial part location, and optical character recognition.
Many other substantial mathematical and computational issues have also been addressed using these ideas. One is the general behavior of curves that evolve with velocity proportional to curvature. For example, could a collapsing simple (does not intersect itself) closed curve fold over on itself instead of transforming smoothly into a shrinking circle, then a point? The rigorous mathematical answer is no; a java applet illustrates the impact of the theorem. Another issue is the computational difficulty arising at the corner that forms when two evolving ripples intersect. The curvature term in the governing partial differential equation is singular at this point because a corner is a circle with zero radius. A trick from computational fluid dynamics known as artificial viscosity has the effect of smoothing the perfect corner and removing the singularity so that the computation can continue.
Level set and fast marching methods are fruitful tools for applications ranging from medicine to microelectronics, and they are raising a host of significant mathematical questions about the evolution of surfaces and the properties of solutions of partial differential equations.
Mathematical ideas as old as curvature and as new as wavelets form the foundation of imaging. Coupled with high speed computation and ingenious engineering, mathematics is permitting us to see further into space and more clearly into our own bodies. That combination of mathematics and technology is also illuminating our view of the world by putting vast libraries of images at our fingertips and by enabling us to visualize overwhelming masses of data. Mathematics has surely expanded our vision of ourselves and of our world.