For the first time, MIT scientists have applied a computer model of how the brain processes visual information to a complex, real world task: recognizing the objects in a busy street scene. The researchers were pleasantly surprised at the power of this new approach.
"People have been talking about computers imitating the brain for a long time," said Tomaso Poggio, the Eugene McDermott Professor of Brain and Cognitive Sciences and a member of the McGovern Institute for Brain Research at MIT. "That was Alan Turing's original motivation in the 1940s. But in the last 50 years, computer science and AI (artificial intelligence) have developed independently of neuroscience."
"Our work is biologically inspired computer science," said Poggio, who is also co-director of the Center for Biological and Computational Learning.
"We developed a model of the visual system that was meant to be useful for neuroscientists in designing and interpreting experiments, but that also could be used for computer science," said Thomas Serre, a postdoctoral associate in Poggio's lab and lead author of a paper on the work in the March 2007 IEEE Transactions on Pattern Analysis and Machine Intelligence.
"We chose street scene recognition as an example because it has a restricted set of object categories, and it has practical social applications," said Serre.
Near-term applications include population surveillance and assistance for automobile drivers; eventually, applications could include visual search engines, biomedical imaging analysis and robots with realistic vision. On the neuroscience end, this research is essential for designing augmented sensory prostheses, such as ones that could replicate the computations carried by damaged nerves from the retina.
"And once you have a good model of how the human brain works," Serre explained, "you can break it to mimic a brain disorder." One brain disorder that involves distortions in visual perception is schizophrenia, but nobody understands the neurobiological basis for those distortions.
"The versatility of the biological model turns computer vision from a trick into something really useful," said co-author Stanley Bileschi, a postdoctoral associate in the Poggio lab.
Recognizing scenes
The IEEE paper describes how the team "showed" the model randomly selected images so that it could "learn" to identify commonly occurring features in real-word objects such as trees and people. In so-called supervised training sessions, the model used those features to label by category examples of objects found in digital photographs of street scenes, such as buildings and cars. The photographs derive from a street scene database compiled by Bileschi.
Compared to traditional computer-vision systems, the biological model was surprisingly versatile. Traditional systems are engineered for specific object classes. For instance, systems engineered to detect faces or recognize textures are poor at detecting cars. In the biological model, the same algorithm can learn to detect widely different types of objects.
To test the model, the team presented full street scenes consisting of previously unseen examples from the street scene database. The model scanned the scene and, based on its supervised training, recognized the objects in the scene. The upshot is that the model was able to learn from examples: This, according to Poggio, is a hallmark of artificial intelligence.
Modeling object recognition
Teaching a computer how to recognize objects has been exceedingly difficult because a computer model has two paradoxical goals. It needs to create a representation for a particular object that is very specific, such as a horse as opposed to a cow or a unicorn. At the same time the representation must be sufficiently "invariant" so as to discard meaningless changes in pose, illumination and other variations in appearances.
Even a child's brain handles these contradictory tasks easily in rapid object recognition. Pixel-like information enters from the retina and passes through the hierarchical architecture of the visual cortex. What makes the Poggio lab's model so innovative and powerful is that, computationally speaking, it mimics the brain's own hierarchy. Specifically, the "layers" within the model replicate the way neurons process input and output stimuli according to neural recordings in physiological labs.
Making it more useful
The model used in the street scene application mimics only the computations the brain uses for rapid object recognition. The lab is now elaborating the model to include the brain's feedback loops from the cognitive centers. This slower form of object recognition provides time for context and reflection, such as: If I see a car, it must be on the road and not in the sky.
Giving the model the ability to recognize such semantic features will empower it for broader applications, including managing seemingly insurmountable amounts of data, work tasks or even e-mail. The team is also working on a model for recognizing motions and actions, such as walking or talking, which could be used to filter videos for anomalous behaviors-or for smarter movie editing.
Additional co-authors are Maximilian Riesenbuber, now at the Georgetown University Medical Center, and Lior Wolf, now at Tel Aviv University.
The street scene database is freely available at cbcl.mit.edu/. This research was partially funded by the Defense Advanced Research Projects Agency (DARPA), the Office of Naval Research, the National Science Foundation and the National Institutes of Health.
A version of this article appeared in MIT Tech Talk on February 28, 2007 (download PDF).