As MTV images flash on the screen, we instantly recognize a face and a car after just a brief glimpse. We need more time, however, to scan a scene when searching for our car in a parking lot or a friend's face in a crowd. To understand fully how we recognize objects in visually cluttered scenes, neuroscientists must first know how the brain filters the initial input. Yet little is known about how the brain represents briefly glimpsed scenes of visual clutter, even those with just two or three objects.
Now, researchers in MIT's McGovern Institute for Brain Research have found strong evidence for a new understanding of how the brain processes several objects at once, organizing the visual information in a systematic, predictable manner for higher levels of the brain to interpret. The study appears in the September issue of the Journal of Neuroscience.
"Our brains recognize objects so easily we never think about it; however, it is a very complex problem," explains James DiCarlo, head of the research team and an assistant professor in MIT's Department of Brain and Cognitive Sciences (BCS). "We must solve this problem before we can create artificial visual machines. We can teach a robot to recognize [isolated] keys, but it has difficulty recognizing keys next to a wallet."
To dissect the problem, DiCarlo's group studied brain activity during controlled clutter situations. First authors Davide Zoccolan, a postdoctoral associate, and David Cox, a graduate student in DiCarlo's lab, showed monkeys quick glimpses of scenes with single objects, or with two or three objects. The exposures were too brief for the monkeys to scan the scene or direct attention towards any one object. Using microelectrodes, the team measured the responses of 104 different neurons in the monkeys' inferotemporal (IT) visual cortex, a brain region important for visual recognition.
Scientists know that each IT neuron responds more intensely to some objects than others, suggesting that these neurons play a role in object discrimination. For example, a neuron may fire intensely at the sight of a key, but barely at all at a wallet. How does that neuron respond to the sight of both a key and a wallet in one scene? The limited existing data suggested that the presentation of the wallet degraded the responses of IT neurons responding to the key, as if noise randomly blurs the brain's representation of the key.
However, the McGovern team's study reveals how the brain may cleverly avoid being confused by simple clutter. They found that additional objects in the scene do change an IT neuron's response. But this change obeys a remarkably systematic rule: The neuron's response to two objects together is the average of its responses to each item separately -- halfway between the responses elicited by a key and wallet individually. The same rule holds true when three objects are presented.
This finding suggests that the simultaneous presentation of multiple objects does not randomly blur the brain's representation of an individual object, but rather changes the representation in a predictable manner. This underlying rule may be one way the visual system processes input that higher brain levels can easily interpret, instantly recognizing multiple objects after even just a brief glimpse. "Although other possible rules would also allow this," DiCarlo says, "the averaging rule may be one that strikes a balance between preserving the information that both objects are present, while also keeping the responses of neurons to reasonable levels."
"This process surely has limits," he continues. "As scenes become more cluttered, the brain must begin to rely on additional processes, such as attention and eye movements. But the brain already can do a lot of remarkably powerful work before such processes kick in."
This work was supported by the NIH, the Pew Charitable Trusts, an International Human Frontier Science Program postdoctoral fellowship, and a National Defense science and engineering graduate fellowship.