Many people preserve their babies' priceless first smiles, words or steps on video, but Associate Professor Deb Roy, head of the MIT Media Lab's Cognitive Machines research group, is taking parental attentiveness to a whole new level.
��
Roy is recording nearly all of his new son's waking hours in an ambitious attempt to use these data to unravel the mystery of how humans naturally acquire language within the context of their primary social setting. He will pay particular attention to the role of physical and social context in how his son, 9 months old, learns early words and early grammatical constructions.
Roy's vast recording and analysis effort, known as "The Human Speechome Project" (speech + home), will yield some 400,000 hours of audio and video data over three years. Roy will present a paper on the Speechome Project at the 28th Annual Cognitive Science Conference in July.
"Just as the Human Genome Project illuminates the innate genetic code that shapes us, the Speechome project is an important first step toward creating a map of how the environment shapes human development and learning," said Frank Moss, director of the Media Lab.
To conduct the Human Speechome Project, Roy has installed 11 overhead, omni-directional fisheye video cameras and 14 ceiling-mounted microphones that can record all activity in his home. A 5-terabyte disk cache in the basement stores data temporarily until it is physically carted back to the Media Lab for analysis.
Roy and his wife have already gathered more than 300 gigabytes per day of compressed data by recording an average of 12-14 hours a day. To retain control over privacy, every room is equipped with a video-off, audio-off and an "oops" button to erase previously recorded data.
Once at the Media Lab, the data is stored in a massive petabyte (1 million gigabyte) disk storage system donated by several companies: Bell Microproducts, Seagate Technology, Marvell and Zetera. To test hypotheses of how children learn, Roy's team will develop machine learning systems that "step into the shoes" of his son by processing the sights and sounds of three years of life at home. The effort constitutes one of the most extensive scientific analyses of long-term infant learning patterns ever undertaken.
"It is not enough to simply capture and store all these data using conventional means," Roy noted. "Instead we need to keep all the information online so that we can do rapid exploration of patterns hidden within the data."
Given the voluminous outpouring of data, new visualization techniques have emerged from the project in an effort to expose basic movement patterns within the home (e.g., a person moving from room to room), as well as more complex behaviors (e.g., changing a diaper or putting away dishes). In addition, a variety of speech and video processing algorithms are under development to start making sense of behavioral and communication patterns embedded in the data.��
In supporting Roy's project, Moss noted the potential applications that technology developed for Speechome might have.
"Equally exciting are the 'spinoff' opportunities that could result from this research. The innovative tools that are being developed for storing and mining thousands of terabytes of speech and video data offer enormous potential for breaking open new business opportunities for a broad range of industries -- from security to Internet commerce," Moss said.
The project has received seed funding from the National Science Foundation (NSF). Christopher Kello, director of the Perception, Action and Cognition program of the NSF, said, "This project will constitute an unprecedented record of the interactions and environmental cues that contribute to the acquisition of language, as well as other social, cognitive and motor skills."��
Any downside? "After we had installed all the equipment for data collection and in-house storage, our home electricity bill quadrupled," Roy said.
A version of this article appeared in MIT Tech Talk on May 17, 2006 (download PDF).