AccessMyLibrary provides FREE access to over 30 million articles from top publications available through your library.
Create a link to this page
Copy and paste this link tag into your Web page or blog:
1. Introduction
Infants are born into a rich and complex environment from which they construct mental representations to model structure that they find in the world. These representations enable infants to understand and predict their surroundings and ultimately to achieve their goals. They accomplish this using a combination of evolved innate structures and powerful learning algorithms.
To explore issues of early language learning, we have developed CELL, a computational model which acquires words from multimodal sensory input. CELL stands for Cross-channel Early Lexical Learning. Set in an information theoretic framework, the model acquires a lexicon by finding and statistically modeling consistent intermodal structure. The model was implemented using current methods of computer vision and speech processing. By using these methods, the system is able to process natural speech and images directly without reliance on manual annotation or transcription. Although the model is limited in its ability to deal with complex scenes and noisy acoustic signals, it nonetheless demonstrates the potential of using these techniques for the purpose of modeling cognitive processes involved in language acquisition.
CELL learns by finding and modeling consistent structure across channels of sensor data. The model relies on a set of innate mechanisms which specify how speech and visual input are represented and compared, and probabilistic learning mechanisms for integrating information across modalities. These innate mechanisms are motivated by empirical findings in the infant development literature. CELL has been implemented for the task of learning shape names from a database of infant-directed speech recordings which were paired with images of objects. (1)
2. Problems of early lexical acquisition
CELL addresses three inter-related questions of early lexical acquisition. First, how do infants discover speech segments which correspond to the words of their language? Second, how do they learn perceptually grounded semantic categories? And tying these questions together: How do infants learn to associate linguistic units with appropriate semantic categories?
Discovering spoken units of a language is difficult since most utterances contain multiple connected words. There are no equivalents of the spaces between printed words when we speak naturally; there are no pauses or other cues which separate the continuous flow of words. Imagine hearing a foreign language for the first time. Without knowing any of the words of the language, imagine trying to determine the location of word boundaries in an utterance, or for that matter, even the number of words. Infants first attempting to segment spoken input face a similarly difficult challenge. This problem is often referred to as the speech segmentation or word discovery problem. Our goal was to understand and model the identification and extraction of semantically salient words from fluent contexts.