AccessMyLibrary provides FREE access to millions of articles from top publications available through your library.
The ease with which people perceive and enjoy music provides cognitive science with significant challenges. Among the most important of these is the perception of time and temporal regularity in auditory sequences. Listeners tend to perceive musical sequences as highly regular; people without any musical training snap their fingers or clap their hands to the temporal structure they perceive in music with seemingly little effort. In particular, listeners hear sounded musical events in terms of durational categories corresponding to the eighth-notes, quarter-notes, half-notes, and so forth, of musical notation. This effortless ability to perceive temporal regularity in musical sequences is remarkable because the actual event durations in music performances deviate significantly from the regularity of duration categories (Clarke, 1989; Gabrielsson, 1987; Palmer, 1989; Repp, 1990). In addition, listeners perceive these temporal fluctuations or deviations from duration categories as systematically related to performers' musical intentions (Clarke, 1985; Palmer, 1996a; Sloboda, 1983; Todd, 1985). For example, listeners tend to perceive duration-lengthening near structural boundaries as indicative of phrase endings (while still hearing regularity). Thus, on the one hand, listeners perceive durations categorically in spite of temporal fluctuations, while on the other hand listeners perceive those fluctuations as related to the musical intentions of performers (Sloboda, 1985; Palmer, 1996a). Music performance provides an excellent example of the temporal fluctuations with which listeners must cope in the perception of music and other complex auditory sequences.
The perceptual constancy that listeners experience in the presence of physical change is not unique to music. Listeners recognize speech, for example, amidst tremendous variability across speakers. Early views of speaker normalization treated extralinguistic (nonstructural) variance as noise, to be filtered out in speech recognition. More recently, talker-specific characteristics of speech such as gender, dialect, and speaking rate, are viewed as helpful for the identification of linguistic categories (cf. Nygaard, Sommers, & Pisoni, 1994; Pisoni, 1997). We take a similar view here, that stimulus variability in music performances may help listeners identify rhythmic categories. Patterns of temporal variability in music performance have been shown to be systematic and intentional (Bengtsson & Gabrielsson, 1983; Palmer, 1989), and are likely to be perceptually informative.
We describe an approach to rhythm perception that addresses both the perceptual categorization of continuously changing temporal events and perceptual sensitivity to those temporal fluctuations in music performance. Our approach assumes that people perceive a rhythm--a complex, temporally patterned sequence of durations--in relation to the activity of a small system of internal oscillations that reflects the rhythm's temporal structure. Internal self-sustained oscillations are the perceptual correlates of beats; multiple internal oscillations that operate at different periods (but with specific phase and period relations) correspond to the hierarchical levels of temporal structure perceived in music. The relationship between this system of internal oscillations and the external rhythm of an auditory sequence governs both listeners' categorization of temporal intervals, and their response to temporal fluctuations as deviations from categorical expectations.
This article describes a computational model of the listeners' perceptual response: a dynamical system that tracks temporal structures amidst the expressive variations of music performance, and interprets deviations from its temporal expectations as musically expressive. We test the model in two experiments by examining its response to performances in which the same pianists performed the same piece of music with different interpretations (Palmer, 1996a; Palmer & van de Sande, 1995). We consider two types of expressive timing common to music performance that correlate with performers' musical intentions: lengthening of events that mark phrase structure boundaries, and temporal spread or asynchrony among chord tones (tones that are notated as simultaneous) that mark the melody (primary musical voice). Two aspects of the model of rhythm perception are assessed. First, we evaluate the model's ability to track different temporal periodicities within music performances. This tests its capacity for following temporal regularity in the face of significant temporal fluctuation. Second, we compare the model's ability to detect temporal irregularities against the structural intentions of performers. This gauges its sensitivity to musically expressive temporal gestures that are known to be informative for listeners. Additionally, we observe that some types of small but systematic temporal irregularities (chord asynchronies) can improve tracking in the presence of much larger temporal fluctuations (rubato). Comparisons of the model's beat-tracking of systematic temporal fluctuations and of random fluctuations in simulated performances indicate that performed deviations from precise temporal regularity are not noise; rather, temporal fluctuations are informative for listeners in a variety of ways. In the next section, we review music-theoretic descriptions of temporal structures in music, and in the following section, we describe the temporal fluctuations that occur in music performance.
1.1. Rhythm, metrical structure, and music notation
Generally speaking, rhythm is the whole feeling of movement in time, including pulse, phrasing, harmony, and meter (Apel, 1972; Lerdahl & Jackendoff, 1983). More commonly, however, rhythm refers to the temporal patterning of event durations in an auditory sequence. Beats are perceived pulses that mark equally spaced (subjectively isochronous) points in time, either in the form of sounded events or hypothetical (unsounded) time points. Beat perception is established by the presence of musical events; however, once a sense of beat has been established, it may continue in the mind of the listener even if the event train temporarily comes into conflict with the pulse series, or after the event train ceases (Cooper & Meyer, 1960). This point is an important motivator for our theoretical approach; once established, beat perception must be able to continue in the presence of stimulus conflict or in the absence of stimulus input. Music theories describe metrical structure as an alternation of strong and weak beats over time. One theory conceptualizes metrical structure as a grid of beats at various time scales (Lerdahl & Jackendoff, 1983), as shown in Fig. 1; these are similar to metrical grids proposed in phonological theories of speech (Liberman & Prince, 1977). According to this notational convention, horizontal rows of dots represent levels of beats, and the relative spacing and alignment among the dots at adjacent levels captures the relationship between the hypothetical periods and phases of the beat levels. Metrical accents are indicated in the grid by the number of coinciding dots. Points at which many beats coincide are called strong beats; points at which few beats coincide are called weak beats. Although these metrical grids are idealized (music performances contain more complex period and phase relationships among beat levels than those captured by metrical grids), the music-theoretic invariants reflected in these grids inform our model of the perception of temporal regularity in music.
[FIGURE 1 OMITTED]
Western conventions of music notation provide a categorical approximation to the timing of a music performance. Music notation specifies event durations categorically; durations of individual events are notated as integer multiples or subdivisions of the most prominent or salient metrical level. Events are grouped into measures that convey specific temporal patterns of accentuation (i.e. the meter). For example, the musical piece notated in Fig. 1 with a time signature of 3/8 uses an eighth-note as its basic durational element, and the durational equivalent of three eighth-notes defines a metrical unit of one measure, in which the first position in the measure is a strong beat and the others are weaker. Although notated durations refer to event onset-to-offset intervals, listeners tend to perceive musical events in terms of onset-to-onset intervals (or inter-onset intervals, IOIs), due to the increased salience of onsets relative to offsets. Hereafter we refer to musical event durations in terms of IOIs.
In this article we focus on the role of meter in the perception of rhythm. Listeners' perception of duration categories in an auditory sequence is influenced by the underlying meter; the same auditory sequence can be interpreted to have a different rhythmic pattern when presented in different metrical contexts (Clarke, 1987; Palmer & Krumhansl, 1990). To model meter perception, we assume that a small set of internal oscillations operates at periods that are roughly approximate to those of each hierarchical metrical level shown in Fig. 1. When driven by musical rhythms, such oscillations phase-lock to the external musical events. Previous work has shown this framework to provide both flexibility in tracking temporally fluctuating rhythms (Large & Kolen, 1994; Large, 1996) and a concurrent ability to discriminate temporal deviations (Large & Jones, 1999). In the current study, we extend this framework to a more natural and complex case that provides a robust test of the model: multivoiced music performances that contain large temporal fluctuations. Most important, the model proposed here predicts that temporal fluctuations can aid the perception of auditory events, as we show in two experiments. The next section describes what information is available in the temporal fluctuations of music performance.
1.2. Temporal fluctuations in music performance
The complex timing of music performance often reflects a musician's attempt to convey an interpretation of musical structure to listeners. The structural flexibility typical of Western tonal music allows performers to interpret musical pieces in different ways. Performers highlight interpretations of musical structure through the use of expressive variations in frequency, timing, intensity, and timbre (cf. Clarke, 1988; Nakamura, 1987; Palmer, 1997; Repp, 1992; Sloboda, 1983). For example, different performers can interpret the same musical piece with different phrase structures (Palmer, 1989, 1992); each performance reflects slowing down or pausing at events that are intended as phrase endings, similar to phrase-final lengthening in speech. Furthermore, listeners are influenced by these temporal fluctuations; the presence of phrase-final lengthening in different performances of the same music influenced listeners' judgments of phrase structure, indicating that the characteristic temporal fluctuations are information-bearing (Palmer, 1988). Thus, a common view is that temporal fluctuations in music performance serve to express structural relationships such as phrase structure (Clarke, 1982; Gabrielsson, 1974) and these large temporal fluctuations provide a challenging test for the model of beat perception described here.
Temporal fluctuations in music performance may also mark the relative importance of different musical parts or voices. Musical instruments such as the piano provide few timbral cues to differentiate among simultaneously co-occurring voices, and the problem of determining which tones or features belong to the same voice or part over time is difficult; this problem is often referred to as stream segregation (cf. Bregman, 1990). Most of Western tonal music contains multiple voices that co-occur, and performers are usually given some freedom to interpret the relative importance of voices. Performers often provide cues such as temporal or intensity fluctuations that emphasize the melody, or most important part (Randel, 1986). Early recordings of piano performance documented a tendency of pianists to play chordal tones (tones notated as simultaneous) with asynchronies up to 70 ms across chord-tone onsets (Henderson, 1936; Vernon, 1936). Palmer (1996a) compared pianists' notated interpretations of melody (most important voice) with expressive timing patterns of their performances. Events interpreted as melody were louder and preceded other events in chords by 20-50 ms (termed melody leads). Although the relative importance of intensity and temporal cues in melody perception is unknown (see also Repp, 1996), the temporal cues alone subsequently affected listeners' perception of melodic intentions in some performances (Palmer, 1996a). Thus, temporal fluctuations in melody provide a subtle test for the model we describe here.
Which cues in music performances mark metrical structure? Although a variety of cues indicate some relationship with meter, there is no one single cue that marks meter. Melody leads tend to coincide with meter; pianists placed larger asynchronies (melody preceding other note events) on strong metrical beats than on weak beats, in both well-learned and unpracticed performances (Palmer, 1989; 1996a). Performers also mark the meter with variations in event intensity or duration (Shaffer, Clarke & N. Todd, 1985; Sloboda, 1983). Which cues mark meter the most can change with musical context. Drake and Palmer (1993) examined cues for metrical, melodic, and rhythmic grouping structures, in piano performances of simple melodies and complex multivoiced music. Metrical accents and rhythmic groups (groups of short and long durations) were marked by intensity, with strong metrical beats and long notated durations performed louder than other events. However, the performance cues that coincided with important metrical locations changed across different musical contexts. These findings suggest that performance cues alone may not explain listeners' perception of metrical regularity across many contexts. We test a model of listeners' expectancies for metrical regularity that may aid perception of meter in the absence of consistent cues.
1.3. Perceptual cues to musical meter
Which types of stimulus information do listeners use to perceive the temporal regularities of meter? Several studies suggest that listeners are sensitive to multiple temporal periodicities in complex auditory sequences (Jones & Yee, 1997; Palmer & Krumhansl, 1990; Povel, 1981). The statistical regularities of Western tonal music may provide some cues to temporal periodicities. For a given metrical level to be instantiated in a musical sequence, it is necessary that a sufficient number of successive beats be sounded to establish that periodicity. Statistical analyses of musical compositions indicate that composers vary the frequency of events across metrical levels (Palmer & Krumhansl, 1990; Palmer, 1996b), which provides sufficient information to differentiate among meters (Brown, 1992). Although this approach is limited by its reliance on a priori knowledge about the contents of an entire musical sequence, it supports our assumption that musical sequences contain perceptual cues to multiple temporal periodicities, which are perceived simultaneously during rhythm perception.
One problem faced by models …