AccessMyLibrary provides FREE access to over 30 million articles from top publications available through your library.
Create a link to this page
Copy and paste this link tag into your Web page or blog:
1. Introduction
The ease with which people perceive and enjoy music provides cognitive science with significant challenges. Among the most important of these is the perception of time and temporal regularity in auditory sequences. Listeners tend to perceive musical sequences as highly regular; people without any musical training snap their fingers or clap their hands to the temporal structure they perceive in music with seemingly little effort. In particular, listeners hear sounded musical events in terms of durational categories corresponding to the eighth-notes, quarter-notes, half-notes, and so forth, of musical notation. This effortless ability to perceive temporal regularity in musical sequences is remarkable because the actual event durations in music performances deviate significantly from the regularity of duration categories (Clarke, 1989; Gabrielsson, 1987; Palmer, 1989; Repp, 1990). In addition, listeners perceive these temporal fluctuations or deviations from duration categories as systematically related to performers' musical intentions (Clarke, 1985; Palmer, 1996a; Sloboda, 1983; Todd, 1985). For example, listeners tend to perceive duration-lengthening near structural boundaries as indicative of phrase endings (while still hearing regularity). Thus, on the one hand, listeners perceive durations categorically in spite of temporal fluctuations, while on the other hand listeners perceive those fluctuations as related to the musical intentions of performers (Sloboda, 1985; Palmer, 1996a). Music performance provides an excellent example of the temporal fluctuations with which listeners must cope in the perception of music and other complex auditory sequences.
The perceptual constancy that listeners experience in the presence of physical change is not unique to music. Listeners recognize speech, for example, amidst tremendous variability across speakers. Early views of speaker normalization treated extralinguistic (nonstructural) variance as noise, to be filtered out in speech recognition. More recently, talker-specific characteristics of speech such as gender, dialect, and speaking rate, are viewed as helpful for the identification of linguistic categories (cf. Nygaard, Sommers, & Pisoni, 1994; Pisoni, 1997). We take a similar view here, that stimulus variability in music performances may help listeners identify rhythmic categories. Patterns of temporal variability in music performance have been shown to be systematic and intentional (Bengtsson & Gabrielsson, 1983; Palmer, 1989), and are likely to be perceptually informative.
We describe an approach to rhythm perception that addresses both the perceptual categorization of continuously changing temporal events and perceptual sensitivity to those temporal fluctuations in music performance. Our approach assumes that people perceive a rhythm--a complex, temporally patterned sequence of durations--in relation to the activity of a small system of internal oscillations that reflects the rhythm's temporal structure. Internal self-sustained oscillations are the perceptual correlates of beats; multiple internal oscillations that operate at different periods (but with specific phase and period relations) correspond to the hierarchical levels of temporal structure perceived in music. The relationship between this system of internal oscillations and the external rhythm of an auditory sequence governs both listeners' categorization of temporal intervals, and their response to temporal fluctuations as deviations from categorical expectations.
This article describes a computational model of the listeners' perceptual response: a dynamical system that tracks temporal structures amidst the expressive variations of music performance, and interprets deviations from its temporal expectations as musically expressive. We test the model in two experiments by examining its response to performances in which the same pianists performed the same piece of music with different interpretations (Palmer, 1996a; Palmer & van de Sande, 1995). We consider two types of expressive timing common to music performance that correlate with performers' musical intentions: lengthening of events that mark phrase structure boundaries, and temporal spread or asynchrony among chord tones (tones that are notated as simultaneous) that mark the melody (primary musical voice). Two aspects of the model of rhythm perception are assessed. First, we evaluate the model's ability to track different temporal periodicities within music performances. This tests its capacity for following temporal regularity in the face of significant temporal fluctuation. Second, we compare the model's ability to detect temporal irregularities against the structural intentions of performers. This gauges its sensitivity to musically expressive temporal gestures that are known to be informative for listeners. Additionally, we observe that some types of small but systematic temporal irregularities (chord asynchronies) can improve tracking in the presence of much larger temporal fluctuations (rubato). Comparisons of the model's beat-tracking of systematic temporal fluctuations and of random fluctuations in simulated performances indicate that performed deviations from precise temporal regularity are not noise; rather, temporal fluctuations are informative for listeners in a variety of ways. In the next section, we review music-theoretic descriptions of temporal structures in music, and in the following section, we describe the temporal fluctuations that occur in music performance.
1.1. Rhythm, metrical structure, and music notation
Generally speaking, rhythm is the whole feeling of movement in time, including pulse, phrasing, harmony, and meter (Apel, 1972; Lerdahl & Jackendoff, 1983). More commonly, however, rhythm refers to the temporal patterning of event durations in an auditory sequence. Beats are perceived pulses that mark equally spaced (subjectively isochronous) points in time, either in the form of sounded events or hypothetical (unsounded) time points. Beat perception is established by the presence of musical events; however, once a sense of beat has been established, it may continue in the mind of the listener even if the event train temporarily comes into conflict with the pulse series, or after the event train ceases (Cooper & Meyer, 1960). This point is an important motivator for our theoretical approach; once established, beat perception must be able to continue in the presence of stimulus conflict or in the absence of stimulus input. Music theories describe metrical structure as an alternation of strong and weak beats over time. One theory conceptualizes metrical structure as a grid of beats at various time scales (Lerdahl & Jackendoff, 1983), as shown in Fig. 1; these are similar to metrical grids proposed in phonological theories of speech (Liberman & Prince, 1977). According to this notational convention, horizontal rows of dots represent levels of beats, and the relative spacing and alignment among the dots at adjacent levels captures the relationship between the hypothetical periods and phases of the beat levels. Metrical accents are indicated in the grid by the number of coinciding dots. Points at which many beats coincide are called strong beats; points at which few beats coincide are called weak beats. Although these metrical grids are idealized (music performances contain more complex period and phase relationships among beat levels than those captured by metrical grids), the music-theoretic invariants reflected in these grids inform our model of the perception of temporal regularity in music.