When Drums Talk: How We Distinguish Speech from Music
We are surrounded by all kinds of sounds and we are usually good at distinguishing them. For instance, when turning on the radio, we immediately notice whether music is played or someone is talking. But what happens when the speech and the music sound similar? What are the sound characteristics that help us to distinguish them? A team of scientists from the Max Planck Institute for Empirical Aesthetics in Frankfurt, the Max Planck NYU Center for Language, Music, and Emotion (CLaME), and Arizona State University decided to investigate this question.
Music and language processing have been repeatedly compared but similarities and differences between domains are challenging to quantify. This is particularly the case when the domains overlap, as happens, for example, with rhymes or rap music. The international research team initiated an online study involving more than one hundred people from a total of 15 different native-language backgrounds in order to better understand the boundaries between these two domains.
The study focused on the “talking” Dùndún drum used in southwestern Nigeria as both a musical instrument and a medium of communication. This drum imitates the tonal language of the Yorùbá, thus creating what is known as a “speech surrogate.” Participants in the study were provided with basic knowledge about the Dùndún drum, although roughly half of them were already familiar with it.
The researchers compared the acoustic characteristics of drum speech vs. drum music in recordings of both. They also asked participants to listen to the same recordings and indicate whether they thought they were hearing speech or music.
“Most participants were able to identify a large number of the excerpts in the way they were intended by the performer—albeit with an unsurprising bias towards the music-like category. Those who were already familiar with the instrument did particularly well, but the others did better than they would have if they had just chosen the answer randomly,” explains Pauline Larrouy-Maestri of the Max Planck Institute for Empirical Aesthetics.
With the data they collected, the researchers developed a statistical model that can be used to predict when a sound sample will be perceived as music-like or speech-like. The model shows that listeners rely on a number of acoustic features to make this distinction.
Of these features, loudness, pitch, timbre, and timing were found to be significant. For example, a regular rhythm and frequent changes in timbre sound more music-like, while a decreased intensity and fewer changes in pitch make a sequence sound more like speech. Familiarity with the instrument appears to influence how a listener registers these acoustic features.
The study’s findings, recently published in the journal “Frontiers in Psychology”, provide empirical evidence for the relevance of acoustic features as well as insights into the role of a listener’s cultural background, thus producing new knowledge about the formation of perceptual categories in speech and music.
Max-Planck-Institut für empirische Ästhetik
Durojaye, C., Fink, L., Roeske, T., Wald-Fuhrmann, M. and Larrouy-Maestri, P. (2021). Perception of Nigerian Dùndún Talking Drum Performances as Speech-Like vs. Music-Like: The Role of Familiarity and Acoustic Cues. Frontiers in Psychology 12:652673.