The McGurk effect is a perceptual phenomenon which demonstrates an interaction between hearing and vision in speech perception. It is a compelling illusion in which humans perceive mismatched audiovisual speech as a completely different syllable. It suggests that speech perception is multimodal, that is, that it involves information from more than one sensory modality. This effect may be experienced when a video of one phoneme's production is dubbed with a sound-recording of a different phoneme being spoken. Often, the perceived phoneme is a third, intermediate phoneme. As an example, the syllable /ba-ba/ is spoken over the lip movements of /ga-ga/, an the perception is of /da-da/. The McGurk effect is sometimes called the McGurk-MacDonald effect.
It was first described in a paper by Harry McGurk and John MacDonald in 1976. This effect was discovered by accident when McGurk and his research assistant, MacDonald, asked a technician to dub a video with a different phoneme other than the one spoken while conducting a study on how infants perceive language at different development stages. When the video was played back, both researchers heard a third phoneme rather than the one spoken or mouthed in the video. McGurk and MacDonald originally believe that this effect resulted from the common phonetic and visual properties of /b/ and /g/. The researchers explain that the sound 'bah' is more acoustically in line with 'dah' and that 'gah' is more visually in line with 'dah' rather than 'bah'. These similarities create mixed signals in the brain - in the auditory and visual processing centers, and the brain needing to find a common factor between these two centers processes 'duh.'[4] Research suggests that visual information plays an important role in how well we process auditory information even when sufficient auditory information in presented in a clear and accurate format.[5] Listeners unconsciously watch mouth and facial movements as a form of lip reading to understand a speaker's meaning. When the facial area or facial movements are obscured, whether through the McGurk effect or lack of visual contact with the face miscommunication is more likely.
Further studies have shown that the McGurk effect appears with other consonants and vowels and that it can exist throughout whole sentences, but the illusion is greater with certain vowels, consonants , and words.[6] Studies have shown that the McGurk effect is stronger when matching vowel combinations are used in the auditory and visual syllable sound stimuli versus using non matching vowel combinations. The same was true for matching and nonmatching vowel combinations in the spoken word stimuli.
More information Sounds/Matching Vowels, Words/Matching Vowels ...
| Sounds/Matching Vowels | Words/Matching Vowels |
| ba/ga | bat/gat |
| be/ge | bad/gad |
| bi/gi | moo/goo |
| bo/go | bent/vest |
| bu/gu | might/die |
Close
More information Sounds/Nonmatching Vowels, Words/Nonmatching Vowels ...
| Sounds/Nonmatching Vowels | Words/Nonmatching Vowels |
| ga/bi | bat/vet |
| ba/gi | bet/vat |
| bi/ga | mail/deal |
| gi/ba | mat/dead |
| be/gu | met/gal |
Close
When subjects watch a video with the visual "My gag kok me koo grive" dubbed with the audio "My bap pop me poo brive" most subjects reported hearing "My dad taught me to drive" on the audiovisual.[7] The effect is very robust; that is, knowledge about it seems to have little effect on one's perception of it. Subjects can be told of the effect before watching a dubbed video and they will still hear the third phoneme while watching, yet if they close their eyes they can hear the correct auditory stimuli. Subjects report hearing a third phoneme even when watching a dubbed video of themselves mouthing the sound. This illusion is different from certain optical illusions, which break down once one "sees through" them. Precise synchronization of mouth movements and dubbed words, clarity of image, or the gaze patterns of the subjects do not play a role in whether or not subjects successfully hear the third phoneme. It has been have shown that a delay of 250 msec or advance of 60 msec between visual and auditory stimuli can appear before results begin to be effected.[8]
Research has shown that the McGurk effect is prevalent in other languages as well as English. It has been demonstrated that it has a strong effect in Italian, German, Spanish, and Hungarian languages, while it has a weaker effect in Japanese, Chinese, and Thai languages. These latter languages have simpler phonological cues with less consonant contrasts along with fewer visually distinct contrasts. It is possible that languages with more complex phonological characteristics require more attention to visual cues.[9] Using the McGurk effect to decipher lyrics to a song has a greater outcome then auditory stimuli alone, but it's effect is not as strong as with the spoken word. This may be due to the difference in movement of the jaw and lips. In speech for the most part these movements are minimal, where as in singing the mouth is opened wider and lip movements are exaggerated for the production of higher pitches and sound fullness,and articulating sung vowels. [10]
Study into the McGurk effect is being used to produce more accurate speech recognition programs by making use of a video camera and lip reading software. It has also been examined in relation to witness testimony; Wareham & Wright's 2005 study showed that inconsistent visual information can change the perception of spoken utterances, suggesting that the McGurk effect may have many influences in everyday perception.
References
Massaro, Dominic W. (May–June 1998). "Speech recognition and sensory integration: a 240-year-old theorem helps explain how people and machines can integrate auditory and visual information to understand speech". American Scientist. 86: 236–239. doi:10.1511/1998.25.236. CS1 maint: date and year (link) CS1 maint: date format (link)
Bovo, R. (August 2009). "The McGurk phenomenon in Italian listeners". ACTA Otorhinolaryngologica Italica. 29: 203–208. CS1 maint: date and year (link)
Quinto, Lena (August 2010). "A comparsion of the McGurk effect for spoken and sung syllables". Attention, Perception and Psychophysics. 72 (6): 1450–1454. doi:10.3758/APP.72.6.1450. PMID 20675792. CS1 maint: date and year (link)
Green, Kerry P. "Studies of the McGurk Effect: Implications for Theories of Speech Perception". ;
Massaro, Dominic W. (May–June 1998). "Speech recognition and sensory integration: a 240-year-old theorem helps explain how people and machines can integrate auditory and visual information to understand speech". American Scientist. 86: 236–239. doi:10.1511/1998.25.236. CS1 maint: date and year (link) CS1 maint: date format (link)
Quinto, Lena (August 2010). "A comparsion of the McGurk effect for spoken and sung syllables". Attention, Perception and Psychophysics. 72 (6): 1450–1454. doi:10.3758/APP.72.6.1450. PMID 20675792. CS1 maint: date and year (link)
Bovo, R. (August 2009). "The McGurk phenomenon in Italian listeners". ACTA Otorhinolaryngologica Italica. 29: 203–208. CS1 maint: date and year (link)
Quinto, Lena (August 2010). "A comparsion of the McGurk effect for spoken and sung syllables". Attention, Perception and Psychophysics. 72 (6): 1450–1454. doi:10.3758/APP.72.6.1450. PMID 20675792. CS1 maint: date and year (link)