Lipreading from Moving Dots

What is this image?
Point your cursor at the image to see how other people answer this question.

Now start the video below, and watch the dots move. Can you tell what the image is when the dots are moving? PLEASE START THE MOVIE NOW.


When the dots are moving, most people are able to identify the image as depicting a mouth (and lower half of the face) articulating a syllable. The image was created using the point-light technique. The technique involves placing small florescent dots on the face of a speaker, and then filming the speaker under special lighting conditions so that only the dots can be seen.
Here is the point-light pattern used to create the images above:

Do point-light faces provide visual speech information? Research has shown that untrained observers can lipread vowels, syllables, and some simple words from point-light faces. Point-light faces can also be used to help us hear noisy auditory speech sentences. The movie below shows the type of point-light stimuli that have been tested with sentences. As you watch the video, notice how the number of point-lights that are placed on the face changes. PLEASE START THE MOVIE NOW.

The sentence spoken in the movie is "The tree fell on the house". While it might be difficult to lipread this sentence with no sound, if you were to hear a noisy, distorted version of the sentence, the point-light image would help your comprehension. In this movie, you are seeing three different point-light patterns. The first pattern contains point-lights on the lips. The second pattern adds points to the teeth and tongue-tip. The third pattern adds additional points to the cheeks, jaw, nose, and forehead. Research shows that the point-lights on the lip, teeth, and tongue-tip are particularly useful for enhancing noisy auditory speech. In this way, the point-light technique can help us determine the important information for visual speech perception.

Does point-light speech integrate with auditory speech? To answer this question, we will use the McGurk effect. In the movie below, you will be seeing and hearing a single point-light syllable repeating. Watch the point-light mouth closely, but concentrate on what you're hearing. After you feel certain of what you perceive, stop the movie and continue reading the text below. PLEASE START THE MOVIE NOW.

Now start the movie again and close your eyes. Listen to the movie repeat until you are sure of what you hear. When you feel certain of what you hear, stop the movie and continue reading the text below.
PLEASE RESTART THE MOVIE NOW.

If you're like most people, you hear a 'va' syllable with your eyes opened, and a 'ba' syllable with your eyes closed. Thus, point-light speech integrates well with auditory speech. This suggests that the speech perception function treats point-light speech in the same it treats regular (non-reduced) visual speech.

What is interesting about point-light speech? Point-light images contain no obvious facial features such as skin, teeth, or the shadows produced in an open mouth. In fact, when these images are not moving, observers have no idea they are looking at a face. Point-light speech shows that isolated speech movements provide visual speech information. The fact that visible speech movements can be informative, and can be integrated well with auditory speech has important implications. Point-light research can help us design better telecommunication systems for the hearing impaired as well as computers that perform speech and face recognition.

However, facial features shown without movement can also provide some visual speech information. Look at the images below. Can you tell what vowels this speaker is articulating? Point your cursor at the faces for the answers.

The fact that we can lipread from photographs suggests that there is visual speech information in frozen mouth positions.

What type of visual speech information is most important: face movements or face features? Researchers are still debating this question. There is evidence that point-light speech images integrate better with auditory speech than do photographs of faces. Still, faces that contain both movements and features integrate best with auditory speech. Perhaps both types of information are important to the speech perception function.