by Susanne Fuchs1, Melanie Weirich1, Christian Kroos2, Natalie Fecher1, Daniel Pape3,
and Sabine Koppetsch4
If one walks through the first level of the main building at the Humboldt University in Berlin and looks at the portraits of the researchers who studied there, became professors, and in some cases won Nobel prizes, one may conclude that the most important visual signs of a famous person are being a man and having a beard.
Wearing a beard has a long socio-cultural tradition going at least back to the Pharaohs. The ancient Egyptians associated facial hair with the sexual, religious and social power of the monarch. Indeed, Queen Hatshepsut wore a bodkin beard after her accession to the throne (Wietig, 2005). Lack of facial hair was long considered a sign of weakness
or divine punishment. The first recorded radical shavings were ordered by Alexander the Great to prevent Persians pulling his soldiers’ beards during hand-to-hand fighting. Another tradition relates beards with fertility.
Today, belief in bearded monarchs, male or female, has declined. The general acceptance of facial hair and specific styles of facial hair appears dependent on sex, culture, nation, and fashion. According to the American Mustache Institute, mustache acceptance is between 16 and 35% in the U.S., though between 72 and 94% in Germany. This paper concerns the influence of facial hair on audio-visual speech intelligibility in noise. It is known that watching the speaker’s face increases the intelligibility of speech in noisy environments (Grant and Seitz, 2000). By observing the cyclical opening and closing of the visible jaw, an observer can identify the rhythmic structure of the spoken utterance or even the focus of a particular sequence (Dohen, Lœvenbruck, and Hill, 2005).
Facial hair can cover parts of the face such as the upper lip, the teeth, and the larynx. This modifies the visible area of the open mouth, and hence facial hair is responsible for a kind of natural impoverishment of the visual speech signal. Under normal conditions such impoverishment may be marginal for the intelligibility of speech, since auditory information is fully available. However, under noisy conditions such as a cocktail party (in audiovisual speech research terms: multi-talker babble noise), visual cues may be crucial for increasing speech intelligibility (assuming that listeners want to understand their communicative partners). Based on these considerations, we hypothesize that:
(1) Facial hair hiding visible articulatory movements leads to lower speech intelligibility under noisy auditory conditions, longer reaction time, and lower confidence in recognizing the relevant target words.
(2) The shape and location of the beard is crucial for the reduced speech intelligibility in noise. A mustache hiding upper lip movement has a larger impact on visual speech intelligibility than a long chin beard, hiding the larynx only. So in terms of speech intelligibility, is it time for a shave?
Methods
Investigating the interference of facial hair with visual speech intelligibility poses the problem of accurately controlling the amount and shape of facial hair across several speakers while keeping the recording situation constant. Since it is difficult to find participants willing to grow and then cut their beards as needed, we decided to use artificial beards made from natural hair. Two different types were chosen: mustache and long chin beard.
Figure 1: Subjects 1, 2 and 3 without beard (left), with a mustache (middle), and with a long chin beard (right).
Stimuli: Video Recordings
Three male speakers in their mid-20s were recorded (see Figure 1). None had natural facial hair above 3mm at the time of the recording.
The speakers were selected according to their hair colour and texture, which had to fit the colour and texture of the attached facial hair. Two types of facial hair were obtained in a specialist mask shop. They consisted of natural hair woven into a strip of gauze. The gauze strip was attached to the facial surface with glue (Mastix).
Each speaker read various target words embedded in carrier sentences in three conditions: no beard (beard0), mustache (beard1), and long chin beard (beard2).
Speech Material
Twenty nouns were selected as target words on the basis of their high frequency in the mental lexicon and their semantic content. Their meaning had to fit four carrier sentences without being predictable from the semantic context of the carrier sentence. We tried to make the corpus as phonetically balanced as possible. Only words consisting of two syllables were chosen.
Design
To avoid participants seeing the same speaker with different beards and hence becoming aware of the aim of the study, beard condition was made a between-subject factor while ‘speaker’ was kept within-subject. Thus, a participant would see all three speakers with the same beard type. The audio-only control condition (A) was designed to mirror the audio-visual (AV) condition. Commercial multi- speaker babble noise was added to the original sound track with its loudness set to result in a final signal-to-noise ratio of 3dB. For the audio-video condition (AV), the original video and the noisy audio signal were presented. Each target word plus carrier sentence was presented in six different versions (beard0-AV, beard1-AV, beard2-AV, beard0-A, beard1-A, beard2-A).
Procedures and Participants
The participants were seated approximately 50 cm away from the monitor on which the stimuli were presented and they listened to the stimuli via Sennheiser HD 201 headphones. They were instructed to type the target word on a computer keyboard as soon as they thought they had recognized it. They were told that their response times were measured by pressing the enter button after they typed in the perceived word. Subjects were subsequently prompted to rate their confidence in having correctly identified the target word by selecting a software button with the computer mouse. The test trials (20 target words * 6 conditions * 2 repetitions) were preceded by 5 practice trials. The experiment took approximately 30 minutes.
Forty-four participants took part in the experiment. The participants were randomly assigned to one of three stimuli groups (speakers without beard, speakers with mustache, speakers with long chin beard) though across groups the same gender ratio and a similar age range were maintained.
Results and Discussion
There is clear evidence that speech intelligibility increases when watching the speaker’s face (AV) in comparison to the audio-only (A) condition. This increase is on average 17% for no beard, 20% for moustache and 12% for long chin beard. Speakers with a moustache have in all cases the lowest speech intelligibility, but the improvement is the largest. Speakers with a long chin beard have similar or even better intelligibility in comparison to the others, but their improvement is the least.
Since reaction times were similar for the audio only and the audio-visual condition, all data were pooled together. However, results differ significantly with respect to the beard condition. Subjects showed significantly longer reaction times in the beard1 condition in comparison to beard0 (pMCMC=0.0042) and beard2 (pMCMC=0.0001).
Similar to intelligibility, subjects were most confident when they rated the AV data for the speakers with a long chin beard. The confidence level was significantly larger in the video condition than in the audio-only condition (pMCMC=0.0001) for all beard types.
The comparison between no beard and mustache showed a trend in the expected direction: the mustache reduced intelligibility, lengthened reaction time, and listeners were less confident to perceive the relevant target word than in the no beard condition. However, the findings for the long chin beard went against our expectations. We found an effect of facial hair in the audio-only condition (long chin beard had the best intelligibility followed by no beard and finally the mustache), where no visual information is available. Two likely explanations for this finding can be put forward: First, since the A and AV conditions were presented randomly and the number of target words was limited, a strong learning effect could have taken place. Second, since we glued the beards onto the facial skin of the speakers, it may have caused some irritation, prompting our speakers to produce the relevant sentences in a different way. When all the A and AV data were pooled together, and recognition was split by beard condition and occurrence of the relevant target words from the first to the sixth trial, a clear learning effect was found for no beard and mustache.
However, long chin beard showed not only a learning effect, but results also differed from the no beard and the mustache conditions. We interpreted this as evidence that indeed our speakers used different articulatory strategies when wearing the long chin beard. It may be that the artificial beard impeded natural jaw movements by preventing the surface of the skin from stretching as much as it usually does. This might have caused a different speaking behaviour in our speakers (e.g., hyper-articulation). Results and interpretation would be in agreement with recent findings on the effects of skin stretching on speech production and perception by Ito and colleagues (Ito, Tiede & Ostry, 2009). Accordingly, if you wear a false long chin beard to stay incognito, be aware that your speech may be more recognisable than without the beard.
Conclusion
We have good news for those whose facial hair is longer than 3 mm: There is no need to shave! A trend towards reduced intelligibility was found in the beard1 condition (mustache), but this trend was not significantly different from the beard0 (no beard) condition. Moreover, the improvement from the audio-only to the audio-visual intelligibility is larger for beard1 (mustache) than beard0 (no beard). This can be explained with a greater attentiveness of the listeners in the beard1 condition. Listeners who were presented with an impoverished visual signal paid more attention to this visual information, thereby gaining increased intelligibility. The greater attentiveness may also be reflected in the significantly longer reaction time found for beard1. Thus, if you wear a mustache in a noisy auditory environment, please slow down your speaking rate and take a break from time to time, so listeners may process your speech. Moreover, be aware that people may be attracted by and focus on your beard (this might be particularly relevant for politicians).
Similar to the findings for intelligibility, listeners showed a non- significant trend towards greater confidence that they had perceived the target word correctly when they saw speakers without any facial hair. Again, politicians might be well advised to consider this, though we leave it to the reader to decide whether most politicians would prefer to be better or less well understood. Based on our data, we were not able to verify whether differences in the shape of the facial hair affect intelligibility. Such an investigation may be carried out in the future with participants who do not mind shaving their facial hair, re-growing it, trimming it, shaving again, re-growing, trimming, and so on, in accordance with the wishes of a bunch of phoneticians interested in visual speech intelligibility or—put more positively—in the name of science.
Acknowledgements
We would like to thank Jörg Dreyer for technical support, all participants, Jim Scobbie for proposing this work for AIR, and Jean-Luc Schwartz for useful comments. This work was supported by a grant from the BMBF. It is dedicated to all men with facial hair and Dieter Fuchs.
References
AMI (American Mustache Institute) website, retrieved August 2, 2009.
Dohen, M., H. Loevenbruck, and H. Hill,”A Multi-measurement Approach to the Identification of the Audiovisual Facial Correlates of Contrastive Focus in French,” AVSP-2005 British Columbia, 2005, pp. 115–6.
Grant, K.W. and P.-F. Seitz, “The Use of Visible Speech Cues for Improving Auditory Detection of Spoken Sentences,” Journal of the Acoustical Society of America, vol. 108, no. 3, 2000, pp. 1197–1208.
Ito T., M. Tiede, and D.J. Ostry, “Somatosensory Function in Speech Perception,” Proceedings of the National Academy of Sciences U.S.A., vol. 106, no. 4, 2009, pp. 1245–8.
Wietig, W. Der Bart. Zur Kulturgeschichte des Bartes von der Antike bis zur Gegenwart. PhD dissertation at the University of Hamburg, 2005.
Note: An extended paper with the same title will be published in Curiosities and Regularities in Speech and Language, ed. by M Zygis, C. Mooshammer, P. Hoole, and
S. Fuchs.
Author Affiliations
1 Zentrum für Allgemeine Sprachwissenschaft, Berlin, Germany
2 University of Western Sydney, Australia
3 Instituto de Engehnaria Electrónica e Telemática de Aveiro, Aveiro, Portugal
4 IB-Hochschule, Berlin, Germany
Images, with the exception of Charles Darwin and Figure 1, are courtesy of the NeatoShop.
_____________________
This article is republished with permission from the January-February 2010 issue of the Annals of Improbable Research. You can download or purchase back issues of the magazine, or subscribe to receive future issues. Or get a subscription for someone as a gift!Visit their website for more research that makes people LAUGH and then THINK.
Further research should also be conducted to test for color/contrast discrimination. E.g., that a dark beard, having greater contrast, would be more easily perceived than that of a lighter shade, given of course, that the test is performed using speakers have a relatively light complexion; the inverse could possibly hold true for speakers with skin tone similar to the shade of the facial hair.