Stevens is best known for his contributions to the fields of phonology, speech perception, and speech production. Stevens' most well-known book, Acoustic Phonetics,[16] is organized according to the distinctive features of Stevens' phonological system.
Stevens is perhaps best known for his proposal of a theory that answers the question: Why are the sounds of the world's languages (their phonemes or segments) so similar to one another? On first learning a foreign language, one is struck by the remarkable differences that can exist between one language's sound system and that of any other. Stevens turned the student's perception on its head: rather than asking why languages are different, he asked, if the sound system of each language is completely arbitrary, why are languages so similar? His answer is the quantal theory of speech.[17] Quantal theory is supported by a theory of language change, developed in collaboration with Samuel Jay Keyser, which postulates the existence of redundant or enhancement features.[18]
Stevens' methodology in the investigation of speech sounds is organized into three steps. The first step is to use physics (mainly tube models) to model the shape of the articulators (e.g. the shapes of the front and back cavity, rounding or non-rounding of lips, etc). Based on the articulatory tube models, resonant frequencies can be calculated, which are the formant frequencies. Once the resonant frequencies are calculated, speech data are collected and analyzed to compare to theoretical calculations. This second stage is mainly experimental, where tokens of interest are usually recorded either in isolation, and/or embedded in a controlled carrier phrase, usually spoken by both several female/male native speakers of the language. The key to data collection is controlling for as many factors as possible so that the acoustic evidence of interest can be investigated with minimum amount of artifacts. The last stage in the investigation is to compare the data results with the theoretical predictions and to account for the differences that occur. Differences can sometimes be explained by the fact that tube models usually are simplified as to not account for loss due to softness of vocal walls (though resistors can be added to the theoretical model). Subglottal system might also affect the vocal tract productive system when the glottal opening is large (please see research on subglottal resonance on effects of speech). Theoretical model predictions can give general predictions about what one can expect to find in real speech, and evidence from real speech can also help refine the original model, and give better insight to the production of speech sounds.
Quantal theory aims to elegantly describe (using physics) and organize all the acoustic features of all possible sounds into a matrix. (See chapter five in Acoustics Phonetics) The ultimate constraint on all speech sounds is the physical articulatory system itself, thus supporting the claim that there can only be a finite set of sounds among languages. The reason that the set of speech sounds is finite is that while the movement of the articulators is continuous, only certain configurations tend to be articulatorily and/or acoustically stable, giving rise to fix frequencies for formants that form sounds that are relatively universal for all languages (i.e. vowels and consonants). Each acoustic sound can thus be described by a handful of defining features (usually binary). For example, lip-round (either on or off) is a feature. Tongue height (either high or low) is another feature. In addition to these defining features which serve as the essential description of the acoustic sounds, there are also enhancing features which help to make the sounds more recognizable. For each of these features, one can apply Stevens' methodology to first use a tube model to model the articulators, and predict the resonant frequencies, then collect data to examine the acoustic properties of that feature, and finally to reconcile with the theoretical model and summarize the acoustic properties of that feature.
To get an introduction to the world of speech science, one can first read the book "The Speech Chain" by Denes P. and Pinson E., where one is given a broad overview of the production and transmission of speech. One is introduced to spectrograms and formant frequencies, which are the main acoustic description of sound segments.
As the vocal folds vibrate, puffs of air pushed through (filtered) by the vocal tract, producing sound. This sound source is modeled as a current source in a circuit modeling the production of sound. Changes in the vocal tract would cause change to the sound that is produced.
The frequency of vibration for females vocal folds tend to be higher than that of males, giving female voices higher pitch than male voices.
Research (Hanson, H.M. 1997) has shown there is a difference between how females and males vibrate their vocal folds; there is a greater spread for female glottis, which gives female voices a more breathy quality than male voices.
The subglottal system refers to the system that is below the glottis in the human body. It includes the trachea, bronchi, and the lungs. It is essentially a fixed system, so does not change for each individual speaker. Research results have shown that during the open phase of the glottal cycle (when the glottis is open), coupling is introduced due to the subglottal system, manifesting acoustically as pole/zero pairs in the frequency domain. These pole/zero pairs introduced by the coupling serve are hypothesized to serve as prohibited or unstable regions in the spectra, serving as natural boundaries for vowel features such as +front or +back.
For adult males, the resonant frequencies of their subglottal system have been measured (using invasive methods) to be 600, 1550, and 2200 Hz. (Acoustic Phonetics, pg 197, Ishizaka et. al., Crane & Boves). The subglottal resonant frequencies of females are slightly higher due to their smaller dimensions. One non-invasive way of measuring these peaks is to use an accelerometer placed above the sternal notch (Henke) to record the acceleration of the skin during phonation. The vibration would capture the resonant frequencies below the glottis (of the subglottal system).
The vocal tract refers to the passage way that is above the glottis, all the way to opening of the lips. A two-tube model is usually used to model the vocal tract, one capturing the dimension (cross-sectional area and length) of the back cavity, the other modeling the front cavity. Resonant frequencies calculated from the tube model are the formant frequencies. To produce the schwa vowel /ə/, the vocal tract is relatively open all the way from the glottis to the mouth, thus the tube model can be thought of as a relatively uniform open tube, making the resonant frequencies (or formants) evenly apart. The radiation at the mouth would cause these resonant frequencies to be about five percent lower. (Acoustics Phonetics, pg 139) Female vocal tracts (average of 14.1 cm) are on average shorter than the male vocal tracts (average of 17.7 cm), thus making them having higher formant frequencies than males.
Since the vocal tract walls are soft, energy is lost in the vocal tract, which increases the bandwidth of the formants.
When the velopharyngeal port opens during the production of certain sounds, such as /n/ and /m/, coupling is introduced due to the nasal cavity, which gives the output a nasal quality.
The quantal theory suggests that the phonological inventory of a language is defined primarily by the acoustic characteristics of each segment, with boundaries specified by the acoustic-articulatory mapping. The implication is that phonological segments must have some type of acoustic invariance.[19] Blumstein and Stevens[20] demonstrated what appeared to be an invariant relationship between the acoustic spectrum and the perceived sound: by adding energy to the burst spectrum of "pa" at a particular frequency, it is possible to turn it into "ta" or "ka" respectively, depending on the frequency. Presence of the extra energy causes perception of the lingual consonant; its absence causes perception of the labial.
Stevens' recent work has re-structured the theory of acoustic invariance into a shallow hierarchical perceptual model, the model of acoustic landmarks and distinctive features.
While on sabbatical at KTH in Sweden in 1962, Stevens volunteered as a participant in cineradiography experiments being conducted by Sven Öhman. Stevens' cineradiographic films are among the most widely distributed; copies exist on laserdisc, and some are available online.[21]
After returning to MIT, Stevens agreed to supervise the research of a dentistry student named Joseph S. Perkell. Perkell's knowledge of oral anatomy permitted him to trace Stevens' X-ray films onto paper, and to publish the results.[22]
Other contributions to the study of speech production include a model by which one can predict the spectral shape of turbulent speech excitation (depending on the dimensions of the turbulent jet), and work related to the vocal fold configurations that lead to different modes of phonation.[23]
In fact, the spectral properties (formants, bandwidth of formants, other glottal characteristics) of all possible sound phonemes in all languages can theoretically be modeled and predicted using physics-based resonator models. Basic tube resonators can be used to give a general prediction of formants for vowels. Additional refinement to the basic model is used by adding resistors and/or capacitors to the model to represent energy losses due to vocal tract walls. Acoustical coupling due to the subglottal system can also be modeled by adding additional tubes to the model of the original vocal tract, introducing pole/zero in the spectra that represent the effects of subglottal coupling. (The locations of these pole/zero pairs are the resonant frequencies of the subglottal system). Glottal characteristics such as vocal pitch (F0), open quotient (H1-H2), and degree of breathiness (H1-A3) can also be modeled and measured from the spectra. (Hanson & Stevens).
Stevens joined MIT as an assistant professor in 1954.[24] He became an associate professor in 1957, a full professor in 1963, and was appointed as the Clarence J. Lebel Chaired Professor in 1977.[7] One of his long-time collaborators, Dennis Klatt (who wrote DECtalk while working in Stevens' lab), said that "As a leader, Ken is known for his devotion to students and his miraculous ability to run a busy laboratory while appearing to manage by a principle of benevolent anarchy."[4]
The first doctoral thesis Stevens signed at MIT was that of his fellow student, James L. Flanagan, in 1955. Flanagan started graduate school at MIT in the same year as Stevens, but without a prior master's degree; he earned his M.S. in 1950 under Beranek's supervision, then finished his doctoral thesis under Stevens' supervision in 1955.[25]
Stevens estimated in 2001 that he had supervised approximately forty Ph.D. candidates.[5]
On the occasion of his receipt of the Gold Medal of the Acoustical Society of America, in 1995, colleagues wrote of Stevens' Speech Group that "during its existence of almost four decades" it "has been outstanding in the support that it has provided to women researchers, many of whom have gone on to populate the upper echelons of research labs throughout the world.".[4] Stevens’ laboratory has been referred to by colleagues as a "national treasure" [6]
Stevens was active in the Acoustical Society of America since his time as a graduate student. He was a member of the executive council from 1963 to 1966,[26] Vice President from 1971–2, and President of the Society from 1976–7.[27] He is a Fellow of the ASA. In 1983 he received its Silver Medal in Speech Communication, and in 1995 he received the Gold Medal from the society.[4]
Stevens was also active in the IEEE, where he held the rank of IEEE Life Fellow. In 2004, Ken Stevens and Gunnar Fant were the joint first winners of the IEEE James L. Flanagan Speech and Audio Processing Award.[28]
Stevens was a Fellow of the American Academy of Arts and Sciences, a member of the National Academy of Engineering,[29] a member of the National Academy of Sciences,[30] and a 1999 recipient of the United States National Medal of Science.[6]