Dialectology

Dialectology (from Ancient Greek διάλεκτος, dialektos 'talk, dialect' and -λογία, -logia) is the scientific study of dialects and other forms of language variation, especially variation associated with geographic region. Dialectologists investigate differences in pronunciation, grammar, and vocabulary, and how such differences pattern across communities and change over time.^[1]^[2]

The field developed in the 19th century alongside historical linguistics and became closely associated with large-scale dialect surveys and the production of dialect maps and linguistic atlases. Such work typically relies on systematic data collection (for example, questionnaires, interviews, and recordings) and represents the geographic distribution of linguistic features using concepts such as isoglosses and dialect boundaries.

From the mid-20th century onward, dialectology has increasingly overlapped with sociolinguistics and variationist approaches, extending its focus from primarily rural, long-established speakers to include urban varieties, social differentiation, and the effects of migration and language contact. Topics commonly discussed in dialectological research include mutual intelligibility and the language–dialect distinction, diglossia, dialect continua, and the relationship between regional varieties and standard forms in pluricentric languages.

History

Early developments

Dialectology emerged from nineteenth-century comparative work on the Indo-European languages, where variation across dialects and written records was used to infer earlier stages such as Proto-Indo-European.^[3] Developments in phonetics and sound change laws encouraged greater attention to living dialects as a source of linguistic data, since they preserve details often absent from written texts. Over time, this work helped establish dialects as objects of study in their own right.

English dialects

In London, there were comments on the different dialects recorded in 12th-century sources, and a large number of dialect glossaries (focussing on vocabulary) were published in the 19th century.^[4] Philologists would also study dialects, as they preserved earlier forms of words.^[4]

In Britain, the philologist Alexander John Ellis described the pronunciation of English dialects in an early phonetic system in volume 5 of his series On Early English Pronunciation. The English Dialect Society was later set up by Joseph Wright to record dialect words in the British Isles. This culminated in the production of the six-volume English Dialect Dictionary in 1905. The English Dialect Society was then disbanded, as its work was considered complete, although some regional branches (e.g. the Yorkshire Dialect Society) still operate today.

Traditional studies in dialectology were generally aimed at producing dialect maps, in which lines were drawn on a map to indicate boundaries between different dialect areas. These maps displayed the geographical distribution of linguistic features using isoglosses, which are lines marking the boundaries where different linguistic variants are used.^[5] The move away from traditional methods of language study, however, caused linguists to become more concerned with social factors. Dialectologists, therefore, began to study social, as well as regional variation. The Linguistic Atlas of the United States (the 1930s) was amongst the first dialect studies to take social factors into account.

Under the leadership of Harold Orton, the University of Leeds became a centre for the study of English dialect and set up an Institute of Dialect and Folk Life Studies. In the 1950s, the university undertook the Survey of English Dialects, which covered all of England, some bordering areas of Wales, and the Isle of Man. In addition, the university produced more than 100 monographs on dialect before the death of Harold Orton in 1975.^[6] The institute closed in September 1983 to accommodate budget cuts at the university, but its dialectological studies are now part of a special collection, the Leeds Archive of Vernacular Culture, in the university's Brotherton Library.^[7]

This shift from purely regional to broader social factors in interest consequently saw the birth of sociolinguistics, which is a mixture of dialectology and social sciences. However, Graham Shorrocks has argued that there was always a sociological element to dialectology and that many of the conclusions of sociolinguists (e.g. the relationships with gender, class and age) can be found in earlier work by traditional dialectologists.^[8]

In the US, Hans Kurath began the Linguistic Atlas of the United States project in the 1930s, intended to consist of a series of in-depth dialectological studies of regions of the country. The first of these, the Linguistic Atlas of New England, was published in 1939.^[9] Kurath specified that at least two informants should be chosen from each county, representing different educational levels and ages, moving beyond earlier approaches that focused exclusively on elderly rural speakers. This methodological innovation introduced social stratification as an important dimension of linguistic geography.^[10] Later works in the same project were published or planned for the Middle Atlantic and South Atlantic states, for the North Central States, for the Upper Midwest, for the Rocky Mountain States, for the Pacific Coast and for the Gulf States,^[11] though in a lesser degree of detail owing to the huge amount of work that would be necessary to fully process the data.

Later large-scale and influential studies of American dialectology have included the Dictionary of American Regional English, based on data collected in the 1960s and published between 1985 and 2013, focusing on lexicon; and the Atlas of North American English, based on data collected in the 1990s and published in 2006, focusing on pronunciation.^[12]

The sociolinguistic turn

The development of variationist sociolinguistics in the 1960s, pioneered by William Labov, fundamentally transformed dialectology. Labov's 1963 study of Martha's Vineyard and his 1966 dissertation on the social stratification of English in New York City introduced quantitative methods and demonstrated that linguistic variation is systematically correlated with social factors such as class, age, and ethnicity.^[13] His famous New York City department store study showed how pronunciation of the post-vocalic /r/ varied systematically with the social prestige of different retail establishments, establishing that linguistic variables could serve as indicators of social status.^[14]

African American Vernacular English

African American Vernacular English (AAVE) has been an important focus of dialectological research. In the late 1960s and early 1970s, William Labov's studies of the linguistic features of AAVE were influential in establishing that it should not be stigmatized as substandard, but rather respected as a variety of English with its own systematic grammatical rules.

AAVE exhibits distinctive features across multiple linguistic levels. It is typically non-rhotic, dropping the /r/ sound when not followed by a vowel. The dialect includes features such as copula omission (e.g., "Sharon gon come"), the habitual or invariant "be" marking regularly occurring actions (e.g., "Billy don't be telling lies"), and intensified negation through double negatives (e.g., "She don wan nothin").^[15]

Research has shown that AAVE spread from its original rural Southern base to become predominantly urban and nationwide following the Great Migration, with contemporary urban features now diffusing back into rural areas. The study of AAVE has contributed significantly to dialectology by demonstrating how social factors like racial segregation, community identity, and socioeconomic status shape language development.

French dialects

Jules Gilliéron published a linguistic atlas of 25 French-speaking locations in Switzerland in 1880. In 1888, Gilliéron responded to a call from Gaston Paris for a survey of the dialects of France, likely to be superseded by Standard French in the near future, by proposing the Atlas Linguistique de la France. The principal fieldworker for the atlas, Edmond Edmont, surveyed 639 rural locations in French-speaking areas of France, Belgium, Switzerland and Italy. The questionnaire initially included 1400 items, later increased to over 1900. The atlas was published in 13 volumes between 1902 and 1910.^[16] The success of Gilliéron and Edmont's methods resulted in the construction of many linguistic atlases in the United States and throughout the world, establishing a model that influenced dialectological research for decades.

German dialects

The first comparative dialect study in Germany was Die Mundarten Bayerns (The Dialects of Bavaria) in 1821 by Johann Andreas Schmeller, which included a linguistic atlas.^[17]

In 1873, L. Liebich surveyed the German-speaking areas of Alsace by a postal questionnaire that covered phonology and grammar. He never published any of his findings.^[18]

In 1876, Eduard Sievers published Grundzüge der Phonetik (Elements of Phonetics) and a group of scholars formed the Neogrammarian school. This work in linguistics covered dialectology in German-speaking countries. In the same year, Jost Winteler published a monograph on the dialect of Kerenzen in the Canton of Glarus in Switzerland, which became a model for monographs on particular dialects.^[19]

Also in 1876, Georg Wenker, a young school librarian from Düsseldorf based in Marburg, sent postal questionnaires out over Northern Germany. These questionnaires contained a list of sentences written in Standard German. These sentences were then transcribed into the local dialect, reflecting dialectal differences. He later expanded his work to cover the entire German Empire, including dialects in the east that have become extinct since the territory was lost to Germany. Wenker's work later became the Deutscher Sprachatlas at the University of Marburg. After Wenker's death in 1911, work continued under Ferdinand Wrede and later questionnaires covered Austria as well as Germany.^[20]

Italian dialects and Corsican

The first treatment of Italian dialects was by Dante Alighieri in his treatise De vulgari eloquentia in the early fourteenth century.

The founder of scientific dialectology in Italy was Graziadio Isaia Ascoli, who, in 1873, founded the journal Archivio glottologico italiano, still active today together with L'Italia dialettale, which was founded by Clemente Merlo in 1924, and the more recent Rivista italiana di dialettologia.

After completing his work in France, Edmond Edmont surveyed 44 locations in Corsica for the Atlas Linguistique de la Corse.^[21]

Two students from the French atlas, Karl Jaberg and Jakob Jud, surveyed dialects in Italy and southern Switzerland in the Sprach- und Sachatlas Italiens und der Südschweiz.^[22] This survey influenced the work of Hans Kurath in the US.^[23]

Scots dialects and Gaelic

The Linguistic Survey of Scotland began in 1949 at the University of Edinburgh.^[24] The first part of the survey researched dialects of Scots in the Scottish Lowlands, the Shetland Islands, the Orkney Islands, Northern Ireland, and the two northernmost counties of England: Cumberland (since merged into Cumbria) and Northumberland. Three volumes of results were published between 1975 and 1985.^[25] The second part studied dialects of Gaelic, including mixed use of Gaelic and English, in the Scottish Highlands and Western Isles. Results were published under the name of Cathair Ó Dochartaigh in five volumes between 1994 and 1997.^[26]

Concepts and terminology

Isoglosses

An isogloss is a line on a dialect map marking the geographic boundary of a particular linguistic feature, such as a specific pronunciation, word choice, or grammatical construction. When researchers survey a region, they often find that each linguistic element has its own geographic distribution. For example, in the Upper Midwest of the United States, one can draw an isogloss separating areas where people say "paper bag" from areas where they say "paper sack."^[27]

When multiple isoglosses coincide or run close together, they form what dialectologists call a bundle of isoglosses, which marks a more substantial dialect boundary. In the American Midwest, for instance, isoglosses separating "pail" (northern) from "bucket" (southern), different pronunciations of "greasy," and various other features bundle together to mark a boundary between Northern and Midland dialect areas.

Major dialects are typically demarcated by such bundles of isoglosses. Notable examples include the Benrath line that distinguishes High German from other West Germanic languages, and the La Spezia–Rimini Line that divides Northern Italian languages from Central Italian dialects. In American English, the North–Midland isogloss demarcates numerous linguistic features including the Northern Cities vowel shift.

Mutual intelligibility

Some have attempted to distinguish dialects from languages by saying that dialects of the same language are understandable to each other's speakers. In linguistics, two varieties are considered mutually intelligible if speakers can understand one another without prior knowledge of the other's speech or with little effort. Mutual intelligibility can be symmetric (both speakers understand each other equally well) or asymmetric (one speaker comprehends better than the other).

This simple criterion is demonstrated to be untenable, for example by the case of Italian and Spanish cited below. While native speakers of the two may enjoy mutual understanding ranging from limited to considerable depending on the topic of discussion and speakers' experience with linguistic variety, few people would want to classify Italian and Spanish as dialects of the same language in any sense other than historical. Spanish and Italian are similar and to varying extents mutually comprehensible, but phonology, syntax, morphology, and lexicon are sufficiently distinct that the two cannot be considered dialects of the same language (but rather developed from their common ancestor Latin).

Conversely, some varieties classified as dialects of a single language may have limited mutual intelligibility. Cantonese speakers, for example, cannot readily understand Mandarin speakers, yet both are classified as dialects of Chinese. This demonstrates that the distinction between dialect and language is often influenced by political and social factors rather than purely linguistic criteria.

Diglossia

Diglossia is a sociolinguistic situation in which two varieties of a language (or two closely related languages) are used side by side within a speech community for different functions. Typically, a high (H) variety is associated with education, government, religion, and other formal domains, while a low (L) variety is used in everyday conversation and other informal settings, often by the same speakers.^[28]

Diglossia is frequently discussed in dialectology because the functional division between varieties can interact with regional variation and with the processes of standardization. One often-cited historical example is the use of Sanskrit in high-prestige written and ritual contexts alongside vernacular Middle Indo-Aryan varieties (often grouped under the label Prakrit) in everyday speech.^[28]

Dialect continuum

A dialect continuum (or dialect chain) is a network of dialects in which geographically adjacent varieties are usually mutually intelligible, but intelligibility tends to decrease with distance. Dialect continua are often discussed in connection with widely distributed language families and long-established speech communities, where innovations spread through local contact rather than from a single center.^[1]

A commonly cited example is the Dutch–German continuum, in which intermediate dialects can form a chain between the two standard languages even though standard Dutch and standard German are not fully mutually intelligible. In a similar way, the closely related Romance varieties of western and southern Europe have often been described as forming continua, with gradual transitions between neighboring varieties in many areas.^[1]

Dialect continua are frequently used to illustrate that dialect boundaries are not always discrete: linguistic features may change gradually across space, and social and political factors (including the influence of standard languages) can create sharper divisions than the underlying patterns of local variation would suggest.^[1]

Pluricentrism

A pluricentric language is a language that has two or more standard forms. An example is Hindustani, which encompasses two standard varieties, Urdu and Hindi. Another example is Norwegian, with Bokmål having developed closely with Danish and Swedish, and Nynorsk as a partly reconstructed language based on older dialects. Both are recognized as official languages in Norway.^[30]

In dialectology, discussion of pluricentricity often overlaps with broader questions about how closely related varieties are classified. For example, the abstand and ausbau framework introduced by sociolinguist Heinz Kloss distinguishes between varieties treated as separate languages primarily because of linguistic distance abstand language and those whose status reflects sociocultural development such as standardization and use in public domains ausbau language.^[31]^[1] In contexts such as dialect continua, closely related varieties may develop distinct standards, blurring the boundary between dialect and language while remaining mutually intelligible to varying degrees.^[31]

In a sense, the set of dialects can be understood as being part of a single diasystem, an abstraction that each dialect is part of. In generative phonology, the differences can be acquired through rules. An example can be taken with Occitan (a cover term for a set of related varieties spoken in Southern France) where 'cavaL' (from late Latin caballus, "horse") is the diasystemic form for the following realizations:

Languedocien dialect: caval [kaβal] (L > [l], sometimes velar, used concurrently with French borrowed forms chival or chivau);
Limousin dialect: chavau [tʃavau] (ca > cha and -L > -u);
Provençal dialect: cavau [kavau] (-L > -u, used concurrently with French borrowed forms chival or chivau);
Gascon dialect: cavath [kawat] (final -L > [t], sometimes palatalized, and used concurrently with French borrowed form chibau)
Auvergnat and Vivaro-alpine dialects: chaval [tʃaval] (same treatment of ca cluster as in Limousin dialect)

The pluricentric approach may be used in practical situations. For instance, when such a diasystem is identified, it can be used to construct a diaphonemic orthography that emphasizes commonalities between the varieties. Such a goal may or may not fit with sociopolitical preferences. Conversely, dialectological field-internal traditions may or may not delay the diversification of a given language into multiple standards (see Luxembourgish for an example of the latter, and the One Standard German Axiom for the former).

Methods

A variety of methods are used to collect data on regional dialects and to select informants. Early dialect research, which aimed to document conservative local forms with minimal influence from ongoing change or contact, often prioritized older speakers in rural communities. Traditional dialectologists frequently sought what later scholars termed NORMs (non-mobile, older, rural, males), who were assumed to preserve relatively conservative features of local speech.^[32] More recently, particularly in interaction with sociolinguistics, dialectology has placed greater emphasis on patterns of variation and change within communities, including the speech of younger speakers and speakers in urban settings.^[32]

Questionnaires

Some of the earliest dialectology collected data by use of written questionnaires asking informants to report on features of their dialect. This methodology has seen a comeback in recent decades,^[33] especially with the availability of online questionnaires that can be used to collect data from a huge number of informants at little expense to the researcher.

Dialect research in the 20th century predominantly used face-to-face interview questionnaires to gather data. There are two main types of questionnaires: direct and indirect. Researchers using the direct method for their face-to-face interviews will present the informant with a set of questions that demand a specific answer and are designed to gather lexical and/or phonological information. For example, the linguist may ask the subject the name for various items, or ask him or her to repeat certain words.

Indirect questionnaires are typically more open-ended and take longer to complete than direct questionnaires. A researcher using this method will sit down with a subject and begin a conversation on a specific topic. For example, he may question the subject about farm work, food and cooking, or some other subject, and gather lexical and phonological information from the information provided by the subject. The researcher may also begin a sentence, but allow the subject to finish it for him, or ask a question that does not demand a specific answer, such as "What are the most common plants and trees around here?"^[34] The sociolinguistic interview may be used for dialectological purposes as well, in which informants are engaged in a long-form open-ended conversation intended to allow them to produce a large volume of speech in a vernacular style.

There are two basic methods of data collection for large-scale surveys: fieldwork and survey by correspondence. Fieldwork, in which a trained investigator transcribes dialectal forms directly (or through recording), affords more precise data and enables the questionnaire to include a greater number of diverse questions. The correspondence method, where questionnaires are sent to respondents (historically to rural schoolteachers), can encompass more locations at less cost, though the data may be less accurate.^[35]

Whereas lexical, phonological and inflectional variations can be easily discerned, information related to larger forms of syntactic variation is much more difficult to gather. Another problem is that informants may feel inhibited and refrain from using dialectal features.^[36]

Recording technology

The development of audio recording technology fundamentally transformed dialectological research. The advent of tape recording in the 1950s and 1960s meant that dialectologists could preserve complete speech samples for later analysis under laboratory conditions, rather than relying solely on fieldworkers' real-time transcriptions. This allowed researchers to examine continuous speech and study how consistently speakers used different dialect features.^[37]

Contemporary approaches

In recent years, technological innovations have enabled researchers to expand dialectological study. Geographic Information Systems (GIS) provide digital tools for collecting, managing, analysing, and visualising spatially referenced language data, and are increasingly used to map dialect features and to create digital linguistic atlases.^[38]

The use of computerized systems for geolinguistic data processing has evolved continuously since the 1970s. Modern linguistic geography projects are fully integrated within Digital Humanities and are governed by principles of application interoperability, free data reuse, and interdisciplinarity.^[39]

Researchers may collect relevant excerpts from books that are entirely or partially written in a dialect. The major drawback is the authenticity of the material, which may be difficult to verify.^[36] Since the advent of social media, it has become possible for researchers to collect large volumes of geotagged posts from platforms such as Twitter, in order to document regional differences in the way language is used in such posts.