Project Description | Corpus of New York City English: Audio-Aligned and Parsed

This project was born out of the New York City English Research Group. It aims to further the study of New York City English (NYCE) — the varieties of English particular to New York City and the surrounding region — through the development and use of an innovative audio-aligned and parsed corpus of New Yorkers’ speech. The project combines recent advances in speech corpus development tools with the special talents and backgrounds of undergraduates at the City University of New York (CUNY), to create the first such corpus of New York City English (the CUNY-CoNYCE). The CUNY-CoNYCE is based on interviews with New Yorkers across the five boroughs and Long Island, conducted by CUNY undergraduates from Queens College, Lehman College (The Bronx), and the College of Staten Island. Because our student populations draw predominantly from neighborhoods across the five boroughs of New York City and Long Island, they are uniquely able to collectively gather and produce large quantities of speech data from all over the region. The ultimate product will be an on-line, freely accessible, ~1,000,000-word audio-aligned and grammatically annotated corpus of NYCE speech, which will be accompanied by a full set of digital, text-searchable recordings of the speech signal from which the corpus is transcribed.

The structure of the corpus is based on that developed for the associated Audio-Aligned and Parsed Corpus of Appalachian English (AAPCAPPE).

In addition to answering questions about language variation and change in NYCE, the corpus will further research in all areas of linguistics, especially in phonetics, phonology, morphology, syntax, sociolinguistics, and discourse analysis. The use of oral history and sociological measurements of ethnic affiliation components in data collection will also make the CUNY-CoNYCE a useful tool for sociologists and anthropologists examining lived experience in urban settings, inter-ethnic relations, and near-term history of New York life. The project will also provide transformative research experiences for dozens of CUNY undergraduates, giving them unique research opportunities. Additionally, users of the corpus will develop an understanding of and appreciation for the grammar of non-standard dialects, and functions of non-standard speech as necessary linguistic resources for social integration.

This entry is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.