Automated phonetic transcription for varieties of English: wav2vec 2.0 fine-tuned on the Buckeye Corpus

Virginia Partridge; Joe Pater; Parth Bhangla; Ali Nirheche; Brandon Prickett

doi:10.7275/amphonology.3874

Options

Proceedings

Automated phonetic transcription for varieties of English: wav2vec 2.0 fine-tuned on the Buckeye Corpus

Authors

Virginia Partridge
Joe Pater (University of Massachusetts Amherst)
Parth Bhangla
Ali Nirheche (University of Massachusetts Amherst)
Brandon Prickett

Abstract

Reliable automated phonetic transcription would vastly increase the database for phonological analysis and theorizing. Advances in speech recognition technology stand to bring us to that goal. We present a wav2vec 2.0 model fine-tuned on the Buckeye corpus of conversational English. We experiment with different amounts of training data for fine-tuning, as well as different gender and age distributions in training data. We find that good results are achieved with about two hours of training data, and that performance is generally robust to skews in the makeup of the training data. These findings are encouraging for the project of extending these methods to languages and varieties that are less well resourced. We also compare our model on the Buckeye test set to a group of universal models, and ones trained on the TIMIT corpus. These comparisons suggest that targeted fine-tuning is worthwhile where the data exist. As a first step in extending our model to other varieties, we also used the TIMIT corpus test set. Our Buckeye-tuned model continues to outperform the universal models on the TIMIT test set, but by a smaller margin. To make our models broadly accessible, we have released them publicly along with a web-based interface which supports input and output in Praat TextGrid format.

Keywords: automated transcription, speech recognition, IPA transcription, wav2vec 2.0, Buckeye corpus

How to Cite:

Partridge, V., Pater, J., Bhangla, P., Nirheche, A. & Prickett, B., (2026) “Automated phonetic transcription for varieties of English: wav2vec 2.0 fine-tuned on the Buckeye Corpus”, Proceedings of the Annual Meetings on Phonology 2(1). doi: https://doi.org/10.7275/amphonology.3874

Downloads:
Download PDF

139 Views

35 Downloads

Published on
2026-03-14

Peer Reviewed

License

Creative Commons Attribution 4.0

Authors

Virginia Partridge
Joe Pater (University of Massachusetts Amherst)
Parth Bhangla
Ali Nirheche (University of Massachusetts Amherst)
Brandon Prickett

Publication details

Supplementary Files

Supplemental materials

File Checksums (MD5)

PDF: 093c0f408ccb1bdb105ba253b3781527

Automated phonetic transcription for varieties of English: wav2vec 2.0 fine-tuned on the Buckeye Corpus

Abstract

Harvard-Style Citation

Vancouver-Style Citation

APA-Style Citation

Non Specialist Summary