Using OpenAI's automatic speech recognition (ASR) system Whisper to Automate the Transcription of L2 Learners' Spoken Texts
2023-06-04, 09:20–09:50 (Asia/Tokyo), C2
Language: English

While generative AI models have become famous for their ability to produce text, the underlying transformer model can also be used for other tasks. This presentation will examine the viability of using OpenAI's automatic speech recognition (ASR) system Whisper in L2 research. While there has been an increase in the use of models like OpenAI’s Whisper for transcribing L1 speech (e.g. Lin, 2023; Seyedi et al., 2022) using ASR with speech produced by L2 English learners can be difficult due to factors such as pronunciation errors, disfluencies, and atypical grammatical constructions (Wang et al., 2021). The presenters examine the error rate in 100 samples of L2 generated texts recorded in the classroom. These include 50 presentations, with only one speaker, and 50 discussions, between two to four speakers. The resulting transcripts were compared against the professionally transcribed versions along with a cleaned version of the transcript that was checked for errors. While the results show that the accuracy of automated transcription comes close to the professional translation, the presenters will highlight areas where ASR based translations struggle and best practices for recording and cleaning transcriptions when using models such as Whisper for transcribing L2 presentations and discussions.


This presentation examines the accuracy of OpenAI’s Automated Speech Recognition (ASR) transformer model for transcribing L2 English learners’ presentations (one speaker) and discussions (two to four speakers).