Abstract

Native and non-native use of language differs, depending on the proficiency of the speaker, in clear and quantifiable ways. It has been shown that customizing the acoustic and language models of a natural language understanding system can significantly improve handling of non-native input; in order to make such a switch, however, the nativeness status of the user must be known. In this paper, we show that naive Bayes classification can be used to identify non-native utterances of English. The advantage of our method is that it relies on text, not on acoustic features, and can be used when the acoustic source is not available. We demonstrate that both read and spontaneous utterances can be classified with high accuracy, and that classification of errorful speech recognizer hypotheses is more accurate than classification of perfect transcriptions. We also characterize part-of-speech sequences that play a role in detecting non-native speech.

Keywords

Computer scienceNatural language processingSpeech recognitionNatural languageArtificial intelligenceNaive Bayes classifierSupport vector machine

Affiliated Institutions

Related Publications

A review of large-vocabulary continuous-speech

Considerable progress has been made in speech-recognition technology over the last few years and nowhere has this progress been more evident than in the area of large-vocabulary...

1996 IEEE Signal Processing Magazine 216 citations

Publication Info

Year
2001
Type
article
Pages
1-8
Citations
48
Access
Closed

External Links

Social Impact

Altmetric

Social media, news, blog, policy document mentions

Citation Metrics

48
OpenAlex

Cite This

Laura Mayfield Tomokiyo, Rosie Jones (2001). You're not from 'round here, are you?. , 1-8. https://doi.org/10.3115/1073336.1073367

Identifiers

DOI
10.3115/1073336.1073367