Abstract
Native and non-native use of language differs, depending on the proficiency of the speaker, in clear and quantifiable ways. It has been shown that customizing the acoustic and language models of a natural language understanding system can significantly improve handling of non-native input; in order to make such a switch, however, the nativeness status of the user must be known. In this paper, we show that naive Bayes classification can be used to identify non-native utterances of English. The advantage of our method is that it relies on text, not on acoustic features, and can be used when the acoustic source is not available. We demonstrate that both read and spontaneous utterances can be classified with high accuracy, and that classification of errorful speech recognizer hypotheses is more accurate than classification of perfect transcriptions. We also characterize part-of-speech sequences that play a role in detecting non-native speech.
Keywords
Affiliated Institutions
Related Publications
Discriminatively estimated joint acoustic, duration, and language model for speech recognition
We introduce a discriminative model for speech recognition that integrates acoustic, duration and language components. In the framework of finite state machines, a general model...
A review of large-vocabulary continuous-speech
Considerable progress has been made in speech-recognition technology over the last few years and nowhere has this progress been more evident than in the area of large-vocabulary...
Using Maximum Entropy for Text Classification
This paper proposes the use of maximum entropy techniques for text classification. Maximum entropy is a probability distribution estimation technique widely used for a variety o...
An exploration of large vocabulary tools for small vocabulary phonetic recognition
While research in large vocabulary continuous speech recognition (LVCSR) has sparked the development of many state of the art research ideas, research in this domain suffers fro...
IAM-OnDB - an on-line English sentence database acquired from handwritten text on a whiteboard
In this paper we present IAM-OnDB - a new large online handwritten sentences database. It is publicly available and consists of text acquired via an electronic interface from a ...
Publication Info
- Year
- 2001
- Type
- article
- Pages
- 1-8
- Citations
- 48
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.3115/1073336.1073367