Abstract

Today's speech recognition technology is mature enough to be useful for many practical applications. In this context, it is of paramount importance to train accurate acoustic models for many languages within given resource constraints such as data, processing power, and time. Multilingual training has the potential to solve the data issue and close the performance gap between resource-rich and resource-scarce languages. Neural networks lend themselves naturally to parameter sharing across languages, and distributed implementations have made it feasible to train large networks. In this paper, we present experimental results for cross- and multi-lingual network training of eleven Romance languages on 10k hours of data in total. The average relative gains over the monolingual baselines are 4%/2% (data-scarce/data-rich languages) for cross- and 7%/2% for multi-lingual training. However, the additional gain from jointly training the languages on all data comes at an increased training time of roughly four weeks, compared to two weeks (monolingual) and one week (crosslingual).

Keywords

Computer scienceContext (archaeology)Artificial neural networkTraining setImplementationResource (disambiguation)Artificial intelligenceTraining (meteorology)Deep neural networksNatural language processingMachine learningProgramming languageComputer network

Affiliated Institutions

Related Publications

Publication Info

Year
2013
Type
article
Pages
8619-8623
Citations
287
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

287
OpenAlex

Cite This

Georg Heigold, Vincent Vanhoucke, Andrew Senior et al. (2013). Multilingual acoustic models using distributed deep neural networks. , 8619-8623. https://doi.org/10.1109/icassp.2013.6639348

Identifiers

DOI
10.1109/icassp.2013.6639348