Abstract

Speech separation is the task of separating target speech from background interference. Traditionally, speech separation is studied as a signal processing problem. A more recent approach formulates speech separation as a supervised learning problem, where the discriminative patterns of speech, speakers, and background noise are learned from training data. Over the past decade, many supervised separation algorithms have been put forward. In particular, the recent introduction of deep learning to supervised speech separation has dramatically accelerated progress and boosted separation performance. This paper provides a comprehensive overview of the research on deep learning based supervised speech separation in the last several years. We first introduce the background of speech separation and the formulation of supervised separation. Then, we discuss three main components of supervised separation: learning machines, training targets, and acoustic features. Much of the overview is on separation algorithms where we review monaural methods, including speech enhancement (speech-nonspeech separation), speaker separation (multitalker separation), and speech dereverberation, as well as multimicrophone techniques. The important issue of generalization, unique to supervised learning, is discussed. This overview provides a historical perspective on how advances are made. In addition, we discuss a number of conceptual issues, including what constitutes the target source.

Keywords

Separation (statistics)Computer scienceSource separationSupervised learningArtificial intelligenceDeep learningSpeech recognitionDiscriminative modelGeneralizationSpeech processingMonauralMachine learningArtificial neural networkMathematics

Affiliated Institutions

Related Publications

Publication Info

Year
2018
Type
article
Volume
26
Issue
10
Pages
1702-1726
Citations
1453
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1453
OpenAlex
62
Influential
1116
CrossRef

Cite This

DeLiang Wang, Jitong Chen (2018). Supervised Speech Separation Based on Deep Learning: An Overview. IEEE/ACM Transactions on Audio Speech and Language Processing , 26 (10) , 1702-1726. https://doi.org/10.1109/taslp.2018.2842159

Identifiers

DOI
10.1109/taslp.2018.2842159
PMID
31223631
PMCID
PMC6586438
arXiv
1708.07524

Data Quality

Data completeness: 88%