AUTOENCODIX: a generalized and versatile framework to train and evaluate autoencoders for biological representation learning and beyond

Maximilian Josef Joas; Neringa Jurenaite; Dušan Praščević; Nico Scherf; Jan Ewald

doi:10.1038/s43588-025-00916-4

Abstract

Abstract In recent years, autoencoders, a family of deep learning-based methods for representation learning, are advancing data-driven research owing to their variability and nonlinear power for multimodal data integration. Despite their success, current implementations lack standardization, versatility, comparability and generalizability. Here we present AUTOENCODIX, an open-source framework, designed as a standardized and flexible pipeline for preprocessing, training and evaluation of autoencoder architectures. These architectures, such as ontology-based and cross-modal autoencoders, provide key advantages over traditional methods by offering explainability of embeddings or the ability to translate across data modalities. We apply the method to datasets from pan-cancer studies (The Cancer Genome Atlas) and single-cell sequencing as well as in combination with imaging. Our studies provide important user-centric insights and recommendations to navigate through architectures, hyperparameters and important tradeoffs in representation learning. These include the reconstruction capability of input data, the quality of embedding for downstream machine learning models and the reliability of ontology-based embeddings for explainability.

Affiliated Institutions

Related Publications

Extracting and composing robust features with denoising autoencoders

Pascal Vincent , Hugo Larochelle , Yoshua Bengio +1 more

Previous work has shown that the difficulties in learning deep generative or discriminative models can be overcome by an initial unsupervised learning step that maps inputs to u...

2008 7123 citations

Domain Generalization with Adversarial Feature Learning

Haoliang Li , Sinno Jialin Pan , Shiqi Wang +1 more

In this paper, we tackle the problem of domain generalization: how to learn a generalized feature representation for an "unseen" target domain by taking the advantage of multipl...

2018 1167 citations

SimCSE: Simple Contrastive Learning of Sentence Embeddings

Tianyu Gao , Xingcheng Yao , Danqi Chen

This paper presents SimCSE, a simple contrastive learning framework that greatly advances the state-of-the-art sentence embeddings. We first describe an unsupervised approach, w...

2021 Proceedings of the 2021 Conference on... 2286 citations

Massive Exploration of Neural Machine Translation Architectures

Denny Britz , Anna Goldie , Minh-Thang Luong +1 more

Neural Machine Translation (NMT) has shown remarkable progress over the past few years, with production systems now being deployed to end-users. As the field is moving rapidly, ...

2017 458 citations

Representation Learning: A Review and New Perspectives

Yoshua Bengio , Aaron Courville , P. M. Durai Raj Vincent

The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more...

2013 IEEE Transactions on Pattern Analysis... 12373 citations

Publication Info

Year: 2025
Type: article
Citations: 1
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

AUTOENCODIX: a generalized and versatile framework to train and evaluate autoencoders for biological representation learning and beyond

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

OpenAlex

Cite This

APA Style

                            
                                    Maximilian Josef Joas, 
                                
                                    Neringa Jurenaite, 
                                
                                    Dušan Praščević
                                
                                et al.
                            
                            (2025). 
                            AUTOENCODIX: a generalized and versatile framework to train and evaluate autoencoders for biological representation learning and beyond. 
                            Nature Computational Science
                            
                            .
                            https://doi.org/10.1038/s43588-025-00916-4

Identifiers

DOI: 10.1038/s43588-025-00916-4