Abstract
Abstract In recent years, autoencoders, a family of deep learning-based methods for representation learning, are advancing data-driven research owing to their variability and nonlinear power for multimodal data integration. Despite their success, current implementations lack standardization, versatility, comparability and generalizability. Here we present AUTOENCODIX, an open-source framework, designed as a standardized and flexible pipeline for preprocessing, training and evaluation of autoencoder architectures. These architectures, such as ontology-based and cross-modal autoencoders, provide key advantages over traditional methods by offering explainability of embeddings or the ability to translate across data modalities. We apply the method to datasets from pan-cancer studies (The Cancer Genome Atlas) and single-cell sequencing as well as in combination with imaging. Our studies provide important user-centric insights and recommendations to navigate through architectures, hyperparameters and important tradeoffs in representation learning. These include the reconstruction capability of input data, the quality of embedding for downstream machine learning models and the reliability of ontology-based embeddings for explainability.
Affiliated Institutions
Related Publications
Extracting and composing robust features with denoising autoencoders
Previous work has shown that the difficulties in learning deep generative or discriminative models can be overcome by an initial unsupervised learning step that maps inputs to u...
Domain Generalization with Adversarial Feature Learning
In this paper, we tackle the problem of domain generalization: how to learn a generalized feature representation for an "unseen" target domain by taking the advantage of multipl...
SimCSE: Simple Contrastive Learning of Sentence Embeddings
This paper presents SimCSE, a simple contrastive learning framework that greatly advances the state-of-the-art sentence embeddings. We first describe an unsupervised approach, w...
Massive Exploration of Neural Machine Translation Architectures
Neural Machine Translation (NMT) has shown remarkable progress over the past few years, with production systems now being deployed to end-users. As the field is moving rapidly, ...
Representation Learning: A Review and New Perspectives
The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more...
Publication Info
- Year
- 2025
- Type
- article
- Citations
- 1
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1038/s43588-025-00916-4