Abstract
This paper addresses the problem of anomaly detection in multi-source heterogeneous data within the ETL (Extract-Transform-Load) process and proposes an intelligent detection framework that integrates temporal modeling and attention mechanisms. The method achieves effective dynamic aggregation of multidimensional features and temporal dependency modeling in ETL logs through the coordinated design of feature encoding, gated recurrent modeling, and multi-head attention allocation. At the feature level, the model uses a unified encoding structure to map raw logs, monitoring metrics, and task status information into a high-dimensional latent space, ensuring consistency of feature scales and completeness of information. At the temporal level, a GRU-based time modeling structure is introduced to capture long-term dependencies, enhancing the model's ability to perceive the evolution of anomaly patterns. At the attention level, a multi-head mechanism is applied to weight different time segments and feature dimensions, enabling adaptive focus on key moments and important features. Finally, the model combines anomaly scoring with distribution consistency constraints to achieve accurate identification and discrimination of potential anomalies. Experimental results show that the proposed framework significantly outperforms traditional rule-based detection, statistical methods, and basic deep models across various ETL task scenarios, demonstrating higher detection accuracy, stability, and generalization capability. The findings verify the effectiveness of integrating temporal modeling and attention mechanisms for anomaly detection in complex data streams and provide a feasible solution for building reliable and scalable intelligent ETL monitoring systems.
Related Publications
Cascaded Partial Decoder for Fast and Accurate Salient Object Detection
Existing state-of-the-art salient object detection networks rely on aggregating multi-level features of pre-trained convolutional neural networks (CNNs). However, compared to hi...
Efficient Multi-Scale Attention Module with Cross-Spatial Learning
Remarkable effectiveness of the channel or spatial attention mechanisms for producing more discernible feature representation are illustrated in various computer vision tasks. H...
Instance Segmentation for Autonomous Log Grasping in Forestry Operations
Wood logs picking is a challenging task to automate. Indeed, logs usually come in cluttered configurations, randomly orientated and overlapping. Recent work on log picking autom...
Pedestrian detection aided by deep learning semantic tasks
Deep learning methods have achieved great successes in pedestrian detection, owing to its ability to learn discriminative features from raw pixels. However, they treat pedestria...
Recurrent Neural Networks for Multivariate Time Series with Missing Values
Multivariate time series data in practical applications, such as health care, geoscience, and biology, are characterized by a variety of missing values. In time series predictio...
Publication Info
- Year
- 2025
- Type
- article
- Citations
- 0
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.20944/preprints202512.0884.v1