Abstract
Driver distraction remains one of the leading causes of traffic accidents. Although deep learning approaches such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Transformers have been extensively applied for distracted driving detection, their performance is often hindered by limited real-time efficiency and high false detection rates. To address these challenges, this paper proposes an efficient dual-stream neural architecture, termed DualStream-AttnXGS, which jointly leverages visual and pose information to improve distraction recognition accuracy. In the RGB stream, an enhanced EfficientNetB0 backbone is employed, where Ghost Convolution and Coordinate Attention modules are integrated to strengthen feature representation while maintaining lightweight computation. A compound loss function combining Center Loss and Focal Loss is further introduced to promote inter-class separability and stabilize training. In parallel, the keypoint stream extracts human skeletal features using YOLOv8-Pose, which are subsequently classified through a compact ensemble model based on XGBoost v2.1.4 and Gradient Boosting. Finally, a Softmax-based probabilistic fusion strategy integrates the outputs of both streams for the final prediction. The proposed model achieved 99.59% accuracy on the SFD3 dataset while attaining 99.12% accuracy on the AUCD2 dataset, demonstrating that the proposed dual-stream architecture provides a more effective solution than single-stream models by leveraging complementary visual and pose information.
Affiliated Institutions
Related Publications
Cascaded Pyramid Network for Multi-person Pose Estimation
The topic of multi-person pose estimation has been largely improved recently, especially with the development of convolutional neural network. However, there still exist a lot o...
GPU-acceleration for Large-scale Tree Boosting
In this paper, we present a novel massively parallel algorithm for accelerating the decision tree building procedure on GPUs (Graphics Processing Units), which is a crucial step...
GhostNet: More Features From Cheap Operations
Deploying convolutional neural networks (CNNs) on embedded devices is difficult due to the limited memory and computation resources. The redundancy in feature maps is an importa...
Efficient object localization using Convolutional Networks
Recent state-of-the-art performance on human-body pose estimation has been achieved with Deep Convolutional Networks (ConvNets). Traditional ConvNet architectures include poolin...
Dynamic Convolution: Attention Over Convolution Kernels
Light-weight convolutional neural networks (CNNs) suffer performance degradation as their low computational budgets constrain both the depth (number of convolution layers) and t...
Publication Info
- Year
- 2025
- Type
- article
- Volume
- 15
- Issue
- 24
- Pages
- 12974-12974
- Citations
- 0
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.3390/app152412974