Abstract
Construction sites are complex environments where traditional safety monitoring methods often suffer from low detection accuracy and limited interpretability. To address these challenges, this study proposes a modular multimodal agent framework that integrates computer vision, knowledge representation, and large language model (LLM)–based reasoning. First, the CLIP model fine-tuned with Low-Rank Adaptation (LoRA) is combined with YOLOv10 to achieve precise recognition of construction activities and personal protective equipment (PPE). Second, a construction safety knowledge graph integrating Retrieval-Augmented Generation (RAG) is constructed to provide structured domain knowledge and enhance contextual understanding. Third, the FusedChain prompting strategy is designed to guide large language models (LLMs) to perform step-by-step safety risk reasoning. Experimental results show that the proposed approach achieves 97.35% accuracy in activity recognition, an average F1-score of 0.84 in PPE detection, and significantly higher performance than existing methods in hazard reasoning. The modular design also facilitates scalable integration with more advanced foundation models, indicating strong potential for real-world deployment in intelligent construction safety management.
Affiliated Institutions
Related Publications
Large language models encode clinical knowledge
Abstract Large language models (LLMs) have demonstrated impressive capabilities, but the bar for clinical applications is high. Attempts to assess the clinical knowledge of mode...
Instance-Aware Semantic Segmentation via Multi-task Network Cascades
Semantic segmentation research has recently witnessed rapid progress, but many leading methods are unable to identify object instances. In this paper, we present Multitask Netwo...
Publication Info
- Year
- 2025
- Type
- article
- Volume
- 15
- Issue
- 24
- Pages
- 4439-4439
- Citations
- 0
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.3390/buildings15244439