Abstract

Ensuring navigational safety is one of the most critical challenges in autonomous maritime navigation research, requiring accurate real-time assessment of collision risks and prompt navigational decisions based on such assessments. Traditional rule-based systems utilizing radar and Automatic Identification Systems (AIS) exhibit fundamental limitations in simultaneously analyzing discrete objects such as vessels and buoys alongside continuous environmental boundaries like coastlines and bridges. To address these limitations, recent research has incorporated artificial intelligence approaches, though most recent studies have primarily focused on object detection methods. This study proposes a structured tag-based multimodal navigation safety framework that performs inference on maritime scenes by integrating YOLO-based object detection with the LLaVA vision–language model, generating outputs that include risk level assessment, navigation action recommendations, reasoning explanations, and object information. The proposed method achieved 86.1% accuracy in risk level assessment and 76.3% accuracy in navigation action recommendations. Through a hierarchical early stopping system using delimiter-based tags, the system reduced output token generation by 95.36% for essential inference results and 43.98% for detailed inference results compared to natural language outputs.

Affiliated Institutions

Related Publications

Publication Info

Year
2025
Type
article
Volume
13
Issue
12
Pages
2339-2339
Citations
0
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

0
OpenAlex

Cite This

Dong-Hyun Kim, Ju-Yeon Yoo (2025). Structured Prompt-Based Vision–Language Reasoning for Risk Assessment and Navigation Decisions in Maritime Navigation. Journal of Marine Science and Engineering , 13 (12) , 2339-2339. https://doi.org/10.3390/jmse13122339

Identifiers

DOI
10.3390/jmse13122339