Abstract
Small language models (SLMs) have shown promises for relation extraction (RE) when extracting RDF triples guided by SHACL shapes focused on common datatype properties. This paper investigates how SLMs handle both datatype and object properties for a complete RDF graph extraction. We show that the key bottleneck is related to long-tail distribution of rare properties. To solve this issue, we evaluate several strategies: stratified sampling, weighted loss, dataset scaling, and template-based synthetic data augmentation. We show that the best strategy to perform equally well over unbalanced target properties is to build a training set where the number of occurrences of each property exceeds a given threshold. To enable reproducibility, we publicly released our datasets, experimental results and code. Our findings offer practical guidance for training shape-aware SLMs and highlight promising directions for future work in semantic RE.
Affiliated Institutions
Related Publications
Random Erasing Data Augmentation
In this paper, we introduce Random Erasing, a new data augmentation method for training the convolutional neural network (CNN). In training, Random Erasing randomly selects a re...
CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features
Regional dropout strategies have been proposed to enhance performance of convolutional neural network classifiers. They have proved to be effective for guiding the model to atte...
Unsupervised Domain Adaptation by Domain Invariant Projection
Domain-invariant representations are key to addressing the domain shift problem where the training and test examples follow different distributions. Existing techniques that hav...
SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition
We consider visual category recognition in the framework of measuring similarities, or equivalently perceptual distances, to prototype examples of categories. This approach is q...
Efficient Multi-Scale Attention Module with Cross-Spatial Learning
Remarkable effectiveness of the channel or spatial attention mechanisms for producing more discernible feature representation are illustrated in various computer vision tasks. H...
Publication Info
- Year
- 2025
- Type
- preprint
- Pages
- 9-17
- Citations
- 0
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1145/3731443.3771342