Accessibility Tools

Skip to main content

Publication

Exploiting Foundation Models for Efficient Labeling of Deformable Linear Objects

Published in: Springer, Cham
Year: 2025
Authors: Alessio Caporali, Kevin Galassi, Matteo Pantano, Gianluca Palli
Project Member: UNIBO
Abstract

The integration of robotic solutions for manipulating Deformable Linear Objects (DLOs) faces significant challenges due to their complex perception. One way to solve this issue is to use deep learning algorithms. However, these require extensive training data. This paper introduces a method for efficiently labeling DLOs in images at the pixel level, starting from sparse annotations of key points. The method allows the generation of a real-world dataset of DLO images for segmentation purposes with minimal human effort. The approach comprises four main steps. First, a user operates a spatial sensor to record key points along the real-world DLOs in the scene. Second, a robot equipped with an eye-in-hand camera collects multiple images of the scene with an ellipsoidal trajectory. Third, a neural network, framed as a regression task, is employed to correct and align the input key points with the centerlines of the DLOs. Finally, a pre-trained foundation model, specifically the Segment Anything Model (SAM), transforms the sparse annotations into a comprehensive pixel-wise mask. The comparison with a baseline approach demonstrates that the proposed method can increase the intersection over the union score of circa 14% without requiring specific fine-tuning procedures. Therefore, the proposed method constitutes a groundstone for enabling low-effort DLO labeling and thus integrating deep learning perception of DLOs.