Publication

Exploiting Foundation Models for Efficient Labeling of Deformable Linear Objects

Published in: Springer, Cham

Year: 2025

Authors: Alessio Caporali, Kevin Galassi, Matteo Pantano, Gianluca Palli

Project Member: UNIBO

Abstract

The integration of robotic solutions for manipulating Deformable Linear Objects (DLOs) faces significant challenges due to their complex perception. One way to solve this issue is to use deep learning algorithms. However, these require extensive training data. This paper introduces a method for efficiently labeling DLOs in images at the pixel level, starting from sparse annotations of key points. The method allows the generation of a real-world dataset of DLO images for segmentation purposes with minimal human effort. The approach comprises four main steps. First, a user operates a spatial sensor to record key points along the real-world DLOs in the scene. Second, a robot equipped with an eye-in-hand camera collects multiple images of the scene with an ellipsoidal trajectory. Third, a neural network, framed as a regression task, is employed to correct and align the input key points with the centerlines of the DLOs. Finally, a pre-trained foundation model, specifically the Segment Anything Model (SAM), transforms the sparse annotations into a comprehensive pixel-wise mask. The comparison with a baseline approach demonstrates that the proposed method can increase the intersection over the union score of circa 14% without requiring specific fine-tuning procedures. Therefore, the proposed method constitutes a groundstone for enabling low-effort DLO labeling and thus integrating deep learning perception of DLOs.

Access the full paper here