Designing Hand-Object Interaction Representations For Better Grasp Priors
Citation:
Morales, Theo Martin, Designing Hand-Object Interaction Representations For Better Grasp Priors, Trinity College Dublin, School of Computer Science & Statistics, Computer Science, 2025Download Item:
Abstract:
Accurate modelling of hand-object interactions (HOI) is critical for applications in areas such as computer vision, augmented/mixed/virtual reality, video games production and robotics. In computer vision, HOI modelling encompasses tasks like hand-object pose estimation, grasp synthesis, object manipulation and mesh reconstruction. These challenges share many characteristics but early HOI research focused on end-to-end methods tailored to specific tasks. These approaches suffered from limited transferability and generalization due to reliance on spurious correlations in the data. Recent research favours explicit representations that capture key features of hand grasps and object affordances, offering better generalization and transferability. Through the use of local coordinate frames, dense hand-object correspondences and contact mapping, explicit representations have shown promise in improving the accuracy of HOI tasks. However, designing representations that are both expressive and lightweight remains challenging, balancing the need for real-time applicability with model generalizability. This thesis investigates the following research questions: (1) Can we design new shape, pose and contact representations in order to overcome the limitations of existing representations while maintaining their expressiveness? and (2) Can we design new models to demonstrate the expressiveness and versatility of these representations for various applications? These questions are explored within the scope of static and dynamic grasp denoising and synthesis, from the lens of deep learning for 3D vision. The focus of this research is on the expressiveness, versatility and efficiency of the proposed solutions when combined with state-of-the-art (SOTA) learning methods and test-time adaptation, as motivated by effective recent trends in the field. This thesis answers the research questions via three main contributions. Firstly, we investigate whether generalizable internal representations can be learned via meta-learning in a new paradigm for hand-object pose prediction. Secondly, we design a novel lightweight and fully-differentiable hand-object interaction field that captures shape and pose in an object-centric frame. We show how this representation can be used to denoise grasps with 8% higher accuracy than the state-of-the-art in static grasps. Thirdly, we extend the representation to model hand surface contacts with a parametric and continuous probabilistic representation of contacts. We demonstrate how this improved representation can be leveraged for synthesis and denoising of static grasps, outperforming specialized state-of-the-art methods in contact, pose and plausibility metrics, while achieving both tasks. This combined representation improves the SOTA, as it is continuous, fully-differentiable and approximately 70� faster to compute.
Author's Homepage:
https://tcdlocalportal.tcd.ie/pls/EnterApex/f?p=800:71:0::::P71_USERNAME:MORALESTDescription:
APPROVED
Author: Morales, Theo Martin
Advisor:
O'Sullivan, CarolPublisher:
Trinity College Dublin. School of Computer Science & Statistics. Discipline of Computer ScienceType of material:
ThesisCollections
Availability:
Full text availableMetadata
Show full item recordThe following license files are associated with this item: