Perception & Understanding

ICRA 2026

OVTAS: Zero-Shot Action Segmentation

Exploring Vision-Language Models for Open-Vocabulary Zero-Shot Action Segmentation. We demonstrate how VLMs can decompose complex long-form videos into meaningful steps without prior training examples.

PDF ↗ Project Page →

TVCG 2025

Causality in Mixed Reality

Visualizing Causality in Mixed Reality for Manual Task Learning. A framework for identifying and visualizing the "why" behind physical actions, helping users understand task dependencies intuitively.

PDF ↗

RA-L 2023 / ICRA 2024

Interacting Objects Dataset

A novel dataset and taxonomy for Object-Object Interactions (OOI). Unlike typical Hunch-Object datasets, we focus on the physical dynamics between objects themselves to enable richer scene understanding.

PDF ↗ Dataset →

Under Review

NA-VQA: Narrative Aligned QA

Narrative Aligned Long Form Video Question Answering. A method for grounding QA in the narrative structure of movies and long videos, improving reasoning over long temporal horizons.

PDF ↗

JCISE 2024

AnnotateXR: Data for Vision

An extended reality workflow for automating data annotation. We allow researchers to collect and auto-label high-fidelity training data for computer vision by simply performing the task in XR.

PDF ↗