STIT: Spatio-Temporal Interaction Transformers for Human-Object Interaction Recognition in Videos

Muna Almushyti; Frederick W. B. Li; IEEE

doi:10.1109/ICPR56361.2022.9956030

Back

Conference proceeding

STIT: Spatio-Temporal Interaction Transformers for Human-Object Interaction Recognition in Videos

Muna Almushyti, Frederick W. B. Li and IEEE

2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), Vol.2022-, pp.3287-3294

International Conference on Pattern Recognition

01/01/2022

DOI: https://doi.org/10.1109/ICPR56361.2022.9956030

Abstract

Computer Science

Computer Science, Artificial Intelligence

Engineering

Engineering, Electrical & Electronic

Imaging Science & Photographic Technology

Science & Technology

Technology

Recognizing human-object interactions is challenging due to their spatio-temporal changes. We propose the Spatio-Temporal Interaction Transformer-based (STIT) network to reason such changes. Specifically, spatial transformers learn humans and objects context at specific frame time. Temporal transformer then learns the relations at a higher level between spatial context representations at different time steps, capturing long-term dependencies across frames. We further investigate multiple hierarchy designs in learning human interactions. We achieved superior performance on Charades, Something-Something v1 and CAD-120 datasets, comparing to baseline models without learning human-object relations, or with prior graph-based networks. We also achieved state-of-the-art accuracy of 95.93% on CAD-120 dataset [1] by employing RGB data only.

Metrics

1 Record Views

Details

Title: STIT: Spatio-Temporal Interaction Transformers for Human-Object Interaction Recognition in Videos
Creators - without role: Muna Almushyti - Durham University
Frederick W. B. Li - Durham University
IEEE
Publication Details: 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), Vol.2022-, pp.3287-3294
Series: International Conference on Pattern Recognition
Publisher: IEEE
Number of pages: 8
Grant note: University of Manchester N8 research partnership EP/T022167/1 / EPSRC; UK Research & Innovation (UKRI); Engineering & Physical Sciences Research Council (EPSRC) University of Durham University of York
Identifiers: 9928731508331
Academic Unit: Qassim University
Language: English
Resource Type: Conference proceeding