Open Access System for Information Sharing

Login Library

 

Thesis
Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Scene Understanding with Contextual Reasoning via Message Passing Pohang University of Science and Technology

Title
Scene Understanding with Contextual Reasoning via Message Passing Pohang University of Science and Technology
Authors
정든솔
Date Issued
2024
Publisher
포항공과대학교
Abstract
Understanding holistic scenes is crucial for addressing real-world problems. Suc- cessful holistic scene understanding requires models to infer relationships or inter- actions between objects within an image. This dissertation focuses on scene graph generation and human-object interaction detection, which are pivotal for deciphering object relationships. Given the complexity of these relationships and their depen- dency on other objects or relationships, contextual reasoning between objects and relationships is essential. In this work, we introduce novel contextual reasoning methods that capture high- level context information as messages and integrate this information into object and relationship features. The essence of this dissertation is the development of these methods for extracting high-level context. Unlike traditional graph neural networks that propagate context information linearly from nodes to edges or vice versa, our methods account for all potential relevancies among components within each task. For scene graph generation, we propose a unique method that employs four types of attention: node-to-node, node-to-edge, edge-to-node, and edge-to-edge. To address the sparsity of ground-truth scene graphs, we also develop a module that selectively eliminates invalid edges, thereby enhancing contextual reasoning through valid con- nections. This approach demonstrates the effectiveness of eliminating invalid edges in scene graph generation. Similarly, for human-object interaction detection, we intro- duce a method that combines unary, pairwise, and ternary relation contexts across human, object, and interaction branches, capturing multiplex context information. However, these methods initially depend on human-annotated labels to iden- tify invalid edges or use human-designed functional forms like unary, pairwise, and ternary relations, which may detract from the model ability to autonomously extract necessary context information. To mitigate this limitation, we propose replacing the conventional multi-head attention module used for scene graphs with a sparse at- tention module that adaptively generates masks specific to each image. This module is trained end-to-end solely with the scene graph generation loss, thereby producing more relevant and effective contextual reasoning masks. This dissertation presents various contextual reasoning approaches for scene graph generation and human-object interaction detection, enhancing the under- standing of holistic scenes. Each proposed method achieves state-of-the-art results on standard benchmarks, with extensive ablation studies validating the effectiveness of each component. The contributions of this research offer profound insights into addressing realistic challenges in computer vision, emphasizing the critical impor- tance of contextual reasoning in predicting relationships and paving the way for future research in this field.
URI
http://postech.dcollection.net/common/orgView/200000807181
https://oasis.postech.ac.kr/handle/2014.oak/124076
Article Type
Thesis
Files in This Item:
There are no files associated with this item.

qr_code

  • mendeley

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Views & Downloads

Browse