In this project, we aim to propose a novel task for semantic video retrieval. Existing methods and benchmarks for video retrieval have mostly focused on short clips with superficial visual cues, such as same actions or objects. However, long and untrimmed videos encountered in real-world scenarios contain multiple semantic elements with complex structures; evaluating similarities between them with existing methods can be sub-optimal. Following from this claim, we propose a systematic and scalable approach to measuring similarties between long videos by leveraging dense captions, assuming that each caption well describes partial semantic segments.
Compositional Video Learning
Long-term video is composed of multiple complex semantics.
We aim to solve challenging problems like the 'cups and balls trick game'
by considering semantic units that make up semantics and disentangling
the composed relationships among semantic units.
Temporal Information embedded Video Scene Graph Generation
Video scene graph generation has been an emerging research topic, which aims to interpret a video as a temporally-evolving graph structure by representing video objects as nodes and their relationships as edges. Compared to the task of scene graph generation from images, it is more challenging because of the dynamic relationships between objects and the temporal dependencies between frames. We are working on embedding these temporal information into the model by extracting it effectively from semantic features in a frame.
Video Question Answering is a task that receives video and question as inputs and outputs an answer to the question. In order to solve this problem efficiently, we are conducting research on how to clearly grasp the spatio-temporal semantic structure of video and extract the core semantic structure.
The personlalized multimodal dialog research using personalized memory.
Protein Structure Prediction
Research to enhance the performance of prediction of protein structure which is pivotal in determining their functions.
The study involves investigating multiple sequence alignments(MSA) and prediction models for protein structure prediction.
Carbon Absorption Prediction
Development of a deep learning algorithm for predicting global carbon absorption using multimodal data.