Most text-to-video retrieval systems function as black boxes, offering limited insight into their decision-making processes. X-CoT (Explainable Chain-of-Thought) addresses this limitation by leveraging large language models to generate human-like explanations and improved ranking through chain-of-thought reasoning. The framework not only enhances retrieval accuracy but also enables a deeper understanding of model behavior and dataset quality.
Generative Adversarial Networks (GANs) and related generative models are designed to synthesize new, realistic data samples resembling the original training distribution. My work in this area focuses on improving efficiency, robustness, and interpretability of generative models through innovations such as custom activation functions (e.g., PMish), model compression via adaptive rank decomposition, and enhanced loss formulations to stabilize training.
Data augmentation techniques expand and diversify training datasets by generating transformed samples. My research explores foreground–background and patch-based augmentation strategies that enhance generalization in recognition tasks, particularly for source-free domain adaptation (SFDA) and person re-identification. These methods reduce reliance on large labeled datasets while maintaining model performance.
Domain adaptation seeks to transfer knowledge from a labeled source domain to an unlabeled target domain with a different data distribution. My work advances this field through unspervised and source-free domain adaptation (SFDA) frameworks that adapt pre-trained models without access to source data. Recent efforts include Shuffle PatchMix (SPM) and Dual-Region Augmentation (DRA), which improve feature alignment and robustness under distributional shifts.
Image denoising aims to restore clean images from noisy observations by leveraging spatial and contextual priors. My research focuses on developing efficient, learning-based denoising frameworks that generalize across diverse noise patterns and imaging conditions.