Mamshad Nayeem Rizve

I am a Postdoctoral Scientist at Amazon Search Science & AI, working at the intersection of video understanding and large language models. I recently completed my Ph.D. journey at the Center for Research in Computer Vision, UCF, where I was advised by Prof. Mubarak Shah. I have a broad interest in various topics in computer vision and machine learning. My Ph.D. research primarily focused on learning with limited labels, including semi-supervised learning, few-shot learning, and self-supervised learning. I have also worked on activity detection, temporal action localization, learning with noisy labels, and multi-modal learning.

Email  /  CV  /  Google Scholar  /  Twitter  /  LinkedIn  /  GitHub

profile photo
Updates

  February 2024: Paper accepted to CVPR 2024
  January 2024: DoRA got accepted to ICLR 2024 as an Oral
  August 2023: Joined Amazon Search Science & AI as a Postdoctoral Scientist
  July 2023: Three papers accepted to ICCV 2023
  July 2023: Successfully defended my PhD dissertation
  February 2023: Two papers accepted to CVPR 2023
  October 2022: Patent granted for real-time spatio-temporal activity detection from untrimmed videos
  July 2022: Two papers accepted to ECCV 2022; TRSSL accepted as an Oral
  May 2022: Started summer internship at Microsoft
  March 2022: Paper accepted to CVPR 2022
  May 2021: Started summer internship at Aurora
  March 2021: Paper accepted to CVPR 2021
  January 2021: Paper accepted to ICLR 2021

Publications
pmml-iccv2023 Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video
Shashanka Venkataramanan, Mamshad Nayeem Rizve, João Carreira, Yuki M. Asano*, Yannis Avrithis*
ICLR, 2024 (Oral presentation; in top 1.2%)
arxiv / openreview / project page / dataset / code / bibtex

Using just “1 video” from our new egocentric dataset - Walking Tours, we develop a new method that outperforms DINO pretrained on ImageNet on image and video downstream tasks.

pmml-iccv2023 Preserving Modality Structure Improves Multi-Modal Learning
Sirnam Swetha, Mamshad Nayeem Rizve, Nina Shvetsova, Hilde Kuehne, Mubarak Shah
ICCV, 2023
arxiv / bibtex

Multi-modal self-supervised methods struggle with out-of-domain data due to ignoring the inherent modality-specific semantic structure. To address this issue, we introduce a Semantic-Structure-Preserving Consistency approach that preserves modality-specific relationships in the joint embedding space by utilizing a Multi-Assignment Sinkhorn-Knopp algorithm. This approach achieves state-of-the-art performance across various datasets and generalizes to both in-domain and out-of-domain datasets.

cdfslv-iccv2023 CDFSL-V: Cross-Domain Few-Shot Learning for Videos
Sarinda Samarasinghe, Mamshad Nayeem Rizve, Navid Kardan, Mubarak Shah
ICCV, 2023
arxiv / bibtex / code

We introduce CDFSL-V, a challenging cross-domain few-shot learning problem in the video domain. We present a carefully designed solution that uses self-supervised learning and curriculum learning to balance information from the source and target domains. Our approach employs a masked autoencoder-based self-supervised training objective to learn from both domains and transition from class discriminative features in the source data to target-domain-specific features in a progressive curriculum.

ssda-iccv2023 SSDA: Secure Source-Free Domain Adaptation
Sabbir Ahmed*, Abdullah Al Arafat*, Mamshad Nayeem Rizve*, Rahim Hossain, Zhishan Guo, Adnan Siraj Rakin
ICCV, 2023
paper/ bibtex/ code

We analyze the security challenges in Source-Free Domain Adaptation (SFDA), where the target domain owner lacks access to the source dataset and is unaware of the source model's training process, making it susceptible to Backdoor/Trojan attacks. We propose secure source-free domain adaptation (SSDA), which uses model compression and knowledge transfer with a spectral-norm-based penalty to defend the target model from backdoor attacks. Our evaluations demonstrate that SSDA effectively defends against such attacks with minimal impact on test accuracy.

pivotal-cvpr2023 PivoTAL: Prior-Driven Supervision for Weakly-Supervised Temporal Action Localization
Mamshad Nayeem Rizve*, Gaurav Mittal*, Ye Yu, Matthew Hall, Sandra Sajeev, Mubarak Shah, Mei Chen
CVPR, 2023
paper/ bibtex / video

PivoTAL approaches Weakly-Supervised Temporal Action Localization from localization-by-localization perspective by learning to localize the action snippets directly. To this end, PivoTAL introduces a novel algorithm that exploits the inherent spatiotemporal structure of the video data in the form of action-specific scene prior, action snippet generation prior, and learnable Gaussian prior to generate pseudo-action snippets. These pseudo-action snippets provide additional supervision, complementing the weak video-level annotations during training.

timebalance-cvpr2023 TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition
Ishan Dave, Mamshad Nayeem Rizve, Chen Chen, Mubarak Shah
CVPR, 2023
arxiv/ bibtex / code

We propose a student-teacher semi-supervised learning framework, where we distill knowledge from a temporally-invariant and temporally-distinctive teacher. Depending on the nature of the unlabeled video, we dynamically combine the knowledge of these two teachers based on a novel temporal similarity-based reweighting scheme.

realistic-eccv2022 Towards Realistic Semi-Supervised Learning
Mamshad Nayeem Rizve, Navid Kardan, Mubarak Shah
ECCV, 2022 (Oral presentation; in top 2.7%)
arxiv / bibtex / code

We propose a new method for open-world semi-supervised learning that utilizes sample uncertainty and incorporates prior knowledge about class distribution to generate reliable class-distribution-aware pseudo-labels for samples belonging to both known and unknown classes.

openldn-eccv2022 OpenLDN: Learning to Discover Novel Classes for Open-World Semi-Supervised Learning
Mamshad Nayeem Rizve, Navid Kardan, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah
ECCV, 2022
arxiv / bibtex / code

OpenLDN utilizes a pairwise similarity loss with bi-level optimization to discover novel classes and transforms the open-world SSL problem into a standard SSL problem, outperforming current state-of-the-art methods with better accuracy/training time trade-off.

invariance-equivariance-cvpr2021 UNICON: Combating Label Noise Through Uniform Selection and Contrastive Learning
Nazmul Karim, Mamshad Nayeem Rizve, Nazanin Rahnavard, Ajmal Mian, Mubarak Shah
CVPR, 2022
arxiv / bibtex / code

UNICON is a robust sample selection approach for training with high label noise. It incorporates a Jensen-Shannon divergence based uniform sample selection mechanism and contrastive learning.

invariance-equivariance-cvpr2021 TCLR: Temporal contrastive learning for video representation
Ishan Dave, Rohit Gupta, Mamshad Nayeem Rizve, Mubarak Shah
CVIU, 2022
arxiv / elsevier / bibtex / code / video

We propose a new temporal contrastive learning framework for self-supervised video representation learning, consisting of two novel losses that aim to increase the temporal diversity of learned features. The framework achieves state-of-the-art results on various downstream video understanding tasks, including significant improvement in fine-grained action classification for visually similar classes.

invariance-equivariance-cvpr2021 Exploring Complementary Strengths of Invariant and Equivariant Representations for Few-Shot Learning
Mamshad Nayeem Rizve, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah
CVPR, 2021
arxiv / bibtex / code / video

We propose a novel training mechanism for few-shot learning that simultaneously enforces equivariance and invariance to geometric transformations, allowing the model to learn features that generalize well to novel classes with few samples.

ups-iclr2021 In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label Selection Framework for Semi-Supervised Learning
Mamshad Nayeem Rizve, Kevin Duarte, Yogesh S Rawat, Mubarak Shah
ICLR, 2021
arxiv / openreview / bibtex / code / video

We propose an uncertainty-aware pseudo-label selection (UPS) framework that reduces pseudo-label noise encountered during training, and allows for the creation of negative pseudo-labels for multi-label classification and negative learning.

gabriella-icpr2020 Gabriella: An Online System for Real-Time Activity Detection in Untrimmed Security Videos
Mamshad Nayeem Rizve, Ugur Demir, Praveen Tirupattur, Aayush Jung Rana, Kevin Duarte, Ishan Dave, Yogesh S Rawat, Mubarak Shah
ICPR, 2020 (Best paper award)
arxiv / project page / bibtex / slides / video

Gabbriella consists of three stages: tubelet extraction, activity classification, and online tubelet merging. Gabriella utilizes a localization network for tubelet extraction, with a novel Patch-Dice loss to handle variations in actor size, and a Tubelet-Merge Action-Split (TMAS) algorithm to detect activities efficiently and robustly.

Patents
gabriella-patent Methods of Real-Time Spatio-Temporal Activity Detection and Categorization from Untrimmed Video Segments
Yogesh S Rawat, Mubarak Shah, Aayush Jung Rana, Praveen Tirupattur, Mamshad Nayeem Rizve
US Patent 11468676
Details