publications

2025

NeurIPS

Improved Training Technique for Shortcut Models

Anh Nguyen*, Viet Nguyen*, Duc Vu, Trung Dao, Chi Tran, Toan Tran, and Anh Tran

In The Thirty-nine Annual Conference on Neural Information Processing Systems, 2025

Abstract

This paper introduces improved training techniques for shortcut models to enhance their performance and efficiency.
ICCV

Supercharged One-step Text-to-Image Diffusion Models with Negative Prompts

Viet Nguyen*, Anh Nguyen*, Trung Dao, Khoi Nguyen, Cuong Pham, Toan Tran, and Anh Tran

In International Conference on Computer Vision, 2025

Abstract HTML PDF

The escalating demand for real-time image synthesis has driven significant advancements in one-step diffusion models, which inherently offer expedited generation speeds compared to traditional multi-step methods. However, this enhanced efficiency is frequently accompanied by a compromise in the controllability of image attributes. While negative prompting, typically implemented via classifier-free guidance (CFG), has proven effective for fine-grained control in multi-step models, its application to one-step generators remains largely unaddressed. Due to the lack of iterative refinement, as in multi-step diffusion, directly applying CFG to one-step generation leads to blending artifacts and diminished output quality. To fill this gap, we introduce Negative-Away Steer Attention (NASA), an efficient method that integrates negative prompts into one-step diffusion models. NASA operates within the intermediate representation space by leveraging cross-attention mechanisms to suppress undesired visual attributes. This strategy avoids the blending artifacts inherent in output-space guidance and achieves high efficiency, incurring only a minimal 1.89% increase in FLOPs compared to the computational doubling of CFG. Furthermore, NASA can be seamlessly integrated into existing timestep distillation frameworks, enhancing the student’s output quality. Experimental results demonstrate that NASA substantially improves controllability and output quality, achieving an HPSv2 score of 31.21, setting a new state-of-the-art benchmark for one-step diffusion models.
arXiv

On the Expressiveness of Visual Prompt Experts

Minh Le*, Anh Nguyen*, Huy Nguyen, Chau Nguyen, Anh Tran, and Nhat Ho

arXiv preprint arXiv:2501.18936

Abstract PDF

Visual Prompt Tuning (VPT) has proven effective for parameter-efficient adaptation of pre-trained vision models to downstream tasks by inserting task-specific learnable prompt tokens. Despite its empirical success, a comprehensive theoretical understanding of VPT remains an active area of research. Building on the recently established connection between Mixture of Experts (MoE) and prompt-based methods, wherein each attention head can be conceptualized as a composition of multiple MoE models, we reinterpret VPT as the introduction of new prompt experts into these MoE structures. We identify a key limitation in existing VPT frameworks: the restricted functional expressiveness of prompt experts, which remain static and thus limited in their adaptability. To address this, we propose Visual Adaptive Prompt Tuning (VAPT), a novel method that endows prompt experts with enhanced expressiveness while preserving parameter efficiency. Empirical evaluations on VTAB-1K and FGVC demonstrate that VAPT achieves substantial performance improvements, surpassing fully fine-tuned baselines by 7.34% and 1.04%, respectively. Moreover, VAPT consistently outperforms VPT while requiring fewer additional parameters. Furthermore, our theoretical analysis indicates that VAPT achieves optimal sample efficiency. Collectively, these results underscore the theoretical grounding and empirical advantages of our approach.

2025

2024

2023

2022

2021

2020

2019

2018

2016